VLD Digital

Displaying 1-40 of 514 results

  • White Papers // Jun 2014

    Concurrent Analytical Query Processing with GPUs

    In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the pro ling of an open-source GPU query engine running commonly used single-query data warehousing workloads, the authors observe that the utilization...

    Provided By VLD Digital

  • White Papers // Jun 2014

    NOMAD: Nonlocking, stOchastic Multimachine algorithm for Asynchronous and Decentralized matrix completion

    The authors develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Ibex - An Intelligent Storage Engine with Support for Advanced SQL Offloading

    Modern data appliances face severe bandwidth bottlenecks when moving vast amounts of data from storage to the query processing nodes. A possible solution to mitigate these bottlenecks is query off-loading to an intelligent storage engine, where partial or whole queries are pushed down to the storage engine. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2014

    ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases

    Lazy replication with Snapshot Isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication often requires execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily...

    Provided By VLD Digital

  • White Papers // Jun 2014

    The Case for Personal Data-Driven Decision Making

    Data-Driven Decision Making (D3M) has shown great promise in professional pursuits such as business and government. Here, policy-makers collect and analyze data to make their operations more efficient and equitable. Progress in bringing the benefits of D3M to everyday life has been slow. For example, a student asks, \"If the...

    Provided By VLD Digital

  • White Papers // May 2014

    WideTable: An Accelerator for Analytical Data Processing

    In this paper the authors present a technique called WideTable that aims to improve the speed of analytical data processing systems. A WideTable is built by denormalizing the database, and then converting complex queries into simple scans on the underlying (wide) table. To avoid the pitfalls associated with denormalization, e.g....

    Provided By VLD Digital

  • White Papers // May 2014

    The Case for Data Visualization Management Systems

    Most visualizations today are produced by retrieving data from a database and using a specialized visualization tool to render it. This decoupled approach results in significant duplication of functionality, such as aggregation and filters, and misses' tremendous opportunities for cross-layer optimizations. In this paper, the authors present the case for...

    Provided By VLD Digital

  • White Papers // May 2014

    On k-Path Covers and their Applications

    The authors introduced the k-all-path cover optimization problem with the goal of computing compact yet faithful synopses of the vertex set of road networks. Their proposed pruning algorithm provides close to optimal results in practice and was experimentally proven to be very efficient on large graphs. For the special subcase...

    Provided By VLD Digital

  • White Papers // May 2014

    From Data Fusion to Knowledge Fusion

    The task of data fusion is to identify the true values of data items (e.g., the true date of birth for Tom Cruise) among multiple observed values drawn from different sources (e.g., Web sites) of varying (and unknown) reliability. A recent survey has provided a detailed comparison of various fusion...

    Provided By VLD Digital

  • White Papers // May 2014

    When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities

    Recently, approximate hardware designs have got many research interests in the computer architecture community. The essential idea of approximate hardware is that the hardware components such as CPU, memory and storage can trade off the accuracy of results for increased performance, reduced energy consumption, or both. The authors propose a...

    Provided By VLD Digital

  • White Papers // May 2014

    Scalable Logging through Emerging NonVolatile Memory

    Emerging byte-addressable, Non-Volatile Memory (NVM) is fundamentally changing the design principle of transaction logging. It potentially invalidates the need for flush-before-commit as log records are persistent immediately upon write. Distributed logging - a once prohibitive technique for single node systems in the DRAM era - becomes a promising solution to...

    Provided By VLD Digital

  • White Papers // May 2014

    Storage Management in AsterixDB

    Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, store, index, and analyze such data efficiently. In response, the authors released the first...

    Provided By VLD Digital

  • White Papers // May 2014

    Workload Matters: Why RDF Databases Need a New Design

    The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable...

    Provided By VLD Digital

  • White Papers // May 2014

    An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems

    There have been several recent proposals for database system architectures that use a deterministic execution frame-work to process transactions. Recent proposals for deterministic database system designs argue that deterministic database systems facilitate replication since the same input can be independently sent to two different replicas without concern for replica divergence....

    Provided By VLD Digital

  • White Papers // May 2014

    M4: A Visualization-Oriented Time Series Data Aggregation

    Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume...

    Provided By VLD Digital

  • White Papers // May 2014

    Reverse k-Ranks Query

    Finding matching customers for a given product based on individual user's preference are critical for many applications, especially in e-commerce. Recently, the reverse top-k query is proposed to return a number of customers who regard a given product as one of the k most favorite products based on a linear...

    Provided By VLD Digital

  • White Papers // Apr 2014

    Incremental Record Linkage

    Record linkage clusters records such that each cluster corresponds to a single distinct real-world entity. It is a crucial step in data cleaning and data integration. In the big data era, the velocity of data updates is often high, quickly making previous linkage results obsolete. This paper presents an end-to-end...

    Provided By VLD Digital

  • White Papers // Apr 2014

    On Arbitrage-free Pricing for General Data Queries

    Data is a commodity. Recent research has considered the mathematical problem of setting prices for different queries over data. Ideal pricing functions need to be flexible - defined for arbitrary queries (select-project-join, aggregate, random sample, and noisy privacy-preserving queries). They should be fine-grained - a consumer should not be required...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories

    Driven by the advance of positioning technology and the popularity of location-sharing services, semantic-enriched trajectory data have become unprecedentedly available. The sequential patterns hidden in such data, when properly defined and extracted, can greatly benefit tasks like targeted advertising and urban planning. Unfortunately, classic sequential pattern mining algorithms developed for...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Towards Building Wind Tunnels for Data Center Design

    Data center design is a tedious and expensive process. Recently, this process has become even more challenging as users of cloud services expect to have guaranteed levels of availability, durability and performance. A new challenge for the service providers is to find the most cost-effective data center design and configuration...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Calibrating Data to Sensitivity in Private Data Analysis

    The authors present an approach to differentially private computation in which one does not scale up the magnitude of noise for challenging queries, but rather scales down the contributions of challenging records. While scaling down all records uniformly is equivalent to scaling up the noise magnitude, they show that scaling...

    Provided By VLD Digital

  • White Papers // Mar 2014

    String Similarity Joins: An Experimental Evaluation

    String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly compared under the same experimental...

    Provided By VLD Digital

  • White Papers // Mar 2014

    An Efficient Publish/Subscribe Index for E-Commerce Databases

    Many of todays publish/subscribe (pub/sub) systems have been designed to cope with a large volume of subscriptions and high event arrival rate (velocity). However, in many novel applications (such as e-commerce), there is an increasing variety of items, each with different attributes. This leads to a very high-dimensional and sparse...

    Provided By VLD Digital

  • White Papers // Mar 2014

    A Principled Approach to Bridging the Gap between Graph Data and their Schemas

    Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Toward Computational Fact-Checking

    In this paper, the authors have shown how to turn fact-checking into a computational problem. Interestingly, by regarding claims as queries with parameters, they can check them - not just for correctness, but more importantly, for more subtle measures of quality - by perturbing their parameters. This observation leads the...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Optimizing Graph Algorithms on Pregel-Like Systems

    The authors study the problem of implementing graph algorithms efficiently on Pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Schemaless and Structureless Graph Querying

    Querying complex graph databases such as knowledge graphs is a challenging task for non-professional users. Due to their complex schemas and variational information descriptions, it becomes very hard for users to formulate a query that can be properly processed by the existing systems. The authors argue that for a user-friendly...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

    Large-scale data analytics have become an integral part of online services, enterprise data management, system management, and scientific applications in order to gain value from huge amounts of collected data. Finding interesting unknown facts and patterns often requires analyzing the full data set instead of applying sampling techniques. Recent approaches...

    Provided By VLD Digital

  • White Papers // Feb 2014

    epiC: an Extensible and Scalable System for Processing Big Data

    The big data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the big data problem are largely based...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Lightweight Indexing of Observational Data in Log-Structured Storage

    Huge amounts of data are being generated by sensing devices every day, recording the status of objects and the environment. Such observational data is widely used in scientific research. As the capabilities of sensors keep improving, the data produced are drastically expanding in precision and quantity, making it a write-intensive...

    Provided By VLD Digital

  • White Papers // Feb 2014

    GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Rank Join Queries in NoSQL Databases

    Cloud stores have become the storage of choice for a large variety of big data producers, consumers, and managers (e.g., Twitter, Facebook, Google, Amazon, etc.) For many modern Big Data applications, RDBMSs were found lacking, particularly with respect to scalability (in terms of number of data items, users, operations per...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Optimizing Graph Algorithms on Pregellike Systems

    The authors study the problem of implementing graph algorithms efficiently on pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Edelweiss: Automatic Storage Reclamation for Distributed Programming

    Event Log Exchange (ELE) is a common programming pattern based on immutable state and messaging. ELE sidesteps traditional challenges in distributed consistency, at the expense of introducing new challenges in designing space reclamation protocols to avoid consuming unbounded storage. The authors introduce Edelweiss, a sublanguage of bloom that provides an...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records

    Identifying records referring to the same real world entity over time enables longitudinal data analysis. However, difficulties arise from the dynamic nature of the world: the entities described by a temporal data set often evolve their states over time. While the state of the art approach to temporal entity matching...

    Provided By VLD Digital

  • White Papers // Jan 2014

    A Provenance Framework for Data-Dependent Process Analysis

    A Data-Dependent Process (DDP) models an application who-se control flow is guided by a finite state machine, as well as by the state of an underlying database. DDPs are commonly found e.g., in e-commerce. In this paper, the authors develop a framework supporting the use of provenance in static (temporal)...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems

    The authors present a vision of next-generation visual analytics ser-vices. They argue that these services should have three related capabilities: support visual and interactive data exploration as they do today, but also suggest relevant data to enrich visualizations, and facilitate the integration and cleaning of that data. Most importantly, they...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Shared Workload Optimization

    As a result of increases in both the query load and the data managed, as well as changes in hardware architecture (multi-core), the last years have seen a shift from query-at-a-time approaches towards Shared Work (SW) systems where queries are executed in groups. Such groups share operators like scans and...

    Provided By VLD Digital

  • White Papers // Dec 2013

    MaaT: Effective and Scalable Coordination of Distributed Transactions in the Cloud

    The past decade has witnessed an increasing adoption of cloud database technology, which provides better scalability, availability, and fault-tolerance via transparent partitioning and replication, and automatic load balancing and fail-over. However, only a small number of cloud databases provide strong consistency guarantees for distributed transactions, despite decades of research on...

    Provided By VLD Digital

  • White Papers // Dec 2013

    A Data and Workload-Aware Algorithm for Range Queries Under Differential Privacy

    The authors describe a new algorithm for answering a given set of range queries under differential privacy which often achieves substantially lower error than competing methods. Their algorithm satisfies differential privacy by adding noise that is adapted to the input data and to the given query set. They first privately...

    Provided By VLD Digital

  • White Papers // Mar 2012

    MonetDB/DataCell: Online Analytics in a Streaming ColumnStore

    In DataCell, the authors design streaming functionalities in a modern relational database kernel which targets big data analytics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. They investigate the opportunities and challenges that arise with such a direction and they show that it carries significant advantages...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Concurrency Control for Adaptive Indexing

    Adaptive indexing initializes and optimizes indexes incrementally, as a side effect of query processing. The goal is to achieve the benefits of indexes while hiding or minimizing the costs of index creation. However, index-optimizing side effects seem to turn read-only queries into update transactions that might, for example, create lock...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories

    Driven by the advance of positioning technology and the popularity of location-sharing services, semantic-enriched trajectory data have become unprecedentedly available. The sequential patterns hidden in such data, when properly defined and extracted, can greatly benefit tasks like targeted advertising and urban planning. Unfortunately, classic sequential pattern mining algorithms developed for...

    Provided By VLD Digital

  • White Papers // Apr 2014

    Incremental Record Linkage

    Record linkage clusters records such that each cluster corresponds to a single distinct real-world entity. It is a crucial step in data cleaning and data integration. In the big data era, the velocity of data updates is often high, quickly making previous linkage results obsolete. This paper presents an end-to-end...

    Provided By VLD Digital

  • White Papers // May 2014

    Reverse k-Ranks Query

    Finding matching customers for a given product based on individual user's preference are critical for many applications, especially in e-commerce. Recently, the reverse top-k query is proposed to return a number of customers who regard a given product as one of the k most favorite products based on a linear...

    Provided By VLD Digital

  • White Papers // Apr 2014

    On Arbitrage-free Pricing for General Data Queries

    Data is a commodity. Recent research has considered the mathematical problem of setting prices for different queries over data. Ideal pricing functions need to be flexible - defined for arbitrary queries (select-project-join, aggregate, random sample, and noisy privacy-preserving queries). They should be fine-grained - a consumer should not be required...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Quality Assessment Social Networks: A Novel Approach for Assessing the Quality of Information on the Web

    The unprecedented volume of information on the Web brings the practical difficulties for the information consumers to assess the quality of information. The wide variety of web users and distinct situations at their hands pose the difficulties to the Quality Assessment (QA) process, which must be customizable according to the...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Deriving Effectiveness Measures for Data Quality Rules

    The poor quality of data constitutes a major concern world-wide, and an obstacle to data integration and analysis efforts. Detecting errors and inconsistencies using application specific data quality rules play an important role in data quality assessment. These rules have different efficacy and cost under different circumstances. In the authors'...

    Provided By VLD Digital

  • White Papers // Sep 2010

    DuDe: The Duplicate Detection Toolkit

    Duplicate detection, also known as entity matching or record linkage, was first defined by the researcher and has been a research topic for several decades. The challenge is to effectively and efficiently identify pairs of records that represent the same real world entity. Researchers have developed and described a variety...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Discovering Conditional Functional Dependencies to Detect Data Inconsistencies

    Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, the authors present an approach that efficiently and robustly discovers conditional functional dependencies for detecting inconsistencies in data and hence improves...

    Provided By VLD Digital

  • White Papers // Aug 2010

    Buffered Bloom Filters on Solid State Storage

    Bloom filters are widely used in many applications including database management systems. With a certain allowable error rate, this data structure provides an efficient solution for membership queries. The error rate is inversely proportional to the size of the bloom filter. Currently, bloom filters are stored in main memory because...

    Provided By VLD Digital

  • White Papers // Aug 2010

    Towards SSD-Ready Enterprise Platforms

    High-performance Solid State Disks (SSDs) deliver a 2 - 3 orders of magnitude increase in I/O Operations Per Second (IOPS) over Hard Disk Drives (HDDs). Extreme-performance SSDs can produce up to 120,000 IOPS for random-access reads, taking as few as eight direct-attached SSDs to reach one million IOPS. However, today's...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Structure Index for RDF Data

    In recent years, the amount of structured RDF data available on the Web has been increasing rapidly. Efficient query processing that can scale to large amounts of RDF data has become an important topic. Significant efforts have been dedicated to the development of solutions for RDF data management. Along this...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Facilitating Fine Grained Data Provenance using Temporal Data Model

    E-science applications use ne grained data provenance to maintain the reproducibility of scientific results, i.e., for each processed data tuple, the source data used to process the tuple as well as the used approach is documented. Since most of the e-science applications perform online processing of sensor data using overlapping...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Data Quality Aware Query System

    Traditional query systems do not factor in data quality considerations in their response. However, the issue of data quality is of growing importance as individuals as well as corporations are increasingly relying on multiple, often external sources of data to make decisions. Previous papers have identified diverse interpretations of data...

    Provided By VLD Digital

  • White Papers // Sep 2010

    String-Based Semantic Web Data Management Using Ternary BTrees

    The Resource Description Framework (RDF) stems from the semantic Web but can also be regarded simply as a data model, independent of its origins. It's simple structure is ideal for describing and merging heterogeneous data from different sources quickly, without having to design a complex schema first. The different nature...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Search-As-You-Type in Forms: Leveraging the Usability and the Functionality of Search Paradigm in Relational Databases

    Querying, or searching, is one of the most important issues in relational databases. There are many search paradigms, such as Structured Query Language (SQL), keyword search, and form search, a.k.a. Query-By-Example (QBE). Among them, QBE is a good trade-off between usability and functionality. However, existing QBE systems are often inconvenient...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Declarative Secure Distributed Systems

    In the past decade, distributed systems have rapidly evolved and gained significant traction in the research community, with an increasing interest concentrated on developing and analyzing secure distributed systems. In this paper, the authors present DS2 (Declarative Secure Distributed Systems), a unified platform for specifying, implementing, and analyzing large-scale secure...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Traceable Peer-to-Peer Record Exchange

    Peer-To-Peer (P2P) technology allows the authors flexible information sharing and communications in a wide-spread network. Unlike the traditional client-server architecture, a P2P network enables a peer to publish information and share data with other peers without central server control. In such an environment, tracing how data is copied between peers...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Spatiotemporal Pattern Queries

    Capturing moving objects data is now possible and becoming cheaper with the advances in the positioning and sensor technologies. The increasing amount of such data and the various fields of applications call for intensive research work for building a spatiotemporal DBMS. This involves several aspects such as modeling the moving...

    Provided By VLD Digital

  • White Papers // Sep 2010

    The Mimicking Octopus: Towards a one-size-fits-all Database Architecture

    Modern enterprises need to pick the right DBMSs e.g. OLTP, OLAP, streaming systems and scan-oriented systems among others, each tailored to a specific use-case application, for their data managing problems. This makes using specialized solutions for each application costly due to licensing fees, integration overhead and DBA costs. Additionally, it...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Mining Subspace Clusters: Enhanced Models, Efficient Algorithms and an Objective Evaluation Study

    In the knowledge discovery process, clustering is an established technique for grouping objects based on mutual similarity. However, in today's applications for each object very many attributes are provided in large and high dimensional databases. As multiple concepts described by different attributes are mixed in the same data set, clusters...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Research on Microarray Dataset Mining

    The recent development of high-throughput bio-techniques for post genomics has generated a large volume of gene expression data. Microarray data has made the new challenges which make many traditional data mining methods infeasible for mining the hidden knowledge. With the rapid progress of bio-techniques of post genomic era, more and...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Harnessing Data Management Technology for Web Mashups Development

    Web mashups, in a similar spirit, stem from the reuse of existing data sources, services or Web applications into more complex assets, with the emphasis being on GUI and programmingless specification. Web mashups are Web applications that integrate heterogenous data sources, services and full applications, on the web. The new...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Towards A Unified Framework For Schema Merging

    Merging schemas to create a mediated view is a recurring problem in applications related to data interoperability. The task becomes particularly challenging when the schemas are highly heterogeneous and autonomous. Classical data integration systems rely on a mediated schema created by human experts through an intensive design process. Automatic generation...

    Provided By VLD Digital

  • White Papers // Aug 2010

    Modeling Multithreaded Query Execution on Chip Multiprocessors

    Modern CPUs follow multi-core designs with multiple threads running in parallel. The data flow of query processing algorithms needs to be adapted to exploit such designs. The authors identify memory accesses and thread synchronization as the main bottlenecks in a multi-core execution environment. They present a uniform framework to mitigate...

    Provided By VLD Digital

  • White Papers // Aug 2010

    On Transactional Memory, Spinlocks, and Database Transactions

    Currently, hardware trends include a move toward multi-core processors, cheap and persistent variants of memory, and even sophisticated hardware support for mutual exclusion in the form of transactional memory. These trends, coupled with a growing desire for extremely high performance on short database transactions, raise the question of whether the...

    Provided By VLD Digital

  • White Papers // Aug 2010

    GPU-Based Speculative Query Processing for Database Operations

    With an increasing amount of data and user demands for fast query processing, the optimization of database operations continues to be a challenging task. A common optimization method is to leverage parallel hardware architectures. With the introduction of general-purpose GPU computing, massively parallel hardware has become available within commodity hardware....

    Provided By VLD Digital

  • White Papers // Sep 2010

    A Benchmark for Context Data Management in Mobile Context-Aware Applications

    Over the last few years, computational power, storage capacity, and sensing capabilities of mobile devices have significantly improved. As a consequence, they have undergone a rapid development from pure telecommunication devices to small and ubiquitous computing platforms. Most importantly, these devices are able to host context-aware applications, i.e., applications that...

    Provided By VLD Digital

  • White Papers // Sep 2010

    SQL QueRIE Recommendations: a query fragment-based approach

    Relational database systems are becoming increasingly popular in the scientific community to support the interactive exploration of large volumes of data. In this paper, users employ a query interface (typically, a web-based client) to issue a series of SQL queries that aim to analyze the data and mine it for...

    Provided By VLD Digital

  • White Papers // Sep 2010

    UpStream: Storagecentric Load Management for Data Stream Processing Systems

    Processing fast updating data streams in real-time must reflect the most recent data. A number of technologies including data stream management systems have emerged to respond to this challenge. While running their queries in a continuous fashion on high-volume push-based data streams (e.g. sensor data, GPS coordinates and stock quotes),...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Jumbo: Beyond MapReduce for Workload Balancing

    Over the past decade several frameworks such as Google MapReduce have been developed that allow data processing with unprecedented scale due to their high scalability and fault tolerance. However, these systems provide both new and existing challenges for workload balancing that have not yet been fully explored. The MapReduce model...

    Provided By VLD Digital

  • White Papers // Sep 2010

    A Distributed Event Stream Processing Framework for Materialized Views over Heterogeneous Data Sources

    Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous structured data sources such as relational databases and XML data. This paper provides the foundation for defining a framework for materialized views over heterogeneous data sources...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Directions and Challenges for Semdata

    The data driven world is here. What will RDF's place therein be? In this paper, the authors explore the inevitable coming together of RDF, analytic oriented databasing, web scale computing and reasoning. To warehouse, analyze, publish and subscribe to anything anywhere, draw conclusions and distill knowledge. This is the challenge....

    Provided By VLD Digital

  • White Papers // Sep 2010

    SpiderStore: Exploiting Main Memory for Efficient RDF Graph Representation and Fast Querying

    The constant growth of available RDF data requires fast and efficient querying facilities of graph data. So far, such data sets have been stored by using mapping techniques from graph structures to relational models, secondary memory structures or even complex main memory based models. The authors present the main memory...

    Provided By VLD Digital

  • White Papers // Sep 2010

    SPARQL Query Answering on a Shared-nothing Architecture

    The amount of semantic Web data is outgrowing the capacity of semantic Web stores. Similar to traditional databases, scaling up RDF stores is faced with a design dilemma: increase the number of nodes at the cost of increased complexity or use sophisticated, and expensive, hardware that can support large amounts...

    Provided By VLD Digital

  • White Papers // Sep 2010

    Optimizing SPARQL queries over the Web of Linked Data

    The web of linked data represents a globally distributed dataspace. It can be queried with SPARQL whose execution takes place by asynchronously traversing the RDF links to discover data sources at run-time. However, the optimization of SPARQL queries over the web of data remains a challenge and in this paper...

    Provided By VLD Digital

  • White Papers // Aug 2008

    Efficient Implementation of Sorting on Multi-Core SIMD CPU Architecturea

    Sorting a list of input numbers is one of the most fundamental problems in the field of computer science in general and high-throughput database applications in particular. Although literature abounds with various flavors of sorting algorithms, different architectures call for customized implementations to achieve faster sorting times. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Ibex - An Intelligent Storage Engine with Support for Advanced SQL Offloading

    Modern data appliances face severe bandwidth bottlenecks when moving vast amounts of data from storage to the query processing nodes. A possible solution to mitigate these bottlenecks is query off-loading to an intelligent storage engine, where partial or whole queries are pushed down to the storage engine. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2014

    ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases

    Lazy replication with Snapshot Isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication often requires execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily...

    Provided By VLD Digital