VLD Digital

Displaying 1-40 of 513 results

  • White Papers // Jun 2014

    Concurrent Analytical Query Processing with GPUs

    In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the pro ling of an open-source GPU query engine running commonly used single-query data warehousing workloads, the authors observe that the utilization...

    Provided By VLD Digital

  • White Papers // Jun 2014

    NOMAD: Nonlocking, stOchastic Multimachine algorithm for Asynchronous and Decentralized matrix completion

    The authors develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Ibex - An Intelligent Storage Engine with Support for Advanced SQL Offloading

    Modern data appliances face severe bandwidth bottlenecks when moving vast amounts of data from storage to the query processing nodes. A possible solution to mitigate these bottlenecks is query off-loading to an intelligent storage engine, where partial or whole queries are pushed down to the storage engine. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2014

    ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases

    Lazy replication with Snapshot Isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication often requires execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily...

    Provided By VLD Digital

  • White Papers // Jun 2014

    The Case for Personal Data-Driven Decision Making

    Data-Driven Decision Making (D3M) has shown great promise in professional pursuits such as business and government. Here, policy-makers collect and analyze data to make their operations more efficient and equitable. Progress in bringing the benefits of D3M to everyday life has been slow. For example, a student asks, \"If the...

    Provided By VLD Digital

  • White Papers // May 2014

    WideTable: An Accelerator for Analytical Data Processing

    In this paper the authors present a technique called WideTable that aims to improve the speed of analytical data processing systems. A WideTable is built by denormalizing the database, and then converting complex queries into simple scans on the underlying (wide) table. To avoid the pitfalls associated with denormalization, e.g....

    Provided By VLD Digital

  • White Papers // May 2014

    The Case for Data Visualization Management Systems

    Most visualizations today are produced by retrieving data from a database and using a specialized visualization tool to render it. This decoupled approach results in significant duplication of functionality, such as aggregation and filters, and misses' tremendous opportunities for cross-layer optimizations. In this paper, the authors present the case for...

    Provided By VLD Digital

  • White Papers // May 2014

    On k-Path Covers and their Applications

    The authors introduced the k-all-path cover optimization problem with the goal of computing compact yet faithful synopses of the vertex set of road networks. Their proposed pruning algorithm provides close to optimal results in practice and was experimentally proven to be very efficient on large graphs. For the special subcase...

    Provided By VLD Digital

  • White Papers // May 2014

    From Data Fusion to Knowledge Fusion

    The task of data fusion is to identify the true values of data items (e.g., the true date of birth for Tom Cruise) among multiple observed values drawn from different sources (e.g., Web sites) of varying (and unknown) reliability. A recent survey has provided a detailed comparison of various fusion...

    Provided By VLD Digital

  • White Papers // May 2014

    When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities

    Recently, approximate hardware designs have got many research interests in the computer architecture community. The essential idea of approximate hardware is that the hardware components such as CPU, memory and storage can trade off the accuracy of results for increased performance, reduced energy consumption, or both. The authors propose a...

    Provided By VLD Digital

  • White Papers // May 2014

    Scalable Logging through Emerging NonVolatile Memory

    Emerging byte-addressable, Non-Volatile Memory (NVM) is fundamentally changing the design principle of transaction logging. It potentially invalidates the need for flush-before-commit as log records are persistent immediately upon write. Distributed logging - a once prohibitive technique for single node systems in the DRAM era - becomes a promising solution to...

    Provided By VLD Digital

  • White Papers // May 2014

    Storage Management in AsterixDB

    Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, store, index, and analyze such data efficiently. In response, the authors released the first...

    Provided By VLD Digital

  • White Papers // May 2014

    Workload Matters: Why RDF Databases Need a New Design

    The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable...

    Provided By VLD Digital

  • White Papers // May 2014

    An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems

    There have been several recent proposals for database system architectures that use a deterministic execution frame-work to process transactions. Recent proposals for deterministic database system designs argue that deterministic database systems facilitate replication since the same input can be independently sent to two different replicas without concern for replica divergence....

    Provided By VLD Digital

  • White Papers // May 2014

    M4: A Visualization-Oriented Time Series Data Aggregation

    Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume...

    Provided By VLD Digital

  • White Papers // May 2014

    Reverse k-Ranks Query

    Finding matching customers for a given product based on individual user's preference are critical for many applications, especially in e-commerce. Recently, the reverse top-k query is proposed to return a number of customers who regard a given product as one of the k most favorite products based on a linear...

    Provided By VLD Digital

  • White Papers // Apr 2014

    Incremental Record Linkage

    Record linkage clusters records such that each cluster corresponds to a single distinct real-world entity. It is a crucial step in data cleaning and data integration. In the big data era, the velocity of data updates is often high, quickly making previous linkage results obsolete. This paper presents an end-to-end...

    Provided By VLD Digital

  • White Papers // Apr 2014

    On Arbitrage-free Pricing for General Data Queries

    Data is a commodity. Recent research has considered the mathematical problem of setting prices for different queries over data. Ideal pricing functions need to be flexible - defined for arbitrary queries (select-project-join, aggregate, random sample, and noisy privacy-preserving queries). They should be fine-grained - a consumer should not be required...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories

    Driven by the advance of positioning technology and the popularity of location-sharing services, semantic-enriched trajectory data have become unprecedentedly available. The sequential patterns hidden in such data, when properly defined and extracted, can greatly benefit tasks like targeted advertising and urban planning. Unfortunately, classic sequential pattern mining algorithms developed for...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Towards Building Wind Tunnels for Data Center Design

    Data center design is a tedious and expensive process. Recently, this process has become even more challenging as users of cloud services expect to have guaranteed levels of availability, durability and performance. A new challenge for the service providers is to find the most cost-effective data center design and configuration...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Calibrating Data to Sensitivity in Private Data Analysis

    The authors present an approach to differentially private computation in which one does not scale up the magnitude of noise for challenging queries, but rather scales down the contributions of challenging records. While scaling down all records uniformly is equivalent to scaling up the noise magnitude, they show that scaling...

    Provided By VLD Digital

  • White Papers // Mar 2014

    String Similarity Joins: An Experimental Evaluation

    String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly compared under the same experimental...

    Provided By VLD Digital

  • White Papers // Mar 2014

    An Efficient Publish/Subscribe Index for E-Commerce Databases

    Many of todays publish/subscribe (pub/sub) systems have been designed to cope with a large volume of subscriptions and high event arrival rate (velocity). However, in many novel applications (such as e-commerce), there is an increasing variety of items, each with different attributes. This leads to a very high-dimensional and sparse...

    Provided By VLD Digital

  • White Papers // Mar 2014

    A Principled Approach to Bridging the Gap between Graph Data and their Schemas

    Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Lightweight Indexing of Observational Data in Log-Structured Storage

    Huge amounts of data are being generated by sensing devices every day, recording the status of objects and the environment. Such observational data is widely used in scientific research. As the capabilities of sensors keep improving, the data produced are drastically expanding in precision and quantity, making it a write-intensive...

    Provided By VLD Digital

  • White Papers // Feb 2014

    GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Rank Join Queries in NoSQL Databases

    Cloud stores have become the storage of choice for a large variety of big data producers, consumers, and managers (e.g., Twitter, Facebook, Google, Amazon, etc.) For many modern Big Data applications, RDBMSs were found lacking, particularly with respect to scalability (in terms of number of data items, users, operations per...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Toward Computational Fact-Checking

    In this paper, the authors have shown how to turn fact-checking into a computational problem. Interestingly, by regarding claims as queries with parameters, they can check them - not just for correctness, but more importantly, for more subtle measures of quality - by perturbing their parameters. This observation leads the...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Optimizing Graph Algorithms on Pregel-Like Systems

    The authors study the problem of implementing graph algorithms efficiently on Pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Schemaless and Structureless Graph Querying

    Querying complex graph databases such as knowledge graphs is a challenging task for non-professional users. Due to their complex schemas and variational information descriptions, it becomes very hard for users to formulate a query that can be properly processed by the existing systems. The authors argue that for a user-friendly...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

    Large-scale data analytics have become an integral part of online services, enterprise data management, system management, and scientific applications in order to gain value from huge amounts of collected data. Finding interesting unknown facts and patterns often requires analyzing the full data set instead of applying sampling techniques. Recent approaches...

    Provided By VLD Digital

  • White Papers // Feb 2014

    epiC: an Extensible and Scalable System for Processing Big Data

    The big data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the big data problem are largely based...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Optimizing Graph Algorithms on Pregellike Systems

    The authors study the problem of implementing graph algorithms efficiently on pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records

    Identifying records referring to the same real world entity over time enables longitudinal data analysis. However, difficulties arise from the dynamic nature of the world: the entities described by a temporal data set often evolve their states over time. While the state of the art approach to temporal entity matching...

    Provided By VLD Digital

  • White Papers // Jan 2014

    A Provenance Framework for Data-Dependent Process Analysis

    A Data-Dependent Process (DDP) models an application who-se control flow is guided by a finite state machine, as well as by the state of an underlying database. DDPs are commonly found e.g., in e-commerce. In this paper, the authors develop a framework supporting the use of provenance in static (temporal)...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems

    The authors present a vision of next-generation visual analytics ser-vices. They argue that these services should have three related capabilities: support visual and interactive data exploration as they do today, but also suggest relevant data to enrich visualizations, and facilitate the integration and cleaning of that data. Most importantly, they...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Shared Workload Optimization

    As a result of increases in both the query load and the data managed, as well as changes in hardware architecture (multi-core), the last years have seen a shift from query-at-a-time approaches towards Shared Work (SW) systems where queries are executed in groups. Such groups share operators like scans and...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Edelweiss: Automatic Storage Reclamation for Distributed Programming

    Event Log Exchange (ELE) is a common programming pattern based on immutable state and messaging. ELE sidesteps traditional challenges in distributed consistency, at the expense of introducing new challenges in designing space reclamation protocols to avoid consuming unbounded storage. The authors introduce Edelweiss, a sublanguage of bloom that provides an...

    Provided By VLD Digital

  • White Papers // Dec 2013

    Write-limited Sorts and Joins for Persistent Memory

    To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byte addressable, but exhibits asymmetric I/O: writes are typically one order of magnitude...

    Provided By VLD Digital

  • White Papers // Dec 2013

    Reverse Top-k Search using Random Walk with Restart

    With the increasing popularity of social networks, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Node-to-node proximity is the key building block for many graph-based...

    Provided By VLD Digital

  • White Papers // Aug 2013

    ISLABEL: an Independent-Set based Labeling Scheme for Point-to-Point Distance Querying

    The authors study the problem of computing shortest path or distance between two query vertices in a graph, which has numerous important applications. Quite a number of indexes have been proposed to answer such distance queries. However, all of these indexes can only process graphs of size barely up to...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Streaming Algorithms for k-core Decomposition

    A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. A k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-Hard problems on real networks efficiently,...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Efficient Error-tolerant Query Autocompletion

    Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper, the authors study the problem of query autocompletion that tolerates errors in users' input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Spatial Keyword Query Processing: An Experimental Evaluation

    Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. The authors provide an all-around survey of 12 state-of-the-art geo-textual indices. They propose...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Fast Iterative Graph Computation with Block Updates

    Scaling iterative graph processing applications to large graphs is an important problem. Performance is critical, as data scientists need to execute graph programs many times with varying parameters. The need for a high-level, high-performance programming model has inspired much research on graph programming frameworks. In this paper, the authors show...

    Provided By VLD Digital

  • White Papers // Sep 2013

    Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce

    The skyline operator and its variants such as dynamic skyline and reverse skyline operators have attracted considerable attention recently due to their broad applications. However, computations of such operators are challenging today since there is an increasing trend of applications to deal with big data. For such data-intensive applications, the...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Simple, Fast, and Scalable Reachability Oracle

    In this paper, by introducing two simple, elegant, and effective labeling approaches, Hierarchical Labeling and Distribution Labeling, the authors are able to resolve an important open question in reachability computation: the reachability oracle can be a powerful tool to handle real, very large graphs. Their experimental results demonstrate that they...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Understanding Hierarchical Methods for Differentially Private Histograms

    In recent years, many approaches to differentially privately publish histograms have been proposed. Several approaches rely on constructing tree structures in order to decrease the error when answer large range queries. In this paper, the authors examine the factors affecting the accuracy of hierarchical approaches by studying the Mean Squared...

    Provided By VLD Digital

  • White Papers // Sep 2013

    Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs

    Horton+ is a graph query processing system that executes declarative reachability queries on a partitioned attributed multi-graph. It employs a query language, query optimizer, and a distributed execution engine. The query language expresses declarative reachability queries, and supports closures and predicates on node and edge attributes to match graph paths....

    Provided By VLD Digital

  • White Papers // Sep 2010

    FlashStore: High Throughput Persistent Key-Value Store

    The authors present FlashStore, a high throughput persistent key value store that uses flash memory as a non-volatile cache between RAM and hard disk. FlashStore is designed to store the working set of key-value pairs on flash and use one flash read per key lookup. As the working set changes...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis

    Large-scale graph analysis is becoming important with the rise of world-wide social network services. Recently in SociaLite, the authors proposed extensions to Datalog to efficiently and succinctly implement graph analysis programs on sequential machines. This paper describes novel extensions and optimizations of SociaLite for parallel and distributed executions to support...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Scaling Queries Over Big RDF Graphs With Semantic Hash Partitioning

    Massive volumes of big RDF data are growing beyond the performance capacity of conventional RDF data management systems operating on a single node. Applications using large RDF data demand efficient data partitioning solutions for supporting RDF data access on a cluster of compute nodes. In this paper the authors present...

    Provided By VLD Digital

  • White Papers // Aug 2013

    An Experimental Analysis of Iterated Spatial Joins in Main Memory

    Many modern applications rely on high-performance processing of spatial data. Examples include location-based services, games, virtual worlds, and scientific simulations such as molecular dynamics and behavioral simulations. These applications deal with large numbers of moving objects that continuously sense their environment, and their data access can often be abstracted as...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Counting and Sampling Triangles from a Graph Stream

    Triangle counting has emerged as an important building block in the study of social networks, identifying thematic structures of networks, spam and fraud detection, link classification and recommendation, and more. This paper presents a new space-efficient algorithm for counting and sampling triangles - and more generally, constant-sized cliques - in...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Expressiveness and Complexity of Order Dependencies

    Dependencies play an important role in databases. The authors study Order Dependencies (ODs) - and Unidirectional Order Dependencies (UODs), a proper sub-class of ODs - which describe the relationships among lexicographical orderings of sets of tuples. They consider lexicographical ordering, as by the order-by operator in SQL, because this is...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Query-Driven Approach to Entity Resolution

    The significance of data quality research is motivated by the observation that the effectiveness of data-driven technologies such as decision support tools, data exploration, analysis, and scientific discovery tools is closely tied to the quality of data to which such techniques are applied. This paper explores \"On-the-fly\" data cleaning in...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Efficient Bulk Updates on Multiversion B-trees

    Partial persistent index structures support efficient access to current and past versions of objects, while updates are allowed on the current version. The MultiVersion B-Tree (MVBT) represents a partially persistent index-structure with both, asymptotic worst-case performance and excellent performance in real life applications. Updates are performed tuple-by-tuple with the same...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Counter Strike: Generic Top-Down Join Enumeration for Hypergraphs

    Finding the optimal execution order of join operations is a crucial task of today's cost-based query optimizers. There are two approaches to identify the best plan: bottom-up and top-down join enumeration. But only the top-down approach allows for branch and-bound pruning, which can improve compile time by several orders of...

    Provided By VLD Digital

  • White Papers // Aug 2013

    A Sampling Algebra for Aggregate Estimation

    As of 2005, sampling has been incorporated in all major database systems. While efficient sampling techniques are realizable, determining the accuracy of an estimate obtained from the sample is still an unresolved problem. In this paper, the authors present a theoretical framework that allows an elegant treatment of the problem....

    Provided By VLD Digital

  • White Papers // Aug 2013

    Summarizing Answer Graphs Induced by Keyword Queries

    Various methods were developed to process keyword queries. In practice, these methods typically generate a set of graphs G induced by Q. Keyword search has been popularly used to query graph data. Due to the lack of structure support, a keyword query might generate an excessive number of matches, referred...

    Provided By VLD Digital

  • White Papers // Aug 2013

    A Probabilistic Optimization Framework for the Empty-Answer Problem

    The authors propose a principled optimization-based interactive query relaxation framework for queries that return no answers. Given an initial query that returns an empty answer set, their framework dynamically computes and suggests alternative queries with less conditions than those the user has initially requested, in order to help the user...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters

    A table placement method is a critical component in big data analytics on distributed systems. It determines the way how data values in a two-dimensional table are organized and stored in the underlying cluster. Based on hadoop computing environments, several table placement methods have been proposed and implemented. However, a...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia

    Bloom filters are a great technique to test whether a key is not in a set of keys. This paper presents a novel data structure called ARF. In a nutshell, ARFs are for range queries what Bloom filters are for point queries. That is, an ARF can determine whether a...

    Provided By VLD Digital

  • White Papers // Aug 2013

    QuEval: Beyond High-Dimensional Indexing a La Carte

    In the recent past, the amount of high-dimensional data, such as feature vectors extracted from multimedia data, increased dramatically. A large variety of indexes have been proposed to store and access such data efficiently. However, due to specific requirements of a certain use case, choosing an adequate index structure is...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Probabilistic Query Rewriting for Efficient and Effective Keyword Search on Graph Data

    The problem of rewriting keyword search queries on graph data has been studied recently, where the main goal is to clean user queries by rewriting keywords as valid tokens appearing in the data and grouping them into meaningful segments. The main solution to this problem employs heuristics for ranking query...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases

    Tabular data on the web has become a rich source of structured data that is useful for ordinary users to explore. Due to its potential, tables on the web have recently attracted a number of studies with the goals of understanding the semantics of those web tables and providing effective...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Supporting Distributed Feed-Following Apps over Edge Devices

    In feed-following applications such as Twitter and Facebook, users (consumers) follow a large number of other users (producers) to get personalized feeds, generated by blending producers' feeds. With the proliferation of cloud-connected smart edge devices such as Smartphone, producers and consumers of many feed-following applications reside on edge devices and...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Multi-Tuple Deletion Propagation: Approximations and Complexity

    In this paper the authors study the computational complexity of the classic problem of deletion propagation in a relational database, where tuples are deleted from the base relations in order to realize a desired deletion of tuples from the view. Such an operation may result in a (sometimes unavoidable) side...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Discovering Denial Constraints

    Integrity Constraints (ICs) provide a valuable tool for enforcing correct application semantics. However, designing ICs requires experts and time. Proposals for automatic discovery have been made for some formalisms, such as functional dependencies and their extension conditional functional dependencies. Unfortunately, these dependencies cannot express many common business rules. For example,...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Universal Indexing of Arbitrary Similarity Models

    The increasing amount of available unstructured content together with the growing number of large non-relational databases put more emphasis on the content-based retrieval and precisely on the area of similarity searching. Although there exist several indexing methods for efficient querying, not all of them are best-suited for arbitrary similarity models....

    Provided By VLD Digital

  • White Papers // Jun 2012

    Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

    Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for diskbased systems where I/O dominated the performance. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2012

    LogBase: A Scalable Log-Structured Database System in the Cloud

    Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Writeahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log...

    Provided By VLD Digital

  • White Papers // May 2012

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Definition, Detection, and Recovery of Single-Page Failures, a Fourth Class of Database Failures

    The three traditional failure classes are system, media, and transaction failures. Sometimes, however, modern storage exhibits failures that differ from all of those. In order to capture and describe such cases, single-page failures are introduced as a fourth failure class. This class encompasses all failures to read a data page...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Pushing the Boundaries of Crowd-Enabled Databases with Query-Driven Schema Expansion

    By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, the authors extend crowd-enabled databases by flexible query-driven schema expansion,...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores

    Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: they have little idle system time to devote on physical design; and there is little, if any, a priori workload knowledge, while the query and data workload keeps changing...

    Provided By VLD Digital

  • White Papers // Jan 2012

    Aggregation in Probabilistic Databases Via Knowledge Compilation

    This paper presents a query evaluation technique for positive relational algebra queries with aggregates on a representation system for probabilistic data based on the algebraic structures of semiring and semimodule. The core of the authors' evaluation technique is a procedure that compiles semimodule and semiring expressions into so-called decomposition trees,...

    Provided By VLD Digital

  • White Papers // Dec 2011

    Putting Lipstick on Pig: Enabling Database-Style Workflow Provenance

    Workflow provenance typically assumes that each module is a "Black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of...

    Provided By VLD Digital

  • White Papers // Dec 2011

    High-Performance Concurrency Control Mechanisms for Main-Memory Databases

    A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper, the authors introduce two efficient concurrency control methods specifically designed for main-memory...

    Provided By VLD Digital

  • White Papers // Oct 2011

    View Selection in Semantic Web Databases

    The authors consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, they address the problem of selecting a set of views to be materialized in the database, minimizing a combination...

    Provided By VLD Digital