VLD Digital

Displaying 1-40 of 513 results

  • White Papers // Jun 2014

    Concurrent Analytical Query Processing with GPUs

    In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the pro ling of an open-source GPU query engine running commonly used single-query data warehousing workloads, the authors observe that the utilization...

    Provided By VLD Digital

  • White Papers // Jun 2014

    NOMAD: Nonlocking, stOchastic Multimachine algorithm for Asynchronous and Decentralized matrix completion

    The authors develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Ibex - An Intelligent Storage Engine with Support for Advanced SQL Offloading

    Modern data appliances face severe bandwidth bottlenecks when moving vast amounts of data from storage to the query processing nodes. A possible solution to mitigate these bottlenecks is query off-loading to an intelligent storage engine, where partial or whole queries are pushed down to the storage engine. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2014

    ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases

    Lazy replication with Snapshot Isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication often requires execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily...

    Provided By VLD Digital

  • White Papers // Jun 2014

    The Case for Personal Data-Driven Decision Making

    Data-Driven Decision Making (D3M) has shown great promise in professional pursuits such as business and government. Here, policy-makers collect and analyze data to make their operations more efficient and equitable. Progress in bringing the benefits of D3M to everyday life has been slow. For example, a student asks, \"If the...

    Provided By VLD Digital

  • White Papers // May 2014

    WideTable: An Accelerator for Analytical Data Processing

    In this paper the authors present a technique called WideTable that aims to improve the speed of analytical data processing systems. A WideTable is built by denormalizing the database, and then converting complex queries into simple scans on the underlying (wide) table. To avoid the pitfalls associated with denormalization, e.g....

    Provided By VLD Digital

  • White Papers // May 2014

    The Case for Data Visualization Management Systems

    Most visualizations today are produced by retrieving data from a database and using a specialized visualization tool to render it. This decoupled approach results in significant duplication of functionality, such as aggregation and filters, and misses' tremendous opportunities for cross-layer optimizations. In this paper, the authors present the case for...

    Provided By VLD Digital

  • White Papers // May 2014

    On k-Path Covers and their Applications

    The authors introduced the k-all-path cover optimization problem with the goal of computing compact yet faithful synopses of the vertex set of road networks. Their proposed pruning algorithm provides close to optimal results in practice and was experimentally proven to be very efficient on large graphs. For the special subcase...

    Provided By VLD Digital

  • White Papers // May 2014

    From Data Fusion to Knowledge Fusion

    The task of data fusion is to identify the true values of data items (e.g., the true date of birth for Tom Cruise) among multiple observed values drawn from different sources (e.g., Web sites) of varying (and unknown) reliability. A recent survey has provided a detailed comparison of various fusion...

    Provided By VLD Digital

  • White Papers // May 2014

    When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities

    Recently, approximate hardware designs have got many research interests in the computer architecture community. The essential idea of approximate hardware is that the hardware components such as CPU, memory and storage can trade off the accuracy of results for increased performance, reduced energy consumption, or both. The authors propose a...

    Provided By VLD Digital

  • White Papers // May 2014

    Scalable Logging through Emerging NonVolatile Memory

    Emerging byte-addressable, Non-Volatile Memory (NVM) is fundamentally changing the design principle of transaction logging. It potentially invalidates the need for flush-before-commit as log records are persistent immediately upon write. Distributed logging - a once prohibitive technique for single node systems in the DRAM era - becomes a promising solution to...

    Provided By VLD Digital

  • White Papers // May 2014

    Storage Management in AsterixDB

    Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, store, index, and analyze such data efficiently. In response, the authors released the first...

    Provided By VLD Digital

  • White Papers // May 2014

    Workload Matters: Why RDF Databases Need a New Design

    The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable...

    Provided By VLD Digital

  • White Papers // May 2014

    An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems

    There have been several recent proposals for database system architectures that use a deterministic execution frame-work to process transactions. Recent proposals for deterministic database system designs argue that deterministic database systems facilitate replication since the same input can be independently sent to two different replicas without concern for replica divergence....

    Provided By VLD Digital

  • White Papers // May 2014

    M4: A Visualization-Oriented Time Series Data Aggregation

    Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume...

    Provided By VLD Digital

  • White Papers // May 2014

    Reverse k-Ranks Query

    Finding matching customers for a given product based on individual user's preference are critical for many applications, especially in e-commerce. Recently, the reverse top-k query is proposed to return a number of customers who regard a given product as one of the k most favorite products based on a linear...

    Provided By VLD Digital

  • White Papers // Apr 2014

    Incremental Record Linkage

    Record linkage clusters records such that each cluster corresponds to a single distinct real-world entity. It is a crucial step in data cleaning and data integration. In the big data era, the velocity of data updates is often high, quickly making previous linkage results obsolete. This paper presents an end-to-end...

    Provided By VLD Digital

  • White Papers // Apr 2014

    On Arbitrage-free Pricing for General Data Queries

    Data is a commodity. Recent research has considered the mathematical problem of setting prices for different queries over data. Ideal pricing functions need to be flexible - defined for arbitrary queries (select-project-join, aggregate, random sample, and noisy privacy-preserving queries). They should be fine-grained - a consumer should not be required...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories

    Driven by the advance of positioning technology and the popularity of location-sharing services, semantic-enriched trajectory data have become unprecedentedly available. The sequential patterns hidden in such data, when properly defined and extracted, can greatly benefit tasks like targeted advertising and urban planning. Unfortunately, classic sequential pattern mining algorithms developed for...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Towards Building Wind Tunnels for Data Center Design

    Data center design is a tedious and expensive process. Recently, this process has become even more challenging as users of cloud services expect to have guaranteed levels of availability, durability and performance. A new challenge for the service providers is to find the most cost-effective data center design and configuration...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Calibrating Data to Sensitivity in Private Data Analysis

    The authors present an approach to differentially private computation in which one does not scale up the magnitude of noise for challenging queries, but rather scales down the contributions of challenging records. While scaling down all records uniformly is equivalent to scaling up the noise magnitude, they show that scaling...

    Provided By VLD Digital

  • White Papers // Mar 2014

    String Similarity Joins: An Experimental Evaluation

    String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly compared under the same experimental...

    Provided By VLD Digital

  • White Papers // Mar 2014

    An Efficient Publish/Subscribe Index for E-Commerce Databases

    Many of todays publish/subscribe (pub/sub) systems have been designed to cope with a large volume of subscriptions and high event arrival rate (velocity). However, in many novel applications (such as e-commerce), there is an increasing variety of items, each with different attributes. This leads to a very high-dimensional and sparse...

    Provided By VLD Digital

  • White Papers // Mar 2014

    A Principled Approach to Bridging the Gap between Graph Data and their Schemas

    Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Toward Computational Fact-Checking

    In this paper, the authors have shown how to turn fact-checking into a computational problem. Interestingly, by regarding claims as queries with parameters, they can check them - not just for correctness, but more importantly, for more subtle measures of quality - by perturbing their parameters. This observation leads the...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Optimizing Graph Algorithms on Pregel-Like Systems

    The authors study the problem of implementing graph algorithms efficiently on Pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Schemaless and Structureless Graph Querying

    Querying complex graph databases such as knowledge graphs is a challenging task for non-professional users. Due to their complex schemas and variational information descriptions, it becomes very hard for users to formulate a query that can be properly processed by the existing systems. The authors argue that for a user-friendly...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

    Large-scale data analytics have become an integral part of online services, enterprise data management, system management, and scientific applications in order to gain value from huge amounts of collected data. Finding interesting unknown facts and patterns often requires analyzing the full data set instead of applying sampling techniques. Recent approaches...

    Provided By VLD Digital

  • White Papers // Feb 2014

    epiC: an Extensible and Scalable System for Processing Big Data

    The big data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the big data problem are largely based...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Lightweight Indexing of Observational Data in Log-Structured Storage

    Huge amounts of data are being generated by sensing devices every day, recording the status of objects and the environment. Such observational data is widely used in scientific research. As the capabilities of sensors keep improving, the data produced are drastically expanding in precision and quantity, making it a write-intensive...

    Provided By VLD Digital

  • White Papers // Feb 2014

    GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Rank Join Queries in NoSQL Databases

    Cloud stores have become the storage of choice for a large variety of big data producers, consumers, and managers (e.g., Twitter, Facebook, Google, Amazon, etc.) For many modern Big Data applications, RDBMSs were found lacking, particularly with respect to scalability (in terms of number of data items, users, operations per...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Optimizing Graph Algorithms on Pregellike Systems

    The authors study the problem of implementing graph algorithms efficiently on pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Edelweiss: Automatic Storage Reclamation for Distributed Programming

    Event Log Exchange (ELE) is a common programming pattern based on immutable state and messaging. ELE sidesteps traditional challenges in distributed consistency, at the expense of introducing new challenges in designing space reclamation protocols to avoid consuming unbounded storage. The authors introduce Edelweiss, a sublanguage of bloom that provides an...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records

    Identifying records referring to the same real world entity over time enables longitudinal data analysis. However, difficulties arise from the dynamic nature of the world: the entities described by a temporal data set often evolve their states over time. While the state of the art approach to temporal entity matching...

    Provided By VLD Digital

  • White Papers // Jan 2014

    A Provenance Framework for Data-Dependent Process Analysis

    A Data-Dependent Process (DDP) models an application who-se control flow is guided by a finite state machine, as well as by the state of an underlying database. DDPs are commonly found e.g., in e-commerce. In this paper, the authors develop a framework supporting the use of provenance in static (temporal)...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems

    The authors present a vision of next-generation visual analytics ser-vices. They argue that these services should have three related capabilities: support visual and interactive data exploration as they do today, but also suggest relevant data to enrich visualizations, and facilitate the integration and cleaning of that data. Most importantly, they...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Shared Workload Optimization

    As a result of increases in both the query load and the data managed, as well as changes in hardware architecture (multi-core), the last years have seen a shift from query-at-a-time approaches towards Shared Work (SW) systems where queries are executed in groups. Such groups share operators like scans and...

    Provided By VLD Digital

  • White Papers // Dec 2013

    MaaT: Effective and Scalable Coordination of Distributed Transactions in the Cloud

    The past decade has witnessed an increasing adoption of cloud database technology, which provides better scalability, availability, and fault-tolerance via transparent partitioning and replication, and automatic load balancing and fail-over. However, only a small number of cloud databases provide strong consistency guarantees for distributed transactions, despite decades of research on...

    Provided By VLD Digital

  • White Papers // Dec 2013

    A Data and Workload-Aware Algorithm for Range Queries Under Differential Privacy

    The authors describe a new algorithm for answering a given set of range queries under differential privacy which often achieves substantially lower error than competing methods. Their algorithm satisfies differential privacy by adding noise that is adapted to the input data and to the given query set. They first privately...

    Provided By VLD Digital

  • White Papers // Mar 2012

    How to Price Shared Optimizations in the Cloud

    Data-management-as-a-service systems are increasingly being used in collaborative settings, where multiple users access common datasets. Cloud providers have the choice to implement various optimizations, such as indexing or materialized views, to accelerate queries over these datasets. Each optimization carries a cost and may benefit multiple users. This creates a major...

    Provided By VLD Digital

  • White Papers // Oct 2011

    View Selection in Semantic Web Databases

    The authors consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, they address the problem of selecting a set of views to be materialized in the database, minimizing a combination...

    Provided By VLD Digital

  • White Papers // Sep 2011

    Fast Updates on Read-Optimized Databases Using Multi-Core CPUs

    Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process introduces significant overheads and unacceptable downtimes in update intensive systems, aspiring to combine transactional and analytical workloads into one system....

    Provided By VLD Digital

  • White Papers // Apr 2012

    Distributed GraphLab: A Framework for Machine Learning in the Cloud

    While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, the authors introduced the GraphLab...

    Provided By VLD Digital

  • White Papers // Aug 2012

    ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases

    The authors explain the problem of local alignment, which is finding pairs of similar subsequences with gaps. The problem exists in biosequence databases. BLAST is a typical software for finding local alignment based on heuristic, but could miss results. A recent exact approach BWT-SW improves the complexity of the Smith-Waterman...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Automatic Partitioning of Database Applications

    Database-backed applications are nearly ubiquitous in user's daily lives. Applications that make many small accesses to the database create two challenges for developers: increased latency and wasted resources from numerous network round trips. A well-known technique to improve transactional database application performance is to convert part of the application into...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Compacting Transactional Data in Hybrid OLTP & OLAP Databases

    Growing main memory sizes have facilitated database management systems that keep the entire database in main memory. The drastic performance improvements that came along with these in-memory systems have made it possible to reunite the two areas of OnLine Transaction Processing (OLTP) and OnLine Analytical Processing (OLAP): An emerging class...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Probabilistic Databases with MarkoViews

    Most of the work on query evaluation in probabilistic databases has focused on the simple tuple-independent data model, where tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposition and a variety of approximation algorithms. However, complex...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Optimal Algorithms for Crawling a Hidden Database in the Web

    A hidden database refers to a dataset that an organization makes accessible on the web by allowing users to issue queries through a search interface. In other words, data acquisition from such a source is not by following static hyper-links. Instead, data are obtained by querying the interface, and reading...

    Provided By VLD Digital

  • White Papers // Jun 2012

    Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

    Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for diskbased systems where I/O dominated the performance. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2012

    LogBase: A Scalable Log-Structured Database System in the Cloud

    Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Writeahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log...

    Provided By VLD Digital

  • White Papers // May 2012

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in...

    Provided By VLD Digital

  • White Papers // Aug 2012

    The Vertica Analytic Database: C-Store 7 Years Later

    This paper describes the system architecture of the Vertica Analytic Database (Vertica), a commercialization of the design of the C-Store research prototype. Vertica demonstrates a modern commercial RDBMS system that presents a classical relational interface while at the same time achieving the high performance expected from modern "Web scale" analytic...

    Provided By VLD Digital

  • White Papers // Aug 2012

    A Storage Advisor for Hybrid-Store Databases

    With the SAP HANA database, SAP offers a high-performance in-memory hybrid-store database. Hybrid-store databases - that is, databases supporting row- and column-oriented data management - are getting more and more prominent. While the columnar management offers high-performance capabilities for analyzing large quantities of data, the row-oriented store can handle transactional...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Mining Frequent Itemsets Over Uncertain Databases

    In recent years, due to the wide applications of uncertain data, mining frequent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. Thus, unlike the corresponding problem in deterministic databases...

    Provided By VLD Digital

  • White Papers // Jan 2012

    Aggregation in Probabilistic Databases Via Knowledge Compilation

    This paper presents a query evaluation technique for positive relational algebra queries with aggregates on a representation system for probabilistic data based on the algebraic structures of semiring and semimodule. The core of the authors' evaluation technique is a procedure that compiles semimodule and semiring expressions into so-called decomposition trees,...

    Provided By VLD Digital

  • White Papers // Dec 2011

    Putting Lipstick on Pig: Enabling Database-Style Workflow Provenance

    Workflow provenance typically assumes that each module is a "Black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of...

    Provided By VLD Digital

  • White Papers // Dec 2011

    High-Performance Concurrency Control Mechanisms for Main-Memory Databases

    A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper, the authors introduce two efficient concurrency control methods specifically designed for main-memory...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Definition, Detection, and Recovery of Single-Page Failures, a Fourth Class of Database Failures

    The three traditional failure classes are system, media, and transaction failures. Sometimes, however, modern storage exhibits failures that differ from all of those. In order to capture and describe such cases, single-page failures are introduced as a fourth failure class. This class encompasses all failures to read a data page...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Pushing the Boundaries of Crowd-Enabled Databases with Query-Driven Schema Expansion

    By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, the authors extend crowd-enabled databases by flexible query-driven schema expansion,...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores

    Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: they have little idle system time to devote on physical design; and there is little, if any, a priori workload knowledge, while the query and data workload keeps changing...

    Provided By VLD Digital

  • White Papers // Jan 2012

    Relation Strength-Aware Clustering of Heterogeneous Information Networks With Incomplete Attributes

    With the rapid emergence of online social media, online shopping sites and cyber-physical systems, it has become possible to model many forms of interconnected networks as heterogeneous information networks in which objects (i.e., nodes) are of different types, and links among objects correspond to different relations, denoting different interaction semantics....

    Provided By VLD Digital

  • White Papers // Sep 2010

    Data-Oriented Transaction Execution

    While hardware technology has undergone major advancements over the past decade, transaction processing systems have remained largely unchanged. The number of cores on a chip grows exponentially, following Moore's Law, allowing for an ever-increasing number of transactions to execute in parallel. As the number of concurrently-executing transactions increases, contended critical...

    Provided By VLD Digital

  • White Papers // Feb 2013

    Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration

    The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified...

    Provided By VLD Digital

  • White Papers // Sep 2011

    HIWAS: Enabling Technology for Analysis of Clinical Data in XML Documents

    The information contained in large collections of clinical data can be used for many valuable purposes, such as epidemiological studies, evidence-based medicine, monitoring compliance with best clinical practices, and cost-benefit analyses. However, the emerging standards for the electronic representation of clinical data, such as the Clinical Document Architectures (CDAs), are...

    Provided By VLD Digital

  • White Papers // Sep 2011

    Inspector Gadget: A Framework for Custom Monitoring and Debugging of Distributed Dataflows

    The authors consider how to monitor and debug query processing dataflows, in distributed environments such as Pig/Hadoop. Their work is motivated by a series of informal user interviews, which revealed that monitoring and debugging needs are both pressing and diverse. In response to these interviews, they created a framework for...

    Provided By VLD Digital

  • White Papers // Sep 2011

    Consistent Synchronization Schemes for Workload Replay

    Oracle Database Replay has been recently introduced in Oracle 11g as a novel tool to test relational database systems. It involves recording the workload running on the database server in a production system, and subsequently replaying it on the database server in a test system. A key feature of workload...

    Provided By VLD Digital

  • White Papers // Sep 2011

    AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables

    The authors present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, they map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust framework...

    Provided By VLD Digital

  • White Papers // Sep 2011

    UpStream: A Storage-centric Load Management System for Real-time Update Streams

    UpStream is a framework for load management over data streams with update semantics. It provides a novel storage manager architecture that can be plugged into data stream processing engines for serving streaming applications that require low-staleness results over real-time continuous queries. The authors propose to demonstrate the key aspects of...

    Provided By VLD Digital

  • White Papers // Sep 2011

    Spicy: An OpenSource Tool for Second-Generation Schema Mapping and Data Exchange

    Recent results in schema-mapping and data-exchange re-search may be considered the starting point for a new generation of systems, capable of dealing with a significantly larger class of applications. In this paper, the authors demonstrate the first of these second-generation systems, called ++Spicy. They introduce a number of scenarios from...

    Provided By VLD Digital

  • White Papers // Sep 2011

    A Demonstration of HYRISE- A Main Memory Hybrid Storage Engine

    The authors propose to demonstrate HYRISE, a main memory hybrid database system, which automatically partitions tables into vertical partitions consisting of variable numbers of columns based on access patterns to each table. Using an accurate model of cache misses, HYRISE is able to predict the performance of different partitioning, and...

    Provided By VLD Digital

  • White Papers // Sep 2011

    FuDoCS: A Web Service Composition System Based on Fuzzy Dominance for Preference Query Answering

    Modern enterprises are increasingly moving towards a service oriented architecture for data sharing by putting their data sources behind services, thereby providing an interoperable way to interact with their data. This class of services is known as DaaS (Data-as-a-Service) services. DaaS Composition is a powerful solution to answer the user's...

    Provided By VLD Digital

  • White Papers // Sep 2011

    From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra

    MapReduce-based data processing platforms offer a promising approach for cost-effective and Web-scale processing of Semantic Web data. However, one major challenge is that this computational paradigm leads to high I/O and communication costs when processing tasks with several join operations typical in SPARQL queries. The goal of this demonstration is...

    Provided By VLD Digital

  • White Papers // Sep 2011

    InfoNetOLAPer: Integrating InfoNetWarehouse and InfoNetCube with InfoNetOLAP

    Graph OLAP operations provide multi-dimensional and multilevel view of Information Networks (InfoNetworks) and thus have received growing research interests. With the continuous accumulation and increasing prevalence of Information Networks, OLAP and mining of InfoNetworks have become one of the new research frontiers. To support efficient graph OLAP operations on information...

    Provided By VLD Digital

  • White Papers // Sep 2011

    DataSynth: Generating Synthetic Data using Declarative Constraints

    A variety of scenarios such as database system and application testing, data masking, and benchmarking require synthetic database instances, often having complex data characteristics. The authors present DataSynth, a flexible tool for generating synthetic databases. DataSynth uses a simple and powerful declarative abstraction based on cardinality constraints to specify data...

    Provided By VLD Digital

  • White Papers // Sep 2011

    EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples

    One of the first steps in the process of integrating information from multiple sources into a desired target format is to specify the relationships, called schema mappings, between the source schemas and the target schema. In this paper, the authors showcase a new methodology for designing schema mappings. Their system...

    Provided By VLD Digital

  • White Papers // Sep 2011

    Whodunit: An Auditing Tool for Detecting Data Breaches

    Commercial database systems provide support to maintain an audit trail that can be analyzed offline to identify potential threats to data security. The authors present a tool that performs data auditing that asks for an audit trail of all users and queries that referenced sensitive data, for example "Find all...

    Provided By VLD Digital

  • White Papers // Sep 2011

    Automatic Workload Driven Index Defragmentation

    Queries that scan a B-Tree index can suffer significant I/O performance degradation due to index fragmentation. The task of determining if an index should be defragmented is challenging for DataBase Administrators (DBAs) since today's database engines offer no support for quantifying the impact of defragmenting an index on query I/O...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Verifying Computations with Streaming Interactive Proofs

    When computation is outsourced, the data owner would like to be assured that the desired computation has been performed correctly by the service provider. In theory, proof systems can give the necessary assurance, but prior work is not sufficiently scalable or practical. In this paper, the authors develop new proof...

    Provided By VLD Digital

  • White Papers // Sep 2011

    Efficient Rank Join with Aggregation Constraints

    The authors show aggregation constraints that naturally arise in several applications can enrich the semantics of rank join queries, by allowing users to impose their application-specific preferences in a declarative way. By analyzing the properties of aggregation constraints, they develop efficient deterministic and probabilistic algorithms which can push the aggregation...

    Provided By VLD Digital