VLD Digital

Displaying 1-40 of 514 results

  • White Papers // Jun 2014

    Concurrent Analytical Query Processing with GPUs

    In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the pro ling of an open-source GPU query engine running commonly used single-query data warehousing workloads, the authors observe that the utilization...

    Provided By VLD Digital

  • White Papers // Jun 2014

    NOMAD: Nonlocking, stOchastic Multimachine algorithm for Asynchronous and Decentralized matrix completion

    The authors develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Ibex - An Intelligent Storage Engine with Support for Advanced SQL Offloading

    Modern data appliances face severe bandwidth bottlenecks when moving vast amounts of data from storage to the query processing nodes. A possible solution to mitigate these bottlenecks is query off-loading to an intelligent storage engine, where partial or whole queries are pushed down to the storage engine. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2014

    ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases

    Lazy replication with Snapshot Isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication often requires execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily...

    Provided By VLD Digital

  • White Papers // Jun 2014

    The Case for Personal Data-Driven Decision Making

    Data-Driven Decision Making (D3M) has shown great promise in professional pursuits such as business and government. Here, policy-makers collect and analyze data to make their operations more efficient and equitable. Progress in bringing the benefits of D3M to everyday life has been slow. For example, a student asks, \"If the...

    Provided By VLD Digital

  • White Papers // May 2014

    WideTable: An Accelerator for Analytical Data Processing

    In this paper the authors present a technique called WideTable that aims to improve the speed of analytical data processing systems. A WideTable is built by denormalizing the database, and then converting complex queries into simple scans on the underlying (wide) table. To avoid the pitfalls associated with denormalization, e.g....

    Provided By VLD Digital

  • White Papers // May 2014

    The Case for Data Visualization Management Systems

    Most visualizations today are produced by retrieving data from a database and using a specialized visualization tool to render it. This decoupled approach results in significant duplication of functionality, such as aggregation and filters, and misses' tremendous opportunities for cross-layer optimizations. In this paper, the authors present the case for...

    Provided By VLD Digital

  • White Papers // May 2014

    On k-Path Covers and their Applications

    The authors introduced the k-all-path cover optimization problem with the goal of computing compact yet faithful synopses of the vertex set of road networks. Their proposed pruning algorithm provides close to optimal results in practice and was experimentally proven to be very efficient on large graphs. For the special subcase...

    Provided By VLD Digital

  • White Papers // May 2014

    From Data Fusion to Knowledge Fusion

    The task of data fusion is to identify the true values of data items (e.g., the true date of birth for Tom Cruise) among multiple observed values drawn from different sources (e.g., Web sites) of varying (and unknown) reliability. A recent survey has provided a detailed comparison of various fusion...

    Provided By VLD Digital

  • White Papers // May 2014

    When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities

    Recently, approximate hardware designs have got many research interests in the computer architecture community. The essential idea of approximate hardware is that the hardware components such as CPU, memory and storage can trade off the accuracy of results for increased performance, reduced energy consumption, or both. The authors propose a...

    Provided By VLD Digital

  • White Papers // May 2014

    Scalable Logging through Emerging NonVolatile Memory

    Emerging byte-addressable, Non-Volatile Memory (NVM) is fundamentally changing the design principle of transaction logging. It potentially invalidates the need for flush-before-commit as log records are persistent immediately upon write. Distributed logging - a once prohibitive technique for single node systems in the DRAM era - becomes a promising solution to...

    Provided By VLD Digital

  • White Papers // May 2014

    Storage Management in AsterixDB

    Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, store, index, and analyze such data efficiently. In response, the authors released the first...

    Provided By VLD Digital

  • White Papers // May 2014

    Workload Matters: Why RDF Databases Need a New Design

    The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable...

    Provided By VLD Digital

  • White Papers // May 2014

    An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems

    There have been several recent proposals for database system architectures that use a deterministic execution frame-work to process transactions. Recent proposals for deterministic database system designs argue that deterministic database systems facilitate replication since the same input can be independently sent to two different replicas without concern for replica divergence....

    Provided By VLD Digital

  • White Papers // May 2014

    M4: A Visualization-Oriented Time Series Data Aggregation

    Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume...

    Provided By VLD Digital

  • White Papers // May 2014

    Reverse k-Ranks Query

    Finding matching customers for a given product based on individual user's preference are critical for many applications, especially in e-commerce. Recently, the reverse top-k query is proposed to return a number of customers who regard a given product as one of the k most favorite products based on a linear...

    Provided By VLD Digital

  • White Papers // Apr 2014

    Incremental Record Linkage

    Record linkage clusters records such that each cluster corresponds to a single distinct real-world entity. It is a crucial step in data cleaning and data integration. In the big data era, the velocity of data updates is often high, quickly making previous linkage results obsolete. This paper presents an end-to-end...

    Provided By VLD Digital

  • White Papers // Apr 2014

    On Arbitrage-free Pricing for General Data Queries

    Data is a commodity. Recent research has considered the mathematical problem of setting prices for different queries over data. Ideal pricing functions need to be flexible - defined for arbitrary queries (select-project-join, aggregate, random sample, and noisy privacy-preserving queries). They should be fine-grained - a consumer should not be required...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories

    Driven by the advance of positioning technology and the popularity of location-sharing services, semantic-enriched trajectory data have become unprecedentedly available. The sequential patterns hidden in such data, when properly defined and extracted, can greatly benefit tasks like targeted advertising and urban planning. Unfortunately, classic sequential pattern mining algorithms developed for...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Towards Building Wind Tunnels for Data Center Design

    Data center design is a tedious and expensive process. Recently, this process has become even more challenging as users of cloud services expect to have guaranteed levels of availability, durability and performance. A new challenge for the service providers is to find the most cost-effective data center design and configuration...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Calibrating Data to Sensitivity in Private Data Analysis

    The authors present an approach to differentially private computation in which one does not scale up the magnitude of noise for challenging queries, but rather scales down the contributions of challenging records. While scaling down all records uniformly is equivalent to scaling up the noise magnitude, they show that scaling...

    Provided By VLD Digital

  • White Papers // Mar 2014

    String Similarity Joins: An Experimental Evaluation

    String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly compared under the same experimental...

    Provided By VLD Digital

  • White Papers // Mar 2014

    An Efficient Publish/Subscribe Index for E-Commerce Databases

    Many of todays publish/subscribe (pub/sub) systems have been designed to cope with a large volume of subscriptions and high event arrival rate (velocity). However, in many novel applications (such as e-commerce), there is an increasing variety of items, each with different attributes. This leads to a very high-dimensional and sparse...

    Provided By VLD Digital

  • White Papers // Mar 2014

    A Principled Approach to Bridging the Gap between Graph Data and their Schemas

    Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to...

    Provided By VLD Digital

  • White Papers // Feb 2014

    epiC: an Extensible and Scalable System for Processing Big Data

    The big data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the big data problem are largely based...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Lightweight Indexing of Observational Data in Log-Structured Storage

    Huge amounts of data are being generated by sensing devices every day, recording the status of objects and the environment. Such observational data is widely used in scientific research. As the capabilities of sensors keep improving, the data produced are drastically expanding in precision and quantity, making it a write-intensive...

    Provided By VLD Digital

  • White Papers // Feb 2014

    GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Rank Join Queries in NoSQL Databases

    Cloud stores have become the storage of choice for a large variety of big data producers, consumers, and managers (e.g., Twitter, Facebook, Google, Amazon, etc.) For many modern Big Data applications, RDBMSs were found lacking, particularly with respect to scalability (in terms of number of data items, users, operations per...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Toward Computational Fact-Checking

    In this paper, the authors have shown how to turn fact-checking into a computational problem. Interestingly, by regarding claims as queries with parameters, they can check them - not just for correctness, but more importantly, for more subtle measures of quality - by perturbing their parameters. This observation leads the...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Optimizing Graph Algorithms on Pregel-Like Systems

    The authors study the problem of implementing graph algorithms efficiently on Pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Schemaless and Structureless Graph Querying

    Querying complex graph databases such as knowledge graphs is a challenging task for non-professional users. Due to their complex schemas and variational information descriptions, it becomes very hard for users to formulate a query that can be properly processed by the existing systems. The authors argue that for a user-friendly...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

    Large-scale data analytics have become an integral part of online services, enterprise data management, system management, and scientific applications in order to gain value from huge amounts of collected data. Finding interesting unknown facts and patterns often requires analyzing the full data set instead of applying sampling techniques. Recent approaches...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Optimizing Graph Algorithms on Pregellike Systems

    The authors study the problem of implementing graph algorithms efficiently on pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records

    Identifying records referring to the same real world entity over time enables longitudinal data analysis. However, difficulties arise from the dynamic nature of the world: the entities described by a temporal data set often evolve their states over time. While the state of the art approach to temporal entity matching...

    Provided By VLD Digital

  • White Papers // Jan 2014

    A Provenance Framework for Data-Dependent Process Analysis

    A Data-Dependent Process (DDP) models an application who-se control flow is guided by a finite state machine, as well as by the state of an underlying database. DDPs are commonly found e.g., in e-commerce. In this paper, the authors develop a framework supporting the use of provenance in static (temporal)...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems

    The authors present a vision of next-generation visual analytics ser-vices. They argue that these services should have three related capabilities: support visual and interactive data exploration as they do today, but also suggest relevant data to enrich visualizations, and facilitate the integration and cleaning of that data. Most importantly, they...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Shared Workload Optimization

    As a result of increases in both the query load and the data managed, as well as changes in hardware architecture (multi-core), the last years have seen a shift from query-at-a-time approaches towards Shared Work (SW) systems where queries are executed in groups. Such groups share operators like scans and...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Edelweiss: Automatic Storage Reclamation for Distributed Programming

    Event Log Exchange (ELE) is a common programming pattern based on immutable state and messaging. ELE sidesteps traditional challenges in distributed consistency, at the expense of introducing new challenges in designing space reclamation protocols to avoid consuming unbounded storage. The authors introduce Edelweiss, a sublanguage of bloom that provides an...

    Provided By VLD Digital

  • White Papers // Dec 2013

    MaaT: Effective and Scalable Coordination of Distributed Transactions in the Cloud

    The past decade has witnessed an increasing adoption of cloud database technology, which provides better scalability, availability, and fault-tolerance via transparent partitioning and replication, and automatic load balancing and fail-over. However, only a small number of cloud databases provide strong consistency guarantees for distributed transactions, despite decades of research on...

    Provided By VLD Digital

  • White Papers // Dec 2013

    A Data and Workload-Aware Algorithm for Range Queries Under Differential Privacy

    The authors describe a new algorithm for answering a given set of range queries under differential privacy which often achieves substantially lower error than competing methods. Their algorithm satisfies differential privacy by adding noise that is adapted to the input data and to the given query set. They first privately...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Adaptive and Big Data Scale Parallel Execution in Oracle

    In this paper the authors showcase some of the newly introduced parallel execution methods in Oracle RDBMS. These methods provide highly scalable and adaptive evaluation for the most commonly used SQL operations - joins, group-by, rollup/cube, grouping sets, and window functions. The novelty of these techniques is their use of...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Overview of Turn Data Management Platform for Digital Advertising

    In this paper the authors give an overview of turn Data Management Platform (DMP). They explain the purpose of this type of platforms, and show how it is positioned in the current digital advertising ecosystem. They also provide a detailed description of the key components in turn DMP. These components...

    Provided By VLD Digital

  • White Papers // Sep 2013

    A New Service for Customer Care Based on the TrentoRise BigData Platform

    The internet of the web 2.0 era has radically changed the way people interact and behave in business and everyday life. It has been a profound revolution, affecting the way people conduct business and interact with society. Social applications have radically changed the way people interact by creating new behavioral...

    Provided By VLD Digital

  • White Papers // Aug 2013

    The Trento Big Data Platform for Public Administration and Large Companies: Use Cases and Opportunities

    Data analysis is used to drive almost every aspect of the authors' modern society, including mobile services, retail manufacturing, financial services, life sciences, and physical sciences. Novel approaches are required to extract value from such data without omitting the opportunities enabling service innovation design. TrentoRISE, in collaboration with the dbTrento...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Designing Query Optimizers for Big Data Problems of The Future

    The Vertica analytic database (Vertica) is a modern, commercially successful RDBMS. It contains a SQL query optimizer, written from scratch, especially for the Vertica storage system and execution engine. The authors wrote their own optimizer, despite a countervailing industry trend to reuse or wrap existing optimizers in new database systems....

    Provided By VLD Digital

  • White Papers // Aug 2013

    How to Maximize the Value of Big Data with the Open Source SpagoBI Suite Through a Comprehensive Approach

    Managing and analyzing Big Data is one of the most interesting challenges that new technologies have to face nowadays. Big Data does not only refer to different kinds of sources. It also means managing structured and unstructured data, real-time data streams, as well as semantic ontologies applied to data. Specifically,...

    Provided By VLD Digital

  • White Papers // Sep 2013

    Microsoft SQL Server's Integrated Database Approach for Modern Applications and Hardware

    Recently, there has been much renewed interest in re-architecting database systems to exploit new hardware. While some efforts have suggested that one needs specialized engines (\"One size does not fit all\"), the approach pursued by Microsoft's SQL server has been to integrate multiple elements into a common architecture. This brings...

    Provided By VLD Digital

  • White Papers // Aug 2013

    SAP HANA: The Evolution from a Modern Main-Memory Data Platform to an Enterprise Application Platform

    SAP HANA is a pioneering, and one of the best performing, data platform designed from the grounds up to heavily exploit modern hardware capabilities, including SIMD, and large memory and CPU footprints. As a comprehensive data management solution, SAP HANA supports the complete data life cycle encompassing modeling, provisioning, and...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Big Data Integration

    The big data era is upon the people: data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through society. Since the value of data explodes when it can be linked and fused with other data, addressing the Big Data Integration (BDI) challenge...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Towards Database Virtualization for Database as a Service

    Advances in operating system and storage-level virtualization technologies have enabled the effective consolidation of heterogeneous applications in a shared cloud infrastructure. Novel research challenges arising from this new shared environment include load balancing, workload estimation, resource isolation, machine replication, live migration, and an emergent need of automation to handle large...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Mobility and Social Networking: A Data Management Perspective

    Online social networks, such as Facebook and Twitter have become very popular in the past decade. This paper presents the state-of-the-art research that lies at the intersection of two hot topics in the data management community: social networking and mobility. In this paper, the authors give an overview of existing...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Senbazuru: A Prototype Spreadsheet Database Management System

    Spreadsheets have become a critical data management tool, but they lack explicit relational metadata, making it difficult to join or integrate data across multiple spreadsheets. Because spreadsheet data are widely available on a huge range of topics, a tool that allows easy spreadsheet integration would be hugely beneficial for a...

    Provided By VLD Digital

  • White Papers // Aug 2013

    DesTeller: A System for Destination Prediction Based on Trajectories with Privacy Protection

    Destination prediction is an essential task for a number of emerging location based applications such as recommending sightseeing places and sending targeted advertisements. A common approach to destination prediction is to derive the probability of a location being the destination based on historical trajectories. However, existing techniques suffer from the...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Functions Are Data Too

    The authors demonstrate a full-edged implementation of first-class functions for the widely used PL/SQL database programming language. Functions are treated as regular data items that may be constructed at query runtime, stored in and retrieved from tables, assigned to variables, and passed to and from other (higher-order) functions. The resulting...

    Provided By VLD Digital

  • White Papers // Aug 2013

    SPARSI: Partitioning Sensitive Data amongst Multiple Adversaries

    The authors present SPARSI, a novel theoretical framework for partitioning sensitive data across multiple non-colluding adversaries. Most paper in privacy-aware data sharing has considered disclosing summaries where the aggregate information about the data is preserved, but sensitive user information is protected. Nonetheless, there are applications, including online advertising, cloud computing...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Synthetising Changes in XML Documents as PULs

    The ability of efficiently detecting changes in XML documents is crucial in many application contexts. If such changes are represented as XQuery update Pending Update Lists (PULs), they can then be applied on documents using XQuery update engines, and document management can take advantage of existing composition, inversion, reconciliation approaches...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Discovering Longest-lasting Correlation in Sequence Databases

    Most existing paper on sequence databases use correlation (e.g., Euclidean distance and Pearson correlation) as a core function for various analytical tasks. Typically, it requires users to set a length for the similarity queries. However, there is no steady way to define the proper length on different application needs. In...

    Provided By VLD Digital

  • White Papers // Aug 2013

    PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics

    Machine learning algorithms are widely used today for analytical tasks such as data cleaning, data categorization, or data filtering. At the same time, the rise of social media motivates recent uptake in large scale graph processing. Both categories of algorithms are dominated by iterative subtasks, i.e., processing steps which are...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Instant Loading for Main Memory Databases

    The eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Scalable Progressive Analytics on Big Data in the Cloud

    Analytics over the increasing quantity of data stored in the cloud has become very expensive, particularly due to the pay-as-the user-go cloud computation model. Data scientists typically manually extract samples of increasing data size (progressive samples) using domain-specific sampling strategies for exploratory querying. This provides them with user-control, repeatable semantics,...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Scalable XML Query Processing Using Parallel Pushdown Transducers

    In online social networking, network monitoring and financial applications, there is a need to query high rate streams of XML data, but methods for executing individual XPath queries on streaming XML data have not kept pace with multi-core CPUs. For data-parallel processing, a single XML stream is typically split into...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Supporting Keyword Search in Product Database: A Probabilistic Approach

    The ability to let users search for products conveniently in product database is critical to the success of e-commerce. Although structured query languages (e.g. SQL) can be used to effectively access the product database, it is very difficult for end users to learn and use. In this paper, the authors...

    Provided By VLD Digital

  • White Papers // Aug 2013

    A Temporal Probabilistic Database Model for Information Extraction

    Temporal annotations of facts are a key component both for building a high-accuracy knowledge base and for answering queries over the resulting temporal knowledge base with high precision and recall. In this paper, the authors present a temporal-probabilistic database model for cleaning uncertain temporal facts obtained from information extraction methods....

    Provided By VLD Digital

  • White Papers // Aug 2013

    Anti-Caching: A New Approach to Database Management System Architecture

    The traditional wisdom for building disk-based relational DataBase Management Systems (DBMS) is to organize data in heavily-encoded blocks stored on disk, with a main memory block cache. In order to improve performance given high disk latency, these systems use a multi-threaded architecture with dynamic record-level locking that allows multiple transactions...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Aggregation and Ordering in Factorised Databases

    A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier paper, the authors put forward a succinct representation system for relational data called factorized databases and reported on the main-memory query engine FDB for select-project-join queries on such databases. In this paper, they extend...

    Provided By VLD Digital

  • White Papers // Aug 2013

    A Data-adaptive and Dynamic Segmentation Index for Whole Matching on Time Series

    Similarity search on time series is an essential operation in many applications. In the state-of-the-art methods, such as the R-tree based methods, SAX and iSAX, time series are by default divided into equi-length segments globally, that is, all time series are segmented in the same way. Those methods then focus...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Sharing Data and Work Across Concurrent Analytical Queries

    Today's data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional Data Warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for...

    Provided By VLD Digital

  • White Papers // Aug 2012

    PIQL: SuccessTolerant Query Processing in the Cloud

    Newly-released web applications often succumb to a \"Success Disaster,\" where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under...

    Provided By VLD Digital

  • White Papers // May 2012

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Definition, Detection, and Recovery of Single-Page Failures, a Fourth Class of Database Failures

    The three traditional failure classes are system, media, and transaction failures. Sometimes, however, modern storage exhibits failures that differ from all of those. In order to capture and describe such cases, single-page failures are introduced as a fourth failure class. This class encompasses all failures to read a data page...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Pushing the Boundaries of Crowd-Enabled Databases with Query-Driven Schema Expansion

    By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, the authors extend crowd-enabled databases by flexible query-driven schema expansion,...

    Provided By VLD Digital

  • White Papers // Mar 2012

    Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores

    Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: they have little idle system time to devote on physical design; and there is little, if any, a priori workload knowledge, while the query and data workload keeps changing...

    Provided By VLD Digital

  • White Papers // Jan 2012

    Aggregation in Probabilistic Databases Via Knowledge Compilation

    This paper presents a query evaluation technique for positive relational algebra queries with aggregates on a representation system for probabilistic data based on the algebraic structures of semiring and semimodule. The core of the authors' evaluation technique is a procedure that compiles semimodule and semiring expressions into so-called decomposition trees,...

    Provided By VLD Digital

  • White Papers // Dec 2011

    Putting Lipstick on Pig: Enabling Database-Style Workflow Provenance

    Workflow provenance typically assumes that each module is a "Black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of...

    Provided By VLD Digital

  • White Papers // Dec 2011

    High-Performance Concurrency Control Mechanisms for Main-Memory Databases

    A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper, the authors introduce two efficient concurrency control methods specifically designed for main-memory...

    Provided By VLD Digital

  • White Papers // Oct 2011

    View Selection in Semantic Web Databases

    The authors consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, they address the problem of selecting a set of views to be materialized in the database, minimizing a combination...

    Provided By VLD Digital

  • White Papers // Sep 2011

    Fast Updates on Read-Optimized Databases Using Multi-Core CPUs

    Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process introduces significant overheads and unacceptable downtimes in update intensive systems, aspiring to combine transactional and analytical workloads into one system....

    Provided By VLD Digital

  • White Papers // Apr 2012

    Distributed GraphLab: A Framework for Machine Learning in the Cloud

    While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, the authors introduced the GraphLab...

    Provided By VLD Digital

  • White Papers // Mar 2012

    How to Price Shared Optimizations in the Cloud

    Data-management-as-a-service systems are increasingly being used in collaborative settings, where multiple users access common datasets. Cloud providers have the choice to implement various optimizations, such as indexing or materialized views, to accelerate queries over these datasets. Each optimization carries a cost and may benefit multiple users. This creates a major...

    Provided By VLD Digital

  • White Papers // Aug 2013

    NADEEF: A Generalized Data Cleaning System

    Real-world data is dirty: more than 25% of critical data in the world's top companies is flawed. The authors present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify...

    Provided By VLD Digital