VLD Digital

Displaying 1-40 of 514 results

  • White Papers // Jun 2014

    ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases

    Lazy replication with Snapshot Isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication often requires execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily...

    Provided By VLD Digital

  • White Papers // Jun 2014

    The Case for Personal Data-Driven Decision Making

    Data-Driven Decision Making (D3M) has shown great promise in professional pursuits such as business and government. Here, policy-makers collect and analyze data to make their operations more efficient and equitable. Progress in bringing the benefits of D3M to everyday life has been slow. For example, a student asks, \"If the...

    Provided By VLD Digital

  • White Papers // Jun 2014

    NOMAD: Nonlocking, stOchastic Multimachine algorithm for Asynchronous and Decentralized matrix completion

    The authors develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Ibex - An Intelligent Storage Engine with Support for Advanced SQL Offloading

    Modern data appliances face severe bandwidth bottlenecks when moving vast amounts of data from storage to the query processing nodes. A possible solution to mitigate these bottlenecks is query off-loading to an intelligent storage engine, where partial or whole queries are pushed down to the storage engine. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Concurrent Analytical Query Processing with GPUs

    In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the pro ling of an open-source GPU query engine running commonly used single-query data warehousing workloads, the authors observe that the utilization...

    Provided By VLD Digital

  • White Papers // May 2014

    When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities

    Recently, approximate hardware designs have got many research interests in the computer architecture community. The essential idea of approximate hardware is that the hardware components such as CPU, memory and storage can trade off the accuracy of results for increased performance, reduced energy consumption, or both. The authors propose a...

    Provided By VLD Digital

  • White Papers // May 2014

    Workload Matters: Why RDF Databases Need a New Design

    The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable...

    Provided By VLD Digital

  • White Papers // May 2014

    Storage Management in AsterixDB

    Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, store, index, and analyze such data efficiently. In response, the authors released the first...

    Provided By VLD Digital

  • White Papers // May 2014

    M4: A Visualization-Oriented Time Series Data Aggregation

    Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume...

    Provided By VLD Digital

  • White Papers // May 2014

    An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems

    There have been several recent proposals for database system architectures that use a deterministic execution frame-work to process transactions. Recent proposals for deterministic database system designs argue that deterministic database systems facilitate replication since the same input can be independently sent to two different replicas without concern for replica divergence....

    Provided By VLD Digital

  • White Papers // May 2014

    Reverse k-Ranks Query

    Finding matching customers for a given product based on individual user's preference are critical for many applications, especially in e-commerce. Recently, the reverse top-k query is proposed to return a number of customers who regard a given product as one of the k most favorite products based on a linear...

    Provided By VLD Digital

  • White Papers // May 2014

    From Data Fusion to Knowledge Fusion

    The task of data fusion is to identify the true values of data items (e.g., the true date of birth for Tom Cruise) among multiple observed values drawn from different sources (e.g., Web sites) of varying (and unknown) reliability. A recent survey has provided a detailed comparison of various fusion...

    Provided By VLD Digital

  • White Papers // May 2014

    On k-Path Covers and their Applications

    The authors introduced the k-all-path cover optimization problem with the goal of computing compact yet faithful synopses of the vertex set of road networks. Their proposed pruning algorithm provides close to optimal results in practice and was experimentally proven to be very efficient on large graphs. For the special subcase...

    Provided By VLD Digital

  • White Papers // May 2014

    Scalable Logging through Emerging NonVolatile Memory

    Emerging byte-addressable, Non-Volatile Memory (NVM) is fundamentally changing the design principle of transaction logging. It potentially invalidates the need for flush-before-commit as log records are persistent immediately upon write. Distributed logging - a once prohibitive technique for single node systems in the DRAM era - becomes a promising solution to...

    Provided By VLD Digital

  • White Papers // May 2014

    WideTable: An Accelerator for Analytical Data Processing

    In this paper the authors present a technique called WideTable that aims to improve the speed of analytical data processing systems. A WideTable is built by denormalizing the database, and then converting complex queries into simple scans on the underlying (wide) table. To avoid the pitfalls associated with denormalization, e.g....

    Provided By VLD Digital

  • White Papers // May 2014

    The Case for Data Visualization Management Systems

    Most visualizations today are produced by retrieving data from a database and using a specialized visualization tool to render it. This decoupled approach results in significant duplication of functionality, such as aggregation and filters, and misses' tremendous opportunities for cross-layer optimizations. In this paper, the authors present the case for...

    Provided By VLD Digital

  • White Papers // Apr 2014

    On Arbitrage-free Pricing for General Data Queries

    Data is a commodity. Recent research has considered the mathematical problem of setting prices for different queries over data. Ideal pricing functions need to be flexible - defined for arbitrary queries (select-project-join, aggregate, random sample, and noisy privacy-preserving queries). They should be fine-grained - a consumer should not be required...

    Provided By VLD Digital

  • White Papers // Apr 2014

    Incremental Record Linkage

    Record linkage clusters records such that each cluster corresponds to a single distinct real-world entity. It is a crucial step in data cleaning and data integration. In the big data era, the velocity of data updates is often high, quickly making previous linkage results obsolete. This paper presents an end-to-end...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories

    Driven by the advance of positioning technology and the popularity of location-sharing services, semantic-enriched trajectory data have become unprecedentedly available. The sequential patterns hidden in such data, when properly defined and extracted, can greatly benefit tasks like targeted advertising and urban planning. Unfortunately, classic sequential pattern mining algorithms developed for...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Towards Building Wind Tunnels for Data Center Design

    Data center design is a tedious and expensive process. Recently, this process has become even more challenging as users of cloud services expect to have guaranteed levels of availability, durability and performance. A new challenge for the service providers is to find the most cost-effective data center design and configuration...

    Provided By VLD Digital

  • White Papers // Mar 2014

    A Principled Approach to Bridging the Gap between Graph Data and their Schemas

    Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to...

    Provided By VLD Digital

  • White Papers // Mar 2014

    String Similarity Joins: An Experimental Evaluation

    String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly compared under the same experimental...

    Provided By VLD Digital

  • White Papers // Mar 2014

    An Efficient Publish/Subscribe Index for E-Commerce Databases

    Many of todays publish/subscribe (pub/sub) systems have been designed to cope with a large volume of subscriptions and high event arrival rate (velocity). However, in many novel applications (such as e-commerce), there is an increasing variety of items, each with different attributes. This leads to a very high-dimensional and sparse...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Calibrating Data to Sensitivity in Private Data Analysis

    The authors present an approach to differentially private computation in which one does not scale up the magnitude of noise for challenging queries, but rather scales down the contributions of challenging records. While scaling down all records uniformly is equivalent to scaling up the noise magnitude, they show that scaling...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Rank Join Queries in NoSQL Databases

    Cloud stores have become the storage of choice for a large variety of big data producers, consumers, and managers (e.g., Twitter, Facebook, Google, Amazon, etc.) For many modern Big Data applications, RDBMSs were found lacking, particularly with respect to scalability (in terms of number of data items, users, operations per...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Lightweight Indexing of Observational Data in Log-Structured Storage

    Huge amounts of data are being generated by sensing devices every day, recording the status of objects and the environment. Such observational data is widely used in scientific research. As the capabilities of sensors keep improving, the data produced are drastically expanding in precision and quantity, making it a write-intensive...

    Provided By VLD Digital

  • White Papers // Feb 2014

    GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

    Large-scale data analytics have become an integral part of online services, enterprise data management, system management, and scientific applications in order to gain value from huge amounts of collected data. Finding interesting unknown facts and patterns often requires analyzing the full data set instead of applying sampling techniques. Recent approaches...

    Provided By VLD Digital

  • White Papers // Feb 2014

    epiC: an Extensible and Scalable System for Processing Big Data

    The big data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the big data problem are largely based...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Optimizing Graph Algorithms on Pregel-Like Systems

    The authors study the problem of implementing graph algorithms efficiently on Pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Schemaless and Structureless Graph Querying

    Querying complex graph databases such as knowledge graphs is a challenging task for non-professional users. Due to their complex schemas and variational information descriptions, it becomes very hard for users to formulate a query that can be properly processed by the existing systems. The authors argue that for a user-friendly...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Toward Computational Fact-Checking

    In this paper, the authors have shown how to turn fact-checking into a computational problem. Interestingly, by regarding claims as queries with parameters, they can check them - not just for correctness, but more importantly, for more subtle measures of quality - by perturbing their parameters. This observation leads the...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Optimizing Graph Algorithms on Pregellike Systems

    The authors study the problem of implementing graph algorithms efficiently on pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Edelweiss: Automatic Storage Reclamation for Distributed Programming

    Event Log Exchange (ELE) is a common programming pattern based on immutable state and messaging. ELE sidesteps traditional challenges in distributed consistency, at the expense of introducing new challenges in designing space reclamation protocols to avoid consuming unbounded storage. The authors introduce Edelweiss, a sublanguage of bloom that provides an...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Shared Workload Optimization

    As a result of increases in both the query load and the data managed, as well as changes in hardware architecture (multi-core), the last years have seen a shift from query-at-a-time approaches towards Shared Work (SW) systems where queries are executed in groups. Such groups share operators like scans and...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems

    The authors present a vision of next-generation visual analytics ser-vices. They argue that these services should have three related capabilities: support visual and interactive data exploration as they do today, but also suggest relevant data to enrich visualizations, and facilitate the integration and cleaning of that data. Most importantly, they...

    Provided By VLD Digital

  • White Papers // Jan 2014

    A Provenance Framework for Data-Dependent Process Analysis

    A Data-Dependent Process (DDP) models an application who-se control flow is guided by a finite state machine, as well as by the state of an underlying database. DDPs are commonly found e.g., in e-commerce. In this paper, the authors develop a framework supporting the use of provenance in static (temporal)...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records

    Identifying records referring to the same real world entity over time enables longitudinal data analysis. However, difficulties arise from the dynamic nature of the world: the entities described by a temporal data set often evolve their states over time. While the state of the art approach to temporal entity matching...

    Provided By VLD Digital

  • White Papers // Dec 2013

    MaaT: Effective and Scalable Coordination of Distributed Transactions in the Cloud

    The past decade has witnessed an increasing adoption of cloud database technology, which provides better scalability, availability, and fault-tolerance via transparent partitioning and replication, and automatic load balancing and fail-over. However, only a small number of cloud databases provide strong consistency guarantees for distributed transactions, despite decades of research on...

    Provided By VLD Digital

  • White Papers // Dec 2013

    A Data and Workload-Aware Algorithm for Range Queries Under Differential Privacy

    The authors describe a new algorithm for answering a given set of range queries under differential privacy which often achieves substantially lower error than competing methods. Their algorithm satisfies differential privacy by adding noise that is adapted to the input data and to the given query set. They first privately...

    Provided By VLD Digital

  • White Papers // Feb 2013

    Joint Entity Resolution on Multiple Datasets

    Entity Resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions and venues), and resolving records of one type can impact the resolution of other types of records. In this paper the authors...

    Provided By VLD Digital

  • White Papers // Sep 2011

    HyperLocal, Directions-Based Ranking of Places

    Studies find that at least 20% of web queries have local intent; and the fraction of queries with local intent that originate from mobile properties may be twice as high. The emergence of standardized support for location providers in web browsers, as well as of providers of accurate locations, enables...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Optimizing Graph Algorithms on Pregellike Systems

    The authors study the problem of implementing graph algorithms efficiently on pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Aug 2009

    Sort vs. Hash Revisited: Fast Join Implementation on Modern MultiCore CPUs

    Join is an important database operation. As computer architectures evolve, the best join algorithm may change hand. This paper reexamines two popular join algorithms - hash join and sort-merge join - to determine if the latest computer architecture trends shift the tide that has favored hash join for many years....

    Provided By VLD Digital

  • White Papers // Aug 2012

    SWORS: A System for the Efficient Retrieval of Relevant Spatial Web Objects

    With the proliferation of geo-positioning, e.g., by means of GPS or systems that exploit the wireless communication infrastructure, accurate user location is increasingly available. Spatial web objects that possess both a geographical location and a textual description are gaining in prevalence. This gives prominence to spatial keyword queries that exploit...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Scalable K-Means++

    Over half a century old and showing no signs of aging, k-means remain one of the most popular data processing algorithms. As is well-known, a proper initialization of k-means is crucial for obtaining a good final solution. The recently proposed k-means++ initialization algorithm achieves this, obtaining an initial set of...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Capturing Topology in Graph Pattern Matching

    Graph pattern matching is often defined in terms of subgraph isomorphism, an NP-complete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubic-time. However, they fall short of capturing the topology of data graphs, i.e., graphs...

    Provided By VLD Digital

  • White Papers // Aug 2012

    RTED: A Robust Algorithm for the Tree Edit Distance

    The authors consider the classical tree edit distance between ordered labeled trees, which is defined as the minimum-cost sequence of node edit operations that transform one tree into another. The state-of-the-art solutions for the tree edit distance are not satisfactory. The main competitors in the field either have optimal worst-case...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Type-Based Detection of XML Query-Update Independence

    In this paper, the authors present a novel static analysis technique to detect XML query-update independence, in the presence of a schema. Rather than types, the authors' system infers chains of types. Each chain represents a path that can be traversed on a valid document during query/update evaluation. The resulting...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Real Time Discovery of Dense Clusters in Highly Dynamic Graphs: Identifying Real World Events in Highly Dynamic Environments

    Due to their real time nature, microblog streams are a rich source of dynamic information, for example, about emerging events. Existing techniques for discovering such events from a microblog stream in real time (such as Twitter trending topics), have several lacunae when used for discovering emerging events; extant graph based...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Shortest Path Computation with No Information Leakage

    Shortest path computation is one of the most common queries in Location-Based Services (LBSs). Although particularly useful, such queries raise serious privacy concerns. Exposing to a (potentially untrusted) LBS the client's position and the user destination may reveal personal information, such as social habits, health condition, shop-ping preferences, lifestyle choices,...

    Provided By VLD Digital

  • White Papers // Aug 2012

    V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors

    In this paper, the authors propose V-SMART-Join, a scalable MapReduce-based framework for discovering all pairs of similar entities. The V-SMART-Join framework is applicable to sets, multisets, and vectors. V-SMART-Join is motivated by the observed skew in the underlying distributions of Internet traffic, and is a family of 2-stage algorithms, where...

    Provided By VLD Digital

  • White Papers // Aug 2012

    WETSUIT: An Efficient Mashup Tool for Searching and Fusing Web Entities

    The authors demonstrate a new powerful mashup tool called WET- SUIT (Web EnTity Search and fUsIon Tool) to search and integrate web data from diverse sources and domain-specific entity search engines. WETSUIT supports adaptive search strategies to query sets of relevant entities with a minimum of communication overhead. Mashups can...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Developing and Analyzing XSDs through BonXai

    BonXai is a versatile schema specification language expressively equivalent to XML Schema. It is not intended as a replacement for XML schema but it can serve as an additional, user-friendly front-end. It offers a simple way and a lightweight syntax to specify the context of elements based on regular expressions...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Hum-a-song: A Subsequence Matching with Gaps-Range-Tolerances Query-By-Humming System

    The authors present \"Hum-a-song\", a system built for music retrieval, and particularly for the Query-By-Humming (QBH) application. According to QBH, the user is able to hum a part of a song that she recalls and would like to learn what this song is, or find other songs similar to it...

    Provided By VLD Digital

  • White Papers // Aug 2012

    DISKs: A System for Distributed Spatial Group Keyword Search on Road Networks

    Query (e.g., shortest path) on road networks has been extensively studied. Although most of the existing query processing approaches is designed for centralized environments, there is a growing need to handle queries on road networks in distributed environments due to the increasing query workload and the challenge of querying large...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Generating Exact- and Ranked Partially-Matched Answers to Questions in Advertisements

    Taking advantage of the Web, many advertisements (ads for short) websites, which aspire to increase client's transactions and thus profits, offer searching tools which allow users to post keyword queries to capture their information needs or invoke form-based interfaces to create queries by selecting search options, such as a price...

    Provided By VLD Digital

  • White Papers // Aug 2012

    REX: Explaining Relationships between Entity Pairs

    Knowledge bases of entities and relations (either constructed manually or automatically) are behind many real world search engines, including those at Yahoo, Microsoft, and Google. Those knowledge bases can be viewed as graphs with nodes representing entities and edges representing (primary) relationships, and various papers have been conducted on how...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Understanding and Managing Cascades on Large Graphs

    How do contagions spread in population networks? Which group should the authors market to, for maximizing product penetration? Will a given YouTube video go viral? Who are the best people to vaccinate? What happens when two products compete? This paper is to provide an intuitive and concise overview of most...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Indexing the Earth Mover's Distance Using Normal Distributions

    Querying uncertain data sets (represented as probability distributions) presents many challenges due to the large amount of data involved and the difficulties comparing un-certainty between distributions. The Earth Mover's Distance (EMD) has increasingly been employed to compare uncertain data due to its ability to effectively capture the differences between two...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Bayesian Locality Sensitive Hashing for Fast Similarity Search

    Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks the authors to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-Sensitive Hashing (LSH) based methods have become a very popular approach for this problem. However, most such methods...

    Provided By VLD Digital

  • White Papers // Mar 2012

    MonetDB/DataCell: Online Analytics in a Streaming ColumnStore

    In DataCell, the authors design streaming functionalities in a modern relational database kernel which targets big data analytics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. They investigate the opportunities and challenges that arise with such a direction and they show that it carries significant advantages...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Concurrency Control for Adaptive Indexing

    Adaptive indexing initializes and optimizes indexes incrementally, as a side effect of query processing. The goal is to achieve the benefits of indexes while hiding or minimizing the costs of index creation. However, index-optimizing side effects seem to turn read-only queries into update transactions that might, for example, create lock...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Fast Updates on Read-Optimized Databases Using MultiCore CPUs

    Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process introduces significant overheads and unacceptable downtimes in update intensive systems, aspiring to combine transactional and analytical workloads into one system....

    Provided By VLD Digital

  • White Papers // Aug 2013

    Hadoop's Adolescence

    The authors analyze Hadoop workloads from three different research clusters from a user-centric perspective. The goal is to better understand data scientists' use of the system and how well the use of the system matches its design. Their analysis suggests that Hadoop usage is still in its adolescence. They see...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Edelweiss: Automatic Storage Reclamation for Distributed Programming

    Event Log Exchange (ELE) is a common programming pattern based on immutable state and messaging. ELE sidesteps traditional challenges in distributed consistency, at the expense of introducing new challenges in designing space reclamation protocols to avoid consuming unbounded storage. The authors introduce Edelweiss, a sublanguage of bloom that provides an...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Piranha: Optimizing Short Jobs in Hadoop

    Cluster computing has emerged as a key parallel processing platform for large scale data. All major internet companies use it as their major central processing platform. One of cluster computing's most popular examples is MapReduce and its open source implementation Hadoop. These systems were originally designed for batch and massive-scale...

    Provided By VLD Digital

  • White Papers // Sep 2010

    FlashStore: High Throughput Persistent Key-Value Store

    The authors present FlashStore, a high throughput persistent key value store that uses flash memory as a non-volatile cache between RAM and hard disk. FlashStore is designed to store the working set of key-value pairs on flash and use one flash read per key lookup. As the working set changes...

    Provided By VLD Digital

  • White Papers // Sep 2013

    Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs

    Horton+ is a graph query processing system that executes declarative reachability queries on a partitioned attributed multi-graph. It employs a query language, query optimizer, and a distributed execution engine. The query language expresses declarative reachability queries, and supports closures and predicates on node and edge attributes to match graph paths....

    Provided By VLD Digital

  • White Papers // Aug 2013

    ISLABEL: an Independent-Set based Labeling Scheme for Point-to-Point Distance Querying

    The authors study the problem of computing shortest path or distance between two query vertices in a graph, which has numerous important applications. Quite a number of indexes have been proposed to answer such distance queries. However, all of these indexes can only process graphs of size barely up to...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Streaming Algorithms for k-core Decomposition

    A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. A k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-Hard problems on real networks efficiently,...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Efficient Error-tolerant Query Autocompletion

    Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper, the authors study the problem of query autocompletion that tolerates errors in users' input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Spatial Keyword Query Processing: An Experimental Evaluation

    Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. The authors provide an all-around survey of 12 state-of-the-art geo-textual indices. They propose...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Fast Iterative Graph Computation with Block Updates

    Scaling iterative graph processing applications to large graphs is an important problem. Performance is critical, as data scientists need to execute graph programs many times with varying parameters. The need for a high-level, high-performance programming model has inspired much research on graph programming frameworks. In this paper, the authors show...

    Provided By VLD Digital

  • White Papers // Sep 2013

    Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce

    The skyline operator and its variants such as dynamic skyline and reverse skyline operators have attracted considerable attention recently due to their broad applications. However, computations of such operators are challenging today since there is an increasing trend of applications to deal with big data. For such data-intensive applications, the...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Simple, Fast, and Scalable Reachability Oracle

    In this paper, by introducing two simple, elegant, and effective labeling approaches, Hierarchical Labeling and Distribution Labeling, the authors are able to resolve an important open question in reachability computation: the reachability oracle can be a powerful tool to handle real, very large graphs. Their experimental results demonstrate that they...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Understanding Hierarchical Methods for Differentially Private Histograms

    In recent years, many approaches to differentially privately publish histograms have been proposed. Several approaches rely on constructing tree structures in order to decrease the error when answer large range queries. In this paper, the authors examine the factors affecting the accuracy of hierarchical approaches by studying the Mean Squared...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Discovering Denial Constraints

    Integrity Constraints (ICs) provide a valuable tool for enforcing correct application semantics. However, designing ICs requires experts and time. Proposals for automatic discovery have been made for some formalisms, such as functional dependencies and their extension conditional functional dependencies. Unfortunately, these dependencies cannot express many common business rules. For example,...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Multi-Tuple Deletion Propagation: Approximations and Complexity

    In this paper the authors study the computational complexity of the classic problem of deletion propagation in a relational database, where tuples are deleted from the base relations in order to realize a desired deletion of tuples from the view. Such an operation may result in a (sometimes unavoidable) side...

    Provided By VLD Digital

  • White Papers // Aug 2013

    Supporting Distributed Feed-Following Apps over Edge Devices

    In feed-following applications such as Twitter and Facebook, users (consumers) follow a large number of other users (producers) to get personalized feeds, generated by blending producers' feeds. With the proliferation of cloud-connected smart edge devices such as Smartphone, producers and consumers of many feed-following applications reside on edge devices and...

    Provided By VLD Digital