VLD Digital

Displaying 1-40 of 514 results

  • White Papers // Jun 2014

    ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases

    Lazy replication with Snapshot Isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication often requires execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily...

    Provided By VLD Digital

  • White Papers // Jun 2014

    The Case for Personal Data-Driven Decision Making

    Data-Driven Decision Making (D3M) has shown great promise in professional pursuits such as business and government. Here, policy-makers collect and analyze data to make their operations more efficient and equitable. Progress in bringing the benefits of D3M to everyday life has been slow. For example, a student asks, \"If the...

    Provided By VLD Digital

  • White Papers // Jun 2014

    NOMAD: Nonlocking, stOchastic Multimachine algorithm for Asynchronous and Decentralized matrix completion

    The authors develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Ibex - An Intelligent Storage Engine with Support for Advanced SQL Offloading

    Modern data appliances face severe bandwidth bottlenecks when moving vast amounts of data from storage to the query processing nodes. A possible solution to mitigate these bottlenecks is query off-loading to an intelligent storage engine, where partial or whole queries are pushed down to the storage engine. In this paper,...

    Provided By VLD Digital

  • White Papers // Jun 2014

    Concurrent Analytical Query Processing with GPUs

    In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the pro ling of an open-source GPU query engine running commonly used single-query data warehousing workloads, the authors observe that the utilization...

    Provided By VLD Digital

  • White Papers // May 2014

    From Data Fusion to Knowledge Fusion

    The task of data fusion is to identify the true values of data items (e.g., the true date of birth for Tom Cruise) among multiple observed values drawn from different sources (e.g., Web sites) of varying (and unknown) reliability. A recent survey has provided a detailed comparison of various fusion...

    Provided By VLD Digital

  • White Papers // May 2014

    On k-Path Covers and their Applications

    The authors introduced the k-all-path cover optimization problem with the goal of computing compact yet faithful synopses of the vertex set of road networks. Their proposed pruning algorithm provides close to optimal results in practice and was experimentally proven to be very efficient on large graphs. For the special subcase...

    Provided By VLD Digital

  • White Papers // May 2014

    Scalable Logging through Emerging NonVolatile Memory

    Emerging byte-addressable, Non-Volatile Memory (NVM) is fundamentally changing the design principle of transaction logging. It potentially invalidates the need for flush-before-commit as log records are persistent immediately upon write. Distributed logging - a once prohibitive technique for single node systems in the DRAM era - becomes a promising solution to...

    Provided By VLD Digital

  • White Papers // May 2014

    When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities

    Recently, approximate hardware designs have got many research interests in the computer architecture community. The essential idea of approximate hardware is that the hardware components such as CPU, memory and storage can trade off the accuracy of results for increased performance, reduced energy consumption, or both. The authors propose a...

    Provided By VLD Digital

  • White Papers // May 2014

    Workload Matters: Why RDF Databases Need a New Design

    The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable...

    Provided By VLD Digital

  • White Papers // May 2014

    Storage Management in AsterixDB

    Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, store, index, and analyze such data efficiently. In response, the authors released the first...

    Provided By VLD Digital

  • White Papers // May 2014

    M4: A Visualization-Oriented Time Series Data Aggregation

    Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume...

    Provided By VLD Digital

  • White Papers // May 2014

    An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems

    There have been several recent proposals for database system architectures that use a deterministic execution frame-work to process transactions. Recent proposals for deterministic database system designs argue that deterministic database systems facilitate replication since the same input can be independently sent to two different replicas without concern for replica divergence....

    Provided By VLD Digital

  • White Papers // May 2014

    Reverse k-Ranks Query

    Finding matching customers for a given product based on individual user's preference are critical for many applications, especially in e-commerce. Recently, the reverse top-k query is proposed to return a number of customers who regard a given product as one of the k most favorite products based on a linear...

    Provided By VLD Digital

  • White Papers // May 2014

    WideTable: An Accelerator for Analytical Data Processing

    In this paper the authors present a technique called WideTable that aims to improve the speed of analytical data processing systems. A WideTable is built by denormalizing the database, and then converting complex queries into simple scans on the underlying (wide) table. To avoid the pitfalls associated with denormalization, e.g....

    Provided By VLD Digital

  • White Papers // May 2014

    The Case for Data Visualization Management Systems

    Most visualizations today are produced by retrieving data from a database and using a specialized visualization tool to render it. This decoupled approach results in significant duplication of functionality, such as aggregation and filters, and misses' tremendous opportunities for cross-layer optimizations. In this paper, the authors present the case for...

    Provided By VLD Digital

  • White Papers // Apr 2014

    On Arbitrage-free Pricing for General Data Queries

    Data is a commodity. Recent research has considered the mathematical problem of setting prices for different queries over data. Ideal pricing functions need to be flexible - defined for arbitrary queries (select-project-join, aggregate, random sample, and noisy privacy-preserving queries). They should be fine-grained - a consumer should not be required...

    Provided By VLD Digital

  • White Papers // Apr 2014

    Incremental Record Linkage

    Record linkage clusters records such that each cluster corresponds to a single distinct real-world entity. It is a crucial step in data cleaning and data integration. In the big data era, the velocity of data updates is often high, quickly making previous linkage results obsolete. This paper presents an end-to-end...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories

    Driven by the advance of positioning technology and the popularity of location-sharing services, semantic-enriched trajectory data have become unprecedentedly available. The sequential patterns hidden in such data, when properly defined and extracted, can greatly benefit tasks like targeted advertising and urban planning. Unfortunately, classic sequential pattern mining algorithms developed for...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Towards Building Wind Tunnels for Data Center Design

    Data center design is a tedious and expensive process. Recently, this process has become even more challenging as users of cloud services expect to have guaranteed levels of availability, durability and performance. A new challenge for the service providers is to find the most cost-effective data center design and configuration...

    Provided By VLD Digital

  • White Papers // Mar 2014

    A Principled Approach to Bridging the Gap between Graph Data and their Schemas

    Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to...

    Provided By VLD Digital

  • White Papers // Mar 2014

    String Similarity Joins: An Experimental Evaluation

    String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly compared under the same experimental...

    Provided By VLD Digital

  • White Papers // Mar 2014

    An Efficient Publish/Subscribe Index for E-Commerce Databases

    Many of todays publish/subscribe (pub/sub) systems have been designed to cope with a large volume of subscriptions and high event arrival rate (velocity). However, in many novel applications (such as e-commerce), there is an increasing variety of items, each with different attributes. This leads to a very high-dimensional and sparse...

    Provided By VLD Digital

  • White Papers // Mar 2014

    Calibrating Data to Sensitivity in Private Data Analysis

    The authors present an approach to differentially private computation in which one does not scale up the magnitude of noise for challenging queries, but rather scales down the contributions of challenging records. While scaling down all records uniformly is equivalent to scaling up the noise magnitude, they show that scaling...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Rank Join Queries in NoSQL Databases

    Cloud stores have become the storage of choice for a large variety of big data producers, consumers, and managers (e.g., Twitter, Facebook, Google, Amazon, etc.) For many modern Big Data applications, RDBMSs were found lacking, particularly with respect to scalability (in terms of number of data items, users, operations per...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Lightweight Indexing of Observational Data in Log-Structured Storage

    Huge amounts of data are being generated by sensing devices every day, recording the status of objects and the environment. Such observational data is widely used in scientific research. As the capabilities of sensors keep improving, the data produced are drastically expanding in precision and quantity, making it a write-intensive...

    Provided By VLD Digital

  • White Papers // Feb 2014

    GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

    Large-scale data analytics have become an integral part of online services, enterprise data management, system management, and scientific applications in order to gain value from huge amounts of collected data. Finding interesting unknown facts and patterns often requires analyzing the full data set instead of applying sampling techniques. Recent approaches...

    Provided By VLD Digital

  • White Papers // Feb 2014

    epiC: an Extensible and Scalable System for Processing Big Data

    The big data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the big data problem are largely based...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Optimizing Graph Algorithms on Pregel-Like Systems

    The authors study the problem of implementing graph algorithms efficiently on Pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Schemaless and Structureless Graph Querying

    Querying complex graph databases such as knowledge graphs is a challenging task for non-professional users. Due to their complex schemas and variational information descriptions, it becomes very hard for users to formulate a query that can be properly processed by the existing systems. The authors argue that for a user-friendly...

    Provided By VLD Digital

  • White Papers // Feb 2014

    Toward Computational Fact-Checking

    In this paper, the authors have shown how to turn fact-checking into a computational problem. Interestingly, by regarding claims as queries with parameters, they can check them - not just for correctness, but more importantly, for more subtle measures of quality - by perturbing their parameters. This observation leads the...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Optimizing Graph Algorithms on Pregellike Systems

    The authors study the problem of implementing graph algorithms efficiently on pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Shared Workload Optimization

    As a result of increases in both the query load and the data managed, as well as changes in hardware architecture (multi-core), the last years have seen a shift from query-at-a-time approaches towards Shared Work (SW) systems where queries are executed in groups. Such groups share operators like scans and...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems

    The authors present a vision of next-generation visual analytics ser-vices. They argue that these services should have three related capabilities: support visual and interactive data exploration as they do today, but also suggest relevant data to enrich visualizations, and facilitate the integration and cleaning of that data. Most importantly, they...

    Provided By VLD Digital

  • White Papers // Jan 2014

    A Provenance Framework for Data-Dependent Process Analysis

    A Data-Dependent Process (DDP) models an application who-se control flow is guided by a finite state machine, as well as by the state of an underlying database. DDPs are commonly found e.g., in e-commerce. In this paper, the authors develop a framework supporting the use of provenance in static (temporal)...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records

    Identifying records referring to the same real world entity over time enables longitudinal data analysis. However, difficulties arise from the dynamic nature of the world: the entities described by a temporal data set often evolve their states over time. While the state of the art approach to temporal entity matching...

    Provided By VLD Digital

  • White Papers // Jan 2014

    Edelweiss: Automatic Storage Reclamation for Distributed Programming

    Event Log Exchange (ELE) is a common programming pattern based on immutable state and messaging. ELE sidesteps traditional challenges in distributed consistency, at the expense of introducing new challenges in designing space reclamation protocols to avoid consuming unbounded storage. The authors introduce Edelweiss, a sublanguage of bloom that provides an...

    Provided By VLD Digital

  • White Papers // Dec 2013

    Exemplar Queries: Give me an Example of What You Need

    Search engines are continuously employing advanced techniques that aim to capture user intentions and provide results that go beyond the data that simply satisfy the query conditions. Examples include the personalized results, related searches, similarity search, popular and relaxed queries. In this paper, the authors introduce a novel query paradigm...

    Provided By VLD Digital

  • White Papers // Dec 2013

    Reverse Top-k Search using Random Walk with Restart

    With the increasing popularity of social networks, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Node-to-node proximity is the key building block for many graph-based...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Compacting Transactional Data in Hybrid OLTP&OLAP Databases

    Growing main memory sizes have facilitated database management systems that keep the entire database in main memory. The drastic performance improvements that came along with these in-memory systems have made it possible to reunite the two areas of OnLine Transaction Processing (OLTP) and OnLine Analytical Processing (OLAP): an emerging class...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Ranking Large Temporal Data

    Ranking temporal data has not been studied until recently, even though ranking is an important operator (being promoted as a first-class citizen) in database systems. However, only the instant top-k queries on temporal data were studied in, where objects with the k highest scores at a query time instance t...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Measuring Two-Event Structural Correlations on Graphs

    Real-life graphs usually have various kinds of events happening on them, e.g., product purchases in online social networks and intrusion alerts in computer networks. The occurrences of events on the same graph could be correlated, exhibiting either attraction or repulsion. Such structural correlations can reveal important relationships between different events....

    Provided By VLD Digital

  • White Papers // Aug 2012

    Publishing Microdata with a Robust Privacy Guarantee

    Organizations, such as government agencies or hospitals, regularly release microdata (e.g., census data or medical records) to serve benign purposes. However, such data can inadvertently reveal sensitive personal information to malicious adversaries. Experience has shown that merely concealing explicit identifying attributes, such as name or phone number, does not suffice...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Injecting Uncertainty in Graphs for Identity Obfuscation

    Data collected nowadays by social-networking applications create fascinating opportunities for building novel services, as well as expanding the authors' understanding about social structures and their dynamics. Unfortunately, publishing social-network graphs is considered an ill-advised practice due to privacy concerns. To alleviate this problem, several anonymization methods have been proposed, aiming...

    Provided By VLD Digital

  • White Papers // Aug 2012

    PrivBasis: Frequent Itemset Mining with Differential Privacy

    The discovery of frequent itemsets can serve valuable economic and research purposes. Releasing discovered frequent itemsets, however, presents privacy challenges. In this paper, the authors studies the problem of how to perform frequent itemset mining on transaction databases while satisfying differential privacy. They propose an approach, called PrivBasis, which leverages...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Queries with Guarded Negation

    A well-established and fundamental insight in database theory is that negation (also known as complementation) tends to make queries difficult to process and difficult to reason about. Many basic problems are decidable and admit practical algorithms in the case of unions of conjunctive queries, but become difficult or even un-decidable...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Performance Guarantees for Distributed Reachability Queries

    In the real world a graph is often fragmented and distributed across different sites. This highlights the need for evaluating queries on distributed graphs. This paper proposes distributed evaluation algorithms for three classes of queries: reachability for determining whether one node can reach another, bounded reachability for deciding whether there...

    Provided By VLD Digital

  • White Papers // Aug 2012

    REX: Recursive, Delta-Based Data-Centric Computation

    In today's Web and social network environments, query workloads include ad hoc and OLAP queries, as well as iterative algorithms that analyze data relationships (e.g., link analysis, clustering, learning). Modern DBMSs support ad hoc and OLAP queries, but most are not robust enough to scale to large clusters. Conversely, "Cloud"...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Optimization of Analytic Window Functions

    Analytic functions represent the state-of-the-art way of performing complex data analysis within a single SQL statement. In particular, an important class of analytic functions that has been frequently used in commercial systems to support OLAP and decision support applications is the class of window functions. A window function returns for...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Stubby: A Transformation-based Optimizer for MapReduce Workflows

    There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces-ranging from program-based to query-based interfaces-for generating MapReduce workflows. Studies have shown that the gap in...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Efficient Multi-way Theta-Join Processing Using MapReduce

    Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to sup-port OLAP applications...

    Provided By VLD Digital

  • White Papers // Aug 2012

    hStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems

    As storage systems become increasingly heterogeneous and complex, it adds burdens on DBAs, causing suboptimal performance even after a lot of human efforts have been made. In addition, existing monitoring-based storage management by access pattern detections has difficulties to handle work-loads that are highly dynamic and concurrent. To achieve high...

    Provided By VLD Digital

  • White Papers // Aug 2012

    CDAS: A Crowdsourcing Data Analytics System

    Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, the authors look to the crowdsourcing solution-employing human...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Early Accurate Results for Advanced Analytics on MapReduce

    Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Efficient Processing of k Nearest Neighbor Joins using MapReduce

    k Nearest Neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the joint operation, kNN join is...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Sketch-based Querying of Distributed Sliding-Window Data Streams

    While traditional data-management systems focus on evaluating single, ad-hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely distributed and constantly updated. Furthermore, such query answers often need to discount data that is "Stale", and...

    Provided By VLD Digital

  • White Papers // Aug 2012

    DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views

    Applications ranging from algorithmic trading to scientific data analysis require realtime analytics based on views over databases that change at very high rates. Such views have to be kept fresh at low maintenance cost and latencies. At the same time, these views have to support classical SQL, rather than window...

    Provided By VLD Digital

  • White Papers // Aug 2012

    SODA: Generating SQL for Business Users

    The purpose of data warehouses is to enable business analysts to make better decisions. Over the years the technology has matured and data warehouses have become extremely successful. As a consequence, more and more data has been added to the data warehouses and their schemas have become increasingly complex. These...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Minuet: A Scalable Distributed Multiversion B-Tree

    Data management systems have traditionally been designed to support either long-running analytics queries or short-lived transactions, but an increasing number of applications need both. For example, online games, socio-mobile apps, and e-commerce sites need to not only maintain operational state, but also analyze that data quickly to make predictions and...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets

    With the advent of reliable positioning technologies and prevalence of location-based services, it is now feasible to accurately study the propagation of items such as infectious viruses, sensitive information pieces, and malwares through a population of moving objects, e.g., individuals, mobile devices, and vehicles. In such application scenarios, an item...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Truss Decomposition in Massive Networks

    The k-truss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NP-hard, there exists a polynomial time algorithm for computing k-truss. Compared with k-core which is also efficient to compute, k-truss represents the "Core" of a k-core...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Efficient Subgraph Matching on Billion Node Graphs

    The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query processing. Typically, most indices...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Probabilistically Bounded Staleness for Practical Partial Quorums

    Data store replication results in a fundamental trade-off between operation latency and data consistency. In this paper, the authors examine this trade-off in the context of quorum-replicated data stores. Under partial or non-strict quorum replication, a data store waits for responses from a subset of replicas before answering a query,...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Optimizing I/O for Big Array Analytics

    Big array analytics is becoming indispensable in answering important scientific and business questions. Most analysis tasks consist of multiple steps, each making one or multiple passes over the arrays to be analyzed and generating intermediate results. In the big data setting, I/O optimization is a key to efficient analytics. In...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Learning Semantic String Transformations from Examples

    The authors address the problem of performing semantic transformations on strings, which may represent a variety of data types (or their combination) such as a column in a relational table, time, date, currency, etc. Unlike syntactic transformations, which are based on regular expressions and which interpret a string as a...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Adding Logical Operators to Tree Pattern Queries on Graph Structured Data

    As data are increasingly modeled as graphs for expressing complex relationships, the tree pattern query on graph-structured data becomes an important type of queries in real-world applications. Most practical query languages, such as XQuery and SPARQL, support logical expressions using logical-AND/OR/NOT operators to define structural constraints of tree patterns. In...

    Provided By VLD Digital

  • White Papers // Aug 2012

    An Analysis of Structured Data on the Web

    In this paper, the authors analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research interest. They perform a study to understand and quantify the value of Web-scale...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Comments on "Stack based Algorithms for Pattern Matching on DAGs"

    In this paper, the authors generalizes the classical holistic twig join algorithms and proposes PathStackD, TwigStackD and DagStackD to respectively evaluate path, twig and DAG pattern queries on directed acyclic graphs. In this paper, they investigate the major results of that paper, pointing out several discrepancies and proposing solutions to...

    Provided By VLD Digital

  • White Papers // Aug 2012

    Querying Schemas With Access Restrictions

    The authors study verification of systems whose transitions consist of accesses to a Web-based data-source. An access is a lookup on a relation within a relational database, fixing values for a set of positions in the relation. For example, a transition can represent access to a Web form, where the...

    Provided By VLD Digital

  • White Papers // Apr 2013

    Upper and Lower Bounds on the Cost of a MapReduce Computation

    In this paper, the authors study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "Embarrassingly parallel," the finer they partition the work of the reducers so that more parallelism can be extracted, the greater will be the total communication between mappers...

    Provided By VLD Digital

  • White Papers // Apr 2013

    A Distributed Graph Engine for Web Scale RDF Data

    Much work has been devoted to supporting RDF data. But state-of-the-art systems and methods still cannot handle web scale RDF data effectively. Furthermore, many useful and general purpose graph-based operations (e.g., random walk, reachability, community discovery) on RDF data are not supported, as most existing systems store and index data...

    Provided By VLD Digital

  • White Papers // Apr 2013

    DAX: A Widely Distributed Multitenant Storage Service for DBMS Hosting

    Many applications hosted on the cloud have sophisticated data management needs that are best served by a SQL-based relational DBMS. It is not difficult to run a DBMS in the cloud, and in many cases one DBMS instance is enough to support an application's workload. However, a DBMS running in...

    Provided By VLD Digital

  • White Papers // Apr 2013

    Efficient Implementation of Generalized Quantification in Relational Query Languages

    The authors' present research aimed at improving their understanding of the use and implementation of quantification in relational query languages in general and SQL in particular. In order to make their results as general as possible, they use the framework of Generalized Quantification. Generalized Quantifiers (GQs) are high-level, declarative logical...

    Provided By VLD Digital

  • White Papers // Apr 2013

    XORing Elephants: Novel Erasure Codes for Big Data

    Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage...

    Provided By VLD Digital

  • White Papers // Feb 2013

    PARAS: A Parameter Space Framework for Online Association Mining

    Association rule mining is known to be computationally intensive, yet real-time decision-making applications are increasingly intolerant to delays. In this paper, the authors introduce the parameter space model, called PARAS. PARAS enable efficient rule mining by compactly maintaining the final rulesets. The PARAS model is based on the notion of...

    Provided By VLD Digital

  • White Papers // Feb 2013

    NeMa: Fast Graph Search with Label Similarity

    It is increasingly common to find real-life data represented as networks of labeled, heterogeneous entities. To query these networks, one often needs to identify the matches of a given query graph in a (typically large) network modeled as a target graph. Due to noise and the lack of fixed schema...

    Provided By VLD Digital

  • White Papers // Feb 2013

    Lightweight Privacy-Preserving Peer-to-Peer Data Integration

    Peer Data Management Systems (PDMS) are an attractive solution for managing distributed heterogeneous information. When a peer (client) requests data from another peer (server) with a different schema, translations of the query and its answer are done by a sequence of intermediate peers (translators). There are two privacy issues in...

    Provided By VLD Digital

  • White Papers // Apr 2013

    Partitioning and Ranking Tagged Data Sources

    Online types of expression in the form of social networks, micro-blogging, blogs and rich content sharing platforms have proliferated in the last few years. Such proliferation contributed to the vast explosion in online data sharing the authors are experiencing today. One unique aspect of online data sharing is tags manually...

    Provided By VLD Digital

  • White Papers // Feb 2013

    ClouDiA: A Deployment Advisor for Public Clouds

    An increasing number of distributed data-driven applications are moving into shared public clouds. By sharing resources and operating at scale, public clouds promise higher utilization and lower costs than private clusters. To achieve high utilization, however, cloud providers inevitably allocate virtual machine instances non-contiguously, i.e., instances of a given application...

    Provided By VLD Digital