VLDB Endowment

Displaying 1-40 of 247 results

  • White Papers // Nov 2011

    PIQL: Success-Tolerant Query Processing in the Cloud

    Newly-released web applications often succumb to a "Success Disaster," where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    A Framework for Supporting DBMS-Like Indexes in the Cloud

    To support "Database as a service" (DaaS) in the cloud, the database system is expected to provide similar functionalities as in centralized DBMS such as efficient processing of ad hoc queries. The system must therefore support DBMS-like indexes, possibly a few indexes for each table to provide fast location of...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    RemusDB: Transparent High Availability for Database Systems

    In this paper, the authors present a technique for building a High-Availability (HA) DataBase Management System (DBMS). The proposed technique can be applied to any DBMS with little or no customization, and with reasonable performance overhead. Their approach is based on Remus, a commodity HA solution implemented in the virtualization...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    DivDB: A System for Diversifying Query Results

    With the availability of very large databases, an exploratory query can easily lead to a vast answer set, typically based on an answer's relevance (i.e., top-k, tf-idf) to the user query. Navigating through such an answer set requires huge effort and users give up after perusing through the first few...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Online Data Fusion

    The Web contains a significant volume of structured data in various domains, but a lot of data are dirty and erroneous, and they can be propagated through copying. While data integration techniques allow querying structured data on the Web, they take the union of the answers retrieved from different sources...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Summary Graphs for Relational Database Schemas

    Increasingly complex databases need ever more sophisticated tools to help users understand their schemas and interact with the data. Existing tools fall short of either providing the "Big picture," or of presenting useful connectivity information. In this paper, the authors define summary graphs, a novel approach for summarizing schemas. Given...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    MapReduce Programming and Cost based Optimization? Crossing This Chasm With Starfish

    MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Proactive Detection and Repair of Data Corruption: Towards a Hasslefree Declarative Approach With Amulet

    Occasional corruption of stored data is an unfortunate byproduct of the complexity of modern systems. Hardware errors, software bugs, and mistakes by human administrators can corrupt important sources of data. The dominant practice to deal with data corruption today involves administrators writing ad hoc scripts that run data-integrity tests at...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Scalable SPARQL Querying of Large RDF Graphs

    The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data management on a...

    Provided By VLDB Endowment

  • White Papers // Jul 2011

    Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

    Given a query object q, a Reverse Nearest Neighbor (RNN) query in a common certain database returns the objects having q as their nearest neighbor. A new challenge for databases is dealing with uncertain objects. In this paper, the authors consider Probabilistic Reverse Nearest Neighbor (PRNN) queries, which return the...

    Provided By VLDB Endowment

  • White Papers // Jun 2011

    Monitoring Reverse Top-k Queries Over Mobile Devices

    Location-based queries are widely employed to retrieve useful information based on the user's geographical position. For example, a tourist that walks around a city may seek points of interest (e.g., restaurants) in her vicinity that satisfy her preferences (e.g., cheap and highly-rated). A top-k query defined by the user preferences...

    Provided By VLDB Endowment

  • White Papers // Apr 2011

    Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud Using Live Data Migration

    Database systems serving cloud platforms must serve large numbers of applications (or tenants). In addition to managing tenants with small data footprints, different schemas, and variable load patterns, such multitenant data platforms must minimize their operating costs by efficient resource sharing. When deployed over a pay-per-use infrastructure, elastic scaling and...

    Provided By VLDB Endowment

  • White Papers // Mar 2011

    Automatic Optimization for MapReduce Programs

    The MapReduce distributed programming framework has become popular, despite evidence that current implementations are inefficient, requiring far more hardware than traditional relational databases to complete similar tasks. MapReduce jobs are amenable to many traditional database query optimizations (B+Trees for selections, column-store-style techniques for projections, etc), but existing systems do not...

    Provided By VLDB Endowment

  • White Papers // Mar 2011

    CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads

    Index tuning, i.e., selecting the indexes appropriate for a workload, is a crucial problem in database system tuning. In this paper, the authors solve index tuning for large problem instances that are common in practice, e.g., thousands of queries in the workload, thousands of candidate indexes and several hard and...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    High Throughput Transaction Executions on Graphics Processors

    OLTP (On-Line Transaction Processing) is an important business system sector in various traditional and emerging online services. Due to the increasing number of users, OLTP systems require high throughput for executing tens of thousands of transactions in a short time period. Encouraged by the recent success of GPGPU (General-Purpose computation...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    Incrementally Maintaining Classification Using an RDBMS

    The proliferation of imprecise data has motivated both researchers and the database industry to push statistical techniques into Relational DataBase Management Systems (RDBMSes). The authors study strategies to maintain model-based views for a popular statistical technique, classification, inside an RDBMS in the presence of updates (to the set of training...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    Distributed Inference and Query Processing for RFID Tracking and Monitoring

    In this paper, the authors present the design of a scalable, distributed stream processing system for RFID tracking and monitoring. Since RFID data lacks containment and location information that is key to query processing, they propose to combine location and containment inference with stream query processing in a single architecture,...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    Fast Sparse MatrixVector Multiplication on GPUs: Implications for Graph Mining

    Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this paper the authors present a novel non-parametric, self-tunable, approach to data representation for computing this kernel, particularly targeting sparse matrices representing power-law...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    Automatic Wrappers for Large Scale Web Extraction

    The authors present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. This enables one to learn wrappers in a completely unsupervised manner from automatically and cheaply obtained noisy training data, e.g., using dictionaries and regular expressions. By removing the site-level supervision that wrapper-based...

    Provided By VLDB Endowment

  • White Papers // Jan 2011

    Graph Indexing of Road Networks for Shortest Path Queries With Label Restrictions

    The current widespread use of location-based services and GPS technologies has revived interest in very fast and scalable shortest path queries. The authors introduce a new shortest path query type in which dynamic constraints may be placed on the allowable set of edges that can appear on a valid shortest...

    Provided By VLDB Endowment

  • White Papers // Nov 2010

    Efficient Processing of Top-k Spatial Preference Queries

    Top-k spatial preference queries return a ranked set of the k best data objects based on the scores of feature objects in their spatial neighborhood. Despite the wide range of location-based applications that rely on spatial preference queries, existing algorithms incur non-negligible processing cost resulting in high response time. The...

    Provided By VLDB Endowment

  • White Papers // Oct 2010

    ROXXI: Reviving Witness DOcuments to EXplore EXtracted Information

    In recent years, there has been considerable research on information extraction and constructing RDF knowledge bases. In general, the goal is to extract all relevant information from a corpus of documents, store it into an ontology, and answer future queries based only on the created knowledge base. Thus, the original...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Read-Once Functions and Query Evaluation in Probabilistic Databases

    Probabilistic databases hold promise of being a viable means for large-scale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on query-centric formulations (e.g.,...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    On Dense Pattern Mining in Graph Streams

    Many massive web and communication network applications create data which can be represented as a massive sequential stream of edges. For example, conversations in a telecommunication network or messages in a social network can be represented as a massive stream of edges. Such streams are typically very large, because of...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Dynamic Join Optimization in Multi-Hop Wireless Sensor Networks

    To enable smart environments and self-tuning data centers, the authors are developing the Aspen system for integrating physical sensor data, as well as stream data coming from machine logical state, and database or Web data from the Internet. A key component of this system is a query processor optimized for...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Efficient B-Tree Based Indexing for Cloud Data Processing

    There has been an increasing interest in deploying a storage system on Cloud to support applications that require massive scalability and high throughput in storage layer. Examples of such systems include Amazon's Dynamo and Google's BigTable. Cloud storage systems are designed to meet several essential requirements of data-intensive applications: manageability,...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting

    Recent legislation has increased the requirements of organizations to report data breaches, or unauthorized access to data. While access control policies are used to restrict access to a database, these policies are complex and difficult to configure. As a result, misconfigurations sometimes allow users access to unauthorized data. In this...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Automatic Rule Refinement for Information Extraction

    Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, the authors show that techniques developed in...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Nearest Neighbor Search With Strong Location Privacy

    The tremendous growth of the Internet has significantly reduced the cost of obtaining and sharing information about individuals, raising many concerns about user privacy. Spatial queries pose an additional threat to privacy because the location of a query may be sufficient to reveal sensitive information about the querier. In this...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Secure Personal Data Servers: A Vision Paper

    An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often count on internet companies to store their data and make them reliable and highly available through the internet. However, these benefits must be weighed against privacy risks incurred...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    PAO: Power-Efficient Attribution of Outliers in Wireless Sensor Networks

    Sensor nodes constitute inexpensive, disposable devices that are often scattered in harsh environments of interest so as to collect and communicate desired measurements of monitored quantities. Due to the commodity hardware used in the construction of sensor nodes, the readings of sensors are frequently tainted with outliers. Given the presence...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems

    There are many academic and commercial Stream Processing Engines (SPEs) today, each of them with its own execution semantics. This variation may lead to seemingly inexplicable differences in query results. In this paper, the authors present SECRET, a model of the behavior of SPEs. SECRET is a descriptive model that...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    FlashStore: High Throughput Persistent KeyValue Store

    The authors present FlashStore, a high throughput persistent keyvalue store that uses flash memory as a non-volatile cache between RAM and hard disk. FlashStore is designed to store the working set of key-value pairs on flash and use one flash read per key lookup. As the working set changes over...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    SSD Bufferpool Extensions for Database Systems

    High-end Solid State Disks (SSDs) provide much faster access to data compared to conventional hard disk drives. The authors present a technique for using solid-state storage as a caching layer between RAM and hard disks in database management systems. By caching data that is accessed frequently, disk I/O is reduced....

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    DataGarage: Warehousing Massive Performance Data on Commodity Servers

    Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, the authors show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

    Large-scale data analysis has become increasingly important for many enterprises. Recently, a new distributed computing paradigm, called MapReduce, and its open source implementation Hadoop, has been widely adopted due to its impressive scalability and flexibility to handle structured as well as unstructured data. In this paper, the authors describe the...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    DistanceBased Outlier Detection: Consolidation and Renewed Bearing

    Detecting outliers in data is an important problem with interesting applications in a myriad of domains ranging from data cleaning to financial fraud detection and from network intrusion detection to clinical diagnosis of diseases. Over the last decade of research, distance-based outlier detection algorithms have emerged as a viable, scalable,...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    ROADTRACK: Scaling Location Updates for Mobile Clients on Road Networks With Query Awareness

    Mobile commerce and Location Based Services (LBS) are some of the fastest growing IT industries in the last five years. Location update of mobile clients is a fundamental capability in mobile commerce and all types of LBS. Higher update frequency leads to higher accuracy, but incurs unacceptably high cost of...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Enabling Real Time Data Analysis

    Network-based services have become a ubiquitous part of the lives, to the point where individuals and businesses have often come to critically rely on them. Building and maintaining such reliable, high performance network and service infrastructures requires the ability to rapidly investigate and resolve complex service and performance impacting issues....

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    PolicyReplay: Misconfiguration Response Queries for Data Breach Reporting

    Recent legislation has increased the requirements of organizations to report data breaches, or unauthorized access to data. While access control policies are used to restrict access to a database, these policies are complex and difficult to configure. As a result, misconfigurations sometimes allow users access to unauthorized data. In this...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Read-Once Functions and Query Evaluation in Probabilistic Databases

    Probabilistic databases hold promise of being a viable means for large-scale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on query-centric formulations (e.g.,...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    On Dense Pattern Mining in Graph Streams

    Many massive web and communication network applications create data which can be represented as a massive sequential stream of edges. For example, conversations in a telecommunication network or messages in a social network can be represented as a massive stream of edges. Such streams are typically very large, because of...

    Provided By VLDB Endowment

  • White Papers // Mar 2011

    CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads

    Index tuning, i.e., selecting the indexes appropriate for a workload, is a crucial problem in database system tuning. In this paper, the authors solve index tuning for large problem instances that are common in practice, e.g., thousands of queries in the workload, thousands of candidate indexes and several hard and...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Dynamic Join Optimization in Multi-Hop Wireless Sensor Networks

    To enable smart environments and self-tuning data centers, the authors are developing the Aspen system for integrating physical sensor data, as well as stream data coming from machine logical state, and database or Web data from the Internet. A key component of this system is a query processor optimized for...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Supporting Realworld Activities in Database Management Systems

    Databases are integral to many application domains in which the cycle of processing the data is complex and may involve real-world activities that are external to the database, e.g., wet-lab experiments, manual measurements, and collecting instrument readings. As a result, an update operation in the database may render dependent data...

    Provided By VLDB Endowment

  • White Papers // Apr 2011

    Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud Using Live Data Migration

    Database systems serving cloud platforms must serve large numbers of applications (or tenants). In addition to managing tenants with small data footprints, different schemas, and variable load patterns, such multitenant data platforms must minimize their operating costs by efficient resource sharing. When deployed over a pay-per-use infrastructure, elastic scaling and...

    Provided By VLDB Endowment

  • White Papers // Jan 2011

    Graph Indexing of Road Networks for Shortest Path Queries With Label Restrictions

    The current widespread use of location-based services and GPS technologies has revived interest in very fast and scalable shortest path queries. The authors introduce a new shortest path query type in which dynamic constraints may be placed on the allowable set of edges that can appear on a valid shortest...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    A Framework for Supporting DBMS-Like Indexes in the Cloud

    To support "Database as a service" (DaaS) in the cloud, the database system is expected to provide similar functionalities as in centralized DBMS such as efficient processing of ad hoc queries. The system must therefore support DBMS-like indexes, possibly a few indexes for each table to provide fast location of...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Efficient B-Tree Based Indexing for Cloud Data Processing

    There has been an increasing interest in deploying a storage system on Cloud to support applications that require massive scalability and high throughput in storage layer. Examples of such systems include Amazon's Dynamo and Google's BigTable. Cloud storage systems are designed to meet several essential requirements of data-intensive applications: manageability,...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    PAO: Power-Efficient Attribution of Outliers in Wireless Sensor Networks

    Sensor nodes constitute inexpensive, disposable devices that are often scattered in harsh environments of interest so as to collect and communicate desired measurements of monitored quantities. Due to the commodity hardware used in the construction of sensor nodes, the readings of sensors are frequently tainted with outliers. Given the presence...

    Provided By VLDB Endowment

  • White Papers // Nov 2011

    PIQL: Success-Tolerant Query Processing in the Cloud

    Newly-released web applications often succumb to a "Success Disaster," where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under...

    Provided By VLDB Endowment

  • White Papers // Jul 2011

    Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

    Given a query object q, a Reverse Nearest Neighbor (RNN) query in a common certain database returns the objects having q as their nearest neighbor. A new challenge for databases is dealing with uncertain objects. In this paper, the authors consider Probabilistic Reverse Nearest Neighbor (PRNN) queries, which return the...

    Provided By VLDB Endowment

  • White Papers // Jun 2011

    Monitoring Reverse Top-k Queries Over Mobile Devices

    Location-based queries are widely employed to retrieve useful information based on the user's geographical position. For example, a tourist that walks around a city may seek points of interest (e.g., restaurants) in her vicinity that satisfy her preferences (e.g., cheap and highly-rated). A top-k query defined by the user preferences...

    Provided By VLDB Endowment

  • White Papers // Nov 2010

    Efficient Processing of Top-k Spatial Preference Queries

    Top-k spatial preference queries return a ranked set of the k best data objects based on the scores of feature objects in their spatial neighborhood. Despite the wide range of location-based applications that rely on spatial preference queries, existing algorithms incur non-negligible processing cost resulting in high response time. The...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    RemusDB: Transparent High Availability for Database Systems

    In this paper, the authors present a technique for building a High-Availability (HA) DataBase Management System (DBMS). The proposed technique can be applied to any DBMS with little or no customization, and with reasonable performance overhead. Their approach is based on Remus, a commodity HA solution implemented in the virtualization...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    DivDB: A System for Diversifying Query Results

    With the availability of very large databases, an exploratory query can easily lead to a vast answer set, typically based on an answer's relevance (i.e., top-k, tf-idf) to the user query. Navigating through such an answer set requires huge effort and users give up after perusing through the first few...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Online Data Fusion

    The Web contains a significant volume of structured data in various domains, but a lot of data are dirty and erroneous, and they can be propagated through copying. While data integration techniques allow querying structured data on the Web, they take the union of the answers retrieved from different sources...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Summary Graphs for Relational Database Schemas

    Increasingly complex databases need ever more sophisticated tools to help users understand their schemas and interact with the data. Existing tools fall short of either providing the "Big picture," or of presenting useful connectivity information. In this paper, the authors define summary graphs, a novel approach for summarizing schemas. Given...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems

    There are many academic and commercial Stream Processing Engines (SPEs) today, each of them with its own execution semantics. This variation may lead to seemingly inexplicable differences in query results. In this paper, the authors present SECRET, a model of the behavior of SPEs. SECRET is a descriptive model that...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    MapReduce Programming and Cost based Optimization? Crossing This Chasm With Starfish

    MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Scalable Verification for Outsourced Dynamic Databases

    Query answers from servers operated by third parties need to be verified, as the third parties may not be trusted or their servers may be compromised. Most of the existing authentication methods construct validity proofs based on the Merkle Hash Tree (MHT). The MHT, however, imposes severe concurrency constraints that...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Efficient Rewriting of XPath Queries Using Query Set Specifications

    The authors study the problem of querying XML data sources that accept only a limited set of queries, such as sources accessible by Web services which can implement very large (potentially infinite) families of XPath queries. To compactly specify such families of queries they adopt the Query Set Specifications, formalism...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Answering Table Augmentation Queries From Unstructured Lists on the Web

    The authors present the design of a system for assembling a table from a few example rows by harnessing the huge corpus of information-rich but unstructured lists on the web. The authors developed a totally unsupervised end to end approach which given the sample query rows - retrieves HTML lists...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Reasoning About Record Matching Rules

    Record matching is the problem for identifying tuples in one or more relations that refer to the same real-world entity. This problem is also known as record linkage, merge-purge, and duplicate detection and object identification. The need for record matching is evident. In data integration it is necessary to collate...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors

    Query optimizers rely on accurate estimations of the sizes of intermediate results. Wrong size estimations can lead to overly expensive execution plans. The authors first define the q-error to measure deviations of size estimates from actual sizes. The q-error enables the derivation of two important results: The authors provide bounds...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    WOLVES: Achieving Correct Provenance Analysis by Detecting and Resolving Unsound Workflow Views

    Workflow views abstract groups of tasks in a workflow into composite tasks, and are used for simplifying provenance analysis, workflow sharing and reuse. An unsound view does not preserve the dataflow between tasks in the workflow, and can therefore cause incorrect provenance analysis. In this demo the authors present WOLVES,...

    Provided By VLDB Endowment

  • White Papers // Jun 2009

    RankIE: Document Retrieval on Ranked Entity Graphs

    Developer communities built around software products, like the SAP Community Network, provide a knowledge base for recurring problems and their solutions. Due to the large amount of content maintained in such communities, e.g., in forums, finding relevant solutions is a major challenge beyond the scope of common keyword-based search engines....

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects

    The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Schema-Based Independence Analysis for XML Updates

    Query-update independence analysis is the problem of determining whether an update affects the results of a query. Query-update independence is useful for avoiding recomputation of materialized views and may have applications to access control and concurrency control. This paper develops static analysis techniques for query-update independence problems involving core XQuery...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Pangea: An Eager Database Replication Middleware Guaranteeing Snapshot Isolation Without Modification of Database Servers

    Recently, several middleware-based approaches have been proposed. If the authors implement all functionalities of database replication only in a middleware layer, they can avoid the high cost of modifying existing database servers or scratch-building. However, it is a big challenge to propose middleware which can enhance performance and scalability without...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Output Space Sampling for Graph Patterns

    Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Optimal Random Perturbation at Multiple Privacy Levels

    Random perturbation is a popular method of computing anonymized data for privacy preserving data mining. It is simple to apply, ensures strong privacy protection, and permits effective mining of a large variety of data patterns. However, all the existing studies with good privacy guarantees focus on perturbation at a single...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Scalable Delivery of Stream Query Result

    Continuous queries over data streams typically produce large volumes of result streams. To scale up the system, one should carefully study the problem of delivering the result streams to the end users, which, unfortunately, is often over-looked in existing systems. In this paper, the authors leverage Distributed Publish/Subscribe System (DPSS),...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Enabling Approximate Querying in Sensor Networks

    Data approximation is a popular means to support energy-efficient query processing in sensor networks. Conventional data approximation methods require users to specify fixed error bounds a prior to address the trade-off between result accuracy and energy efficiency of queries. The authors argue that this can be infeasible and inefficient when,...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    A Demonstration of SciDB: A ScienceOriented DBMS

    The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this paper, the authors present a unified approach to ranking and top-k query processing in probabilistic databases by viewing...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Inverting Schema Mappings: Bridging the Gap Between Theory and Practice

    The inversion of schema mappings has been identified as one of the fundamental operators for the development of a general framework for metadata management. In fact, during the last years three alternative notions of inversion for schema mappings have been proposed. However, the procedures that have been developed for computing...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Measure-Driven Keyword-Query Expansion

    User generated content has been fueling an explosion in the amount of available textual data. In this context, it is also common for users to express, either explicitly (through numerical ratings) or implicitly, their views and opinions on products, events, etc. This wealth of textual information necessitates the development of...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Summarizing Relational Databases

    Complex databases are challenging to explore and query by users unfamiliar with their schemas. Enterprise databases often have hundreds of inter-linked tables, so even when extensive documentation is available, new users must spend a considerable amount of time understanding the schema before they can retrieve any information from the database....

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Anticipatory DTW for Efficient Similarity Search in Time Series Databases

    Time series arise in many different applications in the form of sensor data, stocks data, videos, and other time-related information. Analysis of this data typically requires searching for similar time series in a database. Dynamic Time Warping (DTW) is a widely used high-quality distance measure for time series. As DTW...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Mining Graph Patterns Efficiently Via Randomized Summaries

    Graphs are prevalent in many domains such as Bioinformatics, social networks, Web and cyber-security. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering and classification. Existing graph mining algorithms have achieved great success by exploiting various...

    Provided By VLDB Endowment