VLDB Endowment

Displaying 1-40 of 144 results

  • White Papers // Jun 2015

    To Lock, Swap, or Elide: On the Interplay of Hardware Transactional Memory and LockFree Indexing

    The release of Hardware Transactional Memory (HTM) in commodity CPUs (Central Processing Units) has major implications on the design and implementation of main-memory databases, especially on the architecture of high performance lock-free indexing methods at the core of several of these systems. This paper studies the interplay of HTM and...

    Provided By VLDB Endowment

  • White Papers // Nov 2014

    Trill: A High-Performance Incremental Query Processor for Diverse Analytics

    In this paper, the authors introduce Trill - a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: query model: Trill is based on a tempo-relational model that enables it to handle streaming and relational...

    Provided By VLDB Endowment

  • White Papers // Sep 2014

    SStore: A Streaming NewSQL System for Big Velocity Applications

    First-generation streaming systems did not pay much attention to state management via ACID transactions. S-Store is a data management system that combines OLTP (OnLine Transaction Processing) transactions with stream processing. To create S-Store, the authors begin with H-Store, a main-memory transaction processing engine, and add primitives to support streaming. This...

    Provided By VLDB Endowment

  • White Papers // Sep 2014

    CPU Sharing Techniques for Performance Isolation in Multitenant Relational Database-as-a-Service

    Multi-tenancy and resource sharing are essential to make a Database-as-a-Service (DaaS) cost-effective. However, one major consequence of resource sharing is that the performance of one tenant's workload can be significantly affected by the resource demands of co-located tenants. The lack of performance isolation in a shared environment can make DaaS...

    Provided By VLDB Endowment

  • White Papers // Nov 2011

    PIQL: Success-Tolerant Query Processing in the Cloud

    Newly-released web applications often succumb to a "Success Disaster," where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    A Framework for Supporting DBMS-Like Indexes in the Cloud

    To support "Database as a service" (DaaS) in the cloud, the database system is expected to provide similar functionalities as in centralized DBMS such as efficient processing of ad hoc queries. The system must therefore support DBMS-like indexes, possibly a few indexes for each table to provide fast location of...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Scalable SPARQL Querying of Large RDF Graphs

    The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data management on a...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    MapReduce Programming and Cost based Optimization? Crossing This Chasm With Starfish

    MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Proactive Detection and Repair of Data Corruption: Towards a Hasslefree Declarative Approach With Amulet

    Occasional corruption of stored data is an unfortunate byproduct of the complexity of modern systems. Hardware errors, software bugs, and mistakes by human administrators can corrupt important sources of data. The dominant practice to deal with data corruption today involves administrators writing ad hoc scripts that run data-integrity tests at...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Fast Set Intersection in Memory

    Fast processing of set intersections is a key operation in many query processing tasks in the context of databases and information retrieval. For example, in the context of databases, set intersections are used in the context of various forms of data mining, text analytics, and evaluation of conjunctive predicates. They...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    DivDB: A System for Diversifying Query Results

    With the availability of very large databases, an exploratory query can easily lead to a vast answer set, typically based on an answer's relevance (i.e., top-k, tf-idf) to the user query. Navigating through such an answer set requires huge effort and users give up after perusing through the first few...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Online Data Fusion

    The Web contains a significant volume of structured data in various domains, but a lot of data are dirty and erroneous, and they can be propagated through copying. While data integration techniques allow querying structured data on the Web, they take the union of the answers retrieved from different sources...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Summary Graphs for Relational Database Schemas

    Increasingly complex databases need ever more sophisticated tools to help users understand their schemas and interact with the data. Existing tools fall short of either providing the "Big picture," or of presenting useful connectivity information. In this paper, the authors define summary graphs, a novel approach for summarizing schemas. Given...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    RemusDB: Transparent High Availability for Database Systems

    In this paper, the authors present a technique for building a High-Availability (HA) DataBase Management System (DBMS). The proposed technique can be applied to any DBMS with little or no customization, and with reasonable performance overhead. Their approach is based on Remus, a commodity HA solution implemented in the virtualization...

    Provided By VLDB Endowment

  • White Papers // Jul 2011

    Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

    Given a query object q, a Reverse Nearest Neighbor (RNN) query in a common certain database returns the objects having q as their nearest neighbor. A new challenge for databases is dealing with uncertain objects. In this paper, the authors consider Probabilistic Reverse Nearest Neighbor (PRNN) queries, which return the...

    Provided By VLDB Endowment

  • White Papers // Jun 2011

    Monitoring Reverse Top-k Queries Over Mobile Devices

    Location-based queries are widely employed to retrieve useful information based on the user's geographical position. For example, a tourist that walks around a city may seek points of interest (e.g., restaurants) in her vicinity that satisfy her preferences (e.g., cheap and highly-rated). A top-k query defined by the user preferences...

    Provided By VLDB Endowment

  • White Papers // Apr 2011

    Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud Using Live Data Migration

    Database systems serving cloud platforms must serve large numbers of applications (or tenants). In addition to managing tenants with small data footprints, different schemas, and variable load patterns, such multitenant data platforms must minimize their operating costs by efficient resource sharing. When deployed over a pay-per-use infrastructure, elastic scaling and...

    Provided By VLDB Endowment

  • White Papers // Mar 2011

    Automatic Optimization for MapReduce Programs

    The MapReduce distributed programming framework has become popular, despite evidence that current implementations are inefficient, requiring far more hardware than traditional relational databases to complete similar tasks. MapReduce jobs are amenable to many traditional database query optimizations (B+Trees for selections, column-store-style techniques for projections, etc), but existing systems do not...

    Provided By VLDB Endowment

  • White Papers // Mar 2011

    CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads

    Index tuning, i.e., selecting the indexes appropriate for a workload, is a crucial problem in database system tuning. In this paper, the authors solve index tuning for large problem instances that are common in practice, e.g., thousands of queries in the workload, thousands of candidate indexes and several hard and...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    High Throughput Transaction Executions on Graphics Processors

    OLTP (On-Line Transaction Processing) is an important business system sector in various traditional and emerging online services. Due to the increasing number of users, OLTP systems require high throughput for executing tens of thousands of transactions in a short time period. Encouraged by the recent success of GPGPU (General-Purpose computation...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    Incrementally Maintaining Classification Using an RDBMS

    The proliferation of imprecise data has motivated both researchers and the database industry to push statistical techniques into Relational DataBase Management Systems (RDBMSes). The authors study strategies to maintain model-based views for a popular statistical technique, classification, inside an RDBMS in the presence of updates (to the set of training...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    Distributed Inference and Query Processing for RFID Tracking and Monitoring

    In this paper, the authors present the design of a scalable, distributed stream processing system for RFID tracking and monitoring. Since RFID data lacks containment and location information that is key to query processing, they propose to combine location and containment inference with stream query processing in a single architecture,...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    Fast Sparse MatrixVector Multiplication on GPUs: Implications for Graph Mining

    Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this paper the authors present a novel non-parametric, self-tunable, approach to data representation for computing this kernel, particularly targeting sparse matrices representing power-law...

    Provided By VLDB Endowment

  • White Papers // Feb 2011

    Automatic Wrappers for Large Scale Web Extraction

    The authors present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. This enables one to learn wrappers in a completely unsupervised manner from automatically and cheaply obtained noisy training data, e.g., using dictionaries and regular expressions. By removing the site-level supervision that wrapper-based...

    Provided By VLDB Endowment

  • White Papers // Jan 2011

    Graph Indexing of Road Networks for Shortest Path Queries With Label Restrictions

    The current widespread use of location-based services and GPS technologies has revived interest in very fast and scalable shortest path queries. The authors introduce a new shortest path query type in which dynamic constraints may be placed on the allowable set of edges that can appear on a valid shortest...

    Provided By VLDB Endowment

  • White Papers // Nov 2010

    Efficient Processing of Top-k Spatial Preference Queries

    Top-k spatial preference queries return a ranked set of the k best data objects based on the scores of feature objects in their spatial neighborhood. Despite the wide range of location-based applications that rely on spatial preference queries, existing algorithms incur non-negligible processing cost resulting in high response time. The...

    Provided By VLDB Endowment

  • White Papers // Oct 2010

    ROXXI: Reviving Witness DOcuments to EXplore EXtracted Information

    In recent years, there has been considerable research on information extraction and constructing RDF knowledge bases. In general, the goal is to extract all relevant information from a corpus of documents, store it into an ontology, and answer future queries based only on the created knowledge base. Thus, the original...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Efficient B-Tree Based Indexing for Cloud Data Processing

    There has been an increasing interest in deploying a storage system on Cloud to support applications that require massive scalability and high throughput in storage layer. Examples of such systems include Amazon's Dynamo and Google's BigTable. Cloud storage systems are designed to meet several essential requirements of data-intensive applications: manageability,...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Nearest Neighbor Search With Strong Location Privacy

    The tremendous growth of the Internet has significantly reduced the cost of obtaining and sharing information about individuals, raising many concerns about user privacy. Spatial queries pose an additional threat to privacy because the location of a query may be sufficient to reveal sensitive information about the querier. In this...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Secure Personal Data Servers: A Vision Paper

    An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often count on internet companies to store their data and make them reliable and highly available through the internet. However, these benefits must be weighed against privacy risks incurred...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Automatic Rule Refinement for Information Extraction

    Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, the authors show that techniques developed in...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting

    Recent legislation has increased the requirements of organizations to report data breaches, or unauthorized access to data. While access control policies are used to restrict access to a database, these policies are complex and difficult to configure. As a result, misconfigurations sometimes allow users access to unauthorized data. In this...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Dynamic Join Optimization in Multi-Hop Wireless Sensor Networks

    To enable smart environments and self-tuning data centers, the authors are developing the Aspen system for integrating physical sensor data, as well as stream data coming from machine logical state, and database or Web data from the Internet. A key component of this system is a query processor optimized for...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    On Dense Pattern Mining in Graph Streams

    Many massive web and communication network applications create data which can be represented as a massive sequential stream of edges. For example, conversations in a telecommunication network or messages in a social network can be represented as a massive stream of edges. Such streams are typically very large, because of...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Read-Once Functions and Query Evaluation in Probabilistic Databases

    Probabilistic databases hold promise of being a viable means for large-scale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on query-centric formulations (e.g.,...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    PAO: Power-Efficient Attribution of Outliers in Wireless Sensor Networks

    Sensor nodes constitute inexpensive, disposable devices that are often scattered in harsh environments of interest so as to collect and communicate desired measurements of monitored quantities. Due to the commodity hardware used in the construction of sensor nodes, the readings of sensors are frequently tainted with outliers. Given the presence...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Using XMorph to Transform XML Data

    XMorph is a new, shape polymorphic, domain-specific XML query language. A query in a shape polymorphic language adapts to the shape of the input, freeing the user from having to know the input's shape and making the query applicable to a wide variety of differently shaped inputs. An XMorph query...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Active Complex Event Processing: Applications in Real-Time Health Care

    The analysis of many real-world event based applications has revealed that existing Complex Event Processing technology (CEP), while effective for efficient pattern matching on event stream, is limited in its capability of reacting in real-time to opportunities and risks detected or environmental changes. The authors are the first to tackle...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Thirteen New Players in the Team: A Ferrybased LINQ to SQL Provider

    The authors demonstrate an efficient LINQ to SQL provider and its significant impact on the runtime performance of LINQ programs that process large data volumes. This alternative provider is based on Ferry, compilation technology that lets relational database systems participate in the evaluation of first-order functional programs over nested, ordered...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    AXART Enabling Collaborative Work With AXML Artifacts

    The workflow models have been essentially operation-centric for many years, ignoring almost completely the data aspects. Recently, a new paradigm of data-centric workflows, called business artifacts, has been introduced by Nigam and Caswell. The authors follow this approach and propose a model where artifacts are XML documents that evolve in...

    Provided By VLDB Endowment

  • White Papers // Oct 2009

    MEET DB2: Automated Database Migration Evaluation

    Commercial databases compete for market share, which is composed of not only net-new sales to those purchasing a database for the first time, but also competitive win-backs" and migrations. Database migration, or the act of moving both application code and its underlying database platform from one database to another, presents...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    A Framework for Supporting DBMS-Like Indexes in the Cloud

    To support "Database as a service" (DaaS) in the cloud, the database system is expected to provide similar functionalities as in centralized DBMS such as efficient processing of ad hoc queries. The system must therefore support DBMS-like indexes, possibly a few indexes for each table to provide fast location of...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    Efficient B-Tree Based Indexing for Cloud Data Processing

    There has been an increasing interest in deploying a storage system on Cloud to support applications that require massive scalability and high throughput in storage layer. Examples of such systems include Amazon's Dynamo and Google's BigTable. Cloud storage systems are designed to meet several essential requirements of data-intensive applications: manageability,...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    PAO: Power-Efficient Attribution of Outliers in Wireless Sensor Networks

    Sensor nodes constitute inexpensive, disposable devices that are often scattered in harsh environments of interest so as to collect and communicate desired measurements of monitored quantities. Due to the commodity hardware used in the construction of sensor nodes, the readings of sensors are frequently tainted with outliers. Given the presence...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Scalable SPARQL Querying of Large RDF Graphs

    The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data management on a...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    MapReduce Programming and Cost based Optimization? Crossing This Chasm With Starfish

    MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Proactive Detection and Repair of Data Corruption: Towards a Hasslefree Declarative Approach With Amulet

    Occasional corruption of stored data is an unfortunate byproduct of the complexity of modern systems. Hardware errors, software bugs, and mistakes by human administrators can corrupt important sources of data. The dominant practice to deal with data corruption today involves administrators writing ad hoc scripts that run data-integrity tests at...

    Provided By VLDB Endowment

  • White Papers // Sep 2014

    SStore: A Streaming NewSQL System for Big Velocity Applications

    First-generation streaming systems did not pay much attention to state management via ACID transactions. S-Store is a data management system that combines OLTP (OnLine Transaction Processing) transactions with stream processing. To create S-Store, the authors begin with H-Store, a main-memory transaction processing engine, and add primitives to support streaming. This...

    Provided By VLDB Endowment

  • White Papers // Sep 2014

    CPU Sharing Techniques for Performance Isolation in Multitenant Relational Database-as-a-Service

    Multi-tenancy and resource sharing are essential to make a Database-as-a-Service (DaaS) cost-effective. However, one major consequence of resource sharing is that the performance of one tenant's workload can be significantly affected by the resource demands of co-located tenants. The lack of performance isolation in a shared environment can make DaaS...

    Provided By VLDB Endowment

  • White Papers // Nov 2014

    Trill: A High-Performance Incremental Query Processor for Diverse Analytics

    In this paper, the authors introduce Trill - a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: query model: Trill is based on a tempo-relational model that enables it to handle streaming and relational...

    Provided By VLDB Endowment

  • White Papers // Sep 2011

    Fast Set Intersection in Memory

    Fast processing of set intersections is a key operation in many query processing tasks in the context of databases and information retrieval. For example, in the context of databases, set intersections are used in the context of various forms of data mining, text analytics, and evaluation of conjunctive predicates. They...

    Provided By VLDB Endowment

  • White Papers // Jun 2015

    To Lock, Swap, or Elide: On the Interplay of Hardware Transactional Memory and LockFree Indexing

    The release of Hardware Transactional Memory (HTM) in commodity CPUs (Central Processing Units) has major implications on the design and implementation of main-memory databases, especially on the architecture of high performance lock-free indexing methods at the core of several of these systems. This paper studies the interplay of HTM and...

    Provided By VLDB Endowment

  • White Papers // Oct 2009

    Toward Scalable Keyword Search Over Relational Data

    Key Word Search (KWS) over relational databases has recently received significant attention. Many solutions and many prototypes have been developed. This task requires addressing many issues, including robustness, accuracy, reliability, and privacy. An emerging issue, however, appears to be performance related: current KWS systems have unpredictable running times. In particular,...

    Provided By VLDB Endowment

  • White Papers // Oct 2009

    Cloudy: A Modular Cloud Storage System

    This demonstration presents Cloudy, a modular cloud storage system. Cloudy provides a highly flexible architecture for distributed data storage and is designed to operate with multiple workloads. Based on a generic data model, Cloudy can be customized to meet application requirements. The goal of this demonstration is to show the...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Using XMorph to Transform XML Data

    XMorph is a new, shape polymorphic, domain-specific XML query language. A query in a shape polymorphic language adapts to the shape of the input, freeing the user from having to know the input's shape and making the query applicable to a wide variety of differently shaped inputs. An XMorph query...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Active Complex Event Processing: Applications in Real-Time Health Care

    The analysis of many real-world event based applications has revealed that existing Complex Event Processing technology (CEP), while effective for efficient pattern matching on event stream, is limited in its capability of reacting in real-time to opportunities and risks detected or environmental changes. The authors are the first to tackle...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Thirteen New Players in the Team: A Ferrybased LINQ to SQL Provider

    The authors demonstrate an efficient LINQ to SQL provider and its significant impact on the runtime performance of LINQ programs that process large data volumes. This alternative provider is based on Ferry, compilation technology that lets relational database systems participate in the evaluation of first-order functional programs over nested, ordered...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    AXART Enabling Collaborative Work With AXML Artifacts

    The workflow models have been essentially operation-centric for many years, ignoring almost completely the data aspects. Recently, a new paradigm of data-centric workflows, called business artifacts, has been introduced by Nigam and Caswell. The authors follow this approach and propose a model where artifacts are XML documents that evolve in...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    iFlow: An Approach for Fast and Reliable InternetScale Stream Processing Utilizing Detouring and Replication

    The authors propose to demonstrate iFlow, the replication-based system that supports both fast and reliable processing of data streams over the Internet. iFlow uses a low degree of replication in conjunction with detouring techniques to overcome network outages. iFlow also deploys replicas in a manner that improves performance and availability...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Peer Coordination Through Distributed Triggers

    This is a demonstration of data coordination in a peer data management system through the employment of distributed triggers. The latter express in a declarative manner individual security and consistency requirements of peers, that cannot be ensured by default in the P2P environment. Peers achieve to handle in a transparent...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Seaform: Search As You Type in Forms

    Form-style interfaces have been widely used to allow users to access information. In this demonstration paper, the authors develop a new search paradigm in form-style query interfaces, called SEAFORM (which stands for SEarch-As-You-Type in FORMS), which computes answers on-the-fly as a user types in a query letter by letter and...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    TimeTrails: A System for Exploring Spatio Temporal Information in Documents

    Spatial and temporal data have become ubiquitous in many application domains such as the Geosciences or life sciences. Sophisticated database management systems are employed to manage such structured data. However, an important source of spatio-temporal information that has not been fully utilized are unstructured text documents. In this paper, combinations...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Interactive Route Search in the Presence of Order Constraints

    A route search is an enhancement of an ordinary geographic search. Instead of merely returning a set of entities, the result is a route that goes via entities that are relevant to the search. The input to the problem consists of several search queries, and each query defines a type...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Adaptive Logging for Mobile Device

    Nowadays, due to the increased user requirements of the fast and reliable data management operation for mobile applications, major device vendors use embedded DBMS for their mobile devices such as MP3 players, mobile phones, digital cameras and PDAs. However, database logging is the major bottleneck against the fast response time....

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    HaLoop: Efficient Iterative Data Processing on Large Clusters

    The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce and Dryad are two popular platforms in which the dataflow takes the form of a directed acyclic graph of operators. These platforms...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Navigating in Complex MashedUp Applications

    Mashups integrate a set of Web-services and data sources, often referred to as mashlets. The authors study in this paper a common scenario where these mashlets are components of larger Web-Applications. In this case, integration of mashlets yields a set of inter-connected applications, referred to as Mashed-up APPlications (abbr. MashAPP)....

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Dremel: Interactive Analysis of WebScale Datasets

    Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Identifying the Most Influential Data Objects With Reverse Top-k Queries

    Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, the authors show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Retrieving Topk PrestigeBased Relevant Spatial Web Objects

    The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently in many types of mobile and traditional web services and applications, e.g., Yellow Pages and Maps services. Previous work considers the potential results of...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Transforming Range Queries to Equivalent Box Queries to Optimize Page Access

    Range queries based on L1 distance are a common type of queries in multimedia databases containing feature vectors. The authors propose a novel approach that transforms the feature space into a new feature space such that range queries in the original space are mapped into equivalent box queries in the...

    Provided By VLDB Endowment

  • White Papers // Oct 2009

    QUICK: Expressive and Flexible Search Over Knowledge Bases and Text Collections

    Recent work on Web-extracted data sets has produced an interesting new source of structured Web data. These data sets can be viewed as Knowledge Bases (KB) - large heterogeneous linked entity collections with millions of unique edge and node labels, often encoding rich semantic information over entities. For example, YAGO...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Transforming XML Documents as Schemas Evolve

    Database systems often use XML schema to describe the format of valid XML documents. Usually, this format is determined when the system is designed. Sometimes, in an already functioning system, a need arises to change the XML schemas. In such a situation, the system has to transform the old XML...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    XSACT: A Comparison Tool for Structured Search Results

    Studies show that about 50% of web search is for information exploration purpose, where a user would like to investigate, compare, evaluate, and synthesize multiple relevant results. Due to the absence of general tools that can effectively analyze and differentiate multiple results, a user has to manually read and comprehend...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data

    The authors present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. The system harvests real-world items from template-based HTML pages (the so-called structured Web). It illustrates a two-phase querying of the Web, in which an intentional description of the targeted data is...

    Provided By VLDB Endowment

  • White Papers // Oct 2010

    ROXXI: Reviving Witness DOcuments to EXplore EXtracted Information

    In recent years, there has been considerable research on information extraction and constructing RDF knowledge bases. In general, the goal is to extract all relevant information from a corpus of documents, store it into an ontology, and answer future queries based only on the created knowledge base. Thus, the original...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    EXTRUCT: Using Deep Structural Information in XML Keyword Search

    Users who are unfamiliar with database query languages can search XML data sets using keyword queries. Previous work has shown that current XML keyword search methods, although intuitive, do not effectively use the data's structural information and provide poor precision, recall, and ranking for most queries. Based on an extension...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    SQL QueRIE Recommendations

    This demonstration presents QueRIE, a recommender system that supports interactive database exploration. This system aims at assisting non-expert users of scientific databases by tracking their querying behavior and generating personalized query recommendations. The system is supported by two recommendation engines and the underlying recommendation algorithms. The first identifies potentially "Interesting"...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    P2PDocTagger: Content Management Through Automated P2P Collaborative Tagging

    As the amount of user generated content grows, personal in-formation management has become a challenging problem. Several information management approaches, such as desktop search, document organization and (collaborative) document tagging have been proposed to address this, however they are either inappropriate or inefficient. Automated collaborative document tagging approaches mitigate the...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    InZeit: Efficiently Identifying Insightful Time Points

    Web archives are useful resources to find out about the temporal evolution of persons, organizations, products, or other topics. However, even when advanced text search functionality is available, gaining insights into the temporal evolution of a topic can be a tedious task and often requires sifting through many documents. The...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    iAVATAR: An Interactive Tool for Finding and Visualizing Visual Representative Tags in Image Search

    Tags associated with social images are valuable information source for superior image search and retrieval experiences. Due to the nature of tagging, many tags associated with images are not visually descriptive. Consequently, presence of these noisy tags may reduce the effectiveness of tags' role in image retrieval. To address this...

    Provided By VLDB Endowment