Data Management

Oceans of data are generated every day by businesses and enterprises, and all of it must be prioritized, analyzed, and safeguarded with the right architecture, tools, polices, and procedures. TechRepublic provides the resources you need.

  • White Papers // Aug 2010

    CareDB: A Context and Preference-Aware Location Based Database System

    The authors demonstrate CareDB, a context and preference-aware database system. CareDB provides scalable personalized location-based services to users based on their preferences and current surrounding context. Unlike existing location-based database systems that answer queries based solely on proximity in distance, CareDB considers user preferences and various types of context in...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Geospatial Stream Query Processing Using Microsoft SQL Server StreamInsight

    Microsoft SQL Server spatial libraries contain several components that handle geometrical and geographical data types. With advances in geo-sensing technologies, there has been an increasing demand for geospatial streaming applications. Microsoft SQL Server Stream-Insight (Stream-Insight, for brevity) is a platform for developing and deploying streaming applications that run continuous queries...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

    MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run complex analytical tasks over very large data sets on very large clusters and clouds. However, this comes at a price: MapReduce processes...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Sharing-Aware Horizontal Partitioning for Exploiting Correlations During Query Processing

    Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query,...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Advanced Processing for Ontological Queries

    Ontology-based data access is a powerful form of extending database technology, where a classical Extensional DataBase (EDB) is enhanced by an ontology that generates new intensional knowledge which may contribute to answer a query. The ontological integrity constraints for generating this intensional knowledge can be specified in description logics such...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Embellishing Text Search Queries To Protect User Privacy

    To evaluate the performance of database applications and DBMSs, the authors usually execute workloads of queries on generated databases of different sizes and measure the response time. This paper introduces MyBenchmark, an offline data generation tool that takes a set of queries as input and generates database instances for which...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Set Similarity Join on Probabilistic Data

    Set similarity join has played an important role in many real-world applications such as data cleaning, near duplication detection, data integration, and so on. In these applications, set data often contain noises and are thus uncertain and imprecise. In this paper, the authors model such probabilistic set data on two...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Aether: A Scalable Approach to Logging

    The shift to multi-core hardware brings new challenges to database systems, as the software parallelism determines performance. Even though database systems traditionally accommodate simultaneous requests, a multitude of synchronization barriers serialize execution. Write-ahead logging is a fundamental, omnipresent component in ARIES-style concurrency and recovery, and one of the most important...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Full-Fidelity Flexible Object-Oriented XML Access

    Developers need to programmatically access persistent XML data. Object-oriented access is often the preferred method. Translating XML data into objects or vice-versa is a hard problem due to the data model mismatch and the difficulty of query translation. The authors propose a framework that addresses this problem by transforming object-based...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Cooperative Update Exchange in the Youtopia System

    Youtopia is a platform for collaborative management and integration of relational data. At the heart of Youtopia is an update exchange abstraction: changes to the data propagate through the system to satisfy user-specified mappings. The authors present a novel change propagation model that combines a deterministic chase with human intervention....

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Workload Aware Indexing of Continuously Moving Objects

    The increased deployment of sensors and data communication networks yields data management workloads with update loads that are intense, skewed, and highly bursty. Query loads resulting from location-based services are expected to exhibit similar characteristics. In such environments, index structures can easily become performance bottlenecks. The authors address the need...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Tagging Stream Data for Rich Real-Time Services

    In recent years, data streams have become ubiquitous as technology is improving and the prices of portable devices are falling, e.g., sensor networks, location-based services. Most data streams transmit only data tuples based on which continuous queries are evaluated. In this paper, the authors propose to enrich data streams with...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Effectively Indexing Uncertain Moving Objects for Predictive Queries

    Moving object indexing and query processing is a well studied research topic, with applications in areas such as intelligent transport systems and location-based services. While much existing work explicitly or implicitly assumes a deterministic object movement model, real-world objects often move in more complex and stochastic ways. This paper investigates...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Query Mesh: Multi-Route Query Processing Technology

    The authors propose to demonstrate a practical alternative approach to the current state-of-the-art query processing techniques, called the "Query Mesh" (or QM, for short). The main idea of QM is to compute multiple routes (i.e., query plans), each designed for a particular subset of data with distinct statistical properties. Based...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Exact Cardinality Query Optimization for Optimizer Testing

    The accuracy of cardinality estimates is crucial for obtaining a good query execution plan. Today‟s optimizers make several simplifying assumptions during cardinality estimation that can lead to large errors and hence poor plans. In a scenario such as query optimizer testing it is very desirable to obtain the "Best" plan,...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Managing Massive Time Series Streams with Multi-Scale Compressed Trickles

    The authors present Cypress, a novel framework to archive and query massive time series streams such as those generated by sensor networks, data centers, and scientific computing. Cypress applies multi-scale analysis to decompose time series and to obtain sparse representations in various domains (e.g. frequency domain and time domain). Relying...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Declarative Management in Microsoft SQL Server

    This paper describes the principles and practice of Declarative Management - a new approach to the management of database systems. The standard approach to database systems management involves a brittle coupling of interactive operations and procedural scripts. Such ad hoc approach results in incorrect administration, which leads to increased management...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Efficient Outer Join Data Skew Handling in Parallel DBMS

    Large enterprises have been relying on Parallel DataBase Management Systems (PDBMS) to process their ever-increasing data volume and complex queries. The scalability and performance of a PDBMS comes from load balancing on all nodes in the system. Skewed processing will significantly slow down query response time and degrade the overall...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Path Oracles for Spatial Networks

    The advent of location-based services has led to an increased demand for performing operations on spatial networks in real time. The challenge lies in being able to cast operations on spatial networks in terms of relational operators so that they can be performed in the context of a database. A...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    RDF3X: A RISC-Style Engine for RDF

    RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "Pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    On Chase Termination Beyond Stratification

    The authors study the termination problem of the chase algorithm, a central tool in various database problems such as the constraint implication problem, Conjunctive Query optimization, rewriting queries using views, data exchange, and data integration. The basic idea of the chase is, given a database instance and a set of...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Mining Document Collections to Facilitate Accurate Approximate Entity Matching

    Many entity extraction techniques leverage large reference entity tables to identify entities in documents. Often, an entity is referenced in document collections differently from that in the reference entity tables. Therefore, the authors study the problem of determining whether or not a substring "Approximately" matches with a reference entity. Similarity...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Reference-Based Alignment in Large Sequence Databases

    This paper introduces a novel method, called Reference-Based String Alignment (RBSA), that speeds up retrieval of optimal subsequence matches in large databases of sequences under the edit distance and the Smith-Waterman similarity measure. RBSA operates using the assumption that the optimal match deviates by a relatively small amount from the...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Predictable Performance for Unpredictable Workloads

    This paper introduces Crescando: a scalable, distributed relational table implementation designed to perform large numbers of queries and updates with guaranteed access latency and data freshness. To this end, Crescando leverages a number of modern query processing techniques and hardware trends. Specifically, Crescando is based on parallel, collaborative scans in...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    SIMD-Scan: Ultra Fast In-Memory Table Scan Using On-Chip Vector Processing Units

    The availability of huge system memory, even on standard servers, generated a lot of interest in main memory database engines. In data warehouse systems, highly compressed column-oriented data structures are quite prominent. In order to scale with the data volume and the system load, many of these systems are highly...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    MCDBR: Risk Analysis in the Database

    Enterprises often need to assess and manage the risk arising from uncertainty in their data. Such uncertainty is typically modeled as a probability distribution over the uncertain data values, specified by means of a complex (often predictive) stochastic model. The probability distribution over data values leads to a probability distribution...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Scalable Probabilistic Databases With Factor Graphs and MCMC

    Incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or treatment of relational algebra operators. The authors propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    On MultiColumn Foreign Key Discovery

    A foreign/primary key relationship between relational tables is one of the most important constraints in a database. From a data analysis perspective, discovering foreign keys is a crucial step in understanding and working with the data. Nevertheless, more often than not, foreign key constraints are not specified in the data,...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Building Ranked Mashups of Unstructured Sources With Uncertain Information

    Mashups are situational applications that join multiple sources to better meet the information needs of Web users. Web sources can be huge databases behind query interfaces, which triggers the need of ranking mashup results based on some user preferences. The authors present MashRank, a mashup authoring and processing system building...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Generating Databases for Query Workloads

    To evaluate the performance of database applications and DBMSs, the authors usually execute workloads of queries on generated databases of different sizes and measure the response time. This paper introduces MyBenchmark, an offline data generation tool that takes a set of queries as input and generates database instances for which...

    Provided By VLDB Endowment

  • White Papers // Jul 2010

    SQL QueRIE Recommendations: A Query Fragmentbased Approach

    Relational database systems are becoming increasingly popular in the scientific community to support the interactive exploration of large volumes of data. In this scenario, users employ a query interface (typically, a web-based client) to issue a series of SQL queries that aim to analyze the data and mine it for...

    Provided By VLDB Endowment

  • White Papers // Oct 2009

    Building Disclosure Risk Aware Query Optimizers for Relational Databases

    Many DBMS products in the market provide built in encryption support to deal with the security concerns of the organizations. This solution is quite effective in preventing data leakage from compromised/stolen storage devices. However, recent studies show that a significant part of the leaked records have been done so by...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse k-NN

    In this paper, the authors study the problem of continuous monitoring of reverse k nearest neighbor queries. Existing continuous reverse nearest neighbor monitoring techniques are sensitive towards objects and queries movement. For example, the results of a query are to be recomputed whenever the query changes its location. They present...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Adaptively Parallelizing Distributed Range Queries

    The authors consider the problem of how to best parallelize range queries in a massive scale distributed database. In traditional systems the focus has been on maximizing parallelism, for example by laying out data to achieve the highest throughput. However, in a massive scale database such as the authors' PNUTS...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Synergy-Based Workload Management

    Workload management aims at the efficient execution of queries on a database. In this paper, scheduling plays a crucial role. A vast number of scheduling approaches have been developed, most of them belonging to one of two categories: analysis and monitoring. However, they mainly either focus only on one possible...

    Provided By VLDB Endowment

  • White Papers // Feb 2009

    Maximizing the Data Utility of a Data Archiving & Querying System Through Joint Coding and Scheduling

    The authors study a joint scheduling and coding problem for collecting multi-snapshots spatial data in a resource constrained sensor network. Motivated by a distributed coding scheme for single snapshot data collection, they generalize the scenario to include multi-snapshots and general coding schemes. Associating a utility function with the recovered data,...

    Provided By University of Massachusetts Amherst

  • White Papers // Feb 2009

    Transparent Contribution of Storage and Memory

    Many research projects have proposed contributory systems that utilize the significant free disk space, idle memory, and wasted CPU cycles found on end-user machines. These applications include peer-to-peer backup, large-scale distributed storage, and distributed computation such as signal processing and protein folding. While users are generally willing to give up...

    Provided By University of Massachusetts Amherst

  • White Papers // Jul 2010

    Improved Memory Management for XML Data Stream Processing

    Running XPath queries on XML documents with minimum memory usage is a challenge. YFilter 1.0 stores the entire document in memory. The extensions to YFilter applied in are limited as they discuss memory management techniques for a limited taxonomy of queries. They do not handle cases where data is being...

    Provided By University of Massachusetts Amherst

  • White Papers // Feb 2011

    Dolly: Database Provisioning for the Cloud

    The Cloud is an increasingly popular platform for e-commerce applications that can be scaled on-demand in a very cost effective way. Dynamic provisioning is used to autonomously add capacity in multi-tier cloud-based applications that see workload increases. While many solutions exist to provision tiers with little or no state in...

    Provided By University of Massachusetts Amherst

  • White Papers // Jul 2010

    Learning Causal Models of Relational Domains

    Methods for discovering causal knowledge from observational data have been a persistent topic of AI research for several decades. Essentially all of this work focuses on knowledge representations for propositional domains. In this paper, the authors present several key algorithmic and theoretical innovations that extend causal discovery to relational domains....

    Provided By University of Massachusetts Amherst