Data Management

Oceans of data are generated every day by businesses and enterprises, and all of it must be prioritized, analyzed, and safeguarded with the right architecture, tools, polices, and procedures. TechRepublic provides the resources you need.

  • White Papers // Aug 2008

    The V*-Diagram: A Query-Dependent Approach to Moving KNN Queries

    The Moving k Nearest Neighbor (MkNN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database community. This paper...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Probabilistic Histograms for Probabilistic Data

    There is a growing realization that modern DataBase Management Systems (DBMSs) must be able to manage data that contains uncertainties that are represented in the form of probabilistic relations. Consequently, the design of each core DBMS component must be revisited in the presence of uncertain and probabilistic information. In this...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Rewriting Procedures for Batched Bindings

    Queries, or calls to stored procedures/user-defined functions are often invoked multiple times, either from within a loop in an application program, or from the where/select clause of an outer query. When the invoked query/procedure/function involves database access, a naive implementation can result in very poor performance, due to random I/O....

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Discovering Relative Importance of Skyline Attributes

    Querying databases with preferences is an important research problem. Among various approaches to querying with preferences, the skyline framework is one of the most popular. A well known deficiency of that framework is that all attributes are of the same importance in skyline preference relations. Consequently, the size of the...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    A Pay-As-You-Go Framework for Query Execution Feedback

    Past work has suggested that query execution feedback can be useful in improving the quality of plans by correcting cardinality estimation errors in the query optimizer. The state-of-the-art approach for obtaining execution feedback is "Passive" monitoring which records the cardinality of each operator in the execution plan. The authors observe...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Evita Raced: Metacompilation for Declarative Networks

    Declarative languages have recently been proposed for many new applications outside of traditional data management. Since these are relatively early research efforts, it is important that the architectures of these declarative systems be extensible, in order to accommodate unforeseen needs in these new domains. In this paper, the authors apply...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Discovering Data Quality Rules

    Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often arises when domain constraints and business rules, meant to preserve data consistency and accuracy, are enforced incompletely or not at all in application code....

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting

    Several data management challenges arise in the context of Internet advertising networks, where Internet advertisers pay Internet publishers to display advertisements on their Web sites and drive traffic to the advertisers from surfers' clicks. Although advertisers can target appropriate market segments, the model allows dishonest publishers to defraud the advertisers...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Privacy Preserving Serial Data Publishing by Role Composition

    Previous works about privacy preserving serial data publishing on dynamic databases have relied on unrealistic assumptions of the nature of dynamic databases. In many applications, some sensitive values change freely while others never change. For example, in medical applications, the disease attribute changes with time when patients recover from one...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Truth Discovery and Copying Detection in a Dynamic World

    Modern information management applications often require integrating data from a variety of data sources, some of which may copy or buy data from other sources. When these data sources model a dynamically changing world (e.g., people's contact information changes over time, restaurants open and go out of business), sources often...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Output Perturbation With Query Relaxation

    Given a dataset containing sensitive personal information, a statistical database answers aggregate queries in a manner that preserves individual privacy. The authors consider the problem of constructing a statistical database using output perturbation, which protects privacy by injecting a small noise into each query result. They show that the state-of-the-art...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Transaction Time Indexing With Version Compression

    Immortal DB is a transaction time database system designed to enable high performance for temporal applications. It is built into a commercial database engine, Microsoft SQL Server. This paper describes how the authors integrated a temporal indexing technique, the TSB-tree, into Immortal DB to serve as the core access method....

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    GConnect: A Connectivity Index for Massive Disk-Resident Graphs

    The problem of connectivity is an extremely important one in the context of massive graphs. In many large communication networks, social networks and other graphs, it is desirable to determine the minimum-cut between any pair of nodes. The problem is well solved in the classical literature, since it is related...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    On Efficiently Searching Trajectories and Archival Data for Historical Similarities

    The authors study the problem of efficiently evaluating similarity queries on histories, where a history is a d-dimensional time series for d >=1. While there are some solutions for time-series and spatio-temporal trajectories where typically d

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Keyword Query Cleaning

    Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and efficient keyword query processing over databases a very challenging task. In this paper, the authors introduce the problem of query cleaning for keyword search...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Scalable Adhoc Entity Extraction From Text Collections

    Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, the authors introduce the "Ad-hoc" entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task. In such...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Scheduling Shared Scans of Large Data Files

    The authors study how best to schedule scans of large data files, in the presence of many simultaneous requests to a common set of files. The objective is to maximize the overall rate of processing these files, by sharing scans of the same file as aggressively as possible, without imposing...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    iNextCube: Information Network-Enhanced Text Cube

    Nowadays, most business, administration, and/or scientific databases contain both structured attributes and text attributes. The authors call a database that consists of both multi-dimensional structured data and narrative text data as multidimensional text database. Searching, OLAP, and mining such databases pose many research challenges. To enhance the power of data...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    A Skip-list Approach for Efficiently Processing Forecasting Queries

    Time series data is common in many settings including scientific and financial applications. In these applications, the amount of data is often very large. The authors seek to support prediction queries over time series data. Prediction relies on model building which can be too expensive to be practical if it...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    A Request-Routing Framework for SOA-Based Enterprise Computing

    Enterprises may use a Service-Oriented Architecture (SOA) to provide a streamlined interface to their business processes. To scale up the system, each tier in a composite service usually deploys multiple servers for load distribution and fault tolerance. Such load distribution across multiple servers within the same tier can be viewed...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Hexastore: Sextuple Indexing for Semantic Web Data Management

    Despite the intense interest towards realizing the Semantic Web vision, most existing RDF data management schemes are constrained in terms of efficiency and scalability. Still, the growing popularity of the RDF format arguably calls for an effort to offset these drawbacks. Viewed from a relational database perspective, these constraints are...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Modeling and Querying Possible Repairs in Duplicate Detection

    One of the most prominent data quality problems is the existence of duplicate records. Current duplicate elimination procedures usually produce one clean instance (repair) of the input data, by carefully choosing the parameters of the duplicate detection algorithms. Finding the right parameter settings can be hard, and in many cases,...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    SSD Bufferpool Extensions for Database Systems

    High-end Solid State Disks (SSDs) provide much faster access to data compared to conventional hard disk drives. The authors present a technique for using solid-state storage as a caching layer between RAM and hard disks in database management systems. By caching data that is accessed frequently, disk I/O is reduced....

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    DataGarage: Warehousing Massive Performance Data on Commodity Servers

    Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, the authors show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Enabling Real Time Data Analysis

    Network-based services have become a ubiquitous part of the lives, to the point where individuals and businesses have often come to critically rely on them. Building and maintaining such reliable, high performance network and service infrastructures requires the ability to rapidly investigate and resolve complex service and performance impacting issues....

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    PolicyReplay: Misconfiguration Response Queries for Data Breach Reporting

    Recent legislation has increased the requirements of organizations to report data breaches, or unauthorized access to data. While access control policies are used to restrict access to a database, these policies are complex and difficult to configure. As a result, misconfigurations sometimes allow users access to unauthorized data. In this...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Ten Thousand SQLs: Parallel Keyword Queries Computing

    Keyword search in relational databases has been extensively studied. Given a relational database, a keyword query finds a set of interconnected tuple structures connected by foreign key references. On RDBMS, a keyword query is processed in two steps, namely, Candidate Networks (CNs) generation and CNs evaluation, where a CN is...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    The Case for Determinism in Database Systems

    Replication is a widely used method for achieving high availability in database systems. Due to the nondeterminism inherent in traditional concurrency control schemes, however, special care must be taken to ensure that replicas don't diverge. Log shipping, eager commit protocols, and lazy synchronization protocols are well-understood methods for safely replicating...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Chase Termination: A Constraints Rewriting Approach

    Several database areas such as data exchange and integration share the problem of fixing database instance violations with respect to a set of constraints. The chase algorithm solves such violations by inserting tuples and setting the value of nulls. Unfortunately, the chase algorithm may not terminate and the problem of...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Scalable Data Exchange With Functional Dependencies

    The recent literature has provided a solid theoretical foundation for the use of schema mappings in data-exchange applications. Following this formalization, new algorithms have been developed to generate optimal solutions for mapping scenarios in a highly scalable way, by relying on SQL. However, these algorithms suffer from a serious drawback:...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Interactive Route Search in the Presence of Order Constraints

    A route search is an enhancement of an ordinary geographic search. Instead of merely returning a set of entities, the result is a route that goes via entities that are relevant to the search. The input to the problem consists of several search queries, and each query defines a type...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Adaptive Logging for Mobile Device

    Nowadays, due to the increased user requirements of the fast and reliable data management operation for mobile applications, major device vendors use embedded DBMS for their mobile devices such as MP3 players, mobile phones, digital cameras and PDAs. However, database logging is the major bottleneck against the fast response time....

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Updatable and Evolvable Transforms for Virtual Databases

    Applications typically have some local understanding of a database schema, a virtual database that may differ significantly from the actual schema of the data where it is stored. Application engineers often support a virtual database using custom-built middleware because the available solutions, including updatable views, are unable to express necessary...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Dremel: Interactive Analysis of WebScale Datasets

    Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Identifying the Most Influential Data Objects With Reverse Top-k Queries

    Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, the authors show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Transforming Range Queries to Equivalent Box Queries to Optimize Page Access

    Range queries based on L1 distance are a common type of queries in multimedia databases containing feature vectors. The authors propose a novel approach that transforms the feature space into a new feature space such that range queries in the original space are mapped into equivalent box queries in the...

    Provided By VLDB Endowment

  • White Papers // Sep 2010

    PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting

    Recent legislation has increased the requirements of organizations to report data breaches, or unauthorized access to data. While access control policies are used to restrict access to a database, these policies are complex and difficult to configure. As a result, misconfigurations sometimes allow users access to unauthorized data. In this...

    Provided By VLDB Endowment

  • White Papers // Aug 2008

    Column-Store Support for RDF Data Management: Not All Swans Are White

    This paper reports on the results of an independent evaluation of the techniques presented in the VLDB 2007 paper "Scalable Semantic Web Data Management Using Vertical Partitioning", authored by D. Abadi, A. Marcus, S. R. Mad-den, and K. Hollenbach. The authors revisit the proposed bench-mark and examine both the data...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    Column-Oriented Database Systems

    Column-oriented database systems (column-stores) have attracted a lot of attention in the past few years. Column-stores, in a nutshell, store each database table column separately, with attribute values belonging to the same column stored contiguously, compressed, and densely packed, as opposed to traditional database systems that store entire records (rows)...

    Provided By VLDB Endowment

  • White Papers // Aug 2009

    DataCell: Building a Data Stream Engine on Top of a Relational Database Kernel

    Stream applications gained significant popularity in recent years, which lead to the development of specialized datastream engines. They often have been designed from scratch and are tuned towards the specific requirements posed by their initial target applications, e.g., network monitoring and financial services. However, this also meant that they lack...

    Provided By VLDB Endowment