Data Management

Oceans of data are generated every day by businesses and enterprises, and all of it must be prioritized, analyzed, and safeguarded with the right architecture, tools, polices, and procedures. TechRepublic provides the resources you need.

  • White Papers // Aug 2010

    Adaptive Logging for Mobile Device

    Nowadays, due to the increased user requirements of the fast and reliable data management operation for mobile applications, major device vendors use embedded DBMS for their mobile devices such as MP3 players, mobile phones, digital cameras and PDAs. However, database logging is the major bottleneck against the fast response time....

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Updatable and Evolvable Transforms for Virtual Databases

    Applications typically have some local understanding of a database schema, a virtual database that may differ significantly from the actual schema of the data where it is stored. Application engineers often support a virtual database using custom-built middleware because the available solutions, including updatable views, are unable to express necessary...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Dremel: Interactive Analysis of WebScale Datasets

    Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Identifying the Most Influential Data Objects With Reverse Top-k Queries

    Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, the authors show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Transforming Range Queries to Equivalent Box Queries to Optimize Page Access

    Range queries based on L1 distance are a common type of queries in multimedia databases containing feature vectors. The authors propose a novel approach that transforms the feature space into a new feature space such that range queries in the original space are mapped into equivalent box queries in the...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    CODS: Evolving Data Efficiently and Scalably in Column Oriented Databases

    Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). Database evolution is often necessitated in relational databases due to the changes of data or workload, the suboptimal initial schema design, or the...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    The Picasso Database Query Optimizer Visualizer

    Modern database systems employ a query optimizer module to automatically identify the most efficient strategies for executing the declarative SQL queries submitted by users. The efficiency of these strategies, called "Plans", is measured in terms of "Costs" that are indicative of query response times. Optimization is a mandatory exercise since...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    CareDB: A Context and Preference-Aware Location Based Database System

    The authors demonstrate CareDB, a context and preference-aware database system. CareDB provides scalable personalized location-based services to users based on their preferences and current surrounding context. Unlike existing location-based database systems that answer queries based solely on proximity in distance, CareDB considers user preferences and various types of context in...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Geospatial Stream Query Processing Using Microsoft SQL Server StreamInsight

    Microsoft SQL Server spatial libraries contain several components that handle geometrical and geographical data types. With advances in geo-sensing technologies, there has been an increasing demand for geospatial streaming applications. Microsoft SQL Server Stream-Insight (Stream-Insight, for brevity) is a platform for developing and deploying streaming applications that run continuous queries...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

    MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run complex analytical tasks over very large data sets on very large clusters and clouds. However, this comes at a price: MapReduce processes...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Sharing-Aware Horizontal Partitioning for Exploiting Correlations During Query Processing

    Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query,...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Advanced Processing for Ontological Queries

    Ontology-based data access is a powerful form of extending database technology, where a classical Extensional DataBase (EDB) is enhanced by an ontology that generates new intensional knowledge which may contribute to answer a query. The ontological integrity constraints for generating this intensional knowledge can be specified in description logics such...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Embellishing Text Search Queries To Protect User Privacy

    To evaluate the performance of database applications and DBMSs, the authors usually execute workloads of queries on generated databases of different sizes and measure the response time. This paper introduces MyBenchmark, an offline data generation tool that takes a set of queries as input and generates database instances for which...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Set Similarity Join on Probabilistic Data

    Set similarity join has played an important role in many real-world applications such as data cleaning, near duplication detection, data integration, and so on. In these applications, set data often contain noises and are thus uncertain and imprecise. In this paper, the authors model such probabilistic set data on two...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Aether: A Scalable Approach to Logging

    The shift to multi-core hardware brings new challenges to database systems, as the software parallelism determines performance. Even though database systems traditionally accommodate simultaneous requests, a multitude of synchronization barriers serialize execution. Write-ahead logging is a fundamental, omnipresent component in ARIES-style concurrency and recovery, and one of the most important...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    MCDBR: Risk Analysis in the Database

    Enterprises often need to assess and manage the risk arising from uncertainty in their data. Such uncertainty is typically modeled as a probability distribution over the uncertain data values, specified by means of a complex (often predictive) stochastic model. The probability distribution over data values leads to a probability distribution...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Scalable Probabilistic Databases With Factor Graphs and MCMC

    Incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or treatment of relational algebra operators. The authors propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    On MultiColumn Foreign Key Discovery

    A foreign/primary key relationship between relational tables is one of the most important constraints in a database. From a data analysis perspective, discovering foreign keys is a crucial step in understanding and working with the data. Nevertheless, more often than not, foreign key constraints are not specified in the data,...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Building Ranked Mashups of Unstructured Sources With Uncertain Information

    Mashups are situational applications that join multiple sources to better meet the information needs of Web users. Web sources can be huge databases behind query interfaces, which triggers the need of ranking mashup results based on some user preferences. The authors present MashRank, a mashup authoring and processing system building...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Generating Databases for Query Workloads

    To evaluate the performance of database applications and DBMSs, the authors usually execute workloads of queries on generated databases of different sizes and measure the response time. This paper introduces MyBenchmark, an offline data generation tool that takes a set of queries as input and generates database instances for which...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    XSACT: A Comparison Tool for Structured Search Results

    Studies show that about 50% of web search is for information exploration purpose, where a user would like to investigate, compare, evaluate, and synthesize multiple relevant results. Due to the absence of general tools that can effectively analyze and differentiate multiple results, a user has to manually read and comprehend...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data

    The authors present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. The system harvests real-world items from template-based HTML pages (the so-called structured Web). It illustrates a two-phase querying of the Web, in which an intentional description of the targeted data is...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    EXTRUCT: Using Deep Structural Information in XML Keyword Search

    Users who are unfamiliar with database query languages can search XML data sets using keyword queries. Previous work has shown that current XML keyword search methods, although intuitive, do not effectively use the data's structural information and provide poor precision, recall, and ranking for most queries. Based on an extension...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    SQL QueRIE Recommendations

    This demonstration presents QueRIE, a recommender system that supports interactive database exploration. This system aims at assisting non-expert users of scientific databases by tracking their querying behavior and generating personalized query recommendations. The system is supported by two recommendation engines and the underlying recommendation algorithms. The first identifies potentially "Interesting"...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Massively Parallel Data Analysis With PACTs on Nephele

    Large-scale data analysis applications require processing and analyzing of Terabytes or even Petabytes of data, particularly in the areas of web analysis or scientific data management. This trend has been discussed as "Web-scale data management" in a panel at VLDB 2009. Formerly, parallel data processing was the domain of parallel...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    CoDA: Interactive Cluster Based Concept Discovery

    Large data resources are ubiquitous in science and business. For these domains, an intuitive view on the data is essential to fully exploit the hidden knowledge. Often, these data can be semantically structured by concepts. Since the determination of concepts requires a thorough analysis of the data, data mining methods...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Keymantic: Semantic Keywordbased Searching in Data Integration Systems

    Keyword queries have become a popular alternative to structured query languages, since they do not require the users to have a good knowledge of the way data has been organized in the source. Keyword-based searching techniques on databases and on XML documents typically rely on the construction of specialized indexes...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Data Auditor: Exploring Data Quality and Semantics Using Pattern Tableaux

    The authors present Data Auditor, a tool for exploring data quality and data semantics. Given a rule or an integrity constraint and a target relation, Data Auditor computes pattern tableaux, which concisely summarize subsets of the relation that (mostly) satisfy or (mostly) fail the constraint. This paper describes: The architecture...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    iGraph: A Framework for Comparisons of Disk Based Graph Indexing Techniques

    Graphs are of growing importance in modeling complex structures such as chemical compounds, proteins, images, and program dependence. Given a query graphQ, the subgraph isomorphism problem is to find a set of graphs containing Q from a graph database, which is NP-complete. Recently, there have been a lot of research...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance

    Replication is a widely used method for achieving high availability in database systems. Due to the nondeterminism inherent in traditional concurrency control schemes, however, special care must be taken to ensure that replicas don't diverge. Log shipping, eager commit protocols, and lazy synchronization protocols are well-understood methods for safely replicating...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    The Performance of MapReduce: An Indepth Study

    MapReduce has been widely used for large-scale data analysis in the Cloud. The system is well recognized for its elastic scalability and fine-grained fault tolerance although its performance has been noted to be suboptimal in the database context. According to a recent study, Hadoop, an open source implementation of MapReduce,...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    NETFLi: Onthefly Compression, Archiving and Indexing of Streaming Network Traffic

    The ever-increasing number of intrusions in public and commercial networks has created the need for high-speed archival solutions that continuously store streaming network data to enable forensic analysis and auditing. However, "Turning back the clock" for post-attack analyses is not a trivial task. The first major challenge is that the...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    From a Stream of Relational Queries to Distributed Stream Processing

    Applications from several domains are now being written to process live data originating from hardware and software-based streaming sources. Many of these applications have been written relying solely on database and data warehouse technologies, despite their lack of need for transactional support and ACID properties. In several extreme high-load cases,...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Active Complex Event Processing: Applications in Real-Time Health Care

    The analysis of many real-world event based applications has revealed that existing Complex Event Processing technology (CEP), while effective for efficient pattern matching on event stream, is limited in its capability of reacting in real-time to opportunities and risks detected or environmental changes. The authors are the first to tackle...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Thirteen New Players in the Team: A Ferrybased LINQ to SQL Provider

    The authors demonstrate an efficient LINQ to SQL provider and its significant impact on the runtime performance of LINQ programs that process large data volumes. This alternative provider is based on Ferry, compilation technology that lets relational database systems participate in the evaluation of first-order functional programs over nested, ordered...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Peer Coordination Through Distributed Triggers

    This is a demonstration of data coordination in a peer data management system through the employment of distributed triggers. The latter express in a declarative manner individual security and consistency requirements of peers, that cannot be ensured by default in the P2P environment. Peers achieve to handle in a transparent...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Seaform: Search As You Type in Forms

    Form-style interfaces have been widely used to allow users to access information. In this demonstration paper, the authors develop a new search paradigm in form-style query interfaces, called SEAFORM (which stands for SEarch-As-You-Type in FORMS), which computes answers on-the-fly as a user types in a query letter by letter and...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    TimeTrails: A System for Exploring Spatio Temporal Information in Documents

    Spatial and temporal data have become ubiquitous in many application domains such as the Geosciences or life sciences. Sophisticated database management systems are employed to manage such structured data. However, an important source of spatio-temporal information that has not been fully utilized are unstructured text documents. In this paper, combinations...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Transforming XML Documents as Schemas Evolve

    Database systems often use XML schema to describe the format of valid XML documents. Usually, this format is determined when the system is designed. Sometimes, in an already functioning system, a need arises to change the XML schemas. In such a situation, the system has to transform the old XML...

    Provided By VLDB Endowment

  • White Papers // Aug 2010

    Improving Collaboration In Six Sigma Projects At Dow AgroSciences

    Six Sigma projects improved the company's understanding of business issues by gathering more information from across the organization. The service has also helped the company implement new processes to address the issue of lost market share, with process improvements including changes to supply chain and commercial offers. The biggest thing...

    Provided By Grouputer