Data Management

The latest white papers, case studies and webcasts to help technology professionals ensure data is safe from risk and disaster with quality back-up, infrastructure and information management.

  • White Papers // Apr 2015

    Approximate MaxRS in Spatial Databases

    In the Maximizing Range Sum (MaxRS) problem, given a set P of 2D points each of which is associated with a positive weight, and a rectangle r of specific extents, the authors need to decide where to place r in order to maximize the covered weight of r - that...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Rank Discovery from Web Databases

    Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple...

    Provided By VLD Digital

  • White Papers // Apr 2015

    SPARSI: Partitioning Sensitive Data amongst Multiple Adversaries

    The authors present SPARSI, a novel theoretical framework for partitioning sensitive data across multiple non-colluding adversaries. Most paper in privacy-aware data sharing has considered disclosing summaries where the aggregate information about the data is preserved, but sensitive user information is protected. Nonetheless, there are applications, including online advertising, cloud computing...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Synthetising Changes in XML Documents as PULs

    The ability of efficiently detecting changes in XML documents is crucial in many application contexts. If such changes are represented as XQuery update Pending Update Lists (PULs), they can then be applied on documents using XQuery update engines, and document management can take advantage of existing composition, inversion, reconciliation approaches...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Discovering Longest-lasting Correlation in Sequence Databases

    Most existing paper on sequence databases use correlation (e.g., Euclidean distance and Pearson correlation) as a core function for various analytical tasks. Typically, it requires users to set a length for the similarity queries. However, there is no steady way to define the proper length on different application needs. In...

    Provided By VLD Digital

  • White Papers // Apr 2015

    PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics

    Machine learning algorithms are widely used today for analytical tasks such as data cleaning, data categorization, or data filtering. At the same time, the rise of social media motivates recent uptake in large scale graph processing. Both categories of algorithms are dominated by iterative subtasks, i.e., processing steps which are...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Instant Loading for Main Memory Databases

    The eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Scalable Progressive Analytics on Big Data in the Cloud

    Analytics over the increasing quantity of data stored in the cloud has become very expensive, particularly due to the pay-as-the user-go cloud computation model. Data scientists typically manually extract samples of increasing data size (progressive samples) using domain-specific sampling strategies for exploratory querying. This provides them with user-control, repeatable semantics,...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Scalable XML Query Processing Using Parallel Pushdown Transducers

    In online social networking, network monitoring and financial applications, there is a need to query high rate streams of XML data, but methods for executing individual XPath queries on streaming XML data have not kept pace with multi-core CPUs. For data-parallel processing, a single XML stream is typically split into...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Supporting Keyword Search in Product Database: A Probabilistic Approach

    The ability to let users search for products conveniently in product database is critical to the success of e-commerce. Although structured query languages (e.g. SQL) can be used to effectively access the product database, it is very difficult for end users to learn and use. In this paper, the authors...

    Provided By VLD Digital

  • White Papers // Apr 2015

    A Temporal Probabilistic Database Model for Information Extraction

    Temporal annotations of facts are a key component both for building a high-accuracy knowledge base and for answering queries over the resulting temporal knowledge base with high precision and recall. In this paper, the authors present a temporal-probabilistic database model for cleaning uncertain temporal facts obtained from information extraction methods....

    Provided By VLD Digital

  • White Papers // Apr 2015

    Anti-Caching: A New Approach to Database Management System Architecture

    The traditional wisdom for building disk-based relational DataBase Management Systems (DBMS) is to organize data in heavily-encoded blocks stored on disk, with a main memory block cache. In order to improve performance given high disk latency, these systems use a multi-threaded architecture with dynamic record-level locking that allows multiple transactions...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Aggregation and Ordering in Factorised Databases

    A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier paper, the authors put forward a succinct representation system for relational data called factorized databases and reported on the main-memory query engine FDB for select-project-join queries on such databases. In this paper, they extend...

    Provided By VLD Digital

  • White Papers // Apr 2015

    A Data-adaptive and Dynamic Segmentation Index for Whole Matching on Time Series

    Similarity search on time series is an essential operation in many applications. In the state-of-the-art methods, such as the R-tree based methods, SAX and iSAX, time series are by default divided into equi-length segments globally, that is, all time series are segmented in the same way. Those methods then focus...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Sharing Data and Work Across Concurrent Analytical Queries

    Today's data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional Data Warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for...

    Provided By VLD Digital

  • White Papers // Apr 2015

    MASTRO STUDIO: Managing Ontology-Based Data Access Applications

    Ontology-Based Data Access (OBDA) is a novel paradigm for accessing large data repositories through an ontology that is a formal description of a domain of interest. Supporting the management of OBDA applications poses new challenges, as it requires to provide effective tools for allowing both expert and non-expert users to...

    Provided By VLD Digital

  • White Papers // Apr 2015

    PLASMAHD: Probing the LAttice Structure and MAkeup of High-dimensional Data

    Rapidly making sense of, analyzing, and extracting useful information from large and complex data is a grand challenge. A user tasked with meeting this challenge is often befuddled with questions on where and how to begin to understand the relevant characteristics of such data. Real-world problem scenarios often involve scalability...

    Provided By VLD Digital

  • White Papers // Apr 2015

    IBminer: A Text Mining Tool for Constructing and Populating InfoBox Databases and Knowledge Bases

    Knowledge bases and structured summaries are playing a crucial role in many applications, such as text summarization, question answering, essay grading, and semantic search. Although, many systems (e.g., DBpedia and YaGo2) provide massive knowledge bases of such summaries, they all suffer from incompleteness, in-consistencies, and inaccuracies. These problems can be...

    Provided By VLD Digital

  • White Papers // Apr 2015

    eSkyline: Processing Skyline Queries over Encrypted Data

    The advent of cloud computing re-defines the traditional query processing paradigm. Whereas computational overhead and memory constraints become less prohibitive, data privacy, security, and confidentiality concerns become top priorities. In particular, as data owners outsource the management of their data to service providers, query processing over such data has more...

    Provided By VLD Digital

  • White Papers // Apr 2015

    GestureQuery: A Multitouch Database Query Interface

    Multitouch interfaces allow users to directly and interactively manipulate data. The authors propose bringing such interactive manipulation to the task of querying SQL databases. This paper describes an initial implementation of such an interface for multitouch tablet devices called GestureQuery that translates multitouch gestures into database queries. It provides database...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Mining and Linking Patterns Across Live Data Streams and Stream Archives

    The authors will demonstrate the visual analytics system V istream that supports interactive mining of complex patterns within and across live data streams and stream pattern archives. Their system is equipped with both computational pattern mining and visualization techniques which allow it to not only efficiently discover and manage patterns...

    Provided By VLD Digital

  • White Papers // Apr 2015

    PhotoStand: A Map Query Interface for a Database of News Photos

    PhotoStand enables the use of a map query interface to retrieve news photos associated with news articles that are in turn associated with the principal locations that they mention collected as a result of monitoring the output of over 10,000 RSS news feeds, made available within minutes of publication, and...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS

    GPU acceleration is a promising approach to speed up query processing of database systems by using low cost graphic processors as coprocessors. Two major trends have emerged in this area: the development of frameworks for scheduling tasks in heterogeneous CPU/GPU platforms, which is mainly in the context of co-processing for...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Database Support for Unstructured Meshes

    Despite ubiquitous usage of unstructured mesh in many application domains (e.g., computer aided design, scientific simulation, climate modeling, etc.), there is no specialized mesh database which supports storing and querying such data structures. Existing mesh libraries use file-based APIs which do not support declarative querying and are difficult to maintain....

    Provided By VLD Digital

  • White Papers // Apr 2015

    RecDB in Action: Recommendation Made Easy in Relational Databases

    In this paper, the authors demonstrate RecDB; a full-fledged database system that provides personalized recommendation to users. They implemented RecDB using an existing open source database system PostgreSQL, and they demonstrate the effectiveness of RecDB using two existing recommendation applications restaurant recommendation and movie recommendation. To make this even more...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Mosquito: Another One Bites the Data Upload STream

    Mosquito is a lightweight and adaptive physical design framework for Hadoop. Mosquito connects to existing data pipelines in Hadoop MapReduce and/or HDFS, observes the data, and creates better physical designs, i.e. indexes, as a byproduct. The authors' approach is minimally invasive, yet it allows users and developers to easily improve...

    Provided By VLD Digital

  • White Papers // Apr 2015

    NoFTL: Database Systems on FTL-less Flash Storage

    The database architecture and workhorse algorithms have been designed to compensate for hard disk properties. The I/O characteristics of flash memories have significant impact on database systems and many algorithms and approaches taking advantage of those have been proposed recently. Nonetheless on system level Flash storage devices are still treated...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Lazy ETL in Action: ETL Technology Dates Scientific Data

    Both scientific data and business data have analytical needs. Analysis takes place after a scientific data warehouse is eagerly filled with all data from external data sources (repositories). This is similar to the initial loading stage of Extract, Transform, and Load (ETL) processes that drive business intelligence. ETL can also...

    Provided By VLD Digital

  • White Papers // Apr 2015

    EnviroMeter: A Platform for Querying Community-Sensed Data

    Efficiently querying data collected from Large-area Community driven Sensor Networks (LCSNs) is a new and challenging problem. In the authors' previous papers, they proposed adaptive techniques for learning models (e.g., statistical, nonparametric, etc.) from such data, considering the fact that LCSN data is typically geo-temporally skewed. In this paper, they...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Scolopax: Exploratory Analysis of Scientific Data

    The formulation of hypotheses based on patterns found in data is an essential component of scientific discovery. As larger and richer data sets become available, new scalable and user-friendly tools for scientific discovery through data analysis are needed. The authors demonstrate Scolopax, which explores the idea of a search engine...

    Provided By VLD Digital

  • White Papers // Apr 2015

    PROPOLIS: Provisioned Analysis of Data-Centric Processes

    The authors consider in this demonstration the (static) analysis of data-centric process-based applications, namely applications that depend on an underlying database and whose control is guided by a finite state transition system. They observe that analysts of such applications often want to do more than analyze a specific instance of...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Feature Selection in Enterprise Analytics: A Demonstration Using an R-based Data Analytics System

    Enterprise applications are analyzing ever larger amounts of data using advanced analytics techniques. Recent systems from Oracle, IBM, and SAP integrate R with a data processing system to support richer advanced analytics on large data. A key step in advanced analytics applications is feature selection, which is often an iterative...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Consuming Linked data in Supply Chains: Enabling data visibility via Linked Pedigrees

    The performance of a supply chain depends critically on the coordinating actions and decisions undertaken by the trading partners. The sharing of product and process information plays a central role in the coordination and is a key driver for the success of the supply chain. In this paper, the authors...

    Provided By Aston University

  • White Papers // Apr 2015

    Convergence of IT and Data Mining with Other Technologies

    In this age of technology and communication convergence, the user cannot help but be impacted by technologies and innovations that center on computers. This paper examines the growing need for a secure environment and a general-purpose analytics engine. The process need for secure environment is physical or It security and...

    Provided By International Scientific Research Organization for Science, Engineering and Technology

  • White Papers // Apr 2015

    Data Dependencies Mining In Database by Removing Equivalent Attributes

    Database design methodology normally starts with the first step of conceptual schema design in which users' requirements are modeled as the Entity Relationship (ER) diagram. The next step of logical design focuses on the translation of conceptual schemas into relations or database tables. Data dependency plays a key role in...

    Provided By International Scientific Research Organization for Science, Engineering and Technology

  • White Papers // Apr 2015

    A Comparison of Knives for Bread Slicing

    Vertical partitioning is a crucial step in physical database design in row-oriented databases. A number of vertical partitioning algorithms have been proposed over the last three decades for a variety of niche scenarios. In principle, the underlying problem remains the same: decompose a table into one or more vertical partitions....

    Provided By VLD Digital

  • White Papers // Apr 2015

    Efficient Querying of Inconsistent Databases with Binary Integer Programming

    An inconsistent database is a database that violates one or more integrity constraints. A typical approach for answering a query over an inconsistent database is to first clean the inconsistent database by transforming it to a consistent one and then apply the query to the consistent database. An alternative and...

    Provided By VLD Digital

  • White Papers // Apr 2015

    Discovering Linkage Points over Web Data

    A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator that associates attributes of one database to another....

    Provided By VLD Digital

  • White Papers // Apr 2015

    Supporting User-Defined Functions on Uncertain Data

    Uncertain data management has become crucial in many sensing and scientific applications. As User-Defined Functions (UDFs) become widely used in these applications, an important task is to capture result uncertainty for queries that evaluate UDFs on uncertain data. In this paper, the authors provide a general framework for supporting UDFs...

    Provided By VLD Digital

  • White Papers // Apr 2015

    TripleBit: a Fast and Compact System for Large Scale RDF Data

    The volume of RDF data continues to grow over the past decade and many known RDF datasets have billions of triples. A grant challenge of managing this huge RDF data is how to access this big RDF data efficiently. A popular approach to addressing the problem is to build a...

    Provided By VLD Digital