Association for Computing Machinery

Displaying 281-320 of 7275 results

  • White Papers // Aug 2013

    EventCube: Multi-Dimensional Search and Mining of Structured and Text Data

    A large portion of real world data is either text or structured (e.g., relational) data. Moreover, such data objects are often linked together (e.g., structured specification of products linking with the corresponding product descriptions and customer comments). Even for text data such as news data, typed entities can be extracted...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    AMETHYST: A System for Mining and Exploring Topical Hierarchies of Heterogeneous Data

    In this paper, the authors present AMETHYST, a system for exploring and analyzing a topical hierarchy constructed from a Heterogeneous Information Network (HIN). HINs, composed of multiple types of entities and links are very common in the real world. Many have a text component, and thus can benefit from a...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Cost-Sensitive Online Active Learning with Application to Malicious URL Detection

    Malicious Uniform Resource Locator (URL) detection is an important problem in web search and mining, which plays a critical role in internet security. In literature, many existing studies have attempted to formulate the problem as a regular supervised binary classification task, which typically aims to optimize the prediction accuracy. However,...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Social Influence Based Clustering of Heterogeneous Information Networks

    Social networks continue to grow in size and the type of information hosted. The authors witness a growing interest in clustering a social network of people based on both their social relationships and their participations in activity based information networks. In this paper, they present a social influence based clustering...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    SAE: Social Analytic Engine for Large Networks

    The rapid proliferation of online social networks provides rich data for the user to understand the complex mechanism that governs the dynamics of social networks. This has attracted much attention from both academic and industrial communities. For example, SNAP is general purpose network analysis and graph mining library. It is...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Connecting Users Across Social Media Sites: A Behavioral-Modeling Approach

    People use various social media for different purposes. The information on an individual site is often incomplete. When sources of complementary information are integrated, a better profile of a user can be built to improve online services such as verifying online information. To integrate these sources of information, it is...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    A Tool for Collecting Provenance Data in Social Media

    In recent years, social media sites have provided a large amount of information. Recipients of such information need mechanisms to know more about the received information, including the provenance. Previous paper has shown that some attributes related to the received information provide additional context, so that a recipient can assess...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Understanding Twitter Data with TweetXplorer

    The term \"Big data\" describes data of a magnitude so large that it requires a change in methodology in order to process. In the era of big data it is increasingly difficult for an analyst to extract meaningful knowledge from a sea of information. The authors present TweetXplorer, a system...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Big Data Analytics with Small Footprint: Squaring the Cloud

    The motive for the BID data suite is exploratory data analysis. Exploratory analysis involves sifting through data, making hypotheses about structure and rapidly testing them. This paper describes the BID Data Suite, a collection of hardware, software and design patterns that enable fast, large-scale data mining at very low cost....

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    A General Bootstrap Performance Diagnostic

    As datasets become larger, more complex, and more available to diverse groups of analysts, it would be quite useful to be able to automatically and generically assess the quality of estimates, much as the user are able to automatically train and evaluate predictive models such as classifiers. However, despite the...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Subsampling for Efficient and Effective Unsupervised Outlier Detection Ensembles

    Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Here, the authors propose and study sub-sampling as a technique to induce diversity among individual outlier detectors. They show analytically and experimentally that an...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases

    The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Kernel Density Metric Learning

    In this paper, the authors introduce a supervised metric learning algorithm, called Kernel Density Metric Learning (KDML), which is easy to use and provides nonlinear, probability-based distance measures. KDML constructs a direct nonlinear mapping from the original input space into a feature space based on kernel density estimation. The nonlinear...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    CAPRI: A Tool for Mining Complex Line Patterns in Large Log Data

    Log files provide important information for troubleshooting complex systems. However, the structure and contents of the log data and messages vary widely. For automated processing, it is necessary to first understand the layout and the structure of the data, which becomes very challenging when a massive amount of data and...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    NLSR: Named-Data Link State Routing Protocol

    In this paper, the authors present the design of the Named-data Link State Routing protocol (NLSR), a routing protocol for Named Data Networking (NDN). Since NDN uses names to identify and retrieve data, NLSR propagates reachability to name prefixes instead of IP prefixes. Moreover, NLSR differs from IP-based link-state routing...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Energy-Efficient Content Retrieval in Mobile Cloud

    Mobile Cloud Computing (MCC) has recently been drawing increased attention in academia as well as industry. Content retrieval is a critical service, for many mobile cloud applications and in turns relies on other resources and tools, e.g., internal storage, content searching and sharing, etc. Previous studies have shown that conventional...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Autonomously Improving Query Evaluations Over Multidimensional Data in Distributed Hash Tables

    The proliferation of observational devices and sensors with networking capabilities has led to growth in both the rates and sources of data that ultimately contribute to extreme-scale data volumes. Datasets generated in such settings are often multidimensional, with each dimension accounting for a feature of interest. The authors posit that...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    A Flexible Elastic Control Plane for Private Clouds

    While public cloud computing platforms have become popular in recent years, private clouds - operated by enterprises for their internal use - have also begun gaining traction. The configuration and continuous tuning of a private cloud to meet user demands is a complex task. While private cloud management frameworks provide...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2013

    Mobile Social Networking Through Friend-to-Friend Opportunistic Content Dissemination

    The authors focus on dissemination of content for delay tolerant applications, (i.e. content sharing, advertisement propagation, etc.) where users are geographically clustered into communities. They propose a novel architecture that addresses the issues of lack of trust, delivery latency, loss of user control, and privacy-aware distributed mobile social networking by...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Low-Power, Low-Storage-Overhead Chipkill Correct Via Multi-Line Error Correction

    Due to their large memory capacities, many modern servers require chipkill-correct, an advanced type of memory error detection and correction, to meet their reliability requirements. However, existing chipkill-correct solutions incur high power or storage overheads or both because they use dedicated error-correction resources per codeword to per-form error correction. This...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Opportunities and pitfalls of multi-core scaling using Hardware Transaction Memory

    Hardware transactional memory, which holds the promise to simplify and scale up multicore synchronization, has recently become available in main stream processors in the form of Intel's Restricted Transactional Memory (RTM). Will RTM be a panacea for multicore scaling? This paper tries to shed some light on this question by...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Cache-Conscious Performance Optimization for Similarity Search

    All-pairs similarity search can be implemented in two stages. The first stage is to partition the data and group potentially similar vectors. The second stage is to run a set of tasks where each task compares a partition of vectors with other candidate partitions. Because of data sparsity, accessing feature...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Personal Cloudlets for Privacy and Resource Efficiency in Mobile In-App Advertising

    Mobile in-app ads are the major funding source for free mobile apps which users download from various app markets and install in their Smartphones. However, number researchers have recently pointed out that ad-supported free apps involve hidden costs to the users. Costs are primarily associated with the loss of privacy...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Automated, Retargetable Back-Annotation for Host Compiled Performance and Power Modeling

    With traditional cycle-accurate or instruction-set simulations of processors often being too slow, host-compiled or source-level software execution approaches have recently become popular. Such high-level simulations can achieve order of magnitude speedups, but approaches that can achieve highly accurate characterization of both power and performance metrics are lacking. In this paper,...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Locality-Aware Task Management for Unstructured Parallelism: A Quantitative Limit Study

    In this paper, the authors increase the number of cores on a processor die, the on-chip cache hierarchies that support these cores are getting larger, deeper, and more complex. As a result, non-uniform memory access effects are now prevalent even on a single chip. To reduce execution time and energy...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede

    In this paper, the authors describe their preliminary experiences on the stampede system in the context of the Uintah computational framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on structured adaptive grids. Uintah uses a combination of fluid-flow solvers and particle-based...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding of Concurrency Bugs

    This paper presents Griffin, a new fault-comprehension technique. Griffin provides a way to explain concurrency bugs using additional information over existing fault-localization techniques, and thus, bridges the gap between fault-localization and fault-fixing techniques. Griffin inputs a list of memory-access patterns and a coverage matrix, groups those patterns responsible for the...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Tri-Level-Cell Phase Change Memory: Toward an Efficient and Reliable Memory System

    There are several emerging memory technologies looming on the horizon to compensate the physical scaling challenges of DRAM. Phase Change Memory (PCM) is one such candidate proposed for being part of the main memory in computing systems. One salient feature of PCM is its Multi-Level-Cell (MLC) property, which can be...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Tutorial: Stream Processing Optimizations

    The authors are living in an increasingly connected and instrumented world, where a large number and variety of data sources are available from various software and hardware sensors. These data sources often take the form of continuous data streams. Examples can be found in several domains, such as live stock...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    StreamHub: A Massively Parallel Architecture for High-Performance Content-Based Publish/Subscribe

    By routing messages based on their content, publish/subscribe (pub/sub) systems remove the need to establish and maintain fixed communication channels. Pub/sub is a natural candidate for designing large-scale systems, composed of applications running in different domains and communicating via middleware solutions deployed on a public cloud. Such pub/sub systems must...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    HSG-LM: Hybrid-Copy Speculative Guest OS Live Migration without Hypervisor

    Current Virtual Machine (VM) live migration mechanisms only focus on providing a high availability service by offering minimal downtime to users. In this paper, the authors present a novel live migration technique called HSG-LM, which also aims to provide short waiting time to whoever is responsible for triggering the VM...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Linux Block IO: Introducing Multi-Queue SSD Access on Multi-Core Systems

    The IO performance of storage devices has accelerated from hundreds of IOPS five years ago, to hundreds of thousands of IOPS today, and tens of millions of IOPS projected in five years. This sharp evolution is primarily due to the introduction of NAND- ash devices and their data parallel design....

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Precise Memory Leak Detection for Java Software Using Container Profiling

    A memory leak in a Java program occurs when object references that are no longer needed are unnecessarily maintained. Such leaks are difficult to detect because static analysis typically cannot precisely identify these redundant references, and existing dynamic leak detection tools track and report fine-grained information about individual objects, producing...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Autonomic Provisioning With Self-Adaptive Neural Fuzzy Control for Percentile-Based Delay Guarantee

    Autonomic server provisioning for performance assurance is a critical issue in Internet services. It is challenging to guarantee that requests flowing through a multi-tier system will experience an acceptable distribution of delays. The difficulty is mainly due to highly dynamic workloads, the complexity of underlying computer systems, and the lack...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    Data Debugging with Continuous Testing

    Today, systems rely as heavily on data as on the software that manipulates those data. Errors in these systems are incredibly costly, annually resulting in multi-billion dollar losses, and, on multiple occasions, in death. While software debugging and testing have received heavy research attention, less effort has been devoted to...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    CrowdAtlas: Self-Updating Maps for Cloud and Personal Use

    The inaccuracy of manually created digital road maps is a persistent problem, despite their high economic value. The authors present CrowdAtlas, which automates map update based on people's travels, either individually or crowdsourced. Its mobile navigation app detects significant portions of GPS traces that do not conform to the existing...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    Auditeur: A Mobile-Cloud Service Platform for Acoustic Event Detection on Smartphones

    Auditeur is a general-purpose, energy-efficient, and context-aware acoustic event detection platform for Smartphones. It enables app developers to have their app register for and get notified on a wide variety of acoustic events. Auditeur is backed by a cloud service to store user contributed sound clips and to generate an...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    RetroSkeleton: Retrofitting Android Apps

    An obvious asset of the android platform is the tremendous number and variety of available apps. There is a less obvious, but potentially even more important, benefit to the fact that nearly all apps are developed using a common platform. The authors can leverage the relatively uniform nature of android...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    AdRob: Examining the Landscape and Impact of Android Application Plagiarism

    Malicious activities involving android applications are rising rapidly. As prior work on cyber-crimes suggests, the authors need to understand the economic incentives of the criminals to design the most effective defenses. In this paper, they investigate application plagiarism on android markets at a large scale. They take the first step...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    Just-in-Time Provisioning for Cyber Foraging

    Cloud offload is an important technique in mobile computing. VM-based cloudlets have been proposed as offload sites for the resource intensive and latency-sensitive computations typically associated with mobile multimedia applications. Since cloud offload relies on precisely-configured back-end software, it is difficult to support at global scale across cloudlets in multiple...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    A Multi-Level DHT Routing Framework With Aggregation

    Information-Centric Networking (ICN) has recently attracted research attention, which decouples content from hosts at the network layer, and retrieves a content object by its name (identifier), instead of its storage location (host IP address) in order to address IP network's limitations in supporting content distribution. However, ICN systems face scalability...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    CATT: Potential Based Routing With Content Caching for ICN

    Information Centric Networking (ICN) has shown possibilities to solve several problems of the Internet. At the same time, some problems need to be tackled in order to advance this promising architecture. In this paper, the authors address two of the problems, namely routing and content caching. For the routing, they...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    On the Effects of Caching in Access Aggregation Networks

    All forecasts of Internet traffic point at a substantial growth over the next few years. From a network operator perspective, efficient in-network caching of data is and will be a key component in trying to cope with and pro t from this increasing demand. One problem, however, is to evaluate...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    Caesar: A Content Router for High Speed Forwarding

    Today, high-end routers forward hundreds of millions of packets per second by means of longest prefix match on forwarding tables with less than a million IP prefixes. Information-centric networking, a novel form of networking where content is requested by its name, poses a new challenge in the design of high-end...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    An Information-Centric Architecture for Data Center Networks

    The authors propose a new Data Center Network (DCN) architecture, based on the principles of Information-Centric Networking (ICN). Their Info-Centric Data Center Network (IC-DCN) addresses many of the pain-points in current DCNs, such as network scalability, host mobility, etc. At the same time, IC-DCN introduces a number of new features...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    Access Control Enforcement Delegation for Information-Centric Networking Architectures

    Information is the building block of Information Centric Networks (ICNs). Access control policies limit information dissemination to authorized entities only. Defining access control policies in an ICN is a non-trivial task as an information item may exist in multiple copies dispersed in various network locations, including caches and content replication...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    ICN-RE: Redundancy Elimination for Information-Centric Networking

    This paper bridges Information-Centric Networking (ICN), a novel form of networking centered around information or content, and Redundancy Elimination (RE), a popular technique widely used to identify content with similar byte-stream. The result is ICN-RE, the first ICN design that supports redundancy elimination. The authors show by means of numerical...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    On Cloud-Centric Network Architecture for Multi-Dimensional Mobility

    Despite pervasive deployment of wireless networks, maintaining seamless mobile connectivity within a set of local devices and to the remote cloud is still challenging. The crux of this challenge stems from the simultaneous interplay of multiple dimensions of a user's mobility - users frequently move between multiple access networks, mobile...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    SCAMPI: Service Platform for Social Aware Mobile and Pervasive Computing

    Allowing mobile users to find and access resources available in the surrounding environment opportunistically via their smart devices could enable them to create and use a rich set of services. Such services can go well beyond what is possible for a mobile phone acting alone. In essense, access to diverse...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    Scalability of a Mobile Cloud Management System

    Ubiquitous network access allows people to access an ever increasing range of services from a variety of mobile terminals, including laptops, tablets and Smartphones. A flexible and economically efficient way of provisioning such services is through Cloud Computing. Assuming that several cloud-enabled datacenters are made available at the edges of...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    Computing in Cirrus Clouds: The Challenge of Intermittent Connectivity

    Mobile devices are increasingly being relied on for tasks that go beyond simple connectivity and demand more complex processing. The primary approach in wide use today uses cloud computing resources to off-load the "Heavy lifting" to specially designated servers when they are well connected. In reality, a mobile device often...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    SNARF: A Social Networking-Inspired Accelerator Remoting Framework

    The diminishing size and battery requirements of mobile devices restrict the scope of computations possible on such devices and motivate approaches that support the selective offloading of computations to remote resources. With a variety of resources available to potentially host offloaded computations - such as cloud-provisioned resources, and devices within...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    Characterization of the Impact of Resource Availability on Opportunistic Computing

    With opportunistic computing, devices are no longer restricted to using their own services and resources, but can access services and resources made available by other devices. The performance of opportunistic computing is greatly affected by the resource topology in the network: what resources/services are available, as well as when and...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    A Cloud-Assisted Design for Autonomous Driving

    This paper presents Carcel, a cloud-assisted system for autonomous driving. Carcel enables the cloud to have access to sensor data from autonomous vehicles as well as the roadside infrastructure. The cloud assists autonomous vehicles that use this system to avoid obstacles such as pedestrians and other vehicles that may not...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    Fog Computing and Its Role in the Internet of Things

    Fog Computing extends the Cloud Computing paradigm to the edge of the network, thus enabling a new breed of applications and services. Defining characteristics of the Fog are: low latency and location awareness; wide-spread geographical distribution; mobility; very large number of nodes, predominant role of wireless access, strong presence of...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    The Case for Cloud-Enabled Mobile Sensing Services

    The authors make the case for cloud-enabled mobile sensing services that support an emerging application class, one which infers near-real time collective context using sensor data obtained continuously from a large set of consumer mobile devices. They present the high-level architecture and functional requirements for such a mobile sensing service,...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    An Integrated Cloud-Based Framework for Mobile Phone Sensing

    Now-a-days mobile phones are not only communication devices, but also a source of rich sensory data that can be collected and exploited by distributed people-centric sensing applications. Among them, environmental monitoring and emergency response systems can particularly benefit from people-based sensing. Due to the limited resources of mobile devices, sensed...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    Energy-Aware Keyword Search on Mobile Phones

    With the explosive growth of communication technologies, modern mobile phones become more powerful than ever. Unfortunately, the battery lifetime of mobile phones is still limited, and energy awareness is a priority of designing applications on mobile phones. To demonstrate the energy awareness on mobile phones, the authors propose a new...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2011

    Integrating and Querying Web Databases and Documents

    There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semi-structured (documents). A key challenge is to integrate, query and analyze such heterogeneous collections of information. In this paper, the authors defend the idea of building web metadata repositories using relational databases as...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2011

    ONTOCUBE: Efficient Ontology Extraction Using OLAP Cubes

    Ontologies are knowledge conceptualizations of a particular domain and are commonly represented with hierarchies. While final ontologies appear deceivingly simple on paper, building ontologies represents a time-consuming task that is normally performed by natural language processing techniques or schema matching. On the other hand, OLAP cubes are most commonly used...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2012

    Interactive Exploration and Visualization of OLAP Cubes

    An OLAP cube is typically explored with multiple aggregations selecting different subsets of cube dimensions to analyze trends or to discover unexpected results. Unfortunately, such analytic process is generally manual and fails to statistically explain results. In this paper, the authors propose to combine dimension lattice traversal and parametric statistical...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Keyword Search Across Databases and Documents

    Given the continuous growth of databases and the abundance of diverse files in modern IT environments, there is a pressing need to integrate keyword search on heterogeneous information sources. A particular case in which such integration is needed occurs when a collection of documents (e.g., word processing documents, spreadsheets, text...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Query Recommendation in Digital Libraries Using OLAP

    Query suggestion is well-known to enhance the user's search for relevant documents. In this paper, the authors propose a novel technique that emulates a human skill when searching or exploring digital collections. In general, a user begins searching by providing a naive query and then analyzes the retrieved documents in...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    OLAP-Based Query Recommendation

    Query recommendation is an invaluable tool for enabling users to speed up their searches. In this paper, the authors present algorithms for generating query suggestions, assuming no previous knowledge of the collection. They developed an online OLAP algorithm to generate query suggestions for the users based on the frequency of...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Repairing OLAP Queries in Databases With Referential Integrity Errors

    Many database applications and OLAP tools dynamically generate SQL queries involving join operators and aggregate functions and send these queries to a database server for execution. This dynamically generated SQL code normally assumes the underlying tables and columns are clean and lacks the necessary robustness to deal with foreign keys...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2010

    Relational Versus Non-Relational Database Systems for Data Warehousing

    Relational database systems have been the dominating technology to manage and analyze large data warehouses. Moreover, the ER model, the standard in database design, has a close relationship with the relational model. Recently, there has been a surge of alternative technologies for large scale analytic processing, most of which are...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Comparing SQL and MapReduce to Compute Naive Bayes in a Single Table Scan

    Most data mining processing is currently performed on flat files outside the DBMS. The authors propose novel techniques to process such data mining computations inside the DBMS. They focus on the popular Naive Bayes classification algorithm. In contrast to most approaches, their techniques work completely inside the DBMS, exploiting the...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2010

    Efficient Algorithms Based on Relational Queries to Mine Frequent Graphs

    Frequent sub-graph mining is an important problem in data mining with wide application in science. For instance, graphs can be used to represent structural relationships in problems related to network topology, chemical compound, protein structures, and so on. Searching for patterns from graph databases is difficult since graph-related operations generally...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2010

    Fast PCA and Bayesian Variable Selection for Large Data Sets Based on SQL and UDFs

    Large amounts of data are stored in relational DBMSs. However, statistical analysis is frequently performed outside the DBMS using statistical tools, such as the well-known R package, leading to slow processing when data sets cannot t in main memory and going through a le export bottleneck. In this paper, the...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Fast and Dynamic OLAP Exploration Using UDFs

    OLAP is a set of database exploratory techniques to efficiently retrieve multiple sets of aggregations from a large dataset. Generally, these techniques have either involved the use of an external OLAP server or required the dataset to be exported to a specialized OLAP tool for more efficient processing. In this...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2009

    Efficient Computation of PCA With SVD in SQL

    PCA is one of the most common dimensionality reduction techniques with broad applications in data mining, statistics and signal processing. In this paper, the authors study how to leverage a DBMS computing capabilities to solve PCA. They propose a solution that combines a summarization of the data set with the...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Consistency-Aware Evaluation of OLAP Queries in Replicated Data Warehouses

    OLAP tools for distributed data warehouses generally assume underlying replicated tables are up to date. Unfortunately, maintaining updated replicas is difficult due to the inherent tradeoff between consistency and availability. In this paper, the authors propose techniques to evaluate OLAP queries in distributed data warehouses assuming a lazy replication model....

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Exploration and Visualization of OLAP Cubes With Statistical Tests

    In On-Line Analytical Processing (OLAP), users explore a database cube with roll-up and drill-down operations in order to find interesting results. Most approaches rely on simple aggregations and value comparisons in order to validate findings. In this paper, the authors propose to combine OLAP dimension lattice traversal and statistical tests...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    DBDOC: Querying and Browsing Databases and Interrelated Documents

    Large collections of documents are commonly created around a database, where a typical database schema may contain hundreds of tables and thousands of columns. The authors developed a system based on SQL code generation and user-defined functions that analyzes document-to-metadata links by extracting a basic set of relationships at different...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Information Retrieval From Digital Libraries in SQL

    Information retrieval techniques have been traditionally exploited outside of relational database systems, due to storage overhead, the complexity of programming them inside the database system, and their slow performance in SQL implementations. This paper supports the idea that searching and querying digital libraries with information retrieval models in relational database...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2011

    Estimating and Bounding Aggregations in Databases With Referential Integrity Errors

    Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2012

    Building Statistical Models and Scoring With UDFs

    Multidimensional statistical models are generally computed outside a relational DBMS, exporting data sets. This paper explains how fundamental multidimensional statistical models are computed inside the DBMS in a single table scan exploiting SQL and User-Defined Functions (UDFs). The techniques described herein are used in a commercial data mining tool, called...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2012

    Metadata Management for Federated Databases

    A federated database consists of several loosely integrated databases, where each database may contain hundreds of tables and thousands of columns, interrelated by complex foreign key relationships. In general, there exists a lot of semi-structured data elements outside the database represented by documents (files), created and updated by multiple users...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2011

    Q-Score: Proactive Service Quality Assessment in a Large IPTV System

    In large-scale IPTV systems, it is essential to maintain high service quality while providing a wider variety of service features than typical traditional TV. Thus service quality assessment systems are of paramount importance as they monitor the user-perceived service quality and alert when issues occurs. For IPTV systems, however, there...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2011

    Measurement and Analysis of a Large Scale Commercial Mobile Internet TV System

    Large scale, Internet based mobile TV deployment presents both tremendous opportunities and challenges for mobile operators and technology providers. This paper presents a measurement based study on a large scale mobile TV ser-vice offering in China. Within the one month measurement period, the authors' dataset captured over 1 million unique...

    Provided By Association for Computing Machinery