Association for Computing Machinery

Displaying 241-280 of 6662 results

  • White Papers // Dec 2013

    QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon

    Large-scale Data Centers (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty of matching applications to one of the many hardware platforms available can degrade performance, violating the Quality of Service (QoS) guarantees that many cloud workloads require. While previous work...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    XLynx - An FPGA-Based XML Filter for Hybrid XQuery Processing

    While offering unique performance and energy saving advantages, the use of Field-Programmable Gate Arrays (FPGAs) for database acceleration has demanded major concessions from system designers. Either the programmable chips have been used for very basic application tasks (such as implementing a rigid class of selection predicates), or their circuit definition...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Selecting Representative Benchmark Inputs for Exploring Microprocessor Design Spaces

    The design process of a microprocessor requires representative workloads to steer the search process toward an optimum design point for the target application domain. However, considering a broad set of workloads to cover the large space of potential workloads is infeasible given how time-consuming design space exploration typically is. Hence,...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Accelerating an Application Domain with Specialized Functional Units

    Hardware specialization has received renewed interest recently as chips are hitting power limits. Chip designers of traditional processor architectures have primarily focused on general-purpose computing, partially due to time-to-market pressure and simpler design processes. But new power limits require some chip specialization. Although hardware configured for a specific application yields...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

    Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper, while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    XLynx-An FPGA-based XML Filter for Hybrid XQuery Processing

    While offering unique performance and energy saving advantages, the use of Field-Programmable Gate Arrays (FPGAs) for database acceleration has demanded major concessions from system designers. Either the programmable chips have been used for very basic application tasks (such as implementing a rigid class of selection predicates), or their circuit definition...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Flexible Filters in Stream Programs

    The stream-processing model is a natural fit for multicore systems because it exposes the inherent locality and concurrency of a program and highlights its separable tasks for efficient parallel implementations. The authors present flexible filters, a load-balancing optimization technique for stream programs. Flexible filters utilize the programmability of the cores...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    A Flexible Framework for Detecting IPv6 Vulnerabilities

    Security has recently become a very important concern for entities using IPv6 networks. This is especially true with the recent news reports where governments and companies have admitted to credible cyber attacks against them in which confidential information and the security of data have been compromised. In this paper, the...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Go with the Flow: Toward Workflow-Oriented Security Assessment

    In this paper, the authors advocate the use of workflow-describing how a system provides its intended functionality-as a pillar of cybersecurity analysis and propose a holistic workflow-oriented assessment framework. While workflow models are currently used in the area of performance and reliability assessment, these approaches are designed neither to assess...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Supporting End-to-End Social Media Data Analysis with the IndexedHBase Platform

    As data intensive applications evolve, many research projects involving big data require efficient extraction and analysis of specific data subsets, rather than the whole dataset. Social media data analysis is one such example. While social media platforms such as Twitter provide tremendous data about all kinds of social activities, most...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Towards Minimal-Delay Deadline-Driven Data Center TCP

    Cloud datacenter applications such as web search, retail, advertising, and recommendation systems, etc., generate a diverse mix of short and long flows that carry widely varying deadlines due to their soft-real time nature. In this paper, the authors present MCP, a novel distributed and reactive transport protocol for Data Center...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Exploiting Application Dynamism and Cloud Elasticity for Continuous Dataflows

    Contemporary continuous data flow systems use elastic scaling on distributed cloud resources to handle variable data rates and to meet applications' needs while attempting to maximize resource utilization. However, virtualized clouds present an added challenge due to the variability in resource performance - over time and space - thereby impacting...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Scalable Parallel OPTICS Data Clustering Using Graph Algorithmic Techniques

    Clustering is a data mining technique that groups data into meaningful subclasses, known as clusters, such that it minimizes the intra-differences and maximizes inter-differences of these subclasses. For the purpose of knowledge discovery, it identifies dense and sparse regions and therefore, discovers overall distribution patterns and correlations in the data....

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes

    Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made - a point where remediation can be difficult. However, creating analytical performance models that...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    CooMR: Cross-Task Coordination for Efficient Data Management in MapReduce Programs

    Hadoop is a widely adopted open source implementation of MapReduce programming model for big data processing. It represents system resources as available map and reduces slots and assigns them to various tasks. This execution model gives little regard to the need of cross-task coordination on the use of shared system...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Feng Shui of Supercomputer Memory: Positional Effects in DRAM and SRAM Faults

    Several recent publications confirm that faults are common in high-performance computing systems. Therefore, further attention to the faults experienced by such computing systems is warranted. In this paper, the authors present a study of DRAM and SRAM faults in large high-performance computing systems. Their goal is to understand the factors...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

    Owing to the significant high rate of component failures at extreme scales, system services will need to be failure-resistant, adaptive and self-healing. A majority of HPC services are still designed around a centralized paradigm and hence are susceptible to scaling issues. Peer-To-Peer (P2P) services have proved themselves at scale for...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Load-Balanced Pipeline Parallelism

    Accelerating a single thread in current parallel systems remains a challenging problem, because sequential threads do not naturally take advantage of the additional cores. Recent paper shows that automatic extraction of pipeline parallelism is an effective way to speed up single thread execution. However, two problems remain challenging - load...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Tera-Scale 1D FFT With Low-Communication Algorithm and Intel R Xeon Phi Coprocessors

    In this paper, the authors demonstrate the first tera-scale performance of Intel Xeon Phi coprocessors on 1D FFT computations. Applying a disciplined performance programming methodology of sound algorithm choice, valid performance model, and well-executed optimizations, they break the tera-flop mark on a mere 64 nodes of Xeon Phi and reach...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing

    Due to limits in technology scaling, software developers have come to rely on thread-level parallelism to obtain sustainable performance improvement. However, except for the case where the computation is massively parallel (e.g., data-parallel applications), performance of threaded applications is often limited by how inter-thread synchronization is performed. For example, using...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Location-Aware Cache Management for Many-Core Processors with Deep Cache Hierarchy

    As cache hierarchies become deeper and the number of cores on a chip increases, managing caches becomes more important for performance and energy. However, current hardware cache management policies do not always adapt optimally to the applications behavior: e.g., caches may be polluted by data structures whose locality cannot be...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Exploring DRAM Organizations for Energy-Efficient and Resilient Exascale Memories

    The power target for exa-scale supercomputing is 20MW, with about 30% budgeted for the memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Investigating Applications Portability with the Uintah DAG-Based Runtime System on PetaScale Supercomputers

    Present trends in high performance computing present formidable challenges for applications code using multicore nodes possibly with accelerators and/or co-processors and reduced memory while still attaining scalability. Software frameworks that execute machine independent applications code using a runtime system that shields users from architectural complexities offer a possible solution. The...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Efficient Data Partitioning Model for Heterogeneous Graphs in the Cloud

    As the size and variety of information networks continue to grow in many scientific and engineering domains, the authors witness a growing demand for efficient processing of large heterogeneous graphs using a cluster of compute nodes in the cloud. One open issue is how to effectively partition a large graph...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Efficient and Customizable Data Partitioning Framework for Distributed Big RDF Data Processing in the Cloud

    Big data business can leverage and benefit from the clouds; the most optimized, shared, automated, and virtualized computing infrastructures. One of the important challenges in processing big data in the clouds is how to effectively partition the big data to ensure efficient distributed processing of the data. In this paper...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Coordinated Energy Management in Heterogeneous Processors

    In this paper the authors examine energy management in a heterogeneous processor consisting of an integrated CPU-GPU for High-Performance Computing (HPC) applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types - a new...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Practical Nonvolatile Multilevel-Cell Phase Change Memory

    Multi-Level Cell (MLC) Phase Change Memory (PCM) may provide both high capacity main memory and faster-than-Flash persistent storage. But slow growth in cell resistance with time, resistance drift, can cause transient errors in MLC-PCM. Drift errors increase with time, and prior work suggests refresh before the cell loses data. The...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Analysis of Computing and Energy Performance of Multicore, NUMA, and Manycore Platforms for an Irregular Application

    The exponential growth in processor performance seems to have reached a turning point. Now-a-days, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints paved the way for the development of multi and manycore processors. Research on...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Asynchronous Object Storage with QoS for Scientific and Commercial Big Data

    In this paper, the authors present their design for an asynchronous object storage system intended for use in scientific and commercial big data workloads. Use cases from the target workload do-mains are used to motivate the key abstractions used in the Application Programming Interface (API). The architecture of the Scalable...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Cost-Aware Cloud Bursting for Enterprise Applications

    The high cost of provisioning resources to meet peak application demands has led to the widespread adoption of pay-as-you-go cloud computing services to handle workload fluctuations. Some enterprises with existing IT infrastructure employ a hybrid cloud model where the enterprise uses its own private resources for the majority of its...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    The Energy Case for Graph Processing on Hybrid CPU and GPU Systems

    This paper investigates the power, energy, and performance characteristics of large-scale graph processing on hybrid (i.e., CPU and GPU) single-node systems. Graph processing can be accelerated on hybrid systems by properly mapping the graph-layout to processing units, such that the algorithmic tasks exercise each of the units where they perform...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Modeling and Implementation of Energy Neutral Sensing Systems

    Energy Neutral Sensing Systems (ENSSys) achieve long-time operation by combining energy-harvesting hardware with software that regulates energy saving and spending. However, simply managing the energy resources is not the goal in itself. The authors present the modeling, implementation, and evaluation of a single wireless sensor network that executes energy-harvesting algorithms...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Supporting Complex Queries and Access Policies for Multi-user Encrypted Databases

    Cloud computing is an emerging paradigm offering companies (virtually) unlimited data storage and computation at attractive costs. It is a cost-effective model because it does not require deployment and maintenance of any dedicated IT infrastructure. Despite its benefits, it introduces new challenges for protecting the confidentiality of the data. Sensitive...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    PHANTOM: Practical Oblivious Computation in a Secure Processor

    Confidentiality of data is a major concern for enterprises and individuals who wish to offload computation to the cloud. In particular, cloud operators have physical access to machines and can observe sensitive information (data and code) as it moves between a CPU and physical memory. In response to such attacks,...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    AndroTotal: A Flexible, Scalable Toolbox and Service for Testing Mobile Malware Detectors

    Although there are controversial opinions regarding how large the mobile malware phenomenon is in terms of absolute numbers, hype aside, the amount of new Android malware variants is increasing. This trend is mainly due to the fact that, as it happened with traditional malware, the authors are striving to repackage,...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Cross-Platform Malware: Write Once, Infect Everywhere

    In this ongoing paper, the authors perform the first systematic investigation of cross-platform (X-platform) malware. As a first step, this paper presents an exploration into existing X-platform malware families and X-platform vulnerabilities used to distribute them. Their exploration shows that X- platform malware uses a wealth of methods to achieve...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Delta: Automatic Identification of Unknown Web-based Infection Campaigns

    The rapid growth and widespread access to the Internet, and the ubiquity of web-based services make it easy to communicate and interact globally. However, the software used to implement the functionality of web sites is often vulnerable to different attack vectors, such as cross-site scripting or SQL injections, and access...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    An Empirical Study of Cryptographic Misuse in Android Applications

    Developers use cryptographic APIs in Android with the intent of securing data such as passwords and personal information on mobile devices. In this paper, the authors ask whether developers use the cryptographic APIs in a fashion that provides typical cryptographic notions of security, e.g., IND-CPA security. They develop program analysis...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    AppIntent: Analyzing Sensitive Data Transmission in Android for Privacy Leakage Detection

    Android phones often carry personal information, attracting malicious developers to embed code in Android applications to steal sensitive data. With known techniques in the literature, one may easily determine if sensitive data is being transmitted out of an Android phone. However, transmission of sensitive data in itself does not necessarily...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    POSTER - TRIPLEX: Verifying Data Minimisation in Communication Systems

    The main idea behind the TRIPLEX framework is to analyze relevant privacy aspects of privacy-enhancing protocols in a specified scenario that may involve several actors and protocol instances (of different protocols). Systems dealing with personal information are legally required to satisfy the principle of data minimization. Privacy-enhancing protocols use cryptographic...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2010

    Network on Chip Design and Optimization Using Specialized Influence Models

    In this paper, the authors propose the use of specialized influence models to capture the dynamic behavior of a Network-on-Chip (NoC). Their goal is to construct a versatile modeling framework that will help in the development and analysis of distributed and adaptive features for NoCs. As an application testbench, they...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2007

    CheckFence: Checking Consistency of Concurrent Data Types on Relaxed Memory Models

    Concurrency libraries can facilitate the development of multithreaded programs by providing concurrent implementations of familiar data types such as queues or sets. There exist many optimized algorithms that can achieve superior performance on multiprocessors by allowing concurrent data accesses without using locks. Unfortunately, such algorithms can harbor subtle concurrency bugs....

    Provided By Association for Computing Machinery

  • White Papers // Apr 2010

    A Power-Aware Mapping Approach to Map IP Cores onto NoCs under Bandwidth and Latency Constraints

    In this paper, the authors investigate the Intellectual Property (IP) mapping problem that maps a given set of IP cores onto the tiles of a mesh-based Network-on-Chip (NoC) architecture such that the power consumption due to inter core communications is minimized. This IP mapping problem is considered under both bandwidth...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    A System-Level Infrastructure for Multidimensional MP-SoC Design Space Co-Exploration

    In this paper, the authors present a flexible and extensible system-level MP-SoC Design Space Exploration (DSE) infrastructure, called NASA. This highly modular framework uses well-defined interfaces to easily integrate different system-level simulation tools as well as different combinations of search strategies in a simple plug-and-play fashion. Moreover, NASA deploys a...

    Provided By Association for Computing Machinery

  • White Papers // May 2013

    Process-Variation Aware Mapping of Best-Effort and Real-Time Streaming Applications to MPSoCs

    As technology scales, the impact of process variation on the MAXimum supported Frequency (FMAX) of individual cores in a Multi-Processor System-on-Chip (MPSoC) becomes more pronounced. Task allocation without variation-aware performance analysis can greatly compromise performance and lead to a significant loss in yield, defined as the percentage of manufactured chips...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2011

    Parallelizing Large-Scale Data Processing Applications with Data Skew: A Case Study in Product-Offer Matching

    The last decade has seen a surge of interest in large-scale data-parallel processing engines. While these engines share many features in common with parallel databases, they make a set of different trade-o s. In consequence many of the lessons learned for programming parallel databases have to be re-learned in the...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2012

    CUDIA: Probabilistic Cross-level Imputation using Individual Auxiliary Information

    In healthcare-related studies, individual patient or hospital data are not often publicly available due to privacy restrictions, legal issues or reporting norms. However, such measures may be provided at a higher or more aggregated level, such as state-level, county-level summaries or averages over health zones such as Hospital Referral Regions...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2011

    Performance Measurements and Modeling of a Java-based Session Initiation Protocol (SIP) Application Server

    The Session Initiation Protocol (SIP) is an Internet protocol for establishing sessions between two or more parties. It is becoming ubiquitous in uses such as voice over IP, instant messaging, internet TV, and others. Performance is a chief concern with SIP because Quality of Service (QoS) is important and SIP...

    Provided By Association for Computing Machinery

  • White Papers // May 2013

    Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability

    Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. The authors present a novel...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2014

    Red Fox: An Execution Environment for Relational Query Processing on GPUs

    Modern enterprise applications represent an emergent application arena that requires the processing of queries and computations over massive amounts of data. Large-scale, multi-GPU cluster systems potentially present a vehicle for major improvements in throughput and consequently over-all performance. However, throughput improvement using GPUs is challenged by the distinctive memory and...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2010

    Axel: A Heterogeneous Cluster with FPGAs and GPUs

    In this paper, the authors describe a heterogeneous computer cluster called Axel. Axel contains a collection of nodes; each node can include multiple types of accelerators such as FPGAs (Field Programmable Gate Arrays) and GPUs (Graphics Processing Units). A MapReduce framework for the Axel cluster is presented which exploits spatial...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2011

    Power Profiling and Optimization for Heterogeneous Multi-Core Systems

    Processing speed and energy efficiency are two of the most critical issues for computer systems. This paper presents a systematic approach for profiling the power and performance characteristics of application targeting heterogeneous multi-core computing platforms. The authors' approach enables rapid and automated design space exploration involving optimization of workload distribution...

    Provided By Association for Computing Machinery

  • White Papers // May 2010

    Programming Framework for Clusters with Heterogeneous Accelerators

    The authors describe a programming framework for high performance clusters with various hardware accelerators. In this framework, users can utilize the available heterogeneous resources productively and efficiently. The distributed application is highly modularized to support dynamic system configuration with changing types and number of the accelerators. Multiple layers of communication...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2008

    A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs

    GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their GPUs. However, manual development of high-performance parallel code for GPUs is still very challenging. In this paper, a number...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2014

    Roofline-Aware DVFS for GPUs

    Graphics Processing Units (GPUs) are becoming increasingly popular for compute workloads, mainly because of their large number of processing elements and high-bandwidth to off-chip memory. The roofline model captures the ratio between the two (the compute-memory ratio), an important architectural parameter. This work proposes to change the compute-memory ratio Dynamically,...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2009

    A Secure Unidirectional Proxy Re-Encryption Using Identity and Secret Key Exchange

    Proxy re-encryption, abbreviated as PRE, is a cryptosystem which allows the proxy to re-encrypt a ciphertext without accessing the underlying message. The re-encryption protocol should be key independent to avoid compromising the private keys of the sender and the recipient. PRE should also be secure from signature re-usability, where unreliable...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2011

    Multi-Task Dynamic Mapping onto NoC-based MPSoCs

    Task mapping defines the best placement of a given task in the MPSoC, according to some criteria, as energy or Manhattan distance minimization. The ITRS roadmap forecast in a near future MPSoCs with hundreds of Processing Elements (PEs). Therefore, dynamic mapping heuristics are required. An important gap is observed in...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2012

    Towards Automatic Actor Pinning on Multi-Core Architectures

    The actor model is a high-level programming abstraction that attempts to ease the development of parallel applications, among others, by shielding the developer from the underlying platform. In this model the execution relies on a Runtime Environment (RE) to be able to efficiently use the underlying machine. Modern processors possess...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2013

    Actor Scheduling for Multicore Hierarchical Memory Platforms

    Erlang applications are present in several mission-critical systems. These systems demand substantial computing resources that are usually provided by multiprocessor and multi-core platforms. Hierarchical memory platforms, or Non-Uniform Memory Access (NUMA) architectures, account for an important share of these platforms. Yet, the research on the suitability of the current Virtual...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Analysis of Computing and Energy Performance of Multicore, NUMA, and Manycore Platforms for an Irregular Application

    The exponential growth in processor performance seems to have reached a turning point. Now-a-days, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints paved the way for the development of multi and manycore processors. Research on...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2013

    Adaptive Virtual Channel Partitioning for Network-on-Chip in Heterogeneous Architectures

    Current heterogeneous Chip Multi-Processors (CMPs) integrate a GPU architecture on a die. However, the heterogeneity of this architecture inevitably exerts different pressures on shared resource management due to differing characteristics of CPU and GPU cores. The authors consider how to efficiently share on-chip resources between cores within the heterogeneous system,...

    Provided By Association for Computing Machinery

  • White Papers // May 2008

    Address Translation for Manycore Systems

    One of the many challenges of designing efficient manycore systems is to determine where and to what degree shared information is cached locally. In this paper, the authors specifically address efficient solutions for distributing virtual-to-physical address translations and keeping them coherent throughout a Chip Multi-Processor (CMP) system with hundreds of...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    DMR3D: Dynamic Memory Relocation in 3D Multicore Systems

    Three-dimensional multicore systems present unique opportunities for proximity driven data placement in the memory banks. Coupled with distributed memory controllers, a de-sign trend seen in recent systems, the authors propose a Dynamic Memory Relocator for 3D multicores (DMR3D) to dynamically migrate physical pages among different memory controllers. Their proposed technique...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2010

    An Integrated GPU Power and Performance Model

    GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2009

    Age Based Scheduling for Asymmetric Multiprocessors

    Asymmetric (or heterogeneous) multi-processors are becoming popular in the current era of multi-cores due to their power efficiency and potential performance and energy efficiency. However, scheduling of multithreaded applications in asymmetric multi-processors is still a challenging problem. Scheduling algorithms for asymmetric multi-processors must not only be aware of asymmetry in...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2013

    Efficient Virtual Memory for Big Memory Servers

    The authors' analysis shows that many \"Big-memory\" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory. They consume as much as 10% of execution cycles on TLB misses, even using large pages. On the other hand, they find that these workloads...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2008

    Minimizing CPU Energy in Real-Time Systems with Discrete Speed Management

    In this paper the authors present a general framework to analyze and design embedded systems minimizing the energy consumption without violating timing requirements. A set of realistic assumptions is considered in the model in order to apply the results in practical real-time applications. The processor is assumed to have as...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2010

    Sporadic Server Revisited

    The Sporadic Server (SS) overcomes the major limitations of other Resource Reservation Fixed Priority based techniques, but it also presents some drawbacks, mainly related to an increased scheduling overhead and a not so efficient behavior during overrun situations. In this paper, the authors introduce and prove the effectiveness of an...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2009

    An Implementation of the Earliest Deadline First Algorithm in Linux

    Recently, many projects have been started to introduce some real-time mechanisms into General Purpose Operating Systems (GPOS) in order to make them capable of providing the users with some temporal guarantees. Many of these projects focused especially on Linux for its capillary and widespread adoption throughout many different research and...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2011

    PROARTIS: Probabilistically Analysable Real-Time Systems

    Static timing analysis is the state-of-the-art practice to ascertain the timing behavior of current-generation real-time embedded systems. The adoption of more complex hardware to respond to the increasing demand for computing power in next-generation systems exacerbates some of the limitations of static timing analysis. In particular, the effort of acquiring...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2011

    On-line Scheduling of Real-Time Services with Profit and Penalty

    In this paper, the authors study a new family of real-time service oriented scheduling problems. The real time tasks are scheduled non-preemptively with the objective of maximizing the total utility. Different from the traditional utility accrual scheduling problem that each task is associated with only a single Time Utility Function...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2012

    Communication-Aware HW/SW Co-design for Heterogeneous Multicore Platforms

    QUAD is an open source profiling toolset, which is an integral part of the Q2 profiling framework. In this paper, the authors extend QUAD to introduce the concept of unique data values regarding the data communication among functions. This feature is important to make a proper partitioning of the application....

    Provided By Association for Computing Machinery

  • White Papers // May 2012

    Decoupled Inter- and Intra-Application Scheduling for Composable and Robust Embedded MPSoC Platforms

    Systems-on-Chip (SoCs) typically implement complex applications, each consisting of multiple tasks. Several applications share the SoC cores, to reduce cost. Applications have mixed time-criticality, i.e., real-time or not, and are typically developed together with their schedulers, by different parties. Composability, i.e., complete functional and temporal isolation between applications, is a...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2012

    Using Unfoldings in Automated Testing of Multithreaded Programs

    In multithreaded programs both environment input data and the non-deterministic inter-leavings of concurrent events can affect the behavior of the program. One approach to systematically explore the non-determinism caused by input data is dynamic symbolic execution. For testing multithreaded programs the authors present a new approach that combines dynamic symbolic...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2013

    SIMD Divergence Optimization through Intra-Warp Compaction

    SIMD execution units in GPUs are increasingly used for high performance and energy efficient acceleration of general purpose applications. However, SIMD control flow divergence effects can result in reduced execution efficiency in a class of GPGPU applications, classified as divergent applications. Improving SIMD efficiency, therefore, has the potential to bring...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2012

    Acceleration of Bulk Memory Operations in a Heterogeneous Multicore Architecture

    In this paper, the authors present a novel approach of using the integrated GPU to accelerate conventional operations that are normally performed by the CPUs, the bulk memory operations, such as memcpy or memset. Offloading the bulk memory operations to the GPU has many advantages, the throughput driven GPU outperforms...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2012

    Hybrid DRAM/PRAM-based Main Memory for Single-Chip CPU/GPU

    Single-chip CPU/GPU architecture is being adopted in high-end (embedded) systems, e.g., Smartphone and tablet PCs. Main memory subsystem is expected to consist of hybrid DRAM and Phase-change RAM (PRAM) due to the difficulties in DRAM scaling. In this paper, the authors address the performance optimization of the hybrid DRAM/PRAM main...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2013

    Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles

    Tiling is a key technique to enhance data reuse. For computations structured as one sequential outer \"Time\" loop enclosing a set of parallel inner loops, tiling only the parallel inner loops may not enable enough data reuse in the cache. Tiling the inner loops along with the outer time loop...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    A Framework for Enhancing Data Reuse via Associative Reordering

    The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algorithms and to a more limited extent in optimizing compilers. In this paper, the authors develop a novel framework utilizing the associativity and commutativity of operations in regular loop computations to enhance register...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2011

    Compilation of Stream Programs onto Scratchpad Memory Based Embedded Multicore Processors Through Retiming

    The prevalence of stream applications in signal processing, multi-media, and network processing domains has resulted in a new trend of programming and architecture design. Several languages and multicore architectures have been developed to support streaming applications. In many of these multicore architectures Scratch-Pad Memories (SPM) have substituted caches due to...

    Provided By Association for Computing Machinery