Association for Computing Machinery

Displaying 1-40 of 6350 results

  • White Papers // Apr 2014

    An Online Auction Framework for Dynamic Resource Provisioning in Cloud Computing

    Auction mechanisms have recently attracted substantial attention as an efficient approach to pricing and resource allocation in cloud computing. This work, to the authors' knowledge, represents the first online combinatorial auction designed in the cloud computing paradigm, which is general and expressive enough to both optimize system efficiency across the...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2014

    Palette: Enabling Scalable Analytics for Big-Memory, Multicore Machines

    Hadoop and its variants have been widely used for processing large scale analytics tasks in a cluster environment. However, use of a commodity cluster for analytics tasks needs to be reconsidered based on two key observations: in recent years, large memory, multicore machines have become more affordable; and recent studies...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    Maple: Scalable Multi-Dimensional Range Search over Encrypted Cloud Data with Tree-based Index

    Cloud computing promises users massive scale outsourced data storage services with much lower costs than traditional methods. However, privacy concerns compel sensitive data to be stored on the cloud server in an encrypted form. This papers a great challenge for effectively utilizing cloud data, such as executing common SQL queries....

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    Optimization Method for Request Admission Control to Guarantee Performance Isolation

    Software-as-a-Service (SaaS) often shares one single application instance among different tenants to reduce costs. However, sharing potentially leads to undesired influence from one tenant onto the performance observed by the others. Furthermore, providing one tenant additional resources to support its increasing demands without increasing the performance of tenants who do...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    The Benefit of SMT in the Multi-Core Era: Flexibility Towards Degrees of Thread-Level Parallelism

    The number of active threads in a multi-core processor varies over time and is often much smaller than the number of supported hardware threads. This requires multi-core chip designs to balance core count and per-core performance. Low active thread counts benefit from a few big, high-performance cores, while high active...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    On Quantitative Dynamic Data Flow Tracking

    Information flow tracking can be used to support access and usage control. In order to enforce real-world usage control requirements on data, one must take into account that data exist in multiple representations. The authors present a non-probabilistic model for dynamic quantitative data flow tracking. Estimations of the amount of...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    I/O Paravirtualization at the Device File Boundary

    Paravirtualization is an important I/O virtualization technology since it uniquely provides all of the following benefits: the ability to share the device between multiple VMs, support for legacy devices without virtualization hardware, and high performance. However, existing paravirtualization solutions have one main limitation: they only support one I/O device class,...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    Guardrail: A High Fidelity Approach to Protecting Hardware Devices from Buggy Drivers

    While device driver code is both critical to proper system operation and more susceptible to bugs than other system software, relatively little work has been done in the area of online driver correctness monitoring (perhaps due to the performance-sensitive nature of driver software). This paper demonstrate that decoupled correctness checking...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    Scale-Out NUMA

    Emerging datacenter applications operate on vast datasets that are kept in DRAM to minimize latency. The large number of servers needed to accommodate this massive memory footprint requires frequent server-to-server communication in applications such as key-value stores and graph-based applications that rely on large irregular data structures. The fine-grained nature...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    K2: A Mobile Operating System for Heterogeneous Coherence Domains

    Mobile System-on-Chips (SoC) that incorporate heterogeneous coherence domains promise high energy efficiency to a wide range of mobile applications, yet are difficult to program. To exploit the architecture, a desirable, yet missing capability is to replicate Operating System (OS) services over multiple coherence domains with minimum inter-domain communication. In designing...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2014

    Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach

    The demand to access large amounts of heterogeneous structured data is emerging as a trend for many users and applications. However, the effort involved in querying heterogeneous and distributed third-party databases can create major barriers for data consumers. At the core of this problem is the semantic gap between the...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2014

    Towards Fair and Efficient SMP Virtual Machine Scheduling

    As multicore processors become prevalent in modern computer systems, there is a growing need for increasing hardware utilization and exploiting the parallelism of such platforms. With virtualization technology, hardware utilization is improved by encapsulating independent workloads into Virtual Machines (VMs) and consolidating them onto the same machine. SMP virtual machines...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2014

    Leveraging Hardware Message Passing for Efficient Thread Synchronization

    As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms for thread synchronization in concurrent programs is becoming a major concern. On cache-coherent shared-memory processors, synchronization efficiency is ultimately limited by the performance of the underlying cache coherence protocol. This paper studies how hardware support for message...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2014

    Predicting Crowd Behavior with Big Public Data

    With public information becoming widely accessible and shared on today's web, greater insights are possible into crowd actions by citizens and non-state actors such as large protests and cyber activism. The authors present e orts to predict the occurrence, specific timeframe, and location of such actions before they occur based...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2014

    Fusing Data with Correlations

    Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce additional noise to the data. The authors wish to automatically distinguish correct data and erroneous data for creating a cleaner...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2014

    A Computational Field Framework for Collaborative Task Execution in Volunteer Clouds

    The increasing diffusion of cloud technologies offers new opportunities for distributed and collaborative computing. Volunteer clouds are a prominent example, where participants join and leave the platform and collaborate by sharing computational resources. The high complexity, dynamism and unpredictability of such scenarios call for decentralized self-approaches. The authors present in...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2014

    SOFT GRID - Big Data Analytics for Smart Grid

    The people live in an information rich society, where the amount of information in the world is presently doubling every year. Therefore, the greatest challenge of their period is data Handling. Big Data Analytics (BDA) has achieved its current prominence within utility domain in this context. As the power grid...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2014

    Authenticated Data Structures, Generically

    An Authenticated Data Structure (ADS) is a data structure whose operations can be carried out by an untrusted prover, the results of which a verifier can efficiently check as authentic. This paper has the prover produce a compact proof that the verifier can check along with each operation's result. ADSs...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Cloud Adoption: Prioritizing Obstacles and Obstacles Resolution Tactics Using AHP

    The enormous potential of cloud computing for improved and cost-effective service has generated unprecedented interest in its adoption. However, a potential cloud user faces numerous risks regarding service requirements, cost implications of failure and uncertainty about cloud providers' ability to meet service level agreements. These risks hinder the adoption of...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Seeking Anonymity in an Internet Panopticon

    In today's \"Big Data\" internet, users often need to assume that, by default, their every statement or action online is monitored and tracked; moreover, statements and actions are linked with detailed user profiles built by entities ranging from commercial vendors and advertisers to state surveillance agencies to online stalkers and...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Subverting System Authentication with Context-Aware, Reactive Virtual Machine Introspection

    Recent advances in bridging the semantic gap between Virtual Machines (VMs) and their guest processes have a dark side: they can be abused to subvert and compromise VM file system images and process images. To demonstrate this alarming capability, a context aware, reactive VM Introspection (VMI) instrument is presented and...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Linearly Compressed Pages: A Low-Complexity, Low-Latency Main Memory Compression Framework

    Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization

    Several system-level operations trigger bulk data copy or initialization. Even though these bulk data operations do not require any computation, current systems transfer a large quantity of data back and forth on the memory channel to perform such operations. As a result, bulk data operations consume high latency, bandwidth, and...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

    The explosive growth in digital data and its growing role in real-time decision support motivate the design of high-performance DataBase Management Systems (DBMSs). Meanwhile, slowdown in supply voltage scaling has stymied improvements in core performance and ushered an era of power-limited chips. These developments motivate the design of DBMS accelerators...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

    In server workloads, large instruction working sets result in high L1 instruction cache miss rates. Fast access requirements preclude large instruction caches that can accommodate the deep software stacks prevalent in server applications. Prefetching has been a promising approach to mitigate instruction-fetch stalls by relying on recurring instruction streams of...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Utilizing Domain-Specific Keywords for Discovering Public SPARQL Endpoints: A Life-Sciences Use-Case

    The LOD cloud comprises of billions of facts covering hundreds of datasets. In accordance with the linked data principles, these datasets are connected by a variety of typed links, forming an interlinked \"Web of data\". The growing diversity of the web of data makes it more and more challenging for...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Cyber-Secure Communication Architecture for Active Power Distribution Networks

    Active power distribution networks require sophisticated monitoring and control strategies for efficient energy management and automatic adaptive reconfiguration of the power infrastructure. Such requirements are realized by deploying a large number of various electronic automation and communication field devices, such as Phasor Measurement Units (PMUs) or Intelligent Electronic Devices (IEDs),...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Smartphones as Practical and Secure Location Verification Tokens for Payments

    The authors propose a novel location-based second-factor authentication solution for modern Smartphones. They demonstrate their solution in the context of point of sale transactions and show how it can be effectively used for the detection of fraudulent transactions caused by card theft or counterfeiting. Their scheme makes use of Trusted...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    XLynx - An FPGA-Based XML Filter for Hybrid XQuery Processing

    While offering unique performance and energy saving advantages, the use of Field-Programmable Gate Arrays (FPGAs) for database acceleration has demanded major concessions from system designers. Either the programmable chips have been used for very basic application tasks (such as implementing a rigid class of selection predicates), or their circuit definition...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon

    Large-scale Data Centers (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty of matching applications to one of the many hardware platforms available can degrade performance, violating the Quality of Service (QoS) guarantees that many cloud workloads require. While previous work...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    A Flexible Framework for Detecting IPv6 Vulnerabilities

    Security has recently become a very important concern for entities using IPv6 networks. This is especially true with the recent news reports where governments and companies have admitted to credible cyber attacks against them in which confidential information and the security of data have been compromised. In this paper, the...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Go with the Flow: Toward Workflow-Oriented Security Assessment

    In this paper, the authors advocate the use of workflow-describing how a system provides its intended functionality-as a pillar of cybersecurity analysis and propose a holistic workflow-oriented assessment framework. While workflow models are currently used in the area of performance and reliability assessment, these approaches are designed neither to assess...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Supporting End-to-End Social Media Data Analysis with the IndexedHBase Platform

    As data intensive applications evolve, many research projects involving big data require efficient extraction and analysis of specific data subsets, rather than the whole dataset. Social media data analysis is one such example. While social media platforms such as Twitter provide tremendous data about all kinds of social activities, most...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Towards Minimal-Delay Deadline-Driven Data Center TCP

    Cloud datacenter applications such as web search, retail, advertising, and recommendation systems, etc., generate a diverse mix of short and long flows that carry widely varying deadlines due to their soft-real time nature. In this paper, the authors present MCP, a novel distributed and reactive transport protocol for Data Center...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

    Owing to the significant high rate of component failures at extreme scales, system services will need to be failure-resistant, adaptive and self-healing. A majority of HPC services are still designed around a centralized paradigm and hence are susceptible to scaling issues. Peer-To-Peer (P2P) services have proved themselves at scale for...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Efficient and Customizable Data Partitioning Framework for Distributed Big RDF Data Processing in the Cloud

    Big data business can leverage and benefit from the clouds; the most optimized, shared, automated, and virtualized computing infrastructures. One of the important challenges in processing big data in the clouds is how to effectively partition the big data to ensure efficient distributed processing of the data. In this paper...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Efficient Data Partitioning Model for Heterogeneous Graphs in the Cloud

    As the size and variety of information networks continue to grow in many scientific and engineering domains, the authors witness a growing demand for efficient processing of large heterogeneous graphs using a cluster of compute nodes in the cloud. One open issue is how to effectively partition a large graph...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes

    Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made - a point where remediation can be difficult. However, creating analytical performance models that...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Cost-Aware Cloud Bursting for Enterprise Applications

    The high cost of provisioning resources to meet peak application demands has led to the widespread adoption of pay-as-you-go cloud computing services to handle workload fluctuations. Some enterprises with existing IT infrastructure employ a hybrid cloud model where the enterprise uses its own private resources for the majority of its...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2013

    Asynchronous Object Storage with QoS for Scientific and Commercial Big Data

    In this paper, the authors present their design for an asynchronous object storage system intended for use in scientific and commercial big data workloads. Use cases from the target workload do-mains are used to motivate the key abstractions used in the Application Programming Interface (API). The architecture of the Scalable...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2013

    ScreenPass: Secure Password Entry on Touchscreen Devices

    Users routinely access cloud services through third-party apps on Smartphones by giving apps login credentials (i.e., a username and password). Unfortunately, users have no assurance that their apps will properly handle this sensitive information. In this paper, the authors describe the design and implementation of ScreenPass, which significantly improves the...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2012

    VSim: Simulating Multi-Server Setups at Near Native Hardware Speed

    Simulating contemporary computer systems is a challenging endeavor, especially when it comes to simulating high-end setups involving multiple servers. The simulation environment needs to run complete software stacks, including operating systems, middleware, and application software, and it needs to simulate network and disk activity next to CPU performance. In addition,...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2012

    Probabilistic Modeling for Job Symbiosis Scheduling on SMT Processors

    Symbiotic job scheduling improves Simultaneous Multi-Threading (SMT) processor performance by coscheduling jobs that have \"Compatible\" demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a limited number of possible coschedules, use heuristics to gauge symbiosis, are rigid in their optimization target, and do not preserve...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2013

    Understanding Fundamental Design Choices in Single-ISA Heterogeneous Multicore Architectures

    Single-ISA heterogeneous multicore processors have gained substantial interest over the past few years because of their power efficiency, as they offer the potential for high overall chip throughput within a given power budget. Prior work in heterogeneous architectures has mainly focused on how heterogeneity can improve overall system throughput. To...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2013

    Per-Thread Cycle Accounting in Multicore Processors

    While multicore processors improve overall chip throughput and hardware utilization, resource sharing among the cores leads to unpredictable performance for the individual threads running on a multicore processor. Unpredictable per-thread performance becomes a problem when considered in the context of multicore scheduling: system software assumes that all threads make equal...

    Provided By Association for Computing Machinery

  • White Papers // May 2009

    A Mechanistic Performance Model for Superscalar Out-of-Order Processors

    A mechanistic model for out-of-order superscalar processors is developed and then applied to the study of microarchitecture resource scaling. The model divides execution time into intervals separated by disruptive miss events such as branch mispredictions and cache misses. Each type of miss event results in characterizable performance behavior for the...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2007

    Quantitative Analysis of the Speed/Accuracy Trade-off in Transaction Level Modeling

    The increasing complexity of embedded systems requires modeling at higher levels of abstraction. Transaction Level Modeling (TLM) has been proposed to abstract communication for high speed system simulation and rapid design space exploration. Although being widely accepted for its high performance and efficiency, TLM often exhibits a significant loss in...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2012

    AFReP: Application-Guided Function-Level Registerfile Power-Gating for Embedded Processors

    With shrinking CMOS feature size, static power is growing significantly and power density has emerged as an increasing concern. At the same time, one trend of embedded processors is toward larger Register Files (RFs) which further increases static power dissipation and aggravating the issue. This paper introduces an Application-guided Function-level...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2006

    The Pipeline Decomposition Tree: An Analysis Tool for Multiprocessor Implementation of Image Processing Applications

    Modern embedded systems for image processing involve increasingly complex levels of functionality under real-time and resource related constraints. As this complexity increases, the application of single-chip multiprocessor technology is attractive. To address the challenges of mapping image processing applications onto embedded multiprocessor platforms, this paper presents a novel data structure...

    Provided By Association for Computing Machinery

  • White Papers // May 2007

    Beyond Single-Appearance Schedules: Efficient DSP Software Synthesis Using Nested Procedure Calls

    Synthesis of Digital Signal-Processing (DSP) software from dataflow-based formal models is an effective approach for tackling the complexity of modern DSP applications. In this paper, an efficient method is proposed for applying subroutine call instantiation of module functionality when synthesizing embedded software from a dataflow specification. The technique is based...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2012

    Exploring Multi-Threaded Java Application Performance on Multicore Hardware

    While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the embedded to the server domain. Managed languages complicate performance studies because they have additional...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2007

    Using HPM-Sampling to Drive Dynamic Compilation

    All high-performance production JVMs employ an adaptive strategy for program execution. Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution. This paper empirically evaluates the design space of several profilers for initiating...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2013

    Bottle Graphs: Visualizing Scalability Bottlenecks in Multi-Threaded Applications

    Understanding and analyzing multi-threaded program performance and scalability is far from trivial, which severely complicates parallel software development and optimization. In this paper, the authors present bottle graphs, a powerful analysis tool that visualizes multi-threaded program performance, in regards to both per-thread parallelism and execution time. Each thread is represented...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2012

    Power-Aware Multi-Core Simulation for Early Design Stage Hardware/Software Co-Optimization

    With limited increases in clock frequency because of power constraints, improving next-generation processor performance has become a real challenge. One increasingly attractive way to improve performance within a given power and energy budget is to optimize the system for a specific workload - a paradigm that is broadly adopted for...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2011

    Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulation

    Two major trends in high-performance computing, namely, larger numbers of cores and the growing size of on-chip cache memory, are creating significant challenges for evaluating the design space of future processor architectures. Fast and scalable simulations are therefore needed to allow for sufficient exploration of large multi-core systems within a...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2011

    Fine-Grained DVFS Using On-Chip Regulators

    Limit studies on Dynamic Voltage and Frequency Scaling (DVFS) provide apparently contradictory conclusions. On the one end, early limit studies report that DVFS is effective at large timescales (on the order of million(s) of cycles) with large scaling overheads (on the order of tens of microseconds), and they conclude that...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2009

    Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors

    A thread executing on a Simultaneous Multi-Threading (SMT) processor that experiences a long latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated by a stalled thread by identifying long-latency loads and preventing the thread from fetching more...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2006

    Automatic Generation of Transaction-Level Models for Rapid Design Space Exploration

    As System-on-Chip (SoC) designs grow in complexity and size, on-chip communication is becoming an increasingly important factor. Transaction-level modeling has been touted to improve simulation performance and modeling efficiency for early design space exploration. But no tools are available to generate such transaction-level models from abstract input descriptions. Designers have...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    Designer-in-the-Loop Recoding of ESL Models Using Static Parallel Access Conflict Analysis

    At the Electronic System Level (ESL), a well-defined design model enables early design space exploration and automatic synthesis on custom multiprocessor platforms. However, the initial design model is usually manually recoded from un-structured and sequential source code. To efficiently create cleanly structured and parallel models, this paper proposes a designer-in-the-loop...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2006

    Fast and Accurate Transaction Level Models Using Result Oriented Modeling

    Efficient communication modeling is a critical task in System-on-Chip (SoC) design and exploration. In particular, fast and accurate communication is needed to predict the performance of a system. Recently, Transaction Level Modeling (TLM) is used to speedup communication simulation at the cost of accuracy. This paper proposes a novel modeling...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2006

    Improving SDRAM Access Energy Efficiency for Low-Power Embedded Systems

    DRAM (Dynamic Random Access Memory) energy consumption in low-power embedded systems can be very high, exceeding that of the data cache or even that of the processor. This paper presents and evaluates a scheme for reducing the energy consumption of SDRAM (Synchronous DRAM) memory access by a combination of techniques...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2012

    Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload

    Designing data centers for web 2.0 social networking applications is a major challenge because of the large number of users, the large scale of the data centers, the distributed application base, and the cost sensitivity of a data center facility. Optimizing the data center for performance per dollar is far...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2006

    The Exigency of Benchmark and Compiler Drift: Designing Tomorrow's Processors with Yesterday's Tools

    Due to the amount of time required to design a new processor, one set of benchmark programs may be used during the design phase while another may be the standard when the design is finally delivered. Using one benchmark suite to design a processor while using a different, presumably more...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2010

    SWEEP: Evaluating Computer System Energy Efficiency Using Synthetic Workloads

    Energy efficiency is a key design concern in contemporary processor and system design, in the embedded domain as well as in the enterprise domain. The focus on energy efficiency has led to a number of power benchmarking methods recently. For example, EEMBC released EnergyBench and SPEC released SPECpower to quantify...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2008

    Multithreaded Simulation for Synchronous Dataflow Graphs

    Synchronous DataFlow (SDF) has been successfully used in design tools for system-level simulation of wireless communication systems. Modern wireless communication standards involve large complexity and highly-multirate behavior, and typically result in long simulation time. The traditional approach for simulating SDF graphs is to compute and execute static single-processor schedules. Now-a-days,...

    Provided By Association for Computing Machinery

  • White Papers // May 2006

    Energy-Efficient Embedded Software Implementation on Multiprocessor System-on-Chip with Multiple Voltages

    Performance guarantee and energy efficiency are becoming increasingly important for the implementation of embedded software. Traditionally, the Worst-Case Execution Time (WCET) is considered to provide performance guarantee, however, this often leads to overdesigning the system. This paper develops energy-driven completion ratio guaranteed scheduling techniques for the implementation of embedded software...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2007

    Java Object Header Elimination for Reduced Memory Consumption in 64-Bit Virtual Machines

    Memory performance is an important design issue for contemporary computer systems given the huge processor-memory speed gap. This paper proposes a space-efficient java object model for reducing the memory consumption of 64-bit java virtual machines. The authors completely eliminate the object header through Typed Virtual Addressing (TVA) or implicit typing....

    Provided By Association for Computing Machinery

  • White Papers // Jun 2008

    Automated Hardware-Independent Scenario Identification

    Scenario-based design exploits the time-varying execution behavior of applications by dynamically adapting the system on which they run. This is a particularly interesting design methodology for media applications with soft real-time constraints such as decoders: frames can be classified into scenarios based on their decode complexity, and the system can...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2006

    A Performance Counter Architecture for Computing Accurate CPI Components

    A common way of representing processor performance is to use Cycles Per Instruction (CPI) `Stacks' which break performance into a baseline CPI plus a number of individual miss event CPI components. CPI stacks can be very helpful in gaining insight into the behavior of an application on a given microprocessor;...

    Provided By Association for Computing Machinery

  • White Papers // May 2012

    An Efficient CPI Stack Counter Architecture for Superscalar Processors

    Cycles-Per-Instruction (CPI) stacks provide intuitive and insightful performance information to software developers. Performance bottlenecks are easily identified from CPI stacks, which hint towards software changes for improving performance. Computing CPI stacks on contemporary superscalar processors is non-trivial though because of various overlap effects. Prior work proposed a CPI counter architecture...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2014

    The Benefit of SMT in the Multi-Core Era: Flexibility Towards Degrees of Thread-Level Parallelism

    The number of active threads in a multi-core processor varies over time and is often much smaller than the number of supported hardware threads. This requires multi-core chip designs to balance core count and per-core performance. Low active thread counts benefit from a few big, high-performance cores, while high active...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2009

    Per-Thread Cycle Accounting in SMT Processors

    In this paper, the authors propose a cycle accounting architecture for Simultaneous Multi-Threading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they are running simultaneously on the SMT processor. This is done by accounting each cycle to either a base,...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2012

    Iterative Optimization for the Data Center

    Iterative optimization is a simple but powerful approach that searches for the best possible combination of compiler optimizations for a given workload. However, each program, if not each data set, potentially favors a different combination. As a result, iterative optimization is plagued by several practical issues that prevent it from...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2013

    Criticality Stacks: Identifying Critical Threads in Parallel Programs Using Synchronization Behavior

    Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore performance while saving energy. Due to synchronization, certain threads make others wait, because they hold a lock or have yet to reach a barrier. The authors call these critical threads, i.e., threads whose performance is determinative of...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2010

    Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design

    In this paper, the authors present a fundamental law for parallel performance: it shows that parallel performance is not only limited by sequential code (as suggested by Amdahl's law) but is also fundamentally limited by synchronization through critical sections. Extending Amdahl's software model to include critical sections, they derive the...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2011

    Optimizing the Datacenter for Data-Centric Workloads

    The amount of data produced on the internet is growing rapidly. Along with data explosion comes the trend towards more and more diverse data, including rich media such as audio and video. Data explosion and diversity leads to the emergence of data-centric workloads to manipulate, manage and analyze the vast...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2006

    Accurate Memory Data Flow Modeling in Statistical Simulation

    Microprocessor design is a very complex and time-consuming activity. One of the primary reasons is the huge design space that needs to be explored in order to identify the optimal design given a number of constraints. Simulations are usually used to explore these huge design spaces, however, they are fairly...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2007

    Designer-Controlled Generation of Parallel and Flexible Heterogeneous MPSoC Specification

    Programming Multi-Processor Systems-on-Chip (MPSoC) involves partitioning and mapping of sequential reference code onto multiple parallel processing elements. The immense potential available through MPSoC architectures depends heavily on the effectiveness of this programming. Existing automatic parallelizing techniques, though effective on shared memory architectures, are insufficient for MPSoCs, which are typically characterized...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2007

    Pointer Re-Coding for Creating Definitive MPSoC Models

    Today's MPSoC synthesis and exploration design flows start from an abstract input specification model captured in a system level design language. Usually this model is created from a C reference code by encapsulating the computation and the communication using behaviors and channels. However, often pointers in the reference code hamper...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2013

    Exploiting Just-Enough Parallelism When Mapping Streaming Applications in Hard Real-Time Systems

    Embedded streaming applications specified using parallel Models of Computation (MoC) often contain ample amount of parallelism which can be exploited using Multi-Processor System-on-Chip (MPSoC) platforms. It has been shown that the various forms of parallelism in an application should be explored to achieve the maximum system performance. However, if more...

    Provided By Association for Computing Machinery