Association for Computing Machinery

Displaying 81-120 of 6651 results

  • White Papers // Jun 2014

    Resource Allocation for Hardware Implementations of Map

    The map operation, in which a function is applied independently to each element in a collection to produce a new collection, appears in many settings and is easy to parallelize. While a straight-forward implementation in hardware will consist of multiple functional units with buffers to balance variable execution times, the...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    Revealing Applications' Access Pattern in Collective I/O for Cache Management

    Collective I/O is a critical I/O strategy on high-performance parallel computing systems that enables programmers to reveal parallel processes' I/O accesses collectively and makes possible for the parallel I/O middleware to carry out I/O requests in a highly efficient manner. Collective I/O has been proven as a core parallel I/O...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    Scaling Up Matrix Computations on Shared-Memory Manycore Systems with 1000 CPU Cores

    While the growing number of cores per chip allows researchers to solve larger scientific and engineering problems, the parallel efficiency of the deployed parallel software starts to decrease. This un-scalability problem happens to both vendor-provided and open-source software and wastes CPU cycles and energy. By expecting CPUs with hundreds of...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    Cascading Failures in Power Grids - Analysis and Algorithms

    In this paper, the authors focus on cascading line failures in the transmission system of the power grid. Recent large-scale power outages demonstrated the limitations of percolation- and epidemic-based tools in modeling cascades. Hence, they study cascades by using computational tools and a linearized power flow model. They first obtain...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    A Framework for Enhancing Data Reuse via Associative Reordering

    The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algorithms and to a more limited extent in optimizing compilers. In this paper, the authors develop a novel framework utilizing the associativity and commutativity of operations in regular loop computations to enhance register...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    VeriCon: Towards Verifying Controller Programs in Software-Defined Networks

    Software-Defined Networking (SDN) is a new paradigm for operating and managing computer networks. SDN enables logically centralized control over network devices through a \"Controller\" software that operates independently from the network hardware, and can be viewed as the network operating system. Network operators can run both in-house and third-party SDN...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    A Runtime Cloud Efficiency Software Quality Metric

    In this paper, the authors introduce the Cloud Efficiency (CE) metric, a novel runtime metric which assesses how effectively an application uses software-defined infrastructure. The CE metric is computed as the ratio of two functions: a benefit function which captures the current set of benefits derived from the application, and...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    Performance Regression Testing Target Prioritization via Performance Risk Analysis

    As software evolves, problematic changes can significantly degrade software performance, i.e., introducing performance regression. Performance regression testing is an effective way to reveal such issues in early stages. Yet because of its high overhead, this activity is usually performed infrequently. Consequently, when performance regression issue is spotted at a certain...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    Inductive Verification of Data Model Invariants for Web Applications

    Modern software applications store their data in remote cloud servers. Users interact with these applications using web browsers or thin clients running on mobile devices. A key issue in dependability of these applications is the correctness of the actions that update the data store, which are triggered by user requests....

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    VirtualSwindle: An Automated Attack Against In-App Billing on Android

    "Since its introduction, Android's in-app billing service has quickly gained popularity. The in-app billing service allows users to pay for options, services, subscriptions, and virtual goods from within mobile apps themselves. In-app billing is attractive for developers because it is easy to integrate, and has the advantage that the developer...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    PROSPECT: Peripheral Proxying Supported Embedded Code Testing

    Embedded systems are an integral part of almost every electronic product today. From consumer electronics to industrial components in SCADA systems, their possible fields of application are manifold. While especially in industrial and critical infrastructures the security requirements are high, recent publications have shown that embedded systems do not cope...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    Shades of Gray: A Closer Look at Emails in the Gray Area

    Every day, millions of users spend a considerable amount of time browsing through the messages in their spam folders. With newsletters and automated notifications responsible for 42% of the messages in the user's inboxes, inevitably some important emails get misclassified as spam. Unfortunately, users are often unable to take security...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    On the Effectiveness of Risk Prediction Based on Users Browsing Behavior

    Users are typically the final target of web attacks: criminals are interested in stealing their money, their personal information, or in infecting their machines with malicious code. However, while many aspects of web attacks have been carefully studied by researchers and security companies, the reasons that make certain users more...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    On the Feasibility of Software Attacks on Commodity Virtual Machine Monitors via Direct Device Assignment

    The security of Virtual Machine Monitors (VMMs) is a challenging and active field of research. In particular, due to the increasing significance of hardware virtualization in cloud solutions, it is important to clearly understand existing and arising VMM-related threats. Unfortunately, there is still a lot of confusion around this topic...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    SHiFA: System-Level Hierarchy in Run-Time Fault-Aware Management of Many-Core Systems

    A system-level approach to fault-aware resource management of many-core systems is proposed. The proposed approach, called SHiFA, is able to tolerate run-time faults at system level without any hardware overhead. In contrast to the existing system-level methods, network resources are also considered to be potentially faulty. Accordingly, applications are mapped...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    System-Level Security for Network Processors with Hardware Monitors

    New attacks are emerging that target the Internet infrastructure. Modern routers use programmable network processors that may be exploited by merely sending suitably crafted data packets into a network. Hardware monitors that are co-located with processor cores can detect attacks that change processor behavior with high probability. In this paper,...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    A Design Methodology for Compositional High-Level Synthesis of Communication-Centric SoCs

    Systems-on-chip are increasingly designed at the system level by combining synthesizable IP components that operate concurrently while interacting through communication channels. CAD-tool vendors support this system-level design approach with high-level synthesis tools and libraries of interface primitives implementing the communication protocols. These interfaces absorb timing differences in the hardware-component implementations,...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    Protecting SRAM-Based FPGAs Against Multiple Bit Upsets Using Erasure Codes

    Multiple bit upsets due to radiation-induced soft errors are a major concern in nanoscale technology nodes. Once such errors occur in the configuration frames of an FPGA device, they permanently affect the functionality of the mapped design. The combination of error correction schemes and configuration scrubbing is an efficient approach...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    Autonomic Resource Provisioning for Cloud-Based Software

    Cloud elasticity provides a software system with the ability to maintain optimal user experience by automatically acquiring and releasing resources, while paying only for what has been consumed. The mechanism for automatically adding or removing resources on the fly is referred to as auto-scaling. The state-of-the practice with respect to...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2014

    The Harvester, the Botmaster, and the Spammer: On the Relations Between the Different Actors in the Spam Landscape

    A spammer needs three elements to run a spam operation: a list of victim email addresses, content to be sent, and a botnet to send it. Each of these three elements are critical for the success of the spam operation: a good email list should be composed of valid email...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Generation of Reduced Analog Circuit Models Using Transient Simulation Traces

    The generation of fast models for device level circuit descriptions is a very active area of research. Model order reduction is an attractive technique for dynamical models size reduction. In this paper, the authors propose an approach based on clustering, curve-fitting, linearization and Krylov space projection to build reduced models...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    A Qualitative Simulation Approach for Verifying PLL Locking Property

    Simulation cannot give a full coverage of Phase Locked Loop (PLL) behavior in presence of process variation, jitter and varying initial conditions. Qualitative Simulation is an attracting method that computes behavior envelopes for dynamical systems over continuous ranges of their parameters. Therefore, this method can be employed to verify PLLs...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    A Semi-Formal Approach for Analog Circuits Behavioral Properties Verification

    The analog circuit design process is becoming very complex and therefore new verification approaches are very much needed. Simulation is the most used technique to compute the behavior of a circuit model. Statistical methods like Monte Carlo rely on repeating numerical simulations for a random sampling of parameters. The authors...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Neural Network-Based Accelerators for Transcendental Function Approximation

    The general-purpose approximate nature of Neural Network (NN) based accelerators has the potential to sustain the historic energy and performance improvements of computing systems. The authors propose the use of NN-based accelerators to approximate mathematical functions in the GNU C Library (glibc) that commonly occur in application benchmarks. Using their...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    OCV-Aware Top-Level Clock Tree Optimization

    The clock trees of high-performance synchronous circuits have many clock logic cells (e.g., clock gating cells, multiplexers and dividers) in order to achieve aggressive clock gating and required performance across a wide range of operating modes and conditions. As a result, clock tree structures have become very complex and difficult...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    A New Methodology for Reduced Cost of Resilience

    Resilient design techniques are used to ensure correct operation under dynamic variations; and improve design performance (e.g., through timing speculation). However, significant overheads (e.g., 17% and 15% energy penalties due to throughput degradation and additional circuits) are incurred by existing resilient design techniques. For instance, resilient designs require additional circuits...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research

    The rapid growth in complexity and diversity of IC designs, design flows and methodologies has resulted in a benchmark-centric culture for evaluation of performance and scalability in physical design algorithm research. Landmark papers in the literature present vertical benchmarks that can be used across multiple design flow stages; artificial benchmarks...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Towards Topic Modeling for Big Data

    Latent Dirichlet Allocation (LDA) is a popular topic modeling technique in academia but less so in industry, especially in large-scale applications involving search engines and on-line advertisement systems. A main underlying reason is that the topic models used have been too small in scale to be useful; for example, some...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Wireless Scheduling Algorithms in Complex Environments

    Efficient spectrum use in wireless sensor networks through spatial reuse requires effective models of packet reception at the physical layer in the presence of interference. Despite recent progress in analytic and simulations research into worst-case behavior from interference effects, these efforts generally assume geometric path loss and isotropic transmission, assumptions...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    A Generic Provenance Middleware for Database Queries, Updates, and Transactions

    The authors present an architecture and prototype implementation for a Generic Provenance database Middleware (GProM) that is based on the concept of query rewrites, which are applied to an algebraic graph representation of database operations. The system supports a wide range of provenance types and representations for queries, updates, transactions,...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation

    In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    NewsNetExplorer: Automatic Construction and Exploration of News Information Networks

    News data is one of the most abundant and familiar data sources. News data can be systematically utilized and explored by database, data mining and NLP information retrieval researchers to demonstrate to the general public the power of advanced information technology. In the authors view, news data contains rich, inter-related...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Imprecise Datapath Design: An Overclocking Approach

    In this paper, the authors describe an alternative circuit design methodology when considering trade-offs between accuracy, performance and silicon area. They compare two different approaches that could trade accuracy for performance. One is the traditional approach where the precision used in the datapath is limited to meet target latency. The...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Explore-by-Example: An Automatic Query Steering Framework for Interactive Data Exploration

    Interactive Data Exploration (IDE) is a key ingredient of a diverse set of discovery-oriented applications, including ones from scientific computing and evidence-based medicine. In these applications, data discovery is a highly ad hoc interactive process where users execute numerous exploration queries using varying predicates aiming to balance the trade-off between...

    Provided By Association for Computing Machinery

  • White Papers // May 2014

    Fast Distributed Transactions and Strongly Consistent Replication for OLTP Database Systems

    As more data management software is designed for deployment in public and private clouds, or on a cluster of commodity servers, new distributed storage systems increasingly achieve high data access throughput via partitioning and replication. In order to achieve high scalability, however, today's systems generally reduce transactional support, disallowing single...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2014

    Autonomous Soft-error Tolerance of FPGA Configuration Bits

    Field Programmable Gate Arrays (FPGAs) are increasingly susceptible to radiation-induced Single Event Upsets (SEUs). These upsets are predominant in space environment; however, with increasing use of Static RAM (SRAM) in modern FPGAs, these SEUs are gaining prominence even in terrestrial environment. SEUs can flip SRAM bits of FPGA, potentially altering...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2014

    Reconfiguration-Assisted Charging in Large-Scale Lithium-Ion Battery Systems

    Large-scale Lithium-ion batteries are widely adopted in many systems such as electric vehicles and energy backup in power grids. Due to factors such as manufacturing difference and heterogeneous discharging conditions, cells in the battery system may have different statuses such as diverse voltage levels. This cell diversity is commonly known...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2014

    Rage Against the Virtual Machine: Hindering Dynamic Analysis of Android Malware

    Antivirus companies, mobile application marketplaces, and the security research community, employ techniques based on dynamic code analysis to detect and analyze mobile malware. In this paper, the authors present a broad range of anti-analysis techniques that malware can employ to evade dynamic analysis in emulated Android environments. Their detection heuristics...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2014

    Reconciling High Server Utilization and Sub-Millisecond Quality-of-Service

    The simplest strategy to guarantee good Quality-of-Service (QoS) for a latency-sensitive workload with sub-millisecond latency in a shared cluster environment is to never run other workloads concurrently with it on the same server. Unfortunately, this inevitably leads to low server utilization, reducing both the capability and cost effectiveness of the...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2014

    Aerie: Flexible File-System Interfaces to Storage-Class Memory

    Storage-class memory technologies such as phase-change memory and memristors present a radically different interface to storage than existing block devices. As a result, they provide a unique opportunity to re-examine storage architectures. The authors find that the existing kernel-based stack of components, well suited for disks, unnecessarily limits the design...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2011

    Cost-Effective Safety and Fault Localization Using Distributed Temporal Redundancy

    Cost pressure is driving vendors of safety-critical systems to integrate previously distributed systems. One natural approach the authors have previous introduced is On-Demand Redundancy (ODR), which allows safety-critical and non-critical tasks, traditionally isolated to limit interference, to execute on shared resources. Their prior paper has shown that Relaxed Dedication (RD),...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2013

    Practical Automatic Loop Specialization

    Program specialization optimizes a program with respect to program invariants, including known, fixed inputs. These invariants can be used to enable optimizations that are otherwise unsound. In many applications, a program input induces predictable patterns of values across loop iterations, yet existing specializers cannot fully capitalize on this opportunity. To...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2012

    From Sequential Programming to Flexible Parallel Execution

    The embedded computing landscape is being transformed by three trends: growing demand for greater functionality and enriched user experience, increasing diversity and parallelism in the processing substrate, and an accelerating push for ever-greater energy efficiency. For programmers, these trends give rise to three challenges: writing code for a potentially heterogeneous...

    Provided By Association for Computing Machinery

  • White Papers // Aug 2008

    Performance Scalability of Decoupled Software Pipelining

    Any successful solution to using multicore processors to scale general-purpose program performance will have to contend with rising intercore communication costs while exposing coarse-grained parallelism. Recently proposed Pipelined Multi-Threading (PMT) techniques have been demonstrated to have general-purpose applicability and are also able to effectively tolerate intercore latencies through pipelined interthread...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2006

    Static Typing for a Faulty Lambda Calculus

    A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. These faults do not cause permanent damage, but may result in incorrect program execution by altering signal transfers or stored values. While the likelihood that such transient faults will cause any significant damage...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2011

    Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers Via Sensible Co-Locations

    As much of the world's computing continues to move into the cloud, the overprovisioning of computing resources to ensure the performance isolation of latency-sensitive tasks, such as web search, in modern datacenters is a major contributor to low machine utilization. Being unable to accurately predict performance degradation due to contention...

    Provided By Association for Computing Machinery

  • White Papers // May 2008

    Reducing the Impact of Intra-Core Process Variability with Criticality-Based Resource Allocation and Prefetching

    The authors develop architectural techniques for mitigating the impact of process variability. Their techniques hide the performance effects of slow components - including registers, functional units, and L1I and L1D cache frames - without slowing the clock frequency or pessimistically assuming that all components are slow. Using ideas previously developed...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2013

    Language Support for Dynamic, Hierarchical Data Partitioning

    Applications written for distributed-memory parallel architectures must partition their data to enable parallel execution. As memory hierarchies become deeper, it is increasingly necessary that the data partitioning also be hierarchical to match. Current language proposals perform this hierarchical partitioning statically, which excludes many important applications where the appropriate partitioning is...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2014

    Singe: Leveraging Warp Specialization for High Performance on GPUs

    The authors present Singe, a Domain Specific Language (DSL) compiler for combustion chemistry that leverages warp specialization to produce high performance code for GPUs. Instead of relying on traditional GPU programming models that emphasize data-parallel computations, warp specialization allows compilers like Singe to partition computations into sub-computations which are then...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2013

    Precise Memory Leak Detection for Java Software Using Container Profiling

    A memory leak in a Java program occurs when object references that are no longer needed are unnecessarily maintained. Such leaks are difficult to detect because static analysis typically cannot precisely identify these redundant references, and existing dynamic leak detection tools track and report fine-grained information about individual objects, producing...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2014

    LeakChecker: Practical Static Memory Leak Detection for Managed Languages

    Static detection of memory leaks in a managed language such as Java is attractive because it does not rely on any leak-triggering inputs, allowing compile-time tools to find leaks before software is released. A long-standing issue that prevents practical static memory leak detection for Java is that it can be...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2012

    Dynamic Trace-Based Analysis of Vectorization Potential of Applications

    Recent hardware trends with GPUs and the increasing vector lengths of SSE-like ISA extensions for multicore CPUs imply that effective exploitation of SIMD parallelism is critical for achieving high performance on emerging and future architectures. A vast majority of existing applications were developed without any attention by their developers towards...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2013

    Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

    Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper, while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations...

    Provided By Association for Computing Machinery

  • White Papers // Jan 2014

    Scalable Runtime Bloat Detection Using Abstract Dynamic Slicing

    Many large-scale Java applications suffer from runtime bloat. They execute large volumes of methods, and create many temporary objects, all to execute relatively simple operations. There are large opportunities for performance optimizations in these applications, but most are being missed by existing optimization and tooling technology. While JIT optimizations struggle...

    Provided By Association for Computing Machinery

  • White Papers // May 2010

    Dynamic Reconfiguration in NoC-based MPSoCs in the Avionics Domain

    Modern Network-on-Chip-based Multi-Processor Systems-on-Chip (NoC-based MPSoCs) bear the potential for higher performance, but may also allow the concentration of the same functionality on fewer devices in a complex system such as an aircraft. Albeit these advantages the avionics industry is still hesitant to adopt multi-core technology because software requirements such...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2012

    A Novel NoC-Based Design for Fault-Tolerance of Last-Level Caches in CMPs

    Advances in technology scaling, coupled with aggressive voltage scaling results in significant reliability challenges for emerging Chip Multi-Processor (CMP) platforms, where error-prone caches continue to dominate the chip area. Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these CMPs. The authors present a novel fault-tolerant scheme for...

    Provided By Association for Computing Machinery

  • White Papers // May 2011

    Real-Time Address Trace Compression for Emulated and Real System-on-Chip Processor Core Debugging

    In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability to transfer vast amounts of trace data off-chip without significant slow-down has impeded the debugging of such software, in both pre-silicon emulation and in real designs. The authors consider on-chip trace compression performed...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2008

    Particle Graphics on Reconfigurable Hardware

    Particle graphics simulations are well suited for modeling complex phenomena such as water, cloth, explosions, fire, smoke, and clouds. They are normally realized in software as part of an interactive graphics application. The computational complexity of particle graphics simulations restricts the number of particles that can be updated in software...

    Provided By Association for Computing Machinery

  • White Papers // Jul 2009

    A Computing Origami: Folding Streams in FPGAs

    Stream processing represents an important class of applications that spans telecommunications, multimedia and the Internet. The implementation of streaming programs in FPGAs has attracted significant attention because of their inherent parallelism and high performance requirements. Languages, tools, and even custom hardware for streaming have been proposed, some of which are...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2008

    PiPA: Pipelined Profiling and Analysis on Multi-core Systems

    Dynamic instrumentation systems are gaining popularity as means of constructing customized program profiling and analysis tools. However, dynamic instrumentation based analysis tools still suffer from performance problems. The overhead of such systems can be broken down into two components - the overhead of dynamic instrumentation and the time consumed in...

    Provided By Association for Computing Machinery

  • White Papers // Sep 2006

    DEP: Detailed Execution Profile

    In many areas of computer architecture design and program development, the knowledge of dynamic program behavior can be very handy. Several challenges beset the accurate and complete collection of dynamic control flow and memory reference information. These include scalability issues, runtime-overhead, and code coverage. For example, while Tallam and Gupta's...

    Provided By Association for Computing Machinery

  • White Papers // Dec 2011

    Compressor Tree Synthesis on Commercial High-Performance FPGAs

    Compressor trees are a class of circuits that generalizes multioperand addition and the partial product reduction trees of parallel multipliers using carry-save arithmetic. Compressor trees naturally occur in many DSP applications, such as FIR filters, and, in the more general case, their use can be maximized through the application of...

    Provided By Association for Computing Machinery

  • White Papers // May 2011

    Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains

    FPGA logic clusters are comprised of Look-Up Tables (LUTs) and arithmetic carry-chains, which perform specific arithmetic operations such as addition. In this paper, the authors present a generic logic synthesis technique to utilize such dedicated resources by restructuring the already mapped FPGA circuits. The basic idea is to replace the...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2011

    Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains

    Routing resources in modern FPGAs use 50% of the silicon real estate and are significant contributors to critical path delay and power consumption; the situation gets worse with each successive process generation, as transistors scale more effectively than wires. To cope with these challenges, FPGA architects have divided wires into...

    Provided By Association for Computing Machinery

  • White Papers // Nov 2009

    Iterative Layering: Optimizing Arithmetic Circuits by Structuring the Information Flow

    Current logic synthesis techniques are ineffective for arithmetic circuits. They perform poorly for XOR-dominated circuits, and those with a high fan-in dependency between inputs and outputs. Many optimizers, therefore employ libraries of hand-optimized arithmetic components, but cannot optimize across component boundaries. To remedy this situation, the authors introduce a new...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2007

    Progressive Decomposition: A Heuristic to Structure Arithmetic Circuits

    Despite the impressive progress of logic synthesis in the past decade, finding the best architecture for a given circuit still remains an open problem and largely unsolved. In most of the arithmetic circuits the outcome of the synthesis tools depends on the input description of the circuit. In other words,...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2007

    Enhancing FPGA Performance for Arithmetic Circuits

    The Field Programmable Gate Array (FPGA) is an attractive platform for hardware design due to its flexibility. FPGAs are often used for low-volume circuits that could not be profitably synthesized as an Application Specific Integrated Circuit (ASIC). FPGAs offer flexibility and cost-effectiveness that ASICs cannot match; however, their performance is...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2011

    SoC-TM: Integrated HW/SW Support for Transactional Memory Programming on Embedded MPSoCs

    Two overriding concerns in the development of embedded MPSoCs are ease of programming and hardware complexity. In this paper, the authors present SoC-TM, an integrated HW/SW solution for transactional programming on embedded MPSoCs. Their proposal leverages a Hardware Transactional Memory (HTM) design, based on a dedicated HW module for conflict...

    Provided By Association for Computing Machinery

  • White Papers // Mar 2008

    Effects of Virtualization on a Scientific Application Running a Hyperspectral Radiative Transfer Code on Virtual Machines

    The topic of system-level virtualization has recently begun to receive interest for High Performance Computing (HPC). This is in part due to the isolation and encapsulation offered by the virtual machine. These traits enable applications to customize their environments and maintain consistent software configurations in their virtual domains. Additionally, there...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2013

    Area-Efficient Near-Associative Memories on FPGAs

    Associative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today's mainstream FPGAs exacerbates the overhead cost of building these memories using the fixed address match BRAMs. In this paper, the authors develop...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2009

    Choose-Your-Own-Adventure Routing: Lightweight Load-Time Defect Avoidance

    Aggressive scaling increases the number of devices the authors can integrate per square millimeter but makes it increasingly difficult to guarantee that each device fabricated has the intended operational characteristics. Without careful mitigation, component yield rates will fall, potentially negating the economic benefits of scaling. The fine-grained reconfigurability inherent in...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2009

    Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance

    The incidence of hard errors in CPUs is a challenge for future multicore designs due to increasing total core area. Even if the location and nature of hard errors are known a priori, either at manufacture-time or in the field, cores with such errors must be disabled in the absence...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2008

    Brief Announcement: RaceTM - Detecting Data Races Using Transactional Memory

    Widespread emergence of multicore processors will spur development of parallel applications, exposing programmers to more hardware concurrency. Dependable multithreaded software will have to rely on the ability to dynamically detect data races, which are non-deterministic and notoriously hard to reproduce symptoms of synchronization bugs. In this paper, the authors propose...

    Provided By Association for Computing Machinery

  • White Papers // May 2013

    A Hardware-Efficient Architecture for Embedded Real- Time Cascaded Support Vector Machines Classification

    In this paper, the authors present an optimized architecture for cascaded SVM processing, along with a hardware reduction method for the implementation of the additional stages in the cascade, leading to significant improvements. The architecture was implemented on a Virtex 5 FPGA platform and evaluated using face detection as the...

    Provided By Association for Computing Machinery

  • White Papers // Apr 2013

    A New Perspective for Efficient Virtual-Cache Coherence

    Coherent Shared Virtual Memory (cSVM) is highly coveted for heterogeneous architectures as it will simplify programming across different cores and manycore accelerators. In this paper, virtual L1 caches can be used to great advantage, e.g., saving energy consumption by eliminating address translation for hits. Unfortunately, multicore virtual-cache coherence is complex...

    Provided By Association for Computing Machinery

  • White Papers // Jun 2009

    Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs

    Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, the compressor trees contained within the multipliers could implement multi-input addition; however, they are not exposed to the programmer. To...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2008

    Speculative DMA for Architecturally Visible Storage in Instruction Set Extensions

    Instruction Set Extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS) - compiler-controlled memories, similar to scratchpads that are accessible only to ISEs. To achieve a speedup using AVS, Direct...

    Provided By Association for Computing Machinery

  • White Papers // Oct 2008

    Design Space Exploration for Field Programmable Compressor Trees

    The Field Programmable Compressor Tree (FPCT) is a programmable compressor tree (e.g., a Wallace or Dadda Tree) intended for integration in an FPGA or other reconfigurable device. This paper presents a Design Space Exploration (DSE) method that can be used to identify the best FPCT architecture for a given set...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2008

    A Novel FPGA Logic Block for Improved Arithmetic Performance

    To improve FPGA performance for arithmetic circuits, this paper proposes a new architecture for FPGA logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adjacent compressors and can be routed locally without the global routing network. Unlike previous carry-chains for binary and ternary...

    Provided By Association for Computing Machinery

  • White Papers // Feb 2008

    Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs

    The Field Programmable Counter Array (FPCA) was introduced to improve FPGA performance for arithmetic circuits. An FPCA is a reconfigurable IP core that can be integrated into an FPGA. To exploit the FPCA, a circuit is transformed by merging disparate addition and multiplication operations into large multi-input addition operations, which...

    Provided By Association for Computing Machinery