Hardware

Stay current with the components, peripherals and physical parts that constitute your IT department.

  • White Papers // Nov 2011

    Classification and Elimination of Conflicts in Hardware Transactional Memory Systems

    In this paper, the authors analyze the sources of performance losses in hardware transactional memory and investigate techniques to reduce the losses. It dissects the root causes of data conflicts in Hardware Transactional Memory systems (HTM) into four classes of conflicts: true sharing, false sharing, silent store, and write-write conflicts....

    Provided By INRIA

  • White Papers // Jul 2011

    Register Reverse Rematerialization

    Reversible computing could be in more or less long term mandatory for minimizing heat dissipation inherent to computing. It aims at keeping all information on input and intermediate values available at any step of the computation. Rematerialization in register allocation amounts to recomputing values instead of spilling them in memory...

    Provided By INRIA

  • White Papers // Feb 2012

    Performance Evaluation and Analysis of Thread Pinning Strategies on Multi-Core Platforms: Case Study of SPEC OMP Applications on Intel Architectures

    With the introduction of multi-core processors, thread affinity has quickly appeared to be one of the most important factors to accelerate program execution times. The current paper presents a complete experimental study on the performance of various thread pinning strategies. The authors investigate four application independent thread pinning strategies and...

    Provided By INRIA

  • White Papers // Dec 2010

    Automatic Generation of FPGA-Specific Pipelined Accelerators

    Recent, increase in the complexity of the circuits has brought high-level synthesis tools as a must in the digital circuit design. However, these tools come with several limitations, and one of them is the efficient use of pipelined arithmetic operators. This paper explains how to generate efficient hardware with pipelined...

    Provided By INRIA

  • White Papers // Sep 2011

    Designing a CPU Model: From a Pseudo-Formal Document to Fast Code

    For validating low level embedded software, engineers use simulators that take the real binary as input. Like the real hardware, these full-system simulators are organized as a set of components. The main component is the CPU simulator (ISS), because it is the usual bottleneck for the simulation speed, and its...

    Provided By INRIA

  • White Papers // Nov 2010

    Scheduling, Binding and Routing System for a Run-Time Reconfigurable Operator Based Multimedia Architecture

    In this paper, the authors present a system for application scheduling, binding and routing for a run-time Reconfigurable Operator based Multimedia Architecture (ROMA). They use constraint programming to formalize their architecture model together with a specific application program. For this purpose they use an abstract representation of their architecture, which...

    Provided By INRIA

  • White Papers // Oct 2011

    Stream and Memory Hierarchy Design for Multi-Purpose Accelerators

    Power and programming challenges make heterogeneous multi-cores composed of cores and ASICs an attractive alternative to homogeneous multi-cores. Recently, multi-purpose loop-based generated accelerators have emerged as an especially attractive accelerator option. They have several assets: short design time (automatic generation), flexibility (multi-purpose) but low configuration and routing overhead (unlike FPGAs),...

    Provided By INRIA

  • White Papers // Mar 2014

    Sum-of-Product Architectures Computing Just Right

    Many digital filters and signal-processing transforms can be expressed as a Sum of Products with Constants (SPC). This paper addresses the automatic construction of low-precision, but high accuracy SPC architectures: these architectures are specified as last-bit accurate with respect to a mathematical definition. In other words, they behave as if...

    Provided By INRIA

  • White Papers // Jan 2013

    Formal Analysis of a Hardware Dynamic Task Dispatcher with CADP

    The complexity of multiprocessor architectures for mobile multimedia applications renders their validation challenging. In addition, to provide the necessary flexibility, a part of the functionality is realized by software. Thus, a formal model has to take into account both hardware and software. In this paper, the authors report on the...

    Provided By INRIA

  • White Papers // Dec 2013

    A Fine-grained Approach for Power Consumption Analysis and Prediction

    Power consumption has become a critical concern in modern computing systems for various reasons including financial savings and environmental protection. With battery powered devices, the authors need to care about the available amount of energy since it is limited. For the case of supercomputers, as they imply a large aggregation...

    Provided By INRIA

  • White Papers // Feb 2013

    Selecting Benchmark Combinations for the Evaluation of Multicore Throughput

    Most high-performance processors today are able to execute multiple threads of execution simultaneously. Threads share processor resources, like the last-level cache, which may decrease throughput in a non obvious way, depending on threads' characteristics. Computer architects usually study multi-programmed workloads by considering a set of benchmarks and some combinations of...

    Provided By INRIA

  • White Papers // Feb 2013

    DSLM : Dynamic Synchronous Language With Memory

    The authors propose a new language called DSLM based on the synchronous/reactive model. In DSLM, systems are composed of several sites executed asynchronously, while each site is running a number of agents in a synchronous way. An agent consists of a script and a memory. Scripts may call functions or...

    Provided By INRIA

  • White Papers // Oct 2012

    Demystifying Multicore Throughput Metrics

    Several different metrics have been proposed for quantifying the throughput of multicore processors. There is no clear consensus about which metric should be used. Some studies even use several throughput metrics. The authors show that there exists a relation between single-thread average performance metrics and throughput metrics, and that throughput...

    Provided By INRIA

  • White Papers // May 2011

    DRing: A Layered Scheme for Range Queries Over DHTs

    Traditional DHT structures provide very poor support for range queries, since uniform hashing destroys data locality. Several schemes have been proposed to overcome this issue, but they fail to combine load balancing, low message overhead, and low latency in search operations. In this paper the authors present DRing, an efficient...

    Provided By INRIA

  • White Papers // Oct 2009

    Software Transactional Memory: Worst Case Execution Time Analysis

    While real-time applications are becoming more and more concurrent and complex, the drive toward multicore systems raises new challenges related to the parallelization of such performance-critical applications. Transactional memory is an attractive concept for expressing parallelism for programming multicore systems as it avoids the problems of lock-based methods and eases...

    Provided By INRIA

  • White Papers // Aug 2010

    Sharing Resources for Performance and Energy Optimization of Concurrent Streaming Applications

    The authors aim at finding optimal mappings for concurrent streaming applications. Each application consists of a linear chain with several stages, and processes successive data sets in pipeline mode. The objective is to minimize the energy consumption of the whole platform, while satisfying given performance-related bounds on the period and...

    Provided By INRIA

  • White Papers // Mar 2014

    Optimizing Buffer Sizes for Pipeline Workflow Scheduling with Setup Times

    Mapping linear workflow applications onto a set of homogeneous processors can be optimally solved in polynomial time for the throughput objective with fewer processors than stages. This result even holds true, when setup times occur in the execution and homogeneous buffers are available for the storage of intermediate results. In...

    Provided By INRIA

  • White Papers // Jan 2007

    Mapping Pipeline Skeletons Onto Heterogeneous Platforms

    Mapping applications onto parallel platforms is a challenging problem, that becomes even more difficult when platforms are heterogeneous - nowadays a standard assumption. A high-level approach to parallel programming not only eases the application developer's task, but it also provides additional information which can help realize an efficient mapping of...

    Provided By INRIA

  • White Papers // Apr 2011

    Scheduling Streaming Applications on a Complex Multicore Platform

    In this paper, the authors consider the problem of scheduling streaming applications described by complex task graphs on a heterogeneous multi-core platform, the IBM QS 22 platform, embedding two STI Cell BE processor. They first derive a complete computation and communication model of the platform, based on comprehensive benchmarks. Then,...

    Provided By INRIA

  • White Papers // Dec 2012

    Formal Verification of Fault Tolerant NoC-Based Architecture

    Approaches to design fault tolerant Network-on-Chip (NoC) for System-on-Chip (SoC)-based reconfigurable Field-Programmable Gate Array (FPGA) technology are challenges on the conceptualization of the Multi-Processor System-on-Chip (MPSoC) design. For this purpose, the use of rigorous formal approaches, based on incremental design and proof theory, has become an essential step in a...

    Provided By INRIA

  • White Papers // Aug 2007

    UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development

    Simulator development is already a huge burden for many academic and industry research groups; future complex or heterogeneous multi-cores, as well as the multiplicity of performance metrics and required functionality, will make matters worse. The authors present a new simulation environment, called UNISIM, which is designed to rationalize simulator development...

    Provided By INRIA

  • White Papers // Nov 2013

    Mapping Applications on Volatile Resources

    In this paper, the authors study the execution of iterative applications on volatile processors such as those found on desktop grids. They envision two models, one where all tasks are assumed to be independent, and another where all tasks are tightly coupled and keep exchanging information throughout the iteration. These...

    Provided By INRIA

  • White Papers // Jan 2008

    Hybrid Performance Analysis to Accelerate Compiler Optimization Space Exploration for In-Order Processors

    In this paper, the authors investigate the problem of finding the most adequate compiler optimization options to compile a given application to an in-order processor architecture. This is a well known problem, but accuracy (effectiveness) and speed of the decision remain challenging. They present a two-phase method for a rapid...

    Provided By INRIA

  • White Papers // Nov 2007

    An MDE Approach For Implementing Partial Dynamic Reconfiguration In FPGAs

    Field Programmable Gate Arrays (FPGAs) provide an interesting solution when custom logic is needed for short time to market products. The products embedding FPGA System on chip solutions allow them to be updated once deployed. Recent FPGA architectures, such as Xilinx Virtex Series, allow for Partial and Dynamic Run-time reconfiguration...

    Provided By INRIA

  • White Papers // Sep 2007

    Dynamicity Analysis of Delta MINs for MPSOC Architectures

    Multistage interconnection network has been very frequently proposed as connection means in classical on-board multiprocessor systems, it promises to be the solution for the interconnection problems. This paper tries to adapt such networks for embedded system design. The authors' approach is to analyze the dynamicity of the link permutation of...

    Provided By INRIA

  • White Papers // Jul 2007

    Repetitive Allocation Modeling with MARTE

    With the advent of Multi-Processor Systems-on-Chip (MpSoC), the need for modeling the distribution of a parallel application onto a parallel hardware architecture is increasing. The recent standard profile for the Modeling and Analysis of Real-Time and Embedded systems (MARTE) provides a notation for the modeling of regular distributions. This notation...

    Provided By INRIA

  • White Papers // Jun 2007

    Multiple Abstraction Views of FPGA to MAP Parallel Applications

    Manipulating configurable resources like FPGAs in a co-design framework has become essential: especially, FPGAs may efficiently implement parallel systematic signal processing tasks. Nevertheless, such implementations are usually hand written at a low level. The authors' proposition is to provide high level modeling of an application and tools to automatically generate...

    Provided By INRIA

  • White Papers // Dec 2006

    MpNoC Design: Modeling and Simulation

    MppSoC is a SIMD architecture composed of a grid of extended MIPS R3000 processors, called Processing Elements (PEs). This embedded system gives interesting performances in several modern applications based on parallel algorithms. Communication is clearly a key issue in such a system. In fact, regular communication between the PEs are...

    Provided By INRIA

  • White Papers // May 2006

    Hardware/Software Exploration for an Anti-Collision Radar System

    Anti-collision radars help prevent car accidents by detecting obstacles in front of vehicles equipped with such systems. This task traditionally relies on a correlator, which searches for similarities between an emitted and a received wave. Other modules can then use the information produced by the correlator to compute the distance...

    Provided By INRIA

  • White Papers // Feb 2008

    Modeling SPIRIT IP-XACT with UML MARTE

    Reuse and integration of heterogeneous Intellectual Property (IP) from multiple vendors is a major issue of System-on-Chip (SoC) design. Existing tools attempt to validate assembled designs by global co-simulation at the implementation level. This fails more and more due to the increasing complexity and size of actual SoCs. Thus, there...

    Provided By INRIA

  • White Papers // Jan 2010

    Design-Space Exploration of Stream Programs through Semantic-Preserving Transformations

    Stream languages explicitly describe fork-join parallelism and pipelines, offering a powerful programming model for many-core Multi-Processor Systems on Chip (MPSoC). In an embedded resource-constrained system, adapting stream programs to fit memory requirements is particularly important. In this paper the authors present a design-space exploration technique to reduce the minimal memory...

    Provided By INRIA

  • White Papers // Oct 2012

    Advances in Parallel-Stage Decoupled Software Pipelining: Leveraging Loop Distribution, Stream-Computing and the SSA Form

    Decoupled SoftWare Pipelining (DSWP) is a program partitioning method enabling compilers to extract pipeline parallelism from sequential programs. Parallel Stage DSWP (PS-DSWP) is an extension that also exploits the data parallelism within pipeline filters. This paper presents the preliminary design of a new PS-DSWP method capable of handling arbitrary structured...

    Provided By INRIA

  • White Papers // Jun 2010

    Predictive Power Management for Multi-Core Processors

    Predictive power management provides reduced power consumption and increased performance compared to reactive schemes. It effectively reduces the lag between workload phase changes and changes in power adaptations since adaptations can be applied immediately before a program phase change. To this end the authors present the first analysis of prediction...

    Provided By INRIA

  • White Papers // Aug 2013

    Folklore Confirmed: Compiling for Speed = Compiling for Energy

    The main motivations behind the arrival of multi-core processors were power and energy considerations. Increasing power density coupled with heat problems rendered untenable the premise that steadily increased performance could be achieved merely by steadily increasing processor clock speed. Multi-core processors were introduced based on the observation that multiple processors...

    Provided By INRIA

  • White Papers // Aug 2009

    Efficient and Flexible Dynamic Reconfiguration for Multi Context Architectures

    Dynamic reconfiguration is possible on both fine-grain and coarse-grain architectures. One of the used methodologies used consists in the use of multi-context architectures. Unfortunately, the multiple contexts bring power and area overhead. This paper introduces the Dynamic Unifier and reConfigurable blocK (DUCK) concept, a new structure to perform efficiently dynamic...

    Provided By INRIA

  • White Papers // May 2006

    Control Unit for Parallel Embedded System

    New integration methodologies as IP reuse have become more and more popular since few years. These methodologies represent an opportunity to reduce the gap between the integration capacities and the ability of the designers to develop complex systems. SoC (System-on-Chip), that are composed of different heterogeneous cores, have taken benefit...

    Provided By INRIA

  • White Papers // Sep 2007

    Hardware Task Scheduling for Heterogeneous SoC Architectures

    In this paper, the authors present their work on extending artificial neural networks use for real-time task scheduling to heterogeneous system-on-chip architectures. The Hopfield model is the neural network model considered in this study. They introduce new constructing rules to design neural network so that architecture heterogeneity can be considered....

    Provided By INRIA

  • White Papers // Aug 2013

    GeCoS: A Framework for Prototyping Custom Hardware Design Flows

    GeCoS is an open source framework that provides a highly productive environment for hardware design. GeCoS primarily targets custom hardware design using High Level Synthesis, distinguishing itself from classical compiler infrastructures. Compiling for custom hardware makes use of domain specific semantics that are not considered by general purpose compilers. Finding...

    Provided By INRIA

  • White Papers // Jun 2011

    Novel Algorithms for Word-Length Optimization

    Digital Signal Processing (DSP) applications are specified with floating-point data types but they are usually implemented in embedded systems with fixed-point arithmetic to minimize cost and power consumption. The floating-to-fixed point conversion requires an optimization algorithm to determine a combination of optimum word-length for each operator. This paper proposes new...

    Provided By INRIA

  • White Papers // Dec 2009

    Loosely Time-Triggered Architectures for Cyber-Physical Systems

    Cyber-Physical Systems require distributed architectures to support safety critical real-time control. Kopetz' Time-Triggered Architectures (TTA) have been proposed as both an architecture and a comprehensive paradigm for systems architecture, for such systems. To relax the strict requirements on synchronization imposed by TTA, Loosely Time-Triggered Architectures (LTTA) has been recently proposed....

    Provided By INRIA