Argonne National Laboratory

Displaying 1-23 of 23 results

  • White Papers // Nov 2012

    An Evolutionary Path to Object Storage Access

    High-Performance Computing (HPC) storage systems typically consist of an object storage system that is accessed via the POSIX file interface. However, rapid increases in system scales and storage system complexity have uncovered a number of limitations in this model. In particular, applications and libraries are limited in their ability to...

    Provided By Argonne National Laboratory

  • White Papers // Nov 2012

    A Case for Optimistic Coordination in HPC Storage Systems

    High-Performance Computing (HPC) storage systems rely on access coordination to ensure that concurrent updates do not produce incoherent results. HPC storage systems typically employ pessimistic distributed locking to provide this functionality in cases where applications cannot perform their own coordination. This approach, however, introduces significant performance overhead and complicates fault...

    Provided By Argonne National Laboratory

  • White Papers // May 2012

    Transparent Accelerator Migration in a Virtualized GPU Environment

    This paper presents a framework to support transparent, live migration of virtual GPU accelerators in a virtualized execution environment. Migration is a critical capability in such environments because it provides support for fault tolerance, on-demand system maintenance, resource management, and load balancing in the mapping of virtual to physical GPUs....

    Provided By Argonne National Laboratory

  • White Papers // May 2012

    VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units

    Graphics Processing Units (GPUs) have been widely used for general-purpose computation acceleration. However, cur-rent programming models such as CUDA and OpenCL can support GPUs only on the local computing node, where the application execution is tightly coupled to the physical GPU hardware. In this paper, the authors propose a Virtual...

    Provided By Argonne National Laboratory

  • White Papers // May 2012

    DMA-Assisted, Intranode Communication in GPU Accelerated Systems

    Accelerator awareness has become a pressing issue in data movement models, such as MPI, because of the rapid deployment of systems that utilize accelerators. In the authors' previous work, they developed techniques to enhance MPI with accelerator awareness, thus allowing applications to easily and efficiently communicate data between accelerator memories....

    Provided By Argonne National Laboratory

  • White Papers // May 2012

    MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems

    Data movement in high-performance computing systems accelerated by Graphics Processing Units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are...

    Provided By Argonne National Laboratory

  • White Papers // Feb 2012

    Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication

    The industry-standard Message Passing Interface (MPI) provides one-sided communication functionality and is available on virtually every parallel computing system. However, it is believed that MPI's one-sided model is not rich enough to support higher-level global address space parallel programming models. The authors present the first successful application of MPI one-sided...

    Provided By Argonne National Laboratory

  • White Papers // Sep 2011

    Building Algorithmically Nonstop Fault Tolerant MPI Programs

    With the growing scale of High-Performance Computing (HPC) systems, today and more so tomorrow, faults are a norm rather than an exception. HPC applications typically tolerate fail-stop failures under the stop-and-wait scheme, where even if only one processor fails, the whole system has to stop and wait for the recovery...

    Provided By Argonne National Laboratory

  • White Papers // May 2011

    Porting the MG-RAST Metagenomic Data Analysis Pipeline to the Cloud

    Computational biology applications typically favor a local, cluster-based, integrated computational platform. The authors present a lessons learned report for scaling up a metagenomics application that had outgrown the available local cluster hardware. In their example, removing a number of assumptions linked to tight integration allowed one to expand beyond one...

    Provided By Argonne National Laboratory

  • White Papers // Jan 2011

    RDMA Capable IWARP Over Datagrams

    iWARP is a state of the art high-speed connection-based RDMA networking technology for Ethernet networks to provide InfiniBand-like zero-copy and one-sided communication capabilities over Ethernet. Despite the benefits offered by iWARP, many datacenter and web-based applications, such as stock-market trading and media-streaming applications, that rely on datagram-based semantics (mostly through...

    Provided By Argonne National Laboratory

  • White Papers // Jan 2011

    Fault-Tolerant Communication Runtime Support for Data-Centric Programming Models

    The largest supercomputers in the world today consist of hundreds of thousands of processing cores and many more other hardware components. At such scales, hardware faults are a commonplace, necessitating fault-resilient software systems. While different fault-resilient models are available, most focus on allowing the computational processes to survive faults. On...

    Provided By Argonne National Laboratory

  • White Papers // Jan 2011

    Power and Performance Characterization of Computational Kernels on the GPU

    Now-a-days Graphic Processing Units (GPU) are gaining increasing popularity in High Performance Computing (HPC). While modern GPUs can offer much more computational power than CPUs, they also consume much more power. Energy efficiency is one of the most important factors that will affect a broader adoption of GPUs in HPC....

    Provided By Argonne National Laboratory

  • White Papers // Jan 2011

    The Globus eXtensible Input/Output System (XIO): A Protocol Independent IO System for the Grid

    In distributed heterogeneous Grid environments the protocols used to exchange bits are crucial. As researchers work hard to discover the best new protocol for the Grid, application developers struggle with ways to use these new protocols. A stable, consistent, and intuitive framework is needed to aid in the implementation and...

    Provided By Argonne National Laboratory

  • White Papers // Aug 2010

    IWARP Redefined: Scalable Connectionless Communication Over High-Speed Ethernet

    iWARP represents the leading edge of high performance Ethernet technologies. By utilizing an asynchronous communication model, iWARP brings the advantages of OS bypass and RDMA technology to Ethernet. The current specification of iWARP is only defined over connection-oriented transports such as TCP. The memory requirements of many connections along with...

    Provided By Argonne National Laboratory

  • White Papers // Jul 2010

    LHC Databases on the Grid: Achievements and Open Issues

    To extract physics results from the recorded data, the LHC experiments are using Grid computing infrastructure. The event data processing on the Grid requires scalable access to non-event data (detector conditions, calibrations, etc.) stored in relational databases. The database-resident data are critical for the event data reconstruction processing steps and...

    Provided By Argonne National Laboratory

  • White Papers // Jan 2010

    A Study of Hardware Assisted IP Over InfiniBand and Its Impact on Enterprise Data Center Performance

    High-performance sockets implementations such as the Sockets Direct Protocol (SDP) have traditionally showed major performance advantages compared to the TCP/IP stack over InfiniBand (IPoIB). These stacks bypass the kernel-based TCP/IP and take advantage of network hardware features, providing enhanced performance. SDP has excellent performance but limited utility as only applications...

    Provided By Argonne National Laboratory

  • White Papers // Oct 2009

    Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems

    As researchers continue to architect massive-scale systems, it is becoming clear that these systems will utilize a significant amount of shared hardware between processing units. Systems such as the IBM Blue Gene (BG) and Cray XT have started utilizing flat (i.e., scalable) networks, which differ from switched fabrics in that...

    Provided By Argonne National Laboratory

  • White Papers // Sep 2009

    Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers

    With the emergence of new technologies such as Virtual Protocol Interconnect (VPI) for the modern data center, the separation between commodity networking technology and high-performance interconnects is shrinking. With VPI, a single network adapter on a data center server can easily be configured to use one port to interface with...

    Provided By Argonne National Laboratory

  • White Papers // Aug 2009

    Topology-Aware I/O Caching for Shared Storage Systems

    The main contribution of this paper is a topology-aware storage caching scheme for parallel architectures. In a parallel system with multiple storage caches, these caches form a shared cache space, and effective management of this space is a critical issue. Of particular interest is data migration (i.e., moving data from...

    Provided By Argonne National Laboratory

  • White Papers // Aug 2009

    24/7 Characterization of Petascale I/O Workloads

    Developing and tuning computational science applications to run on extreme scale systems are increasingly complicated processes. Challenges such as managing memory access and tuning message-passing behavior are made easier by tools designed specifically to aid in these processes. Tools that can help users better understand the behavior of their application...

    Provided By Argonne National Laboratory

  • White Papers // Aug 2009

    Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P

    High-End Computing (HEC) systems have passed the peta-flop barrier and continue to move toward the next frontier of exascale computing. As companies and research institutes continue to work toward architecting these enormous systems, it is becoming increasingly clear that these systems will utilize a significant amount of shared hardware between...

    Provided By Argonne National Laboratory

  • White Papers // May 2009

    ProOnE: A General-Purpose Protocol Onload Engine for Multi- and Many-Core Architectures

    High-end computing systems have benefited from the use of specialized accelerators to improve their performance and scalability for many years. Network sub-systems were among the early adopters of such techniques, providing hardware-based solutions for intelligent communication offloading. GigaNet was one of the earliest network offload solutions for the Virtual Interface...

    Provided By Argonne National Laboratory

  • White Papers // May 2000

    Costs of Lithium-Ion Batteries for Vehicles

    One of the most promising battery types under development for use in both pure electric and hybrid electric vehicles is the lithium-ion battery. These batteries are well on their way to meeting the challenging technical goals that have been set for vehicle batteries. However, they are still far from achieving...

    Provided By Argonne National Laboratory

  • White Papers // May 2000

    Costs of Lithium-Ion Batteries for Vehicles

    One of the most promising battery types under development for use in both pure electric and hybrid electric vehicles is the lithium-ion battery. These batteries are well on their way to meeting the challenging technical goals that have been set for vehicle batteries. However, they are still far from achieving...

    Provided By Argonne National Laboratory

  • White Papers // May 2011

    Porting the MG-RAST Metagenomic Data Analysis Pipeline to the Cloud

    Computational biology applications typically favor a local, cluster-based, integrated computational platform. The authors present a lessons learned report for scaling up a metagenomics application that had outgrown the available local cluster hardware. In their example, removing a number of assumptions linked to tight integration allowed one to expand beyond one...

    Provided By Argonne National Laboratory

  • White Papers // Aug 2009

    Topology-Aware I/O Caching for Shared Storage Systems

    The main contribution of this paper is a topology-aware storage caching scheme for parallel architectures. In a parallel system with multiple storage caches, these caches form a shared cache space, and effective management of this space is a critical issue. Of particular interest is data migration (i.e., moving data from...

    Provided By Argonne National Laboratory

  • White Papers // Jul 2010

    LHC Databases on the Grid: Achievements and Open Issues

    To extract physics results from the recorded data, the LHC experiments are using Grid computing infrastructure. The event data processing on the Grid requires scalable access to non-event data (detector conditions, calibrations, etc.) stored in relational databases. The database-resident data are critical for the event data reconstruction processing steps and...

    Provided By Argonne National Laboratory

  • White Papers // Nov 2012

    A Case for Optimistic Coordination in HPC Storage Systems

    High-Performance Computing (HPC) storage systems rely on access coordination to ensure that concurrent updates do not produce incoherent results. HPC storage systems typically employ pessimistic distributed locking to provide this functionality in cases where applications cannot perform their own coordination. This approach, however, introduces significant performance overhead and complicates fault...

    Provided By Argonne National Laboratory

  • White Papers // Nov 2012

    An Evolutionary Path to Object Storage Access

    High-Performance Computing (HPC) storage systems typically consist of an object storage system that is accessed via the POSIX file interface. However, rapid increases in system scales and storage system complexity have uncovered a number of limitations in this model. In particular, applications and libraries are limited in their ability to...

    Provided By Argonne National Laboratory

  • White Papers // May 2012

    DMA-Assisted, Intranode Communication in GPU Accelerated Systems

    Accelerator awareness has become a pressing issue in data movement models, such as MPI, because of the rapid deployment of systems that utilize accelerators. In the authors' previous work, they developed techniques to enhance MPI with accelerator awareness, thus allowing applications to easily and efficiently communicate data between accelerator memories....

    Provided By Argonne National Laboratory

  • White Papers // May 2012

    MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems

    Data movement in high-performance computing systems accelerated by Graphics Processing Units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are...

    Provided By Argonne National Laboratory

  • White Papers // Feb 2012

    Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication

    The industry-standard Message Passing Interface (MPI) provides one-sided communication functionality and is available on virtually every parallel computing system. However, it is believed that MPI's one-sided model is not rich enough to support higher-level global address space parallel programming models. The authors present the first successful application of MPI one-sided...

    Provided By Argonne National Laboratory

  • White Papers // May 2012

    Transparent Accelerator Migration in a Virtualized GPU Environment

    This paper presents a framework to support transparent, live migration of virtual GPU accelerators in a virtualized execution environment. Migration is a critical capability in such environments because it provides support for fault tolerance, on-demand system maintenance, resource management, and load balancing in the mapping of virtual to physical GPUs....

    Provided By Argonne National Laboratory

  • White Papers // Jan 2010

    A Study of Hardware Assisted IP Over InfiniBand and Its Impact on Enterprise Data Center Performance

    High-performance sockets implementations such as the Sockets Direct Protocol (SDP) have traditionally showed major performance advantages compared to the TCP/IP stack over InfiniBand (IPoIB). These stacks bypass the kernel-based TCP/IP and take advantage of network hardware features, providing enhanced performance. SDP has excellent performance but limited utility as only applications...

    Provided By Argonne National Laboratory

  • White Papers // Oct 2009

    Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems

    As researchers continue to architect massive-scale systems, it is becoming clear that these systems will utilize a significant amount of shared hardware between processing units. Systems such as the IBM Blue Gene (BG) and Cray XT have started utilizing flat (i.e., scalable) networks, which differ from switched fabrics in that...

    Provided By Argonne National Laboratory

  • White Papers // Sep 2009

    Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers

    With the emergence of new technologies such as Virtual Protocol Interconnect (VPI) for the modern data center, the separation between commodity networking technology and high-performance interconnects is shrinking. With VPI, a single network adapter on a data center server can easily be configured to use one port to interface with...

    Provided By Argonne National Laboratory

  • White Papers // Aug 2009

    Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P

    High-End Computing (HEC) systems have passed the peta-flop barrier and continue to move toward the next frontier of exascale computing. As companies and research institutes continue to work toward architecting these enormous systems, it is becoming increasingly clear that these systems will utilize a significant amount of shared hardware between...

    Provided By Argonne National Laboratory

  • White Papers // May 2012

    VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units

    Graphics Processing Units (GPUs) have been widely used for general-purpose computation acceleration. However, cur-rent programming models such as CUDA and OpenCL can support GPUs only on the local computing node, where the application execution is tightly coupled to the physical GPU hardware. In this paper, the authors propose a Virtual...

    Provided By Argonne National Laboratory

  • White Papers // Sep 2011

    Building Algorithmically Nonstop Fault Tolerant MPI Programs

    With the growing scale of High-Performance Computing (HPC) systems, today and more so tomorrow, faults are a norm rather than an exception. HPC applications typically tolerate fail-stop failures under the stop-and-wait scheme, where even if only one processor fails, the whole system has to stop and wait for the recovery...

    Provided By Argonne National Laboratory

  • White Papers // Jan 2011

    RDMA Capable IWARP Over Datagrams

    iWARP is a state of the art high-speed connection-based RDMA networking technology for Ethernet networks to provide InfiniBand-like zero-copy and one-sided communication capabilities over Ethernet. Despite the benefits offered by iWARP, many datacenter and web-based applications, such as stock-market trading and media-streaming applications, that rely on datagram-based semantics (mostly through...

    Provided By Argonne National Laboratory

  • White Papers // Aug 2010

    IWARP Redefined: Scalable Connectionless Communication Over High-Speed Ethernet

    iWARP represents the leading edge of high performance Ethernet technologies. By utilizing an asynchronous communication model, iWARP brings the advantages of OS bypass and RDMA technology to Ethernet. The current specification of iWARP is only defined over connection-oriented transports such as TCP. The memory requirements of many connections along with...

    Provided By Argonne National Laboratory

  • White Papers // Jan 2011

    Fault-Tolerant Communication Runtime Support for Data-Centric Programming Models

    The largest supercomputers in the world today consist of hundreds of thousands of processing cores and many more other hardware components. At such scales, hardware faults are a commonplace, necessitating fault-resilient software systems. While different fault-resilient models are available, most focus on allowing the computational processes to survive faults. On...

    Provided By Argonne National Laboratory

  • White Papers // Jan 2011

    Power and Performance Characterization of Computational Kernels on the GPU

    Now-a-days Graphic Processing Units (GPU) are gaining increasing popularity in High Performance Computing (HPC). While modern GPUs can offer much more computational power than CPUs, they also consume much more power. Energy efficiency is one of the most important factors that will affect a broader adoption of GPUs in HPC....

    Provided By Argonne National Laboratory

  • White Papers // Aug 2009

    24/7 Characterization of Petascale I/O Workloads

    Developing and tuning computational science applications to run on extreme scale systems are increasingly complicated processes. Challenges such as managing memory access and tuning message-passing behavior are made easier by tools designed specifically to aid in these processes. Tools that can help users better understand the behavior of their application...

    Provided By Argonne National Laboratory

  • White Papers // Jan 2011

    The Globus eXtensible Input/Output System (XIO): A Protocol Independent IO System for the Grid

    In distributed heterogeneous Grid environments the protocols used to exchange bits are crucial. As researchers work hard to discover the best new protocol for the Grid, application developers struggle with ways to use these new protocols. A stable, consistent, and intuitive framework is needed to aid in the implementation and...

    Provided By Argonne National Laboratory

  • White Papers // May 2009

    ProOnE: A General-Purpose Protocol Onload Engine for Multi- and Many-Core Architectures

    High-end computing systems have benefited from the use of specialized accelerators to improve their performance and scalability for many years. Network sub-systems were among the early adopters of such techniques, providing hardware-based solutions for intelligent communication offloading. GigaNet was one of the earliest network offload solutions for the Virtual Interface...

    Provided By Argonne National Laboratory