The Ohio Society of CPAs

Displaying 1-40 of 56 results

  • White Papers // Jan 2014

    Optimizing OpenSolaris NFS over RDMA

    Network File System (NFS) is widely deployed as one of the reliable means for file sharing. A trend in NFS development is the use of Remote Direct Memory Access (RDMA) as the data transport protocol. With its capability of offloaded data movement and direct data placement, RDMA is able to...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Shared Receive Queue based Scalable MPI Design for InniBand Clusters

    Clusters of several thousand nodes interconnected with InniBand, an emerging high-performance interconnect, have already appeared in the Top 500 list. The next-generation InniBand clusters are expected to be even larger with tens-of-thousands of nodes. A high performance scalable MPI design is crucial for MPI applications in order to exploit the...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Efficient SMP-Aware MPI-Level Broadcast over InfiniBand's Hardware Multicast

    Most of the high-end computing clusters found today feature multi-way SMP nodes interconnected by an ultra-low latency and high bandwidth network. InfiniBand is emerging as a high-speed network for such systems. InfiniBand provides a scalable and efficient hardware multicast primitive to efficiently implement many MPI collective operations. However, employing hardware...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics

    The advances in multicore technology and modern interconnects is rapidly accelerating the number of cores deployed in today's commodity clusters. A majority of parallel applications written in MPI employ collective operations in their communication kernels. Optimization of these operations on the multicore platforms is the key to obtaining good performance...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Efficient Asynchronous Memory Copy Operations on Multi-Core Systems and I/OAT

    In recent years, there has been a rapid growth of compute intensive as well as memory-intensive applications in the domains of medical informatics, genomics, satellite weather processing, etc. These applications not only demand large compute cycles but also higher memory performance. Emerging trends in processor technology has led to multi-core...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System

    Multi-core processor is a growing industry trend as single core processors rapidly reach the physical limits of possible complexity and speed. In the new Top500 supercomputer list, more than 20% processors belong to multi-core processor family. However, without an in-depth study on application behaviors and trends on multi-core cluster, the...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Benefits of I/O Acceleration Technology (I/OAT) in Clusters

    Packet processing in the TCP/IP stack at multi-Gigabit data rates occupies a significant portion of the system overhead. Though there are several techniques to reduce the packet processing overhead on the sender-side, the receiver-side continues to remain as a bottleneck. I/O Acceleration Technology (I/OAT), developed by Intel, is a set...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Improving Scalability of OpenMP Applications on MultiCore Systems Using Large Page Support

    Modern multicore architectures have become popular because of the limitations of deep pipelines and heating and power concerns. Some of these multicore architectures such as the Intel Xeon have the ability to run several threads on a single core. The OpenMP standard for compiler directive based shared memory programming allows...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    NemC: A Network Emulator for Cluster-of-Clusters

    A large number of clusters are being used in all different organizations such as universities, laboratories, etc. These clusters are, however, usually independent from each other even in the same organization or building. To provide a single image of such clusters to users and utilize them in an integrated manner,...

    Provided By The Ohio Society of CPAs

  • White Papers // Aug 2013

    PLASMA-HD: Probing the LAttice Structure and MAkeup of High-Dimensional Data

    Rapidly making sense of, analyzing, and extracting useful in-formation from large and complex data is a grand challenge. A user tasked with meeting this challenge is often befuddled with questions on where and how to begin to understand the relevant characteristics of such data. Real-world problem scenarios often involve scalability...

    Provided By The Ohio Society of CPAs

  • White Papers // Aug 2013

    The Yin and Yang of Processing Data Warehousing Queries on GPU Devices

    Database community has made significant research efforts to optimize query processing on GPUs in the past few years. However, the authors can hardly find that GPUs have been truly adopted in major warehousing production systems. Preparing to merge GPUs to the warehousing systems, they have identified and addressed several critical...

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2013

    S-CAVE: Effective SSD Caching to Improve Virtual Machine Storage Performance

    A unique challenge for SSD storage caching management in a Virtual Machine (VM) environment is to accomplish the dual objectives: maximizing utilization of shared SSD cache devices and ensuring performance isolation among VMs. In this paper, the authors present their design and implementation of S-CAVE, a hypervisor-based SSD caching facility,...

    Provided By The Ohio Society of CPAs

  • White Papers // Mar 2013

    SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

    High Performance Computing (HPC) systems are becoming increasingly complex and are also associated with very high operational costs. The cloud computing paradigm, coupled with modern Virtual Machine (VM) technology offers attractive techniques to easily manage large scale systems, while significantly bringing down the cost of computation, memory and storage. However,...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2013

    Proactive Source Coding

    A coding problem, over a slotted system, is introduced where the sender has to transmit one out of several packets to the receiver, but learns the request only at the beginning of each slot with prior statistical information about which packet is needed at the receiver. There is an associated...

    Provided By The Ohio Society of CPAs

  • White Papers // Dec 2012

    VMLab: Infrastructure to Support Desktop Virtualization Experiments for Research and Education

    In terms of convenience and cost-savings, user communities have benefited from transitioning to Virtual Desktop Clouds (VDCs) that are accessible via thin-clients, moving away from dedicated hardware and software in "Traditional desktops". Allocating and managing VDC resources in a scalable and cost-effective manner poses unique challenges to cloud service providers....

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2012

    On the Efficiency-Vs-Security Tradeoff in the Smart Grid

    The smart grid is envisioned to significantly enhance the efficiency of energy consumption, by utilizing two-way communication channels between consumers and operators. For example, operators can opportunistically leverage the delay tolerance of energy demands in order to balance the energy load over time, and hence, reduce the total operational cost....

    Provided By The Ohio Society of CPAs

  • White Papers // Aug 2012

    hStorage-DB: Heterogeneityaware Data Management to Exploit the Full Capability of Hybrid Storage Systems

    As storage systems become increasingly heterogeneous and complex, it adds burdens on DBAs, causing suboptimal performance even after a lot of human efforts have been made. In addition, existing monitoring-based storage management by access pattern detections has difficulties to handle workloads that are highly dynamic and concurrent. To achieve high...

    Provided By The Ohio Society of CPAs

  • White Papers // May 2012

    Multi-Rate Multi-Casting with Intra-Layer Network Coding

    Multi-rate multi-casting is a generalization of singlerate multi-casting to prevent destinations with good connections from being limited by the capacity of bottleneck connections. While multi-rate multi-casting has been traditionally performed over fixed trees, advances in network coding theory have enabled higher throughput and have helped the user move beyond the...

    Provided By The Ohio Society of CPAs

  • White Papers // Apr 2012

    Intra-MIC MPI Communication using MVAPICH2: Early Experience

    Knights Ferry (KNF) is the first instantiation of the Many Integrated Core (MIC) architecture from Intel. It is a development platform that is enabling scientific application and library developers to prepare for the upcoming products based on the MIC architecture. Intel MIC architecture, while providing the compute potential of a...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2012

    CARPO: Correlation-Aware Power Optimization in Data Center Networks

    Power optimization has become a key challenge in the design of large-scale enterprise data centers. Existing research efforts focus mainly on computer servers to lower their energy consumption, while only few studies have tried to address the energy consumption of Data Center Networks (DCNs), which can account for 20% of...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2012

    StVEC: A Vector Instruction Extension for High Performance Stencil Computation

    Stencil computations comprise the compute-intensive core of many scientific applications. The data access pattern of stencil computations often requires several adjacent data elements of arrays to be accessed in innermost parallel loops. Although such loops are vectorized by current compilers like GCC and ICC that target short-vector SIMD instruction sets,...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2012

    Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips

    Lowering supply voltage is one of the most effective techniques for reducing microprocessor power consumption. Unfortunately, at low voltages, chips are very sensitive to process variation, which can lead to large differences in the maximum frequency achieved by individual cores. This paper presents Booster, a simple, low-overhead framework for dynamically...

    Provided By The Ohio Society of CPAs

  • White Papers // Oct 2011

    Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging?

    Given the ever-increasing size of supercomputers, fault resilience and the ability to tolerate faults have become more of a necessity than an option. Checkpoint-Restart protocols have been widely adopted as a practical solution to provide reliability. However, traditional checkpointing mechanisms suffer from heavy I/O bottleneck while dumping process snapshots to...

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2011

    Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters using Shared Memory Backed Windows

    The Message Passing Interface (MPI) has been very popular for programming parallel scientific applications. As the multi-core architectures have become prevalent, a major question that has emerged about the use of MPI within a compute node and its impact on communication costs. The one-sided communication interface in MPI provides a...

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2011

    Can a Decentralized Metadata Service Layer benefit Parallel Filesystems?

    The demand for scalable I/O continues to grow rapidly as computer clusters keep growing. Much of the research in storage systems has been focused on improving the scale and performance of I/O throughput. Scalable file systems do a good job of scaling large file access bandwidth by striping or sharing...

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2011

    CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

    Checkpoint/Restart (C/R) mechanisms have been widely adopted by many MPI libraries to achieve fault-tolerance. However, a major limitation of such mechanisms is the intensive IO bottleneck caused by the need to dump the snapshots of all processes into persistent storage. Several studies have been conducted to minimize this overhead, but...

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2011

    Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters

    It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexity of making the application performance...

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2011

    Assumption Hierarchy for a CHA Call Graph Construction Algorithm

    Method call graphs are integral components of many interprocedural static analyses which are widely used to aid in the development and maintenance of software. Unfortunately, the existences of certain dynamic features in modern programming languages, such as Java or C++, can lead to either unsoundness or imprecision in statically constructed...

    Provided By The Ohio Society of CPAs

  • White Papers // May 2011

    On the Degrees of Freedom of the Cognitive Broadcast Channel

    Cognitive broadcast channel, where two multi-antenna transmitters communicate with their respective receivers, is considered. One of the transmitters is said to be cognitive (secondary) as it is assumed to know the messages of the other (primary) transmitter non-causally. The goal is to design cooperative schemes between the two transmitters, which...

    Provided By The Ohio Society of CPAs

  • White Papers // May 2011

    High Performance Pipelined Process Migration with RDMA

    In this paper, the authors conduct extensive profiling on several process migration mechanisms, and reveal that inefficient I/O and network transfer are the principal factors responsible for the high overhead. They then propose a new approach, Pipelined Process Migration with RDMA (PPMR), to overcome these overheads. Their new protocol fully...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2011

    Joint Interference Cancellation and Dirty Paper Coding for Cognitive Cellular Networks

    Downlink communication in a cellular network with a cognitive (secondary) cell is considered. In the authors' paper, the base station of the cognitive cell knows the messages of the other cell non-causally. They propose a new interference cancellation technique that zero forces the intra-cell interference in the primary cell by...

    Provided By The Ohio Society of CPAs

  • White Papers // Dec 2010

    CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives

    Although Flash Memory based Solid State Drive (SSD) exhibits high performance and low power consumption, a critical concern is its limited lifespan along with the associated reliability issues. In this paper, the authors propose to build a Content-Aware Flash Translation Layer (CAFTL) to enhance the endurance of SSDs at the...

    Provided By The Ohio Society of CPAs

  • White Papers // Dec 2010

    Can High-Performance Interconnects Benefit Hadoop Distributed File System?

    During the past several years, the MapReduce computing model has emerged as a scalable model that is capable of processing petabytes of data. The Hadoop MapReduce framework has enabled large scale Internet applications and has been adopted by many organizations. The Hadoop Distributed File System (HDFS) lies at the heart...

    Provided By The Ohio Society of CPAs

  • White Papers // Dec 2010

    Essential Roles of Exploiting Internal Parallelism of Flash Memory based Solid State Drives in High-Speed Data Processing

    Flash memory based Solid State Drives (SSDs) have shown a great potential to change storage infrastructure fundamentally through their high performance and low power. Most recent studies have mainly focused on addressing the technical limitations caused by special requirements for writes in flash memory. However, a unique merit of an...

    Provided By The Ohio Society of CPAs

  • White Papers // Aug 2010

    Understanding Parallelism-Inhibiting Dependences in Sequential Java Programs

    Many existing sequential components, libraries, and applications will need to be re-engineered for parallelism. This paper proposes a dynamic analysis of sequential Java programs that helps a programmer to understand bottlenecks for parallelism. The analysis measures the parallelism available in the program by considering a hypothetical parallel execution in which...

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2010

    Enhancing Checkpoint Performance with Staging IO and SSD

    With the ever-growing size of computer clusters and applications, system failures are becoming inevitable. Checkpointing, a strategy to ensure fault tolerance, has become imperative in such an environment. How-ever existing mechanism of checkpoint writing to parallel file systems doesn't perform well with increasing job size. Solid State Disk (SSD) is...

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2010

    Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

    Modern supercomputing systems have witnessed a phenomenal growth in the recent history owing to the advent of multi-core architectures and high speed networks. However, the operational and maintenance costs of these systems have also grown rapidly. Several concepts such as Dynamic Voltage and Frequency Scaling (DVFS) and CPU Throttling have...

    Provided By The Ohio Society of CPAs

  • White Papers // May 2010

    Secrecy Games Over the Cognitive Channel

    A secure communication game is considered for the cognitive channel with a confidential primary message, where the primary user is interested in maximizing its secure rate with lowest possible power consumption and the utility of the cognitive user is a weighted sum of the primary secrecy rate and the cognitive...

    Provided By The Ohio Society of CPAs

  • White Papers // Mar 2010

    Polar Coding for Secure Transmission and Key Agreement

    Wyner's work on wiretap channels and the recent works on information theoretic security are based on random codes. Achieving information theoretical security with practical coding schemes is of definite interest. In this note, the attempt is to overcome this elusive task by employing the polar coding technique of Arikan. It...

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2009

    Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems

    Clusters and applications continue to grow in size while their Mean Time Between Failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for large scale parallel jobs. However, the performance of the Checkpoint/Restart mechanism does not scale well with increasing job size due to constraints within the file system....

    Provided By The Ohio Society of CPAs

  • White Papers // Mar 2010

    Polar Coding for Secure Transmission and Key Agreement

    Wyner's work on wiretap channels and the recent works on information theoretic security are based on random codes. Achieving information theoretical security with practical coding schemes is of definite interest. In this note, the attempt is to overcome this elusive task by employing the polar coding technique of Arikan. It...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2011

    Joint Interference Cancellation and Dirty Paper Coding for Cognitive Cellular Networks

    Downlink communication in a cellular network with a cognitive (secondary) cell is considered. In the authors' paper, the base station of the cognitive cell knows the messages of the other cell non-causally. They propose a new interference cancellation technique that zero forces the intra-cell interference in the primary cell by...

    Provided By The Ohio Society of CPAs

  • White Papers // May 2010

    Secrecy Games Over the Cognitive Channel

    A secure communication game is considered for the cognitive channel with a confidential primary message, where the primary user is interested in maximizing its secure rate with lowest possible power consumption and the utility of the cognitive user is a weighted sum of the primary secrecy rate and the cognitive...

    Provided By The Ohio Society of CPAs

  • White Papers // May 2011

    On the Degrees of Freedom of the Cognitive Broadcast Channel

    Cognitive broadcast channel, where two multi-antenna transmitters communicate with their respective receivers, is considered. One of the transmitters is said to be cognitive (secondary) as it is assumed to know the messages of the other (primary) transmitter non-causally. The goal is to design cooperative schemes between the two transmitters, which...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2013

    Proactive Source Coding

    A coding problem, over a slotted system, is introduced where the sender has to transmit one out of several packets to the receiver, but learns the request only at the beginning of each slot with prior statistical information about which packet is needed at the receiver. There is an associated...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2012

    CARPO: Correlation-Aware Power Optimization in Data Center Networks

    Power optimization has become a key challenge in the design of large-scale enterprise data centers. Existing research efforts focus mainly on computer servers to lower their energy consumption, while only few studies have tried to address the energy consumption of Data Center Networks (DCNs), which can account for 20% of...

    Provided By The Ohio Society of CPAs

  • White Papers // Dec 2012

    VMLab: Infrastructure to Support Desktop Virtualization Experiments for Research and Education

    In terms of convenience and cost-savings, user communities have benefited from transitioning to Virtual Desktop Clouds (VDCs) that are accessible via thin-clients, moving away from dedicated hardware and software in "Traditional desktops". Allocating and managing VDC resources in a scalable and cost-effective manner poses unique challenges to cloud service providers....

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2012

    On the Efficiency-Vs-Security Tradeoff in the Smart Grid

    The smart grid is envisioned to significantly enhance the efficiency of energy consumption, by utilizing two-way communication channels between consumers and operators. For example, operators can opportunistically leverage the delay tolerance of energy demands in order to balance the energy load over time, and hence, reduce the total operational cost....

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2011

    Assumption Hierarchy for a CHA Call Graph Construction Algorithm

    Method call graphs are integral components of many interprocedural static analyses which are widely used to aid in the development and maintenance of software. Unfortunately, the existences of certain dynamic features in modern programming languages, such as Java or C++, can lead to either unsoundness or imprecision in statically constructed...

    Provided By The Ohio Society of CPAs

  • White Papers // Aug 2010

    Understanding Parallelism-Inhibiting Dependences in Sequential Java Programs

    Many existing sequential components, libraries, and applications will need to be re-engineered for parallelism. This paper proposes a dynamic analysis of sequential Java programs that helps a programmer to understand bottlenecks for parallelism. The analysis measures the parallelism available in the program by considering a hypothetical parallel execution in which...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2012

    StVEC: A Vector Instruction Extension for High Performance Stencil Computation

    Stencil computations comprise the compute-intensive core of many scientific applications. The data access pattern of stencil computations often requires several adjacent data elements of arrays to be accessed in innermost parallel loops. Although such loops are vectorized by current compilers like GCC and ICC that target short-vector SIMD instruction sets,...

    Provided By The Ohio Society of CPAs

  • White Papers // Aug 2013

    PLASMA-HD: Probing the LAttice Structure and MAkeup of High-Dimensional Data

    Rapidly making sense of, analyzing, and extracting useful in-formation from large and complex data is a grand challenge. A user tasked with meeting this challenge is often befuddled with questions on where and how to begin to understand the relevant characteristics of such data. Real-world problem scenarios often involve scalability...

    Provided By The Ohio Society of CPAs

  • White Papers // Dec 2010

    Can High-Performance Interconnects Benefit Hadoop Distributed File System?

    During the past several years, the MapReduce computing model has emerged as a scalable model that is capable of processing petabytes of data. The Hadoop MapReduce framework has enabled large scale Internet applications and has been adopted by many organizations. The Hadoop Distributed File System (HDFS) lies at the heart...

    Provided By The Ohio Society of CPAs

  • White Papers // Mar 2013

    SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

    High Performance Computing (HPC) systems are becoming increasingly complex and are also associated with very high operational costs. The cloud computing paradigm, coupled with modern Virtual Machine (VM) technology offers attractive techniques to easily manage large scale systems, while significantly bringing down the cost of computation, memory and storage. However,...

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2011

    CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

    Checkpoint/Restart (C/R) mechanisms have been widely adopted by many MPI libraries to achieve fault-tolerance. However, a major limitation of such mechanisms is the intensive IO bottleneck caused by the need to dump the snapshots of all processes into persistent storage. Several studies have been conducted to minimize this overhead, but...

    Provided By The Ohio Society of CPAs

  • White Papers // Oct 2011

    Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging?

    Given the ever-increasing size of supercomputers, fault resilience and the ability to tolerate faults have become more of a necessity than an option. Checkpoint-Restart protocols have been widely adopted as a practical solution to provide reliability. However, traditional checkpointing mechanisms suffer from heavy I/O bottleneck while dumping process snapshots to...

    Provided By The Ohio Society of CPAs

  • White Papers // May 2011

    High Performance Pipelined Process Migration with RDMA

    In this paper, the authors conduct extensive profiling on several process migration mechanisms, and reveal that inefficient I/O and network transfer are the principal factors responsible for the high overhead. They then propose a new approach, Pipelined Process Migration with RDMA (PPMR), to overcome these overheads. Their new protocol fully...

    Provided By The Ohio Society of CPAs

  • White Papers // Jun 2007

    High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA

    State-of-the-art network technology has scaled to 10Gbps. However, TCP's high processing overhead and redundant data copies remain a major bottleneck for applications to fully benefit from such high speed technology. Remote Direct Memory Access (RDMA), as an emerging communication protocol, provides an opportunity for efficient storage system design by virtue...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Optimizing OpenSolaris NFS over RDMA

    Network File System (NFS) is widely deployed as one of the reliable means for file sharing. A trend in NFS development is the use of Remote Direct Memory Access (RDMA) as the data transport protocol. With its capability of offloaded data movement and direct data placement, RDMA is able to...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2006

    Benefits of High Speed Interconnects to Cluster File Systems: A Case Study with Lustre

    While CPU clock cycle and memory bus speed are reaching the level of sub-nanoseconds and 10Gbytes/sec, disk access time and data transfer rate are still lingering around several milliseconds and 300Mbytes/sec, respectively. Since systems with ever-increasing speed are being deployed at the scale of thousands of nodes, IO speed needs...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Shared Receive Queue based Scalable MPI Design for InniBand Clusters

    Clusters of several thousand nodes interconnected with InniBand, an emerging high-performance interconnect, have already appeared in the Top 500 list. The next-generation InniBand clusters are expected to be even larger with tens-of-thousands of nodes. A high performance scalable MPI design is crucial for MPI applications in order to exploit the...

    Provided By The Ohio Society of CPAs

  • White Papers // Feb 2006

    Adaptive Connection Management for Scalable MPI over InfiniBand

    Supporting scalable and efficient parallel programs is a major challenge in parallel computing with the widespread adoption of large-scale computer clusters and supercomputers. One of the pronounced scalability challenges is the management of connections between parallel processes, especially over connection-oriented interconnects such as VIA and InfiniBand. In this paper, the...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Efficient SMP-Aware MPI-Level Broadcast over InfiniBand's Hardware Multicast

    Most of the high-end computing clusters found today feature multi-way SMP nodes interconnected by an ultra-low latency and high bandwidth network. InfiniBand is emerging as a high-speed network for such systems. InfiniBand provides a scalable and efficient hardware multicast primitive to efficiently implement many MPI collective operations. However, employing hardware...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2006

    RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits

    Message Passing Interface (MPI) is a popular parallel programming model for scientific applications. Most high-performance MPI implementations use Rendezvous Protocol for efficient transfer of large messages. This protocol can be designed using either RDMA Write or RDMA Read. Usually, this protocol is implemented using RDMA Write. The RDMA write based...

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2011

    Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters using Shared Memory Backed Windows

    The Message Passing Interface (MPI) has been very popular for programming parallel scientific applications. As the multi-core architectures have become prevalent, a major question that has emerged about the use of MPI within a compute node and its impact on communication costs. The one-sided communication interface in MPI provides a...

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2011

    Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters

    It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexity of making the application performance...

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2011

    Can a Decentralized Metadata Service Layer benefit Parallel Filesystems?

    The demand for scalable I/O continues to grow rapidly as computer clusters keep growing. Much of the research in storage systems has been focused on improving the scale and performance of I/O throughput. Scalable file systems do a good job of scaling large file access bandwidth by striping or sharing...

    Provided By The Ohio Society of CPAs

  • White Papers // Apr 2012

    Intra-MIC MPI Communication using MVAPICH2: Early Experience

    Knights Ferry (KNF) is the first instantiation of the Many Integrated Core (MIC) architecture from Intel. It is a development platform that is enabling scientific application and library developers to prepare for the upcoming products based on the MIC architecture. Intel MIC architecture, while providing the compute potential of a...

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2010

    Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

    Modern supercomputing systems have witnessed a phenomenal growth in the recent history owing to the advent of multi-core architectures and high speed networks. However, the operational and maintenance costs of these systems have also grown rapidly. Several concepts such as Dynamic Voltage and Frequency Scaling (DVFS) and CPU Throttling have...

    Provided By The Ohio Society of CPAs

  • White Papers // Jul 2010

    Enhancing Checkpoint Performance with Staging IO and SSD

    With the ever-growing size of computer clusters and applications, system failures are becoming inevitable. Checkpointing, a strategy to ensure fault tolerance, has become imperative in such an environment. How-ever existing mechanism of checkpoint writing to parallel file systems doesn't perform well with increasing job size. Solid State Disk (SSD) is...

    Provided By The Ohio Society of CPAs

  • White Papers // Sep 2009

    Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems

    Clusters and applications continue to grow in size while their Mean Time Between Failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for large scale parallel jobs. However, the performance of the Checkpoint/Restart mechanism does not scale well with increasing job size due to constraints within the file system....

    Provided By The Ohio Society of CPAs

  • White Papers // Jun 2008

    Performance of HPC Middleware over InfiniBand WAN

    High performance interconnects such as InfiniBand (IB) have enabled large scale deployments of High Performance Computing (HPC) systems. High performance communication and IO middleware such as MPI and NFS over RDMA have also been redesigned to leverage the performance of these modern interconnects. With the advent of long haul InfiniBand...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics

    The advances in multicore technology and modern interconnects is rapidly accelerating the number of cores deployed in today's commodity clusters. A majority of parallel applications written in MPI employ collective operations in their communication kernels. Optimization of these operations on the multicore platforms is the key to obtaining good performance...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2008

    Scaling Alltoall Collective on Multi-core Systems

    In this paper, the authors show that various network interfaces implemented for the same interconnect, exhibit different network characteristics. A single collective algorithm does not perform optimally for all network interfaces due to differing network characteristics. The paper proposes an optimized all-to-all collective algorithm for multi-core systems connected using modern...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Efficient Asynchronous Memory Copy Operations on Multi-Core Systems and I/OAT

    In recent years, there has been a rapid growth of compute intensive as well as memory-intensive applications in the domains of medical informatics, genomics, satellite weather processing, etc. These applications not only demand large compute cycles but also higher memory performance. Emerging trends in processor technology has led to multi-core...

    Provided By The Ohio Society of CPAs

  • White Papers // Feb 2007

    Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach

    Clusters in the area of high-performance computing have been growing in size at a considerable rate. In these clusters, the dominate programming model is the Message Passing Interface (MPI), so the MPI library has a key role in resource usage and performance. To obtain maximal performance, many clusters deploy a...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System

    Multi-core processor is a growing industry trend as single core processors rapidly reach the physical limits of possible complexity and speed. In the new Top500 supercomputer list, more than 20% processors belong to multi-core processor family. However, without an in-depth study on application behaviors and trends on multi-core cluster, the...

    Provided By The Ohio Society of CPAs

  • White Papers // Feb 2007

    Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective

    Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree,...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Benefits of I/O Acceleration Technology (I/OAT) in Clusters

    Packet processing in the TCP/IP stack at multi-Gigabit data rates occupies a significant portion of the system overhead. Though there are several techniques to reduce the packet processing overhead on the sender-side, the receiver-side continues to remain as a bottleneck. I/O Acceleration Technology (I/OAT), developed by Intel, is a set...

    Provided By The Ohio Society of CPAs

  • White Papers // Jan 2014

    Improving Scalability of OpenMP Applications on MultiCore Systems Using Large Page Support

    Modern multicore architectures have become popular because of the limitations of deep pipelines and heating and power concerns. Some of these multicore architectures such as the Intel Xeon have the ability to run several threads on a single core. The OpenMP standard for compiler directive based shared memory programming allows...

    Provided By The Ohio Society of CPAs