Virginia Systems

Displaying 1-40 of 85 results

  • White Papers // May 2014

    KairosVM: Deterministic Introspection for Real-Time Virtual Machine Hierarchical Scheduling

    Consolidation and isolation are key technologies that drove the undisputed popularity of virtualization in most of the computer industry. This popularity has recently led to a growing interest in real-time virtualization, making this technology enter the real-time system industry. However, it has several issues due to the strict timing guarantees...

    Provided By Virginia Systems

  • White Papers // May 2014

    On the Latency of Erasure-Coded Cloud Storage Systems

    Distributed (Cloud) Storage Systems (DSS) exhibit heterogeneity in several dimensions such as the volume (size) of data, frequency of data access and the desired degree of reliability. Ultimately, the complex interplay between these dimensions impacts the latency performance of cloud storage systems. To this end, the authors propose and analyze...

    Provided By Virginia Systems

  • White Papers // Apr 2014

    On Cache-Aware Task Partitioning for Multicore Embedded Real-Time Systems

    One approach for real-time scheduling on multicore platforms involves task partitioning, which statically assigns tasks to cores, enabling subsequent core local scheduling. No past partitioning schemes explicitly consider cache effects. The authors present a partitioning scheme called LWFG, which minimizes cache misses by partitioning tasks that share memory onto the...

    Provided By Virginia Systems

  • White Papers // Mar 2014

    Towards Operating System Support for Heterogeneous-ISA Platforms

    Given an emerging trend towards OS-capable heterogeneous-ISA multi-core processors, the authors address the problem of how to redesign classic Symmetric Multi-Processing (SMP) Operating Systems (OS) to exploit this hardware. They propose an OS design that consists of multiple kernels, each one compiled for, and run on, a specific ISA of...

    Provided By Virginia Systems

  • White Papers // Feb 2014

    Integrating Transactionally Boosted Data Structures With STM Frameworks: A Case Study on Set

    Providing transactional collections of data structures with the same performance of highly concurrent data structures enables performance-competitive transactional composability. Although Software Transactional Memory (STM) is increasingly becoming a promising technology for designing and implementing transactional applications, concurrent data structures still do not exploit STM's advantages. Recently, Optimistic Transactional Boosting (OTB)...

    Provided By Virginia Systems

  • White Papers // Feb 2014

    Remote Invalidation: Optimizing the Critical Path of Memory Transactions

    Software Transactional Memory (STM) systems are increasingly emerging as a promising alternative to traditional locking algorithms for implementing generic concurrent applications. To achieve generality, STM systems incur overheads to the normal sequential execution path, including those due to spin locking, validation (or invalidation), and commit/abort routines. The authors propose a...

    Provided By Virginia Systems

  • White Papers // Feb 2014

    HiperTM: High Performance, Fault-Tolerant Transactional Memory

    The authors present HiperTM, a high performance active replication protocol for fault-tolerant distributed transactional memory. The active replication paradigm allows transactions to execute locally, costing them only a single network communication step during transaction execution. Shared objects are replicated across all sites, avoiding remote object accesses. Replica consistency is ensured...

    Provided By Virginia Systems

  • White Papers // Jan 2014

    Distributed Storage Systems with Secure and Exact Repair - New Results

    Distributed storage is the default technique for storing data in all new generation applications. The data from a file is stored in a decentralized manner on several commodity nodes/disks that when collectively used are capable of recovering the entire file. Replication-based schemes to ensure data reliability incur huge storage overhead...

    Provided By Virginia Systems

  • White Papers // Jan 2014

    Online Performance Projection for Clusters with Heterogeneous GPUs

    The authors present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime...

    Provided By Virginia Systems

  • White Papers // Jan 2014

    Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems

    By scheduling multiple applications with complementary resource requirements on a smaller number of compute nodes, the authors aim to improve performance, resource utilization, energy consumption, and energy efficiency simultaneously. In addition to their naive consolidation approach, which already achieves the aforementioned goals, they propose a new Energy Efficiency-Aware (EEA) scheduling...

    Provided By Virginia Systems

  • White Papers // Jan 2014

    A Cross-Layer Approach for Power-Performance Optimization in Distributed Mobile Systems

    Current trends indicate that delivery of multimedia content to mobile systems operating in distributed environments will drive many future applications. The next generation of mobile systems with multimedia processing capabilities and wireless connectivity will be increasingly deployed in highly dynamic and distributed environments for multimedia playback and delivery. The challenge...

    Provided By Virginia Systems

  • White Papers // Dec 2013

    On High Performance Distributed Transactional Data Structures

    The author's present three protocols for developing high performance distributed transactional data structures. They first protocol, QR-ON, incorporates the open nesting transactional model into QR, a quorum-based protocol for managing concurrency on distributed transactions. The open nesting model allows nested transactions to commit independently of their parent transaction. This releases...

    Provided By Virginia Systems

  • White Papers // Dec 2013

    Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors

    Wideband channelization is a computationally intensive task within Software-Defined Radio (SDR). To support this task, the underlying hardware should provide high performance and allow flexible implementations. Traditional solutions use Field-Programmable Gate Arrays (FPGAs) to satisfy these requirements. While FPGAs allow for flexible implementations, realizing a FPGA implementation is a difficult...

    Provided By Virginia Systems

  • White Papers // Dec 2013

    Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator

    Recent trends in processor architectures utilize available transistors to provide large numbers of execution cores, and hence threads, rather than attempting to speed-up the execution of a single thread or a small number of threads. The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and...

    Provided By Virginia Systems

  • White Papers // Oct 2013

    A Page Coherency Protocol for Popcorn Replicated-kernel Operating System

    Popcorn is a Linux based replicated-kernel Operating System (OS). Popcorn was conceived as a research OS for a wide class of future heterogeneous-ISA hardware. Because of the novelty of such hardware, in which diverse OS-capable CPUs are glued together, it is not clear what level of memory sharing will be...

    Provided By Virginia Systems

  • White Papers // Oct 2013

    On the Programmability and Performance of Heterogeneous Platforms

    Many application areas, including finance, life sciences, physics, and manufacturing, have begun to use computational co-processors such as Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), and even customized Application-Specific Integrated Circuits (ASICs) to achieve substantial gains in performance per watt and performance per dollar...

    Provided By Virginia Systems

  • White Papers // Aug 2013

    EDR: An Energy-Aware Runtime Load Distribution System for Data-Intensive Applications in the Cloud

    Data centers account for a growing percentage of US power consumption. Energy efficiency is now a first-class design constraint for the data centers that support cloud services. Service providers must distribute their data efficiently across multiple data centers. This includes creation of data replicas that provide multiple copies of data...

    Provided By Virginia Systems

  • White Papers // Aug 2013

    On Transactional Memory Concurrency Control in Distributed Real-Time Programs

    The authors consider Distributed Transactional Memory (DTM) for concurrency control in distributed real-time programs, and present an algorithm called RT-TFA. RT-TFA transparently handles object relocation and versioning using an asynchronous clock-based validation technique, and resolves transactional contention using task time constraints. They implement the RT-TFA on top of JChronOS, a...

    Provided By Virginia Systems

  • White Papers // Jul 2013

    On Real-Time STM Concurrency Control for Embedded Software with Improved Schedulability

    Concurrency is intrinsic to embedded software, as they control concurrent physical processes. Often, such concurrent computations need to read/write shared data objects. They must also satisfy time constraints. The authors consider Software Transactional Memory (STM) concurrency control for embedded multicore real-time software, and present a novel contention manager for resolving...

    Provided By Virginia Systems

  • White Papers // Jul 2013

    FBLT: A Real-Time Contention Manager with Improved Schedulability

    Embedded systems sense physical processes and control their behavior, typically through feedback loops. The authors consider Software Transactional Memory (STM) concurrency control for embedded multicore real-time software, and present a novel contention manager for resolving transactional conflicts, called FBLT. They upper bound transactional retries and task response times under FBLT,...

    Provided By Virginia Systems

  • White Papers // Jul 2013

    Energy-Architecture Tuning for ECC-Based RFID Tags

    The implementation of Elliptic Curve Cryptography (ECC) on small microcontrollers is challenging. Past research has therefore emphasized performance optimization: pick target architecture, and minimize the cycle count and footprint of the ECC software. This paper addresses a different aspect of resource-constrained ECC implementation: given the application profile, identify the most...

    Provided By Virginia Systems

  • White Papers // Jul 2013

    pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments

    Power-hungry Graphics Processing Unit (GPU) accelerators are ubiquitous in high performance computing data centers today. GPU virtualization frameworks introduce new opportunities for effective management of GPU resources by decoupling them from application execution. However, power management of GPU-enabled server clusters faces significant challenges. The underlying system infrastructure shows complex power...

    Provided By Virginia Systems

  • White Papers // Jul 2013

    Hyflow2: A High Performance Distributed Transactional Memory Framework in Scala

    Distributed Transactional Memory (DTM) is a recent but promising model for programming distributed systems. It aims to present programmers with a simple to use distributed concurrency control abstraction (transactions), while maintaining performance and scalability similar to distributed fine-grained locks. Any complications usually associated with such locks (e.g., distributed deadlocks) are...

    Provided By Virginia Systems

  • White Papers // Jun 2013

    On the Viability of Speculative Transactional Replication in Database Systems: a Case Study with PostgreSQL

    Active replication is a classical means for providing fault-tolerance and high availability. It is based on the enforcement of consensus among the replicas on a common total order for request processing. The authors investigate the feasibility of systematic speculative processing in the context of Optimistic Atomic Broadcast (OAB) based replication...

    Provided By Virginia Systems

  • White Papers // Jun 2013

    Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming

    Graphics Processing Units (GPUs) have gained widespread use as general-purpose computational accelerators and have been studied extensively across a broad range of scientific applications. Despite the vast interest in accelerator-based systems, programming large multinode GPUs is still a complex task, particularly with respect to optimal data movement across the host-GPU...

    Provided By Virginia Systems

  • White Papers // May 2013

    HyflowCPP: A Distributed Transactional Memory Framework for C++

    The authors present the first ever Distributed Transactional Memory (DTM) framework for distributed concurrency control in C++, called HyflowCPP. HyflowCPP provides distributed atomic sections, and pluggable support for policies for concurrency control, directory lookup, contention management, and networking. While there exists other DTM frameworks, they mostly target VM-based languages (e.g.,...

    Provided By Virginia Systems

  • White Papers // May 2013

    Model-Based, Memory-Centric Performance and Power Optimization on NUMA Multiprocessors

    Non-Uniform Memory Access (NUMA) architectures are ubiquitous in HPC systems. NUMA along with other factors including socket layout, data placement, and memory contention significantly increase the search space to find an optimal mapping of applications to NUMA systems. This search space may be intractable for online optimization and challenging for...

    Provided By Virginia Systems

  • White Papers // May 2013

    Enhancing Concurrency in Distributed Transactional Memory Through Commutativity

    Distributed software transactional memory is an emerging, alternative concurrency control model for distributed systems promising to alleviate the difficulties of lock-based distributed synchronization. The authors consider the Multi-Versioning (MV) model to avoid unnecessary aborts. MV schemes inherently guarantee commits of read-only transactions, but limit the concurrency of write transactions. In...

    Provided By Virginia Systems

  • White Papers // May 2013

    Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures

    Computational biology sequence alignment tools using the Burrows-Wheeler Transform (BWT) are widely used in Next-Generation Sequencing (NGS) analysis. However, despite extensive optimization efforts, the performance of these tools still cannot keep up with the explosive growth of sequencing data. Through an in-depth performance analysis of BWA, a popular BWT-based aligner...

    Provided By Virginia Systems

  • White Papers // Apr 2013

    Scheduling Transactions in Replicated Distributed Software Transactional Memory

    Distributed software Transactional Memory (DTM) is an emerging, alternative concurrency control model for distributed systems that promises to alleviate the difficulties of lock-based distributed synchronization. Object replication can improve concurrency and achieve fault-tolerance in DTM, but may incur high communication overhead (in metric-space networks) to ensure one-copy serializability. The authors...

    Provided By Virginia Systems

  • White Papers // Apr 2013

    Low-Cost and Area-Efficient FPGA Implementations of Lattice-Based Cryptography

    Lattice-based cryptography relies on the hardness of lattice problems. Lattice-based cryptosystems are quantum resistant and are often provably secure based on worst-case hardness assumptions. The interest in lattice-based cryptography is increasing due to its quantum resistance and its provable security under some worst-case hardness assumptions. As this is a relatively...

    Provided By Virginia Systems

  • White Papers // Mar 2013

    Scheduling Open-Nested Transactions in Distributed Transactional Memory

    Distributed Transactional Memory (DTM) is a powerful concurrency control model for distributed systems sparing the programmer from the complexity of manual implementation of lock-based distributed synchronization. The authors consider Herlihy and Sun's data flow DTM model, where objects are migrated to invoking transactions, and the open nesting model of managing...

    Provided By Virginia Systems

  • White Papers // Mar 2013

    ByteSTM: Virtual Machine-level Java Software Transactional Memory

    The authors present ByteSTM, a virtual machine-level Java STM implementation that is built by extending the Jikes RVM. The authors modify Jikes RVM's optimizing compiler to transparently support implicit transactions. Being implemented at the VM-level, it accesses memory directly, avoids Java garbage collection overhead by manually managing memory for transactional...

    Provided By Virginia Systems

  • White Papers // Feb 2013

    A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures

    Emergent heterogeneous systems must be optimized for both power and performance at exascale. Massive parallelism combined with complex memory hierarchies form a barrier to efficient application and architecture design. These challenges are exacerbated with GPUs as parallelism increases orders of magnitude and power consumption can easily double. Models have been...

    Provided By Virginia Systems

  • White Papers // Jan 2013

    An Efficient Interference Management Framework for Multi-Hop Wireless Networks

    Interference management is an important problem in wireless networks. In this paper, the authors focus on the Successive Interference Cancellation (SIC) technique, and aim to design an efficient cross-layer solution to increase throughput for multi-hop wireless networks with SIC. They realize that the challenge of this problem is its mixed...

    Provided By Virginia Systems

  • White Papers // Jan 2013

    On Closed Nesting and Checkpointing in Fault-Tolerant Distributed Transactional Memory

    The authors consider the closed nesting and checkpointing model for transactions in fault-tolerant Distributed Transactional Memory (DTM). The closed nested model allows inner-nested transactions to be aborted (in the event of a transactional conflict) without aborting the parent transaction, while checkpointing allows transactions to rollback to a previous execution state,...

    Provided By Virginia Systems

  • White Papers // Nov 2012

    On the Use of GPUs in Realizing Cost-Effective Distributed RAID

    The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Computing (HPC) storage systems, which are at the forefront of handling the data deluge, typically employ hardware RAID at the backend. However, such solutions are costly, do not...

    Provided By Virginia Systems

  • White Papers // Nov 2012

    Efficient Algorithms for Maximum Link Scheduling in Distributed Computing Models With SINR Constraints

    In this paper, the authors develop a set of fast distributed algorithms in the SINR model, providing constant approximation for the maximum link scheduling problem under uniform power assignment. They find that different aspects of available technology, such as full/half duplex communication, and non-adaptive/adaptive power control, have a significant impact...

    Provided By Virginia Systems

  • White Papers // Aug 2012

    Storage Power Optimizations for Client Devices and Data Centers

    Storage devices are essential to all computing systems that store user data from desktops, to notebooks and Ultrabooks to data centers. Hard Disk Drives (HDDs) or Solid State Drives (SSDs) are today's most popular storage solutions. Active power for storage devices has significant impact on the battery life of client...

    Provided By Virginia Systems

  • White Papers // Aug 2012

    Critical Path-Based Thread Placement for NUMA Systems

    Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability. However, NUMA introduces performance penalties due to remote memory accesses. Without efficiently managing data layout and thread mapping to cores, scientific applications may suffer performance loss, even if they are optimized for NUMA. In this paper, the...

    Provided By Virginia Systems

  • White Papers // Apr 2013

    Low-Cost and Area-Efficient FPGA Implementations of Lattice-Based Cryptography

    Lattice-based cryptography relies on the hardness of lattice problems. Lattice-based cryptosystems are quantum resistant and are often provably secure based on worst-case hardness assumptions. The interest in lattice-based cryptography is increasing due to its quantum resistance and its provable security under some worst-case hardness assumptions. As this is a relatively...

    Provided By Virginia Systems

  • White Papers // Dec 2006

    Restoring End-to-End Resilience in the Presence of Middleboxes

    The philosophy upon which the Internet was built places the intelligence close to the edge. As the Internet has matured, intermediate devices or middleboxes, such as firewalls or application gateways, have been introduced, thereby weakening the end-to-end nature of the network. As a result, applications must often modify their behavior...

    Provided By Virginia Systems

  • White Papers // Aug 2013

    EDR: An Energy-Aware Runtime Load Distribution System for Data-Intensive Applications in the Cloud

    Data centers account for a growing percentage of US power consumption. Energy efficiency is now a first-class design constraint for the data centers that support cloud services. Service providers must distribute their data efficiently across multiple data centers. This includes creation of data replicas that provide multiple copies of data...

    Provided By Virginia Systems

  • White Papers // Jul 2013

    On Real-Time STM Concurrency Control for Embedded Software with Improved Schedulability

    Concurrency is intrinsic to embedded software, as they control concurrent physical processes. Often, such concurrent computations need to read/write shared data objects. They must also satisfy time constraints. The authors consider Software Transactional Memory (STM) concurrency control for embedded multicore real-time software, and present a novel contention manager for resolving...

    Provided By Virginia Systems

  • White Papers // Feb 2012

    Scheduling Closed-Nested Transactions in Distributed Transactional Memory

    Distributed Software Transactional Memory (DSTM) is an emerging, alternative concurrency control model for distributed systems that promises to alleviate the difficulties of lock-based distributed synchronization - e.g., distributed deadlocks, livelocks, and lock convoying. The authors consider Herlihy and Sun's dataflow D-STM model, where objects are migrated to invoking transactions, and...

    Provided By Virginia Systems

  • White Papers // Jul 2013

    FBLT: A Real-Time Contention Manager with Improved Schedulability

    Embedded systems sense physical processes and control their behavior, typically through feedback loops. The authors consider Software Transactional Memory (STM) concurrency control for embedded multicore real-time software, and present a novel contention manager for resolving transactional conflicts, called FBLT. They upper bound transactional retries and task response times under FBLT,...

    Provided By Virginia Systems

  • White Papers // Jan 2012

    HydraVM: Extracting Parallelism from Legacy Sequential Code Using STM

    Many organizations with enterprise-class legacy software are increasingly faced with a hardware technology refresh challenge due to the ubiquity of Chip Multi-Processor (CMP) hardware. This problem is significant when legacy codebases run into several million LOC and are not significantly concurrent (often intentionally designed to be sequential to reduce development...

    Provided By Virginia Systems

  • White Papers // Mar 2013

    Scheduling Open-Nested Transactions in Distributed Transactional Memory

    Distributed Transactional Memory (DTM) is a powerful concurrency control model for distributed systems sparing the programmer from the complexity of manual implementation of lock-based distributed synchronization. The authors consider Herlihy and Sun's data flow DTM model, where objects are migrated to invoking transactions, and the open nesting model of managing...

    Provided By Virginia Systems

  • White Papers // Dec 2011

    On Closed Nesting in Distributed Transactional Memory

    Distributed-Software Transactional Memory (D-STM) is a recent but promising model for programming distributed systems. It aims to present programmers with a simple to use abstraction (transactions), while maintaining performance and scalability similar to distributed fine-grained locks. Any complications usually associated with such locks (i.e. distributed deadlock) are avoided. Building upon...

    Provided By Virginia Systems

  • White Papers // Dec 2013

    Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator

    Recent trends in processor architectures utilize available transistors to provide large numbers of execution cores, and hence threads, rather than attempting to speed-up the execution of a single thread or a small number of threads. The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and...

    Provided By Virginia Systems

  • White Papers // Jan 2014

    Online Performance Projection for Clusters with Heterogeneous GPUs

    The authors present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime...

    Provided By Virginia Systems

  • White Papers // Oct 2013

    On the Programmability and Performance of Heterogeneous Platforms

    Many application areas, including finance, life sciences, physics, and manufacturing, have begun to use computational co-processors such as Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), and even customized Application-Specific Integrated Circuits (ASICs) to achieve substantial gains in performance per watt and performance per dollar...

    Provided By Virginia Systems

  • White Papers // Dec 2013

    Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors

    Wideband channelization is a computationally intensive task within Software-Defined Radio (SDR). To support this task, the underlying hardware should provide high performance and allow flexible implementations. Traditional solutions use Field-Programmable Gate Arrays (FPGAs) to satisfy these requirements. While FPGAs allow for flexible implementations, realizing a FPGA implementation is a difficult...

    Provided By Virginia Systems

  • White Papers // Jan 2014

    Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems

    By scheduling multiple applications with complementary resource requirements on a smaller number of compute nodes, the authors aim to improve performance, resource utilization, energy consumption, and energy efficiency simultaneously. In addition to their naive consolidation approach, which already achieves the aforementioned goals, they propose a new Energy Efficiency-Aware (EEA) scheduling...

    Provided By Virginia Systems

  • White Papers // Jul 2013

    pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments

    Power-hungry Graphics Processing Unit (GPU) accelerators are ubiquitous in high performance computing data centers today. GPU virtualization frameworks introduce new opportunities for effective management of GPU resources by decoupling them from application execution. However, power management of GPU-enabled server clusters faces significant challenges. The underlying system infrastructure shows complex power...

    Provided By Virginia Systems

  • White Papers // Jun 2013

    Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming

    Graphics Processing Units (GPUs) have gained widespread use as general-purpose computational accelerators and have been studied extensively across a broad range of scientific applications. Despite the vast interest in accelerator-based systems, programming large multinode GPUs is still a complex task, particularly with respect to optimal data movement across the host-GPU...

    Provided By Virginia Systems

  • White Papers // May 2013

    Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures

    Computational biology sequence alignment tools using the Burrows-Wheeler Transform (BWT) are widely used in Next-Generation Sequencing (NGS) analysis. However, despite extensive optimization efforts, the performance of these tools still cannot keep up with the explosive growth of sequencing data. Through an in-depth performance analysis of BWA, a popular BWT-based aligner...

    Provided By Virginia Systems

  • White Papers // Jun 2012

    Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation

    The use of accelerators in high-performance computing is increasing. The most commonly used accelerator is the Graphics Processing Unit (GPU) because of its low cost and massively parallel performance. The two most common programming environments for GPU accelerators are CUDA and OpenCL. While CUDA runs natively only on NVIDIA GPUs,...

    Provided By Virginia Systems

  • White Papers // Feb 2012

    Heterogeneous Task Scheduling for Accelerated OpenMP

    Heterogeneous systems with CPUs and computational accelerators such as GPUs, FPGAs or the upcoming Intel MIC are becoming mainstream. In these systems, peak performance includes the performance of not just the CPUs but also all available accelerators. In spite of this fact, the majority of programming models for heterogeneous computing...

    Provided By Virginia Systems

  • White Papers // Feb 2012

    Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

    Graphics Processing Units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale...

    Provided By Virginia Systems

  • White Papers // Dec 2011

    CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures

    The use of Graphics Processing Units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all General-Purpose GPU (GPGPU) applications. In spite of this, the framework is available only on...

    Provided By Virginia Systems

  • White Papers // Oct 2011

    StreamMR: An Optimized MapReduce Framework for AMD GPUs

    MapReduce is a programming model from Google that facilitates parallel processing on a cluster of thousands of commodity computers. The success of MapReduce in cluster environments has motivated several studies of implementing MapReduce on a Graphics Processing Unit (GPU), but generally focusing on the NVIDIA GPU. The authors' investigation reveals...

    Provided By Virginia Systems

  • White Papers // Aug 2011

    Spectral Method Characterization on FPGA and GPU Accelerators

    The increasing demand for High Performance Computing (HPC) across myriad environments, ranging from traditional supercomputers to embedded devices has outpaced the conventional processor's ability to deliver performance. As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented...

    Provided By Virginia Systems

  • White Papers // Jul 2011

    Performance Characterization and Optimization of Atomic Operations on AMD GPUs

    Atomic operations are important building blocks in supporting general-purpose computing on Graphics Processing Units (GPUs). For instance, they can be used to coordinate execution between concurrent threads, and in turn, assist in constructing complex data structures such as hash tables or implementing GPU-wide barrier synchronization. While the performance of atomic...

    Provided By Virginia Systems

  • White Papers // Jan 2011

    GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors

    Next-generation, high-throughput sequencers are now capable of producing hundreds of billions of short sequences (reads) in a single day. The task of accurately mapping the reads back to a reference genome is of particular importance because it is used in several other biological applications, e.g., genome re-sequencing, DNA methylation, and...

    Provided By Virginia Systems

  • White Papers // Feb 2013

    A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures

    Emergent heterogeneous systems must be optimized for both power and performance at exascale. Massive parallelism combined with complex memory hierarchies form a barrier to efficient application and architecture design. These challenges are exacerbated with GPUs as parallelism increases orders of magnitude and power consumption can easily double. Models have been...

    Provided By Virginia Systems

  • White Papers // Aug 2012

    Storage Power Optimizations for Client Devices and Data Centers

    Storage devices are essential to all computing systems that store user data from desktops, to notebooks and Ultrabooks to data centers. Hard Disk Drives (HDDs) or Solid State Drives (SSDs) are today's most popular storage solutions. Active power for storage devices has significant impact on the battery life of client...

    Provided By Virginia Systems

  • White Papers // Aug 2012

    Critical Path-Based Thread Placement for NUMA Systems

    Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability. However, NUMA introduces performance penalties due to remote memory accesses. Without efficiently managing data layout and thread mapping to cores, scientific applications may suffer performance loss, even if they are optimized for NUMA. In this paper, the...

    Provided By Virginia Systems

  • White Papers // May 2013

    Model-Based, Memory-Centric Performance and Power Optimization on NUMA Multiprocessors

    Non-Uniform Memory Access (NUMA) architectures are ubiquitous in HPC systems. NUMA along with other factors including socket layout, data placement, and memory contention significantly increase the search space to find an optimal mapping of applications to NUMA systems. This search space may be intractable for online optimization and challenging for...

    Provided By Virginia Systems

  • White Papers // Aug 2012

    An Iso-Energy-Efficient Approach to Scalable System Power-Performance Optimization

    The power consumption of a large scale system ultimately limits its performance. Consuming less energy while preserving performance leads to better system utilization at scale. The iso-energy-efficiency model was proposed as a metric and methodology for explaining power and performance efficiency on scalable systems. For use in practice, the authors...

    Provided By Virginia Systems

  • White Papers // Nov 2008

    On the Impact of Disk Scrubbing on Energy Savings

    The increasing use of computers for saving valuable data imposes stringent reliability constraints on storage systems. Reliability improvement via use of redundancy is a common practice. As the disk capacity improves, advanced techniques such as disk scrubbing are being employed to proactively fix latent sector errors. These techniques utilize the...

    Provided By Virginia Systems

  • White Papers // May 2014

    On the Latency of Erasure-Coded Cloud Storage Systems

    Distributed (Cloud) Storage Systems (DSS) exhibit heterogeneity in several dimensions such as the volume (size) of data, frequency of data access and the desired degree of reliability. Ultimately, the complex interplay between these dimensions impacts the latency performance of cloud storage systems. To this end, the authors propose and analyze...

    Provided By Virginia Systems

  • White Papers // Jan 2014

    A Cross-Layer Approach for Power-Performance Optimization in Distributed Mobile Systems

    Current trends indicate that delivery of multimedia content to mobile systems operating in distributed environments will drive many future applications. The next generation of mobile systems with multimedia processing capabilities and wireless connectivity will be increasingly deployed in highly dynamic and distributed environments for multimedia playback and delivery. The challenge...

    Provided By Virginia Systems

  • White Papers // Jan 2014

    Distributed Storage Systems with Secure and Exact Repair - New Results

    Distributed storage is the default technique for storing data in all new generation applications. The data from a file is stored in a decentralized manner on several commodity nodes/disks that when collectively used are capable of recovering the entire file. Replication-based schemes to ensure data reliability incur huge storage overhead...

    Provided By Virginia Systems

  • White Papers // Apr 2014

    On Cache-Aware Task Partitioning for Multicore Embedded Real-Time Systems

    One approach for real-time scheduling on multicore platforms involves task partitioning, which statically assigns tasks to cores, enabling subsequent core local scheduling. No past partitioning schemes explicitly consider cache effects. The authors present a partitioning scheme called LWFG, which minimizes cache misses by partitioning tasks that share memory onto the...

    Provided By Virginia Systems

  • White Papers // May 2014

    KairosVM: Deterministic Introspection for Real-Time Virtual Machine Hierarchical Scheduling

    Consolidation and isolation are key technologies that drove the undisputed popularity of virtualization in most of the computer industry. This popularity has recently led to a growing interest in real-time virtualization, making this technology enter the real-time system industry. However, it has several issues due to the strict timing guarantees...

    Provided By Virginia Systems

  • White Papers // Mar 2014

    Towards Operating System Support for Heterogeneous-ISA Platforms

    Given an emerging trend towards OS-capable heterogeneous-ISA multi-core processors, the authors address the problem of how to redesign classic Symmetric Multi-Processing (SMP) Operating Systems (OS) to exploit this hardware. They propose an OS design that consists of multiple kernels, each one compiled for, and run on, a specific ISA of...

    Provided By Virginia Systems

  • White Papers // Feb 2014

    Integrating Transactionally Boosted Data Structures With STM Frameworks: A Case Study on Set

    Providing transactional collections of data structures with the same performance of highly concurrent data structures enables performance-competitive transactional composability. Although Software Transactional Memory (STM) is increasingly becoming a promising technology for designing and implementing transactional applications, concurrent data structures still do not exploit STM's advantages. Recently, Optimistic Transactional Boosting (OTB)...

    Provided By Virginia Systems

  • White Papers // Feb 2014

    Remote Invalidation: Optimizing the Critical Path of Memory Transactions

    Software Transactional Memory (STM) systems are increasingly emerging as a promising alternative to traditional locking algorithms for implementing generic concurrent applications. To achieve generality, STM systems incur overheads to the normal sequential execution path, including those due to spin locking, validation (or invalidation), and commit/abort routines. The authors propose a...

    Provided By Virginia Systems

  • White Papers // Feb 2014

    HiperTM: High Performance, Fault-Tolerant Transactional Memory

    The authors present HiperTM, a high performance active replication protocol for fault-tolerant distributed transactional memory. The active replication paradigm allows transactions to execute locally, costing them only a single network communication step during transaction execution. Shared objects are replicated across all sites, avoiding remote object accesses. Replica consistency is ensured...

    Provided By Virginia Systems