North Carolina State University

Displaying 1-40 of 246 results

  • White Papers // Sep 2014

    A Semantics-Oriented Storage Model for Big Heterogeneous RDF Data

    Increasing availability of RDF data covering different domains is enabling ad-hoc integration of different kinds of data to suit varying needs. This usually results in large collections of data such as the billion triple challenge datasets or SNOMED CT that are not just "Big" in the sense of volume but...

    Provided By North Carolina State University

  • White Papers // Jun 2014

    ScalaJack: Customized Scalable Tracing with in-situ Data Analysis

    Root cause diagnosis of large-scale HPC applications often fails because tools, specifically trace-based ones, can no longer record all metrics they measure. The authors address this problems by combining customized tracing and providing support for in-situ data analysis via ScalaJack, a framework with customizable instrumentation and pluggable ex-tension capabilities for...

    Provided By North Carolina State University

  • White Papers // Jun 2014

    InVis: An EDM Tool for Graphical Rendering and Analysis of Student Interaction Data

    InVis is a novel visualization tool that was developed to explore, navigate and catalog student interaction data. In-Vis processes datasets collected from interactive educational systems such as intelligent tutoring systems and homework helpers and visualizes the student data as graphs. This visual representation of data provides an interactive environment with...

    Provided By North Carolina State University

  • White Papers // Mar 2014

    NoCMsg: Scalable NoC-Based Message Passing

    Current processor design with ever more cores may ensure that theoretical compute performance still follows past increases (resting from Moore's law), but they also increasingly present a challenge to hardware and software alike. As the core count increases, the Network-on-Chip (NoC) topology has changed from buses over rings and fully...

    Provided By North Carolina State University

  • White Papers // Feb 2014

    Tools for Simulation and Benchmark Generation at Exascale

    The path to exascale High-Performance Computing (HPC) poses several challenges related to power, performance, resilience, productivity, programmability, data movement, and data management. Investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. Simulations...

    Provided By North Carolina State University

  • White Papers // Feb 2014

    Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces

    Competing workloads on a shared storage system cause I/O resource contention and application performance vagaries. This problem is already evident in today's HPC storage systems and is likely to become acute at exascale. The authors need more interaction between application I/O requirements and system software tools to help alleviate the...

    Provided By North Carolina State University

  • White Papers // Feb 2014

    Understanding the Tradeoffs Between Software-Managed Vs. Hardware-Managed Caches in GPUs

    On-chip caches are commonly used in computer systems to hide long off-chip memory access latencies. To manage on-chip caches, either software-managed or hardware-managed schemes can be employed. State-of-art accelerators, such as the NVIDIA Fermi or Kepler GPUs and Intel's forthcoming MIC \"KNights Landing\" (KNL), support both software-managed caches, aka. shared...

    Provided By North Carolina State University

  • White Papers // Feb 2014

    Understanding the Tradeoffs between Software- Managed vs. Hardware-Managed Caches in GPUs

    On-chip caches are commonly used in computer systems to hide long off-chip memory access latencies. To manage on-chip caches, either software-managed or hardware-managed schemes can be employed. State-of-art accelerators, such as the NVIDIA Fermi or Kepler GPUs and Intel's forthcoming MIC \"KNights Landing\" (KNL), support both software-managed caches, aka. shared...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    Warp-Level Divergence in GPUs: Characterization, Impact, and Mitigation

    High throughput architectures rely on high Thread-Level Parallelism (TLP) to hide execution latencies. In state-of-art Graphics Processing Units (GPUs), threads are organized in a grid of Thread Blocks (TBs) and each TB contains tens to hundreds of threads. With a TB-level resource management scheme, all the resource required by a...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    Soft Error Protection via Fault-Resilient Data Representations

    Embedded systems are increasingly deployed in harsh environments that their components were not necessarily designed for. As a result, systems may have to sustain transient faults, i.e., both single-bit soft errors caused by radiation from space and transient errors caused by lower signal/noise ratio in smaller fabrication sizes. Hardware can...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    Fair Caching in a Chip Multiprocessor Architecture

    In this paper, the authors present a detailed study of fairness in cache sharing between threads in a Chip Multi-Processor (CMP) architecture. Prior work in CMP architectures has only studied throughput optimization techniques for a shared cache. The issue of fairness in cache sharing, and its relation to throughput, has...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

    In this paper, the authors examine the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the Message Passing Interface (MPI) and by using hardware counters on the microprocessor, they observe each application's...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    IPSec/VPN Security Policy: Correctness, Conflict Detection and Resolution

    IPSec (Internet Security Protocol suite) functions will be executed correctly only if its policies are correctly specified and configured. Manual IPSec policy configuration is inefficient and error-prone. An erroneous policy could lead to communication blockade or serious security breach. In addition, even if policies are specified correctly in each domain,...

    Provided By North Carolina State University

  • White Papers // Dec 2013

    Performance Assessment of A Multi-block Incompressible Navier-Stokes Solver using Directive-based GPU Programming in a Cluster Environment

    OpenACC, a directive-based GPU programming standard, is emerging as a promising technology for massively-parallel accelerators, such as General-Purpose computing on Graphics Processing Units (GPGPU), Accelerated Processing Unit (APU) and Many Integrated Core architecture (MIC). The heterogeneous nature of these accelerators call for careful designs of parallel algorithms and data management,...

    Provided By North Carolina State University

  • White Papers // Dec 2013

    Exploiting Data Representation for Fault Tolerance

    The authors explore the link between data representation and soft errors in dot products. They present an analytic model for the absolute error introduced should a soft error corrupt a bit in an IEEE-754 floating-point number. They show how this finding relates to the fundamental linear algebra concepts of normalization...

    Provided By North Carolina State University

  • White Papers // Nov 2013

    PAQO: Preference-Aware Query Optimization for Decentralized Database Systems

    The declarative nature of SQL has traditionally been a major strength. Users simply state what information they are interested in, and the database management system determines the best plan for retrieving it. A consequence of this model is that should a user ever want to specify some aspect of how...

    Provided By North Carolina State University

  • White Papers // Sep 2013

    WHYPER: Towards Automating Risk Assessment of Mobile Applications

    In this paper, the authors present the first step in addressing this challenge. Specifically, they focus on permissions for a given application and examine whether the application description provides any indication for: why the application needs permission. They present WHYPER, a framework using Natural Language Processing (NLP) techniques to identify...

    Provided By North Carolina State University

  • White Papers // Jul 2013

    A Unified View of Non-monotonic Core Selection and Application Steering in Heterogeneous Chip Multiprocessors

    A single-ISA Heterogeneous Chip Multi-Processor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. The authors consider HCMPs comprised of non-monotonic core types where each core type is performance-optimized to different instruction level behavior and hence cannot be ranked - different program...

    Provided By North Carolina State University

  • White Papers // Jun 2013

    MetaSymploit: Day-One Defense Against Script-Based Attacks with Security-Enhanced Symbolic Analysis

    In this paper, the authors propose MetaSymploit, the first system of fast attack script analysis and automatic signature generation for a network Intrusion Detection System (IDS). As soon as a new attack script is developed and distributed, MetaSymploit uses security-enhanced symbolic execution to quickly analyze the script and automatically generate...

    Provided By North Carolina State University

  • White Papers // Mar 2013

    Reasonableness Meets Requirements: Regulating Security and Privacy in Software

    Software security and privacy issues regularly grab headlines amid fears of identity theft, data breaches, and threats to security. Policymakers have responded with a variety of approaches to combat such risk. Suggested measures include promulgation of strict rules, enactment of open-ended standards, and, at times, abstention in favor of allowing...

    Provided By North Carolina State University

  • White Papers // Feb 2013

    Taming Hosted Hypervisors With (Mostly) Deprivileged Execution

    Recent years have witnessed increased adoption of hosted hypervisors in virtualized computer systems. By non-intrusively extending commodity OSs, hosted hypervisors can effectively take advantage of a variety of mature and stable features as well as the existing broad user base of commodity OSs. However, virtualizing a computer system is still...

    Provided By North Carolina State University

  • White Papers // Feb 2013

    Adaptive Cache Bypassing for Inclusive Last Level Caches

    Cache hierarchy designs, including bypassing, replacement, and the inclusion property, have significant performance impact. Recent works on high performance caches have shown that cache bypassing is an effective technique to enhance the Last Level Cache (LLC) performance. However, commonly used inclusive cache hierarchy cannot benefit from this technique because bypassing...

    Provided By North Carolina State University

  • White Papers // Feb 2013

    Directory-Oblivious Capacity Sharing in Tiled CMPs

    In bus-based CMPs with private caches, Capacity Sharing is applied by spilling victim cache blocks from over-utilized caches to under-utilized ones. If a spilled block is needed, it can be retrieved by posting a miss on the bus. Prior work in this domain focused on Capacity Sharing design and put...

    Provided By North Carolina State University

  • White Papers // Jan 2013

    Flexible Capacity Partitioning in Many-Core Tiled CMPs

    Chip Multi-Processors (CMP) have become a mainstream computing platform. As transistor density shrinks and the number of cores increases, more scalable CMP architectures will emerge. Recently, tiled architectures have shown such scalable characteristics and been used in many industry chips. The memory hierarchy in tiled architectures presents interesting design challenges....

    Provided By North Carolina State University

  • White Papers // Jan 2013

    QuickSense: Fast and Energy-Efficient Channel Sensing for Dynamic Spectrum Access Networks

    Spectrum sensing, the task of discovering spectrum usage at a given location, is a fundamental problem in dynamic spectrum access networks. While sensing in narrow spectrum bands is well studied in previous work, wideband spectrum sensing is challenging since a wideband radio is generally too expensive and power consuming for...

    Provided By North Carolina State University

  • White Papers // Jan 2013

    Characterizing Link Connectivity for Opportunistic Mobile Networking: Does Mobility Suffice?

    With recent drastic growth in the number of users carrying smart mobile devices, it is not hard to envision opportunistic ad-hoc communications taking place with such devices carried by humans. This leads to, however, a new challenge to the conventional link-level metrics, solely defined based on user mobility, such as...

    Provided By North Carolina State University

  • White Papers // Jan 2013

    Modeling Flexible Business Processes

    Current approaches of designing business processes rely on traditional workflow technologies and thus take a logically centralized view of processes. Processes designed in that manner assume the participants will act as invoked, thus limiting their flexibility or autonomy. Flexibility is in conflict with both reusability and compliance. The authors propose...

    Provided By North Carolina State University

  • White Papers // Jan 2013

    A Semantic Protocol-Based Approach for Developing Business Processes

    A (business) protocol is a modular, public specification of an interaction among different roles that achieves a desired purpose. The authors' model protocols in terms of the commitments of the participating roles. Commitments enable reasoning about actions, thus allowing the participants to comply with protocols while acting flexibly to exploit...

    Provided By North Carolina State University

  • White Papers // Dec 2012

    Scheduling Cloud Capacity for Time-Varying Customer Demand

    As utility computing resources become more ubiquitous, service providers increasingly look to the cloud for an in-full or in-part infrastructure to serve utility computing customers on demand. Given the costs associated with cloud infrastructure, dynamic scheduling of cloud resources can significantly lower costs while providing an acceptable service level. The...

    Provided By North Carolina State University

  • White Papers // Dec 2012

    Auto-Generation and Auto-Tuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

    In this paper, the authors develop and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Their proposed framework takes a most concise specification of stencil...

    Provided By North Carolina State University

  • White Papers // Nov 2012

    On the Accurate Identification of Network Service Dependencies in Distributed Systems

    The automated identification of network service dependencies remains a challenging problem in the administration of large distributed systems. Advances in developing solutions for this problem have immediate and tangible benefits to operators in the field. When the dependencies of the services in a network are better-understood, planning for and responding...

    Provided By North Carolina State University

  • White Papers // Oct 2012

    HadISD: A Quality-Controlled Global Synoptic Report Database for Selected Variables at Long-Term Stations From 1973 - 2011

    In this paper, the authors describes the creation of HadISD: an automatically quality-controlled synoptic resolution dataset of temperature, dewpoint temperature, sea-level pressure, wind speed, wind direction and cloud cover from global weather stations for 1973 - 2011. The full dataset consists of over 6000 stations, with 3427 long-term stations deemed...

    Provided By North Carolina State University

  • White Papers // Sep 2012

    Collaborative Assessment of Functional Reliability in Wireless Networks

    Nodes that are part of a multi-hop wireless network, typically deployed in mission critical settings, are expected to perform specific functions. Establishing a notion of reliability of the nodes with respect to each function (referred to as Functional Reliability or FR) is essential for efficient operations and management of the...

    Provided By North Carolina State University

  • White Papers // Sep 2012

    Is Link Signature Dependable for Wireless Security?

    Link signature, which refers to the unique and reciprocal wireless channel between a pair of transceivers, has gained significant attentions recently due to its effectiveness in signal authentication and shared secret construction for various wireless applications. A fundamental assumption of this technique is that the wireless signals received at two...

    Provided By North Carolina State University

  • White Papers // Sep 2012

    An Efficient Algorithm for Solving Traffic Grooming Problems in Optical Networks

    The authors consider the Virtual Topology and Traffic Routing (VTTR) problem, a sub-problem of traffic grooming that arises as a fundamental network design problem in optical networks. The objective of VTTR is to determine the minimum number of light-paths so as to satisfy a set of traffic demands, and does...

    Provided By North Carolina State University

  • White Papers // Sep 2012

    Scalable Optimal Traffic Grooming in WDM Rings Incorporating Fast RWA Formulation

    The authors present a scalable formulation for the traffic grooming problem in WDM ring networks. Specifically, they modify the ILP formulation to replace the constraints related to Routing and Wavelength Assignment (RWA), typically based on a link approach, with a new set of constraints based on the Maximal Independent Set...

    Provided By North Carolina State University

  • White Papers // Sep 2012

    Reducing Data Movement Costs Using Energy-Efficient, Active Computation on SSD

    Modern scientific discovery often involves running complex application simulations on supercomputers, followed by a sequence of data analysis tasks on smaller clusters. This offline approach suffers from significant data movement costs such as redundant I/O, storage bandwidth bottleneck, and wasted CPU cycles, all of which contribute to increased energy consumption...

    Provided By North Carolina State University

  • White Papers // Aug 2012

    A Physical Design Study of FabScalar-generated Superscalar Cores

    FabScalar is a recently published toolset for automatically composing synthesizable Register-Transfer-Level (RTL) designs of diverse superscalar cores. FabScalar is a recently published tool for automatically generating superscalar cores, of different pipeline widths, depths and sizes. The output of FabScalar is a synthesizable Register-Transfer-Level (RTL) description of the desired core. While...

    Provided By North Carolina State University

  • White Papers // Aug 2012

    Network Virtualization: Technologies, Perspectives, and Frontiers

    Network virtualization refers to a broad set of technologies. Commercial solutions have been offered by the industry for years, while more recently the academic community has emphasized virtualization as an enabler for network architecture research, deployment, and experimentation. The authors review the entire spectrum of relevant approaches with the goal...

    Provided By North Carolina State University

  • White Papers // Aug 2012

    A Fast Path-Based ILP Formulation for Offline RWA in Mesh Optical Networks

    RWA is a fundamental problem in the design and control of optical networks. The authors introduce the concept of symmetric RWA solutions and present a new ILP formulation to construct optimally such solutions. The formulation scales to mesh topologies representative of backbone and regional networks. Numerical results demonstrate that the...

    Provided By North Carolina State University

  • White Papers // Mar 2009

    Feedback-Directed Page Placement for CcNUMA Via Hardware-Generated Memory Traces

    Non-Uniform Memory Architectures with cache coherence (ccNUMA) are becoming increasingly common, not just for large-scale high performance platforms but also in the context of multi-cores architectures. Under ccNUMA, data placement may influence overall application performance significantly as references resolved locally to a processor/core impose lower latencies than remote ones. This...

    Provided By North Carolina State University

  • White Papers // Mar 2009

    Selecting Trustworthy Service in Service-Oriented Environments

    Most of current service selection approaches in service-oriented environments fail to capture the dynamic relationships between services or assume the complete knowledge of service composition is known as a prior. In these cases, problems may arise when consumers are not aware of the underlying composition behind services. The authors propose...

    Provided By North Carolina State University

  • White Papers // Feb 2014

    Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces

    Competing workloads on a shared storage system cause I/O resource contention and application performance vagaries. This problem is already evident in today's HPC storage systems and is likely to become acute at exascale. The authors need more interaction between application I/O requirements and system software tools to help alleviate the...

    Provided By North Carolina State University

  • White Papers // Feb 2014

    Understanding the Tradeoffs Between Software-Managed Vs. Hardware-Managed Caches in GPUs

    On-chip caches are commonly used in computer systems to hide long off-chip memory access latencies. To manage on-chip caches, either software-managed or hardware-managed schemes can be employed. State-of-art accelerators, such as the NVIDIA Fermi or Kepler GPUs and Intel's forthcoming MIC \"KNights Landing\" (KNL), support both software-managed caches, aka. shared...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    Warp-Level Divergence in GPUs: Characterization, Impact, and Mitigation

    High throughput architectures rely on high Thread-Level Parallelism (TLP) to hide execution latencies. In state-of-art Graphics Processing Units (GPUs), threads are organized in a grid of Thread Blocks (TBs) and each TB contains tens to hundreds of threads. With a TB-level resource management scheme, all the resource required by a...

    Provided By North Carolina State University

  • White Papers // Jun 2012

    Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

    Given the extraordinary computational power of modern Graphics Processing Units (GPUs), general purpose computation on GPUs (GPGPU) has become an increasingly important platform for high performance computing. To better understand how well the GPU resource has been utilized by application developers and then to facilitate them to develop high performance...

    Provided By North Carolina State University

  • White Papers // Feb 2013

    Adaptive Cache Bypassing for Inclusive Last Level Caches

    Cache hierarchy designs, including bypassing, replacement, and the inclusion property, have significant performance impact. Recent works on high performance caches have shown that cache bypassing is an effective technique to enhance the Last Level Cache (LLC) performance. However, commonly used inclusive cache hierarchy cannot benefit from this technique because bypassing...

    Provided By North Carolina State University

  • White Papers // Feb 2012

    Locality Principle Revisited: A Probability-Based Quantitative Approach

    This paper revisits the fundamental concept of the locality of references and proposes to quantify it as a conditional probability: in an address stream, given the condition that an address is accessed, how likely the same address (temporal locality) or an address within its neighborhood (spatial locality) will be accessed...

    Provided By North Carolina State University

  • White Papers // Feb 2012

    CPU-Assisted GPGPU on Fused CPU-GPU Architectures

    This paper presents a novel approach to utilize the CPU resource to facilitate the execution of GPGPU programs on fused CPU-GPU architectures. In the authors' model of fused architectures, the GPU and the CPU are integrated on the same die and share the on-chip L3 cache and off-chip memory, similar...

    Provided By North Carolina State University

  • White Papers // Jan 2011

    Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs

    Non-determinism makes concurrent bugs extremely difficult to reproduce and to debug. In this paper, the authors propose a new debugging primitive to facilitate the debugging process by exposing this non-deterministic behavior to the programmer. The key idea is to generate a time-ordered trace of events such as function calls/returns and...

    Provided By North Carolina State University

  • White Papers // Sep 2011

    A Tunable, Software-based DRAM Error Detection and Correction Library for HPC

    Proposed exascale systems will present a number of considerable resiliency challenges. In particular, DRAM soft-errors, or bit-flips, are expected to greatly increase due to the increased memory density of these systems. Current hardware-based fault-tolerance methods will be unsuitable for addressing the expected soft error frequency rate. As a result, additional...

    Provided By North Carolina State University

  • White Papers // Sep 2009

    A Programming Model for Massive Data Parallelism with Data Dependencies

    Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For Graphics Processor Units (GPUs), this is particularly the case when program development is aided by environments, such as NVIDIA's Compute Unified Device Architecture (CUDA), which dramatically reduces the...

    Provided By North Carolina State University

  • White Papers // Jan 2009

    PFetch: Software Prefetching Exploiting Temporal Predictability of Memory Access Streams

    CPU speeds have increased faster than the rate of improvement in memory access latencies in the recent past. As a result, with programs that suffer excessive cache misses, the CPU will increasingly be stalled waiting for the memory system to provide the requested memory line. Prefetching is a latency hiding...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    Soft Error Protection via Fault-Resilient Data Representations

    Embedded systems are increasingly deployed in harsh environments that their components were not necessarily designed for. As a result, systems may have to sustain transient faults, i.e., both single-bit soft errors caused by radiation from space and transient errors caused by lower signal/noise ratio in smaller fabrication sizes. Hardware can...

    Provided By North Carolina State University

  • White Papers // Jun 2011

    A Fault Observant Real-Time Embedded Design for Network-on-Chip Control Systems

    Performance and time to market requirements cause many real-time designers to consider Components, Off The Shelf (COTS) for real-time systems. Massive multi-core embedded processors with Network-on-Chip (NoC) designs to facilitate core-to-core communication are becoming common in COTS. These architectures benefit real-time scheduling, but they also pose predictability challenges. In this...

    Provided By North Carolina State University

  • White Papers // Dec 2013

    Exploiting Data Representation for Fault Tolerance

    The authors explore the link between data representation and soft errors in dot products. They present an analytic model for the absolute error introduced should a soft error corrupt a bit in an IEEE-754 floating-point number. They show how this finding relates to the fundamental linear algebra concepts of normalization...

    Provided By North Carolina State University

  • White Papers // Feb 2014

    Tools for Simulation and Benchmark Generation at Exascale

    The path to exascale High-Performance Computing (HPC) poses several challenges related to power, performance, resilience, productivity, programmability, data movement, and data management. Investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. Simulations...

    Provided By North Carolina State University

  • White Papers // Jul 2013

    A Unified View of Non-monotonic Core Selection and Application Steering in Heterogeneous Chip Multiprocessors

    A single-ISA Heterogeneous Chip Multi-Processor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. The authors consider HCMPs comprised of non-monotonic core types where each core type is performance-optimized to different instruction level behavior and hence cannot be ranked - different program...

    Provided By North Carolina State University

  • White Papers // Aug 2012

    A Physical Design Study of FabScalar-generated Superscalar Cores

    FabScalar is a recently published toolset for automatically composing synthesizable Register-Transfer-Level (RTL) designs of diverse superscalar cores. FabScalar is a recently published tool for automatically generating superscalar cores, of different pipeline widths, depths and sizes. The output of FabScalar is a synthesizable Register-Transfer-Level (RTL) description of the desired core. While...

    Provided By North Carolina State University

  • White Papers // Aug 2006

    Assertion-Based Microarchitecture Design for Improved Fault Tolerance

    Protection against transient faults is an important constraint in high-performance processor design. One strategy for achieving efficient reliability is to apply targeted fault checking/masking techniques to different units within an overall reliability regimen. In this paper, the authors propose a novel class of targeted fault checks that verify the functioning...

    Provided By North Carolina State University

  • White Papers // Aug 2006

    The State of ZettaRAM

    Computer architectures are heavily influenced by parameters imposed by memory technologies. Memory hierarchies, virtual memory, prefetching, multithreading, and large-window processors are some well-known examples of architectural innovations influenced by memory constraints. This paper surveys ZettaRAM, a nascent memory technology based on molecular electronics. From patents and papers, the authors distill...

    Provided By North Carolina State University

  • White Papers // Jan 2006

    Non-Uniform Program Analysis & Repeatable Execution Constraints: Exploiting Out-of-Order Processors in Real-Time Systems

    In this paper the authors enable easy, tight, and safe timing analysis of contemporary complex processors. They exploit the fact that out-of-order processors can be analyzed via simulation in the absence of variable control-flow. In their first technique, Non-Uniform Program Analysis (NUPA), program segments with a single flow of control...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    Fair Caching in a Chip Multiprocessor Architecture

    In this paper, the authors present a detailed study of fairness in cache sharing between threads in a Chip Multi-Processor (CMP) architecture. Prior work in CMP architectures has only studied throughput optimization techniques for a shared cache. The issue of fairness in cache sharing, and its relation to throughput, has...

    Provided By North Carolina State University

  • White Papers // Jul 2009

    Core-Selectability in Chip Multiprocessors

    The centralized structures necessary for the extraction of Instruction-Level Parallelism (ILP) are consuming progressively smaller portions of the total die area of Chip Multi-Processors (CMP). The reason for this is that scaling these structures does not enhance general performance as much as scaling the cache and interconnect. However, the fact...

    Provided By North Carolina State University

  • White Papers // Mar 2012

    Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core Processor

    Predictability of task execution is paramount for real-time systems so that upper bounds of execution times can be determined via static timing analysis. Static timing analysis on Network-on-Chip (NoC) processors may result in unsafe underestimations when the underlying communication paths are not considered. This stems from contention on the underlying...

    Provided By North Carolina State University

  • White Papers // Nov 2009

    CHOP: Adaptive Filter-Based DRAM Caching for CMP Server Platforms

    As manycore architectures enable a large number of cores on the die, a key challenge that emerges is the availability of memory bandwidth with conventional DRAM solutions. To address this challenge, integration of large DRAM caches that provide as much as 5

    Provided By North Carolina State University

  • White Papers // Jan 2011

    HAQu: Hardware-Accelerated Queueing for Fine-Grained Threading on a Chip Multiprocessor

    Queues are commonly used in multithreaded programs for synchronization and communication. However, because software queues tend to be too expensive to support fine grained parallelism, hardware queues have been proposed to reduce overhead of communication between cores. Hardware queues require modifications to the processor core and need a custom interconnect....

    Provided By North Carolina State University

  • White Papers // Jun 2009

    Architecture Support for Improving Bulk Memory Copying and Initialization Performance

    Bulk memory copying and initialization is one of the most ubiquitous operations performed in current computer systems by both user applications and operating systems. While many current systems rely on a loop of loads and stores, there are proposals to introduce a single instruction to perform bulk memory copying. While...

    Provided By North Carolina State University

  • White Papers // Jul 2008

    Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures

    In a multi-programmed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time. A heterogeneous Chip Multi-Processor (CMP) architecture consists of processor cores...

    Provided By North Carolina State University

  • White Papers // Jul 2008

    Exploiting Locality to Ameliorate Packet Queue Contention and Serialization

    Packet processing systems maintain high throughput despite relatively high memory latencies by exploiting the coarse-grained parallelism available between packets. In particular, multiple processors are used to overlap the processing of multiple packets. Packet queuing - the fundamental mechanism enabling packet scheduling, differentiated services, and traffic isolation - requires a read-modify-write...

    Provided By North Carolina State University

  • White Papers // Dec 2012

    Auto-Generation and Auto-Tuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

    In this paper, the authors develop and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Their proposed framework takes a most concise specification of stencil...

    Provided By North Carolina State University

  • White Papers // Aug 2011

    Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs

    Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes with hundreds of cores and non-uniform memory access latencies are expected within the next decade. However, even current petascale systems with tens of cores per node suffer from memory bottlenecks. As core counts increase, memory issues will become critical...

    Provided By North Carolina State University

  • White Papers // Jan 2010

    Data-Intensive Document Clustering on GPU Clusters

    Document clustering is a central method to mine massive amounts of data. Due to the explosion of raw documents generated on the Internet and the necessity to analyze them efficiently in various intelligent information systems, clustering techniques have reached their limitations on single processors. Instead of single processors, general purpose...

    Provided By North Carolina State University

  • White Papers // Jan 2014

    Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

    In this paper, the authors examine the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the Message Passing Interface (MPI) and by using hardware counters on the microprocessor, they observe each application's...

    Provided By North Carolina State University

  • White Papers // Jun 2014

    ScalaJack: Customized Scalable Tracing with in-situ Data Analysis

    Root cause diagnosis of large-scale HPC applications often fails because tools, specifically trace-based ones, can no longer record all metrics they measure. The authors address this problems by combining customized tracing and providing support for in-situ data analysis via ScalaJack, a framework with customizable instrumentation and pluggable ex-tension capabilities for...

    Provided By North Carolina State University

  • White Papers // Mar 2014

    NoCMsg: Scalable NoC-Based Message Passing

    Current processor design with ever more cores may ensure that theoretical compute performance still follows past increases (resting from Moore's law), but they also increasingly present a challenge to hardware and software alike. As the core count increases, the Network-on-Chip (NoC) topology has changed from buses over rings and fully...

    Provided By North Carolina State University

  • White Papers // Feb 2014

    Understanding the Tradeoffs between Software- Managed vs. Hardware-Managed Caches in GPUs

    On-chip caches are commonly used in computer systems to hide long off-chip memory access latencies. To manage on-chip caches, either software-managed or hardware-managed schemes can be employed. State-of-art accelerators, such as the NVIDIA Fermi or Kepler GPUs and Intel's forthcoming MIC \"KNights Landing\" (KNL), support both software-managed caches, aka. shared...

    Provided By North Carolina State University

  • White Papers // Jun 2012

    CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures

    Data-parallel languages feature fine-grained parallel primitives that can be supported by compilers targeting modern many-core architectures where data parallelism must be exploited to fully utilize the hardware. Previous research has focused on converting data-parallel languages for SIMD (Single Instruction Multiple Data) architectures. However, directly applying them to today's SIMT (Single...

    Provided By North Carolina State University

  • White Papers // Jan 2012

    ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces

    Benchmarks are essential for evaluating HPC hardware and software for petascale machines and beyond. But benchmark creation is a tedious manual process. As a result, benchmarks tend to lag behind the development of complex scientific codes. This work contributes an automated approach to the creation of communication benchmarks. Given an...

    Provided By North Carolina State University

  • White Papers // Mar 2012

    Low ContentionMapping of Real-Time Tasks onto a TilePro 64 Core Processor

    Predictability of task execution is paramount for real-time systems so that upper bounds of execution times can be determined via static timing analysis. Static timing analysis on Network-on-Chip (NoC) processors may result in unsafe underestimations when the underlying communication paths are not considered. This stems from contention on the underlying...

    Provided By North Carolina State University