University of Tehran

Displaying 1-40 of 73 results

  • White Papers // May 2014

    Model and Complexity Results for Tree Traversals on Hybrid Platform

    The authors study the complexity of traversing tree-shaped work flows whose tasks require large I/O les. They target a heterogeneous architecture with two resources of different types, each equipped with its own memory, such as a multicore node equipped with a dedicated accelerator (FPGA or GPU). Tasks in the work...

    Provided By University of Tehran

  • White Papers // Oct 2013

    An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware

    The enormous gap between the high-performance capabilities of today's CPUs and off-chip communication poses extreme challenges to the development of numerical software that is scalable and achieves high performance. In this paper, the authors describe a successful methodology to address these challenges - starting with their algorithm design, through kernel...

    Provided By University of Tehran

  • White Papers // Oct 2013

    Designing LU-QR Hybrid Solvers for Performance and Stability

    In this paper, the authors introduce hybrid LU-QR algorithms for solving dense linear systems of the form Ax = b. Throughout a matrix factorization, these algorithms dynamically alternate LU with local pivoting and QR elimination steps, based upon some robustness criterion. LU elimination steps can be very efficiently parallelized, and...

    Provided By University of Tehran

  • White Papers // Sep 2013

    Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters With PaRSEC

    In this paper, the authors introduce a new systolic algorithm for QR factorization, and its implementation on a supercomputing cluster of multicore nodes. The algorithm targets a virtual 3D-array and requires only local communications. The implementation of the algorithm uses threads at the node level, and MPI for inter-node communications....

    Provided By University of Tehran

  • White Papers // Sep 2013

    Scalable Dense Linear Algebra on Heterogeneous Hardware

    Design of systems exceeding 1 Pflop/s and the push toward 1 Eflop/s, forced a dramatic shift in hardware design. Various physical and engineering constraints resulted in introduction of massive parallelism and functional hybridization with the use of accelerator units. This paradigm change brings about a serious challenge for application developers,...

    Provided By University of Tehran

  • White Papers // Sep 2013

    Assessing the Impact of ABFT & Checkpoint Composite Strategies

    Algorithm-specific fault tolerant approaches promise unparalleled scalability and performance in failure-prone environments. With the advances in the theoretical and practical understanding of algorithmic traits enabling such approaches, a growing number of frequently used algorithms (including all widely used factorization kernels) have been proven capable of such properties. These algorithms provide...

    Provided By University of Tehran

  • White Papers // Jul 2013

    On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties

    Gaussian elimination is a canonical linear algebra procedure for solving linear systems of equations. In the last few years, the algorithm received a lot of attention in an attempt to improve its parallel performance. This paper surveys recent developments in parallel implementations of the Gaussian elimination. Five different flavors are...

    Provided By University of Tehran

  • White Papers // Jun 2013

    BlackjackBench: Portable Hardware Characterization with Automated Results' Analysis

    Compilers, autotuners, numerical libraries and other performance sensitive software need information about the underlying hardware. If portable performance is a goal, automatic detection of hardware characteristics is necessary given the dramatic changes undergone by computer hardware. Several system benchmarks exist in the literature. However, as hardware becomes more complex, new...

    Provided By University of Tehran

  • White Papers // Jun 2013

    Transient Error Resilient Hessenberg Reduction on GPU-Based Hybrid Architectures

    Graphics Processing Units (GPUs) are gaining wide spread usage in the field of scientific computing owing to the performance boost GPUs bring to computation intensive applications. The typical con-figuration is to integrate GPUs and CPUs in the same system where the CPUs handle the control flow and part of the...

    Provided By University of Tehran

  • White Papers // Jun 2013

    On the Combination of Silent Error Detection and Checkpointing

    In this paper, the authors revisit traditional checkpointing and rollback recovery strategies, with a focus on silent data corruption errors. Contrarily to fail-stop failures, such latent errors cannot be detected immediately, and a mechanism to detect them must be provided. They consider two models: errors are detected after some delays...

    Provided By University of Tehran

  • White Papers // Apr 2013

    Dynamically Balanced Synchronization-Avoiding LU Factorization With Multicore and GPUs

    Graphics Processing Units (GPUs) brought huge performance improvements in the scientific and numerical fields. The authors present an efficient hybrid CPU/GPU computing approach that is portable, dynamically and efficiently balances the workload between the CPUs and the GPUs, and avoids data transfer bottlenecks that are frequently present in numerical algorithms....

    Provided By University of Tehran

  • White Papers // Apr 2013

    An Evaluation of User-Level Failure Mitigation Support in MPI

    As the scale of computing platforms becomes increasingly extreme, the requirements for application fault tolerance are increasing as well. Techniques to address this problem by improving the resilience of algorithms have been developed, but they currently receive no support from the programming model, and without such support, they are bound...

    Provided By University of Tehran

  • White Papers // Apr 2013

    Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q

    The Blue Gene/Q (BG/Q) system is the third generation in the IBM Blue Gene line of massively parallel, energy efficient supercomputers that increases not only in size but also in complexity compared to its Blue Gene (BG) predecessors. Consequently, gaining insight into the intricate ways in which software and hardware...

    Provided By University of Tehran

  • White Papers // Apr 2013

    MuMMI: Multiple Metrics Modeling Infrastructure

    The MuMMI (Multiple Metrics Modeling Infrastructure) project is an infrastructure that facilitates systematic measurement, modeling, and prediction of performance, power consumption and performance-power tradeoffs for parallel systems. In this paper, the authors present the MuMMI framework, which consists of an Instrumentor, Databases and Analyzer. The MuMMI instrumentor provides for automatic...

    Provided By University of Tehran

  • White Papers // Feb 2013

    ClMAGMA: High Performance Dense Linear Algebra with OpenCL

    This paper presents the design and implementation of several fundamental Dense Linear Algebra (DLA) algorithms in OpenCL. In particular, these are linear system solvers and eigen-value problem solvers. Further, the authors give an overview of the clMAGMA library, an open source, high performance OpenCL library that incorporates the developments presented,...

    Provided By University of Tehran

  • White Papers // Feb 2013

    Multi-Criteria Checkpointing Strategies: Optimizing Response-Time Versus Resource Utilization

    Failures are increasingly threatening the efficiency of HPC systems, and current projections of exascale platforms indicate that rollback recovery, the most convenient method for providing fault tolerance to general-purpose applications, reaches its own limits at such scales. One of the reasons explaining this unnerving situation comes from the focus that...

    Provided By University of Tehran

  • White Papers // Feb 2013

    Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations

    Ideal hardware performance counters provide exact deterministic results. Real-world Performance Monitoring Unit (PMU) implementations do not always live up to this ideal. Events that should be exact and deterministic (such as retired instructions) show run-to-run variation and overcount on x86 64 machines, even when run in strictly controlled environments. These...

    Provided By University of Tehran

  • White Papers // Oct 2012

    Virtual Systolic Array for QR Decomposition

    Systolic arrays offer a very attractive, data centric, execution model as an alternative to the von Neumann architecture. Hardware implementations of systolic arrays turned out not to be viable solutions in the past. This paper shows how the systolic design principles can be applied to a software solution to deliver...

    Provided By University of Tehran

  • White Papers // Oct 2012

    Performance Evaluation of LU Factorization Through Hardware Counter Measurements

    The growing demand for scalable and effective scientific and numerical libraries on multicore architectures forces hardware manufacturers to design solutions that improve both the processor speed and transfer rates between their memory hierarchies. Several studies show that these improvement factors are disproportionate and may vary widely from one architecture to...

    Provided By University of Tehran

  • White Papers // Sep 2012

    Implementing a Blocked Aasen's Algorithm With a Dynamic Scheduler on Multicore Architectures

    Factorization of a dense symmetric indefinite matrix is a key computational kernel in many scientific and engineering simulations. However, it is difficult to develop a scalable factorization algorithm that guarantees numerical stability through pivoting and takes advantage of the symmetry at the same time. This is because such an algorithm...

    Provided By University of Tehran

  • White Papers // Sep 2012

    Fidelity-Aware Utilization Control for Cyber-Physical Surveillance Systems

    Recent years have seen the growing deployments of Cyber-Physical Systems (CPSs) in many mission-critical applications such as security, civil infrastructure, and transportation. These applications often impose stringent requirements on system sensing fidelity and timeliness. How-ever, existing approaches treat these two concerns in isolation and hence are not suitable for CPSs...

    Provided By University of Tehran

  • White Papers // Sep 2012

    Measuring Energy and Power With PAPI

    Energy and power consumption are becoming critical metrics in the design and usage of high performance systems. The authors have extended the Performance API (PAPI) analysis library to measure and report energy and power values. These values are reported using the existing PAPI API, allowing code previously instrumented for performance...

    Provided By University of Tehran

  • White Papers // Sep 2012

    Anatomy of a Globally Recursive Embedded LINPACK Benchmark

    The authors present a complete bottom-up implementation of an embedded LINPACK benchmark on the iPad 2. They use a novel formulation of a recursive LU factorization that is recursive and parallel at the global scope. They believe their new algorithm presents an alternative to existing linear algebra parallelization techniques such...

    Provided By University of Tehran

  • White Papers // Aug 2012

    Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems

    In this paper, the authors analyze the potential of using weights for block-asynchronous relaxation methods on GPUs. For this purpose, they introduce different weighting techniques similar to those applied in block-smoothers for multigrid methods. For test matrices taken from the University of Florida Matrix Collection they report the convergence behavior...

    Provided By University of Tehran

  • White Papers // Aug 2012

    High Performance Dense Linear System Solver With Resilience to Multiple Soft Errors

    In the multi-peta-flop era for supercomputers, the number of computing cores is growing exponentially. However, as integrated circuit technology scales below 65 nm, the critical charge required to flip a gate or a memory cell has been dangerously reduced, causing higher cosmic-radiations-induced soft error rate. Soft error threatens computing system...

    Provided By University of Tehran

  • White Papers // Jul 2012

    Power Aware Computing on GPUs

    Energy and power density concerns in modern processors have led to significant computer architecture research efforts in power-aware and temperature-aware computing. With power dissipation becoming an increasingly vexing problem, power analysis of Graphical Processing Unit (GPU) and its components has become crucial for hardware and software system design. Here, the...

    Provided By University of Tehran

  • White Papers // Jul 2012

    Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architecture

    The authors propose to study the impact on the energy footprint of two advanced algorithmic strategies in the context of high performance dense linear algebra libraries: mixed precision algorithms with iterative refinement allow to run at the peak performance of single precision floating-point arithmetic while achieving double precision accuracy and...

    Provided By University of Tehran

  • White Papers // Jun 2012

    PAPI-V: Performance Monitoring for Virtual Machines

    Cloud computing involves use of a hosted computational environment that can provide elastic compute and storage services on demand. Virtualization is a technology that allows multiple Virtual Machines (VMs) to run on a single physical machine and share its resources. Virtualization is increasingly being used in cloud computing to provide...

    Provided By University of Tehran

  • White Papers // Jun 2012

    A Scalable Framework for Heterogeneous GPU-Based Clusters

    GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance, however, there is little parallel software available that can utilize all CPU cores and all GPUs on the heterogeneous system efficiently. On a heterogeneous cluster, the...

    Provided By University of Tehran

  • White Papers // Jun 2012

    Electricity Bill Capping for Cloud-Scale Data Centers that Impact the Power Markets

    Minimizing the energy consumption of data centers has been researched extensively. However, much less attention is given to a related but different research topic: minimizing the electricity bill of a network of data centers by leveraging different electricity prices in different geographical locations to distribute workloads among those locations. Initial...

    Provided By University of Tehran

  • White Papers // May 2012

    A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI

    Most predictions of Exa-scale machines picture billion way parallelism, encompassing not only millions of cores, but also tens of thousands of nodes. Even considering extremely optimistic advances in hardware reliability, probabilistic amplification entails that failures will be unavoidable. Consequently, software fault tolerance is paramount to maintain future scientific productivity. Two...

    Provided By University of Tehran

  • White Papers // May 2012

    Behavior Dynamics in Cognitive Radio Networks: An Interacting Particle System Approach

    A key feature of cognitive radio network is the intelligence of secondary users which can collaborate to improve the performance. The collaboration in terms of channel recommendation is studied. Via recommendations, the channel preferences of secondary users become dynamic. The corresponding behavior dynamics of secondary users are studied. Particularly, the...

    Provided By University of Tehran

  • White Papers // May 2012

    An Efficient Multiple Access Scheme for Voltage Control in Smart Grid Using WiMAX

    An efficient WiMAX-based multiple access scheme for the state updating in voltage control is introduced. The scheme, called sample-contention scheme, addresses the problem of optimally bringing deviated voltage to a predefined reference level with minimum communication resources by taking advantage of the sparseness of voltage disturbance. An n-sample interval is...

    Provided By University of Tehran

  • White Papers // May 2012

    An Efficient Distributed Randomized Solver With Application to Large Dense Linear Systems

    Randomized algorithms are gaining ground in high performance computing applications as they have the potential to outperform deterministic methods, while still providing accurate results. In this paper, the authors propose a randomized algorithm for distributed multicore architectures to efficiently solve large dense symmetric indefinite linear systems that are encountered, for...

    Provided By University of Tehran

  • White Papers // Mar 2012

    Channel Allocation Under Uncertain Primary Users for Delay Sensitive Secondary Users

    For reducing the channel switching frequency due to uncertain and frequent primary users' interruptions, the authors design a nonparametric channel allocation algorithm which can effectively manage the spectrum resources in distributed cognitive radio networks and serve for delay-sensitive data traffics. In the proposed algorithm, an infinite Gaussian mixture model is...

    Provided By University of Tehran

  • White Papers // Jan 2012

    Enabling Application Resilience With and Without the MPI Standard

    As recent research has demonstrated, it is becoming a necessity for large scale applications to have the ability to tolerate process failure during an execution. As the number of processes increases, checkpoint/restart fault tolerance approaches requiring large concurrent state check-pointing become untenable and radically new methods to address fault tolerance...

    Provided By University of Tehran

  • White Papers // Jan 2012

    Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

    Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming languages (e.g., CUDA), pro ling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. The authors present an optimized numerical...

    Provided By University of Tehran

  • White Papers // Dec 2011

    GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

    In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that are applied to leverage the computing power of SIMD accelerators like GPUs. Although they use a very different approach for this purpose, they share the basic idea of compensating the convergence behavior of an...

    Provided By University of Tehran

  • White Papers // Dec 2011

    Block-Asynchronous Multigrid Smoothers for GPU-Accelerated Systems

    This paper explores the need for asynchronous iteration algorithms as smoothers in multi-grid methods. The hardware target for the new algorithms is top-of-the-line, highly parallel hybrid architectures - multicore-based systems enhanced with GPGPUs. These architectures are the most likely candidates for future high-end supercomputers. To pave the road for their...

    Provided By University of Tehran

  • White Papers // Dec 2011

    GreenWare: Greening Cloud-Scale Data Centers to Maximize the Use of Renewable Energy

    To reduce the negative environmental implications (e.g., CO2 emission and global warming) caused by the rapidly increasing energy consumption, many Internet service operators have started taking various initiatives to operate their cloud-scale data centers with renewable energy. Unfortunately, due to the intermittent nature of renewable energy sources such as wind...

    Provided By University of Tehran

  • White Papers // Jul 2012

    Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architecture

    The authors propose to study the impact on the energy footprint of two advanced algorithmic strategies in the context of high performance dense linear algebra libraries: mixed precision algorithms with iterative refinement allow to run at the peak performance of single precision floating-point arithmetic while achieving double precision accuracy and...

    Provided By University of Tehran

  • White Papers // Sep 2012

    Implementing a Blocked Aasen's Algorithm With a Dynamic Scheduler on Multicore Architectures

    Factorization of a dense symmetric indefinite matrix is a key computational kernel in many scientific and engineering simulations. However, it is difficult to develop a scalable factorization algorithm that guarantees numerical stability through pivoting and takes advantage of the symmetry at the same time. This is because such an algorithm...

    Provided By University of Tehran

  • White Papers // Sep 2012

    Measuring Energy and Power With PAPI

    Energy and power consumption are becoming critical metrics in the design and usage of high performance systems. The authors have extended the Performance API (PAPI) analysis library to measure and report energy and power values. These values are reported using the existing PAPI API, allowing code previously instrumented for performance...

    Provided By University of Tehran

  • White Papers // Jun 2012

    PAPI-V: Performance Monitoring for Virtual Machines

    Cloud computing involves use of a hosted computational environment that can provide elastic compute and storage services on demand. Virtualization is a technology that allows multiple Virtual Machines (VMs) to run on a single physical machine and share its resources. Virtualization is increasingly being used in cloud computing to provide...

    Provided By University of Tehran

  • White Papers // Sep 2012

    Anatomy of a Globally Recursive Embedded LINPACK Benchmark

    The authors present a complete bottom-up implementation of an embedded LINPACK benchmark on the iPad 2. They use a novel formulation of a recursive LU factorization that is recursive and parallel at the global scope. They believe their new algorithm presents an alternative to existing linear algebra parallelization techniques such...

    Provided By University of Tehran

  • White Papers // May 2012

    A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI

    Most predictions of Exa-scale machines picture billion way parallelism, encompassing not only millions of cores, but also tens of thousands of nodes. Even considering extremely optimistic advances in hardware reliability, probabilistic amplification entails that failures will be unavoidable. Consequently, software fault tolerance is paramount to maintain future scientific productivity. Two...

    Provided By University of Tehran

  • White Papers // May 2012

    An Efficient Distributed Randomized Solver With Application to Large Dense Linear Systems

    Randomized algorithms are gaining ground in high performance computing applications as they have the potential to outperform deterministic methods, while still providing accurate results. In this paper, the authors propose a randomized algorithm for distributed multicore architectures to efficiently solve large dense symmetric indefinite linear systems that are encountered, for...

    Provided By University of Tehran

  • White Papers // Jul 2012

    Power Aware Computing on GPUs

    Energy and power density concerns in modern processors have led to significant computer architecture research efforts in power-aware and temperature-aware computing. With power dissipation becoming an increasingly vexing problem, power analysis of Graphical Processing Unit (GPU) and its components has become crucial for hardware and software system design. Here, the...

    Provided By University of Tehran

  • White Papers // Jan 2012

    Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

    Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming languages (e.g., CUDA), pro ling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. The authors present an optimized numerical...

    Provided By University of Tehran

  • White Papers // Jun 2012

    A Scalable Framework for Heterogeneous GPU-Based Clusters

    GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance, however, there is little parallel software available that can utilize all CPU cores and all GPUs on the heterogeneous system efficiently. On a heterogeneous cluster, the...

    Provided By University of Tehran

  • White Papers // Dec 2011

    Block-Asynchronous Multigrid Smoothers for GPU-Accelerated Systems

    This paper explores the need for asynchronous iteration algorithms as smoothers in multi-grid methods. The hardware target for the new algorithms is top-of-the-line, highly parallel hybrid architectures - multicore-based systems enhanced with GPGPUs. These architectures are the most likely candidates for future high-end supercomputers. To pave the road for their...

    Provided By University of Tehran

  • White Papers // Aug 2012

    High Performance Dense Linear System Solver With Resilience to Multiple Soft Errors

    In the multi-peta-flop era for supercomputers, the number of computing cores is growing exponentially. However, as integrated circuit technology scales below 65 nm, the critical charge required to flip a gate or a memory cell has been dangerously reduced, causing higher cosmic-radiations-induced soft error rate. Soft error threatens computing system...

    Provided By University of Tehran

  • White Papers // Jan 2012

    Enabling Application Resilience With and Without the MPI Standard

    As recent research has demonstrated, it is becoming a necessity for large scale applications to have the ability to tolerate process failure during an execution. As the number of processes increases, checkpoint/restart fault tolerance approaches requiring large concurrent state check-pointing become untenable and radically new methods to address fault tolerance...

    Provided By University of Tehran

  • White Papers // Jul 2010

    Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures

    This paper is to analyze the dynamic scheduling of dense linear algebra algorithms on shared-memory, multicore architectures. Current numerical libraries, e.g., LAPACK, show clear limitations on such emerging systems mainly due to their coarse granularity tasks. Thus, many numerical algorithms need to be redesigned to better fit the architectural design...

    Provided By University of Tehran

  • White Papers // Apr 2010

    DAGuE: A Generic Distributed DAG Engine for High Performance Computing

    The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures has been a tremendous task for the whole scientific computing community. The authors present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on...

    Provided By University of Tehran

  • White Papers // Apr 2010

    A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction

    The authors present new high performance numerical kernels combined with advanced optimization techniques that significantly increase the performance of parallel bi-diagonal reduction. Their approach is based on developing efficient fine-grained computational tasks as well as reducing overheads associated with their high-level scheduling during the so-called bulge chasing procedure that is...

    Provided By University of Tehran

  • White Papers // Oct 2011

    Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems

    This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms represent the present and the foreseeable future of high-performance computing. The authors' new QR factorization algorithm falls in the category of the tile algorithms which naturally enables...

    Provided By University of Tehran

  • White Papers // Apr 2013

    MuMMI: Multiple Metrics Modeling Infrastructure

    The MuMMI (Multiple Metrics Modeling Infrastructure) project is an infrastructure that facilitates systematic measurement, modeling, and prediction of performance, power consumption and performance-power tradeoffs for parallel systems. In this paper, the authors present the MuMMI framework, which consists of an Instrumentor, Databases and Analyzer. The MuMMI instrumentor provides for automatic...

    Provided By University of Tehran

  • White Papers // Oct 2012

    Performance Evaluation of LU Factorization Through Hardware Counter Measurements

    The growing demand for scalable and effective scientific and numerical libraries on multicore architectures forces hardware manufacturers to design solutions that improve both the processor speed and transfer rates between their memory hierarchies. Several studies show that these improvement factors are disproportionate and may vary widely from one architecture to...

    Provided By University of Tehran

  • White Papers // Aug 2012

    Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems

    In this paper, the authors analyze the potential of using weights for block-asynchronous relaxation methods on GPUs. For this purpose, they introduce different weighting techniques similar to those applied in block-smoothers for multigrid methods. For test matrices taken from the University of Florida Matrix Collection they report the convergence behavior...

    Provided By University of Tehran

  • White Papers // Dec 2011

    GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

    In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that are applied to leverage the computing power of SIMD accelerators like GPUs. Although they use a very different approach for this purpose, they share the basic idea of compensating the convergence behavior of an...

    Provided By University of Tehran

  • White Papers // May 2014

    Model and Complexity Results for Tree Traversals on Hybrid Platform

    The authors study the complexity of traversing tree-shaped work flows whose tasks require large I/O les. They target a heterogeneous architecture with two resources of different types, each equipped with its own memory, such as a multicore node equipped with a dedicated accelerator (FPGA or GPU). Tasks in the work...

    Provided By University of Tehran

  • White Papers // Aug 2010

    Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems

    In this paper the authors propose an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three sub-models to evaluate the cost of executing a thread schedule: an affinity-graph submodel, a memory hierarchy submodel, and a cost submodel that characterize...

    Provided By University of Tehran

  • White Papers // May 2007

    L2 Cache Modeling for Scientific Applications on Chip Multi-Processors

    It is critical to provide high performance for scientific applications running on Chip Multi-Processors (CMP). A CMP architecture often comprises a shared L2 cache and lower-level storages. The shared L2 cache can reduce the number of cache misses if the data are accessed in common by several threads, but it...

    Provided By University of Tehran

  • White Papers // Sep 2006

    Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors

    It is critical to provide high performance for scientific programs running on a Chip Multi-Processor (CMP). A CMP architecture often has a shared L2 cache and lower storage hierarchy. The shared L2 cache can reduce the number of cache misses if the data are commonly shared by several threads, but...

    Provided By University of Tehran

  • White Papers // May 2011

    High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures

    In this paper the authors present a new high performance Bidiagonal ReDuction (BRD) on homogeneous multicore architectures. This paper is an extension of the high performance tridiagonal reduction implemented by the same authors (IPDPS 2011) to the BRD case. The BRD is the first step toward computing the singular value...

    Provided By University of Tehran

  • White Papers // Feb 2009

    A GIS-Based Traffic Control Strategy Planning at Urban Intersections

    For having better and up to date traffic information access, spatio-temporal GIS for Transportation (TGIS-T) need to interact with the Intelligent Transport Systems (ITS). Advanced Traffic Management Systems (ATMS) is one of the components of the ITS and predicts traffic congestion and provides real time traffic information and optimal control...

    Provided By University of Tehran

  • White Papers // Mar 2009

    Online Network-on-Chip Switch Fault Detection and Diagnosis Using Functional Switch Faults

    This paper presents efficient methods for online fault detection and diagnosis of Network-on-Chip (NoC) switches. The fault model considered in this research is a system level fault model based on the generic properties of NoC switch functionality. The proposed method is evaluated by fault simulation in a platform using this...

    Provided By University of Tehran

  • White Papers // Mar 2009

    An Efficent Dynamic Multicast Routing Protocol for Distributing Traffic in NOCs

    Nowadays, in MPSoCs and NoCs, multicast protocol is significantly used for many parallel applications such as cache coherency in distributed shared-memory architectures, clock synchronization, replication, or barrier synchronization. Among several multicast schemes proposed in on chip interconnection networks, path-based multicast scheme has been proven to be more efficient than the...

    Provided By University of Tehran

  • White Papers // Jun 2009

    A New Model for Discovering XML Association Rules From XML Documents

    The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. This paper proposes a new model for the effective extraction of generalized association rules form a XML document collection. This...

    Provided By University of Tehran

  • White Papers // Jun 2009

    Approaches and Schemes for Storing DTD-Independent XML Data in Relational Databases

    The volume of XML data exchange is explosively increasing, and the need for efficient mechanisms of XML data management is vital. Many XML storage models have been proposed for storing XML DTD-independent documents in relational database systems. Benchmarking is the best way to highlight pros and cons of different approaches....

    Provided By University of Tehran

  • White Papers // Apr 2009

    Decision Making Under Uncertain And Risky Situations

    Decision Making is certainly the most important task of a manager and it is often a very difficult one. The domain of decision analysis models falls between two extreme cases. This depends upon the degree of knowledge we have about the outcome of any actions. One "Pole" on this scale...

    Provided By University of Tehran

  • White Papers // Feb 2010

    New Method To Evaluate Financial Performance Of Companies By Fuzzy Logic: Case Study, Drug Industry Of Iran

    Nowadays one of the main reasons that prevent investors from entering emerging markets is the uncertainty of these markets. To choose suitable portfolio in stock market, there are two common techniques; technical analysis and fundamental analysis. One of the most important parts of fundamental analysis is financial analysis. Different methods...

    Provided By University of Tehran

  • White Papers // Sep 2010

    Soft Real-Time Fuzzy Task Scheduling for Multiprocessor Systems

    All practical real-time scheduling algorithms in multiprocessor systems present a trade-off between their computational complexity and performance. In real-time systems, tasks have to be performed correctly and timely. Finding minimal schedule in multiprocessor systems with real-time constraints is shown to be NP-hard. Although some optimal algorithms have been employed in...

    Provided By University of Tehran

  • White Papers // Jun 2009

    Anomaly Detection Using Neuro Fuzzy System

    As the network based technologies become omnipresent, demands to secure networks/systems against threat increase. One of the effective ways to achieve higher security is through the use of Intrusion Detection Systems (IDS), which are a software tool to detect anomalous in the computer or network. In this paper, an IDS...

    Provided By University of Tehran

  • White Papers // Feb 2011

    Quasi-Optimal Network Utility Maximization for Scalable Video Streaming

    This paper addresses rate control for transmission of scalable video streams via Network Utility Maximization (NUM) formulation. Due to stringent QoS requirements of video streams and specific characterization of utility experienced by end-users, NUM formulation for these streams is nonconvex and even nonsmooth, hence making dual methods often incompetent to...

    Provided By University of Tehran

  • White Papers // Feb 2009

    Delay Performance Optimization for Multiuser Diversity Systems With Bursty-Traffic and Heterogeneous Wireless Links

    This paper presents a cross-layer approach for optimizing the delay performance of a multiuser diversity system with heterogeneous block-fading channels and a delay-sensitive bursty-traffic. The authors consider the downlink of a time-slotted multiuser system employing opportunistic scheduling with fair performance at the Medium ACcess (MAC) layer and Adaptive Modulation and...

    Provided By University of Tehran

  • White Papers // Dec 2010

    A Pattern Language for Software Debugging

    In spite of all advancement in software testing, debugging remains a labor-intensive, manual, time consuming, and error prone process. A candidate solution to enhance debugging process is to fuse it with testing process. To achieve this integration, a possible solution may be categorizing common software tests and errors followed by...

    Provided By University of Tehran

  • White Papers // Jun 2009

    Using Automatic Ontology Learning Methods in Human Plausible Reasoning Based Systems

    Human Plausible Reasoning (HPR) theory is based on the way human thinks and reasons. It says having a knowledge base of information, what kind of inference patterns plausibly human uses to reach to an answer. This theory mainly consists of two parts. In the first part it describes the kind...

    Provided By University of Tehran

  • White Papers // Jun 2009

    Using Dempster-Shafer Theory in XML Information Retrieval

    XML is a markup language which is becoming the standard format for information representation and data exchange. A major purpose of XML is the explicit representation of the logical structure of a document. Much research has been performed to exploit logical structure of documents in information retrieval in order to...

    Provided By University of Tehran