2014 Program

SELSE-10

SELSE is an informal workshop. To encourage widespread participation, authors are given the option (but are not required) to make their papers or presentations available on this web site. We thank all of the authors for their participation.
 

Day 1 – April 1st, 2014 – Stanford University

08:00 – 08:45 Breakfast and Registration
08:45 – 09:00 Welcome Remarks: SELSE General and Program Chairs
09:00 – 10:00 Session I: Keynote Speech (Chair: Dimitris Gizopoulos (University of Athens))
Resilience for ExaScale
Bill Dally (Nvidia / Stanford University)
 
10:00 – 10:20 Coffee Break
10:20 – 12:00 Session II: Error Characterization�(Chair: Helia Naeimi (Intel))
A Reference Design for Effective Characterization of Soft Error Vulnerability of Low Voltage Logic and Memory Circuits,
Robert Pawlowski (Oregon State University), Joseph Crop (Oregon State University), Minki Cho (Intel), James Tschanz (Intel), Vivek De (Intel), Shekhar Borkar (Intel), Thomas Fairbanks (Los Alamos National Laboratory), Heather Quinn (Los Alamos National Laboratory), Patrick Chiang (Oregon State University)
Effects of Single-Event Multiple-Transients on Logic-Sensitive Standard Cell Placement,
Bradley T. Kiddie (Vanderbilt University), William H. Robinson (Vanderbilt University)
An Automated Test Infrastructure for NBTI Effect Investigation and Calibration in digital integrated circuits,
Weiyi Qi (North Carolina State University), Eric J. Wyers (North Carolina State University), Zhuo Yan (North Carolina State University), Paul Franzon (North Carolina State University)
Single Event Effects in Muller C-Elements and Asynchronous Circuits Over a Wide Energy Spectrum,
Lorena Anghel (TIMA), Varadan Savulimedu Veeravalli (TU Vienna), Dan Alexandrescu (iRoC), Andreas Steininger (TU Vienna), Kerstin Schneider-Hornstein (TU Vienna), Enrico Costenaro (iRoC)
12:00 – 13:00 Lunch
13:00 – 14:15 Session III: Panel Discussion (Chair: Yanos Sazeides (University of Cyprus))
Topic: All the Reliability Issues Have Been Resolved for Late CMOS Technologies
Moderator: TBA
Panelists:
Norbert Seifert (Intel),
Charles Slayman (Cisco),
Vilas Sridharan (AMD),
Vikas Chandra (ARM)
14:15 – 15:30 Session IV: Poster Session 1 and Coffee Break�(Chair: Alan Wood (Oracle))

Experiences in Developing and Evaluating a Low-Cost Soft-Error-Tolerant Multicore Processor,

John Ingalls (Duke University), Adam Jacobvitz (Duke University), Patrick Eibl (Duke University), Michael Ansel (Duke University), Daniel Sorin (Duke University)

An Energy-Efficient Radiation Hardened Register File Architecture for Reliable Microprocessors,

Yang Lin, Mark Zwolinski and Basel Halak (University of Southampton)

Reducing the Code Degree Of Parallelism to Increase GPUs Reliability,

Paolo Rech (UFRGS), Luigi Carro (UFRGS)

Evaluating Application Resilience with XRay,

Sui Chen (Louisiana State University), Greg Bronevetsky (Lawrence Livermore National Lab), Bin Li (Louisiana State University), Marc Casas (Barcelona Supercomputing Center), Lu Peng (Louisiana State University)

Evaluating Software-Based Fault Detection Techniques Applied at Different Programming Language Abstraction Levels,

Eduardo Chielle (UFRGS), Jos Rodrigo Azambuja (FURG), Fernanda Lima Kastensmidt (UFRGS)

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture,

Jingwen Leng (University of Texas at Austin), Yazhou Zu (University of Texas at Austin), Vijay Janapa Reddi (University of Texas at Austin)

Controlled Approximation via Symptom Detection,

Daya S Khudia (University of Michigan), Scott Mahlke (University of Michigan)
15:30 – 16:45 Session V: Processor Systems(Chair: Vikas Chandra (ARM))
Soft Error Study of ARM SoC at 28 Nanometers,
Austin Lesea (Xilinx), Wojciech Koszek (Xilinx), Glenn Steiner (Xilinx), Gary Swift (Xilinx), Dagan White (Xilinx)
 
Performance Assessment of Data Prefetchers in High Error Rate Technologies,
Nikos Foutris (University of Athens), Dimitris Gizopoulos (University of Athens), Athanasios Chatzidimitriou (University of Athens), John Kalamatianos (AMD), Vilas Sridharan (AMD)
 
Resilience and Real-Time Constrained Energy Optimization in Embedded Processor Systems,
Liang Wang (IBM T. J. Watson Research Center and University of Virginia), Jude A. Rivers (IBM T. J. Watson Research Center), Meeta S. Gupta (IBM T. J. Watson Research Center), Augusto J. Vega (IBM T. J.Watson Research Center), Alper Buyuktosunoglu (IBM T. J. Watson Research Center), Pradip Bose (IBM T. J. Watson Research Center), and Kevin Skadron (University of Virginia)
 
16:45 – 17:05 Coffee Break
17:05 – 18:20 Session VI: Vulnerability Estimation�(Chair: Dan Alexandrescu (IROC Technologies))
A Cross-Layer Approach for Highly Accurate Dynamic Vulnerability Estimation,
Bagus Wibowo (North Carolina State University), Abhinav Agrawal (North Carolina State University), James Tuck (North Carolina State University)
 
Effective Statistical Estimation of Soft Error Vulnerability of Complex Designs,
Shahrzad Mirkhani (University of Texas at Austin), Subhasish Mitra (Stanford University), Chen-Yong Cher (IBM T. J. Watson Center), Jacob A. Abraham (University of Texas at Austin)
 
SDC virus: An Application for SER Model Validation,
Tanima Dey (University of Virginia), Steven E. Raasch (Intel), Jon Stephan (Intel), Arijit Biswas (Intel)
18:30 Reception and Banquet

_

Day 2 – April 2nd, 2014 – Stanford University

08:00 – 08:30 Breakfast
08:30 – 09:30 Session VII: Keynote Speech�(Chair: Mehdi Tahoori (Karlsruhe Institude of Technology))
Understanding the Impact of Silicon Errors on Functional Safety Standards Compliance
Karl Greb (Texas Instruments)
 
09:30 – 10:20 Session VIII: Error Mitigation (Chair: Nelson Tam (Marvell))
Algorithm for Fast Synthesis of Redundant Combinatorial Logic for Selective Fault Tolerance,
Hao Xie (University of Saskatchewan), Li Chen (University of Saskatchewan), Adrian Evans (iROC), Shi-Jie Wen (Cisco), Rick Wong (Cisco)
 
Error Resilience for Off-Chip High Bandwidth Die-Stacked DRAMs,
Xun Jian (University of Illinois at Urbana Champaign), Vilas Sridharan (AMD), Rakesh Kumar (University of Illinois at Urbana Champaign)
 
10:20 – 10:40 Coffee Break
10:40 – 11:55 Session IX: Radiation Effect Modeling and Analysis (Chair: Ishe Ibe (Hitachi))
IRT: A Modeling System for Single Event Upset Analysis that Captures Charge Sharing Effects,
Kerryann Foley (Intel), Norbert Seifert (Intel), Jyothi B. Velamala (Intel)
 
The ANITA neutron facility for SER testing: status and new developments,
Alexander V. Prokofiev (Uppsala University), Elke Passoth (Uppsala University), Anders Hjalmarsson (Uppsala University), Mitja Majerle (Nuclear Physics Institute of ASCR)
warn

Attention : Please note that the list of authors in the USB proceedings was incomplete. The full list of authors is: Alexander V. Prokofiev, Elke Passoth, Anders Hjalmarsson, and Mitja Majerle. We apologize for the error in the proceedings.

 
A Cross-Layer Approach for Radiation-Induced Soft Error Analysis of SRAMs in SOI FinFET Technology,
Saman Kiamehr (Karlsruhe Institute of Technology), Mehdi Tahoori (Karlsruhe Institute of Technology)
11:55 – 13:00 Lunch
13:00 – 14:00 Session X: Keynote Speech (Chair: Sarah Michalak (LANL))
Memory Errors and Mitigation
Tom Pawlowski (Micron)
14:00 – 15:15 Session XI: Poster Session 2 and Coffee Break (Chair: Vilas Sridharan (AMD))
FUSED: A Low-Cost Online Soft-Error Detector,
Vishal Chandra Sharma (University of Utah), Zvonimir Rakamaric (University of Utah), Ganesh Gopalakrishnan (University of Utah)
 
A Synthesis Tool for Designing Noise-Immune Circuits via Selectively-Reinforced Logic,
Marco Donato (Brown University), Iris Bahar (Brown University), Alexander Zaslavsky (Brown University), William Patterson (Brown University), Joseph Mundy (Brown University)
 
Diagnosis of SET Propagation in Combinational Logic under Dynamic Operation,
Varadan Savulimedu Veeravalli (TU Vienna), Andreas Steininger (TU Vienna)
 
Measuring Timing Errors in FPGA-based circuits,
Joshua Levine (Imperial College London), Edward Stott (Imperial College London), Nachiket Kapre (Nanyang Technological University)
 
Who’s Using that Memory? A Subscriber Model for Mapping Errors to Tasks,
Andreas Heinig (TU Dortmund), Florian Schmoll (TU Dortmund), Peter Marwedel (TU Dortmund), Michael Engel (TU Dortmund)
 
Minimization of SER-Induced Costs through Linear Programming,
Dan Alexandrescu (iROC), Adrian Evans (iRoC), Enrico Costenaro (iRoC)
15:15 – 16:30 Session XII: Errors in Memories (Chair: Nachiket Kapre (Nanyang Technological University))
Extra Bits on SRAM and DRAM Errors – More Data from the Field,
Nathan Debardeleben (Los Alamos National Laboratory), Sean Blanchard (Los Alamos National Laboratory), Vilas Sridharan (AMD), Sudhanva Gurumurthi (AMD), Jon Stearley (Sandia National Laboratories), Kurt Ferreira (Sandia National Laboratories), John Shalf (Lawrence Berkeley National Laboratory)
 
Scan-Based Soft Error Mitigation of Configuration Memory in Xilinx 7 Series FPGA Devices,
Eric Crabill (Xilinx), Paula Chang (Xilinx)
 
Measuring the Radiation Reliability of SRAM Structures in GPUs Designed for HPC,
Paolo Rech (UFRGS), Luigi Carro (UFRGS), Nicholas Wang (NVIDIA), Timothy Tsai (NVIDIA), Siva Kumar Sastry Hari (NVIDIA), Stephen W. Keckler (NVIDIA)
16:30 Closing Remarks

Resilience for ExaScale

Abstract:

Large scientific computers and data centers share a need for resilience. Systems with 10^5 nodes and failure rates of 1000 FITs/node experience failures every 10 hours but must provide application MTTI of weeks. To provide resilience in such systems we can exploit the properties of different components. Memory and communications channels can be inexpensively checked using codes while arithmetic operations are relatively inexpensive to duplicate. Exposing the resilience mechanisms to the software enables the application to describe what state needs to be preserved (vs. being easily reconstructed) and what operations need to be checked in hardware (vs. software). This talk will sketch the resilience issues for ExaScale systems and point out some open challenges.

billdally
Bio:

Bill is Chief Scientist and Senior Vice President of Research at NVIDIA Corporation and a Professor (Research) and former chair of Computer Science at Stanford University. Bill and his group have developed system architecture, network architecture, signaling, routing, and synchronization technology that can be found in most large parallel computers today. While at Bell Labs Bill contributed to the BELLMAC32 microprocessor and designed the MARS hardware accelerator. At Caltech he designed the MOSSIM Simulation Engine and the Torus Routing Chip which pioneered wormhole routing and virtual-channel flow control. At the Massachusetts Institute of Technology his group built the J-Machine and the M-Machine, experimental parallel computer systems that pioneered the separation of mechanisms from programming models and demonstrated very low overhead synchronization and communication mechanisms. At Stanford University his group has developed the Imagine processor, which introduced the concepts of stream processing and partitioned register organizations, the Merrimac supercomputer, which led to GPU computing, and the ELM low-power processor. Bill is a Member of the National Academy of Engineering, a Fellow of the IEEE, a Fellow of the ACM, and a Fellow of the American Academy of Arts and Sciences. He has received the ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, and the ACM Maurice Wilkes award. He currently leads projects on computer architecture, network architecture, circuit deisgn, and programming systems. He has published over 200 papers in these areas, holds over 90 issued patents, and is an author of the textbooks, Digital Design: A Systems Approach, Digital Systems Engineering, and Principles and Practices of Interconnection Networks.

Understanding the Impact of Silicon Errors on Functional Safety Standards Compliance

Abstract:

Recent years have seen increasing application of functional safety standards to semiconductor components. Unfortunately, semiconductor developers and functional safety engineers are rarely aligned in terminology or direction. This discussion will attempt to bridge the gap between the two domains by helping semiconductor developers understand how silicon errors are considered in current functional safety state-of-the-art.

karl greb
Bio:

Karl Greb is the chief safety architect for the SafeTI initiative at Texas Instruments. He has responsibility for functional safety evangelization across safety critical TI products, including analog and digital hardware, software, and tools in multiple end equipment applications.

Karl also has development responsibility for the safety architectures of the Hercules MCU product line. He represents TI and the US in the ISO 26262 working group (Functional Safety  Road Vehicles) and is currently leading the ISO 26262 semiconductor subgroup in development of guidelines to support development of compliant semiconductors.

Karl has previously worked in a variety of architecture, applications, and product test positions on MCU, ASIC, and DSP products. He is a 1998 graduate of Texas A&M University (B.S. Computer Engineering), a licensed Texas Professional Engineer (PE), and a Certified Functional Safety Expert (CFSE) in safety critical hardware development.

 

Memory Errors and Mitigation

Abstract:

Memory silicon comprises the majority of silicon area of most high-performance systems, for example, exceeding 98% of total silicon area in the next generation of Supercomputers. This talk will explore some error sources found in the latest generation DRAM and NAND device subsystems. It will demonstrate the wildly different magnitudes of the different sources of errors and the need for decoupling the error detection and correction mechanisms which deal with the different error sources. The general principles of useful error mitigation methods will be discussed. Examples will be taken from some of the most advanced available products such as Micron’s Hybrid Memory Cube.

pawlowski_ j_thomas_160x200
Bio:

J. Thomas Pawlowski is a Fellow and Chief Technologist with Micron’s Architecture Development Group. His responsibilities include evaluating new technologies/investments, exploring new memory and system architectures, and providing guidance to many technical teams, both internally and external to Micron.

Mr. Pawlowski’s experience includes the creation or co-creation of numerous groundbreaking memory architectures and concepts, such as: synchronous pipelined SRAM, hierarchical cache systems, Zero Bus Turnaround SRAM, the first double data rate memory (starting with SRAM and then its subsequent proliferation to DRAM and NAND technologies), PSRAM, high-speed NAND, the first double address rate memory, the first quad data rate memory, the first multi-channel memory, memories on SERDES buses, the first DRAM to exceed SRAM performance (RLDRAM), refresh and error correction schemes for memory subsystems, the architectural roots of Micron’s HMC device, the first dedicated hardware architecture of Micron’s newly announced nondeterministic Automata Processor, and other projects still in development.

Mr. Pawlowski earned a bachelor of applied science degree in electrical engineering, summa cum laude, from the University of Waterloo in Canada. He also holds approximately 150 U.S. and international patents and serves on several advisory boards, including the Exascale Grand Challenge EAB.

In his spare time, Mr. Pawlowski designs and builds loudspeakers and custom tools, and he has completed 60% of the design of a revolutionary electric car concept.