SC 2008: Austin, Texas, USA

Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2008, November 15-21, 2008, Austin, Texas, USA. IEEE/ACM 2008, ISBN 978-1-4244-2835-9

Papers

Kevin J. Barker, Kei Davis, Adolfy Hoisie, Darren J. Kerbyson, Michael Lang, Scott Pakin, José Carlos Sancho:
Entering the petaflop era: the architecture and performance of Roadrunner. 1
Naga K. Govindaraju, Brandon Lloyd, Yuri Dotsenko, Burton Smith, John Manferdelli:
High performance discrete Fourier transforms on graphics processors. 2
Wei-keng Liao, Alok N. Choudhary:
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. 3
Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David A. Patterson, John Shalf, Katherine A. Yelick:
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. 4
Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka:
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA. 5
Philip H. Carns, Bradley W. Settlemyer, Walter B. Ligon III:
Using server-to-server communication in parallel file systems to simplify consistency and improve performance. 6
Subhash Saini, Dale Talcott, Dennis C. Jespersen, M. Jahed Djomehri, Haoqiang Jin, Rupak Biswas:
Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers. 7
James C. Phillips, John E. Stone, Klaus Schulten:
Adapting a message-driven parallel application to GPU-accelerated clusters. 8
Arifa Nisar, Wei-keng Liao, Alok N. Choudhary:
Scaling parallel I/O performance through I/O delegate and caching system. 9
Vlad Nae, Alexandru Iosup, Stefan Podlipnig, Radu Prodan, Dick H. J. Epema, Thomas Fahringer:
Efficient management of data center resources for massively multiplayer online games. 10
Takeshi Yoshino, Yutaka Sugawara, Katsushi Inagami, Junji Tamatsukuri, Mary Inaba, Kei Hiraki:
Performance optimization of TCP/IP over 10 gigabit ethernet by precise instrumentation. 11
Mathieu Luisier, Gerhard Klimeck:
A multi-level parallel simulation approach to electron transport in nano-scale transistors. 12
Sang-Min Park, Marty Humphrey:
Feedback-controlled resource sharing for predictable eScience. 13
Nageswara S. V. Rao, Weikuan Yu, William R. Wing, Stephen W. Poole, Jeffrey S. Vetter:
Wide-area performance profiling of 10GigE and InfiniBand technologies. 14
Philip Sternberg, Esmond G. Ng, Chao Yang, Pieter Maris, James P. Vary, Masha Sosonkina, Hung Viet Le:
Accelerating configuration interaction calculations for nuclear structure. 15
Andrew Mutz, Richard Wolski:
Efficient auction-based grid reservations using dynamic programming. 16
Thomas Scogland, Pavan Balaji, Wu-chun Feng, G. Narayanaswamy:
Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation. 17
Rahul S. Sampath, Santi S. Adavani, Hari Sundar, Ilya Lashuk, George Biros:
Dendro: parallel algorithms for multigrid and AMR methods on 2: 1 balanced octrees. 18
Kurt B. Ferreira, Patrick G. Bridges, Ron Brightwell:
Characterizing application sensitivity to OS interference using kernel-level noise injection. 19
Ryutaro Susukita, Hisashige Ando, Mutsumi Aoyagi, Hiroaki Honda, Yuichi Inadomi, Koji Inoue, Shigeru Ishizuki, Yasunori Kimura, Hidemi Komatsu, Motoyoshi Kurokawa, Kazuaki Murakami, Hidetomo Shibamura, Shuji Yamamura, Yunqing Yu:
Performance prediction of large-scale parallell system and application using macro-level simulation. 20
Jun Qin, Thomas Fahringer:
A novel domain oriented approach for scientific grid workflow composition. 21
Ioan Raicu, Zhao Zhang, Michael Wilde, Ian T. Foster, Peter H. Beckman, Kamil Iskra, Ben Clifford:
Toward loosely coupled programming on petascale systems. 22
Sadaf R. Alam, Richard F. Barrett, M. Bast, Mark R. Fahey, Jeffery A. Kuehn, Collin McCurdy, J. Rogers, Philip C. Roth, Ramanan Sankaran, Jeffrey S. Vetter, Patrick H. Worley, Weikuan Yu:
Early evaluation of IBM BlueGene/P. 23
David Abramson, Colin Enticott, Ilkay Altintas:
Nimrod/K: towards massively parallel dynamic grid workflows. 24
Ron Brightwell, Kevin T. Pedretti, Trammell Hudson:
SMARTMAP: operating system support for efficient data sharing among processes on a multi-core processor. 25
Gregory L. Lee, Dong H. Ahn, Dorian C. Arnold, Bronis R. de Supinski, Matthew Legendre, Barton P. Miller, Martin Schulz, Ben Liblit:
Lessons learned at 208K: towards debugging millions of cores. 26
Marek Wieczorek, Stefan Podlipnig, Radu Prodan, Thomas Fahringer:
Applying double auctions for scheduling of workflows on the Grid. 27
Mahmut T. Kandemir, Feihui Li, Mary Jane Irwin, Seung Woo Son:
A novel migration-based NUCA design for chip multiprocessors. 28
Laura Grigori, James Demmel, Hua Xiang:
Communication avoiding Gaussian elimination. 29
Liqun Cheng, John B. Carter:
Extending CC-NUMA systems to support write update optimizations. 30
Vasily Volkov, James Demmel:
Benchmarking GPUs to tune dense linear algebra. 31
Hans Eberle, Pedro Javier García, Jose Flich, José Duato, Robert Drost, Nils Gura, David Hopkins, Wladek Olesinski:
High-radix crossbar switches enabled by proximity communication. 32
Heshan Lin, Pavan Balaji, Ruth Poole, Carlos P. Sosa, Xiaosong Ma, Wu-chun Feng:
Massively parallel genomic sequence search on the Blue Gene/P architecture. 33
Lorin Hochstein, Forrest Shull, Lynn B. Reid:
The role of MPI in development time: a case study. 34
Changjun Wu, Ananth Kalyanaraman:
An efficient parallel approach for identifying protein families in large-scale metagenomic data sets. 35
Alejandro Duran, Julita Corbalán, Eduard Ayguadé:
An adaptive cut-off for task parallelism. 36
Christopher L. Barrett, Keith R. Bisset, Stephen Eubank, Xizhou Feng, Madhav V. Marathe:
EpiSimdemics: an efficient algorithm for simulating the spread of infectious disease over large realistic social networks. 37
Timothy G. Mattson, Rob F. Van der Wijngaart, Michael A. Frumkin:
Programming the Intel 80-core network-on-a-chip terascale processor. 38
Mohammad Banikazemi, Dan E. Poff, Bülent Abali:
PAM: a novel performance/power aware meta-scheduler for multi-core systems. 39
Yong Chen, Surendra Byna, Xian-He Sun, Rajeev Thakur, William Gropp:
Hiding I/O latency with pre-execution prefetching for parallel applications. 40
Carlos Boneti, Roberto Gioiosa, Francisco J. Cazorla, Mateo Valero:
A dynamic scheduler for balancing HPC applications. 41
Hongzhang Shan, Katie Antypas, John Shalf:
Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. 42
Chao Wang, Frank Mueller, Christian Engelmann, Stephen L. Scott:
Proactive process-level live migration in HPC environments. 43
Surendra Byna, Yong Chen, Xian-He Sun, Rajeev Thakur, William Gropp:
Parallel I/O prefetching using MPI file caching and I/O signatures. 44
Gilles Fedak, Haiwu He, Franck Cappello:
BitDew: a programmable environment for large-scale data management and distribution. 45
Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed:
Scalable load-balance measurement for SPMD codes. 46
Gaurav Khanna, Ümit V. Çatalyürek, Tahsin M. Kurç, Rajkumar Kettimuthu, P. Sadayappan, Ian T. Foster, Joel H. Saltz:
Using overlays for efficient data transfer over shared wide-area networks. 47
Hongfeng Yu, Chaoli Wang, Kwan-Liu Ma:
Massively parallel volume rendering using 2-3 swap image compositing. 48
Kevin A. Huck, Oscar Hernandez, Van Bui, Sunita Chandrasekaran, Barbara M. Chapman, Allen D. Malony, Lois C. McInnes, Boyana Norris:
Capturing performance knowledge for automated analysis. 49
Ewa Deelman, Gurmeet Singh, Miron Livny, G. Bruce Berriman, John Good:
The cost of doing science on the cloud: the Montage example. 50
Oliver Rübel, Prabhat, Kesheng Wu, Hank Childs, Jeremy Meredith, Cameron G. R. Geddes, Estelle Cormier-Michel, Sean Ahern, Gunther H. Weber, Peter Messmer, Hans Hagen, Bernd Hamann, E. Wes Bethel:
High performance multivariate visual data exploration for extremely large data. 51
Emma S. Buneci, Daniel A. Reed:
Analysis of application heartbeats: learning structural and temporal features in time series data for identification of performance problems. 52
Aameek Singh, Madhukar R. Korupolu, Dushmanta Mohapatra:
Server-storage virtualization: integration and load balancing in data centers. 53
Steven W. Schlosser, Michael P. Ryan, Ricardo Taborda-Rios, Julio López, David R. O'Hallaron, Jacobo Bielak:
Materialized community ground models for large-scale earthquake simulation. 54
Lakshminarayanan Renganarayanan, Sanjay V. Rajopadhye:
Positivity, posynomials and tile size selection. 55
Tiankai Tu, Charles A. Rendleman, David W. Borhani, Ron O. Dror, Justin Gullingsrud, Morten Ø. Jensen, John L. Klepeis, Paul Maragakis, Patrick Miller, Kate A. Stafford, David E. Shaw:
A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories. 56
D. Brian Larkins, James Dinan, Sriram Krishnamoorthy, Srinivasan Parthasarathy, Atanas Rountev, P. Sadayappan:
Global trees: a framework for linked data structures on distributed memory parallel systems. 57
Yinglong Xia, Viktor K. Prasanna:
Parallel exact inference on the cell broadband engine processor. 58
Ozcan Ozturk, Seung Woo Son, Mahmut T. Kandemir, Mustafa Karaköy:
Prefetch throttling and data pinning for improving performance of shared caches. 59

ACM Gordon Bell finalists

Laura Carrington, Dimitri Komatitsch, Michael Laurenzano, Mustafa M. Tikir, David Michéa, Nicolas Le Goff, Allan Snavely, Jeroen Tromp:
High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62K processors. 60
Gonzalo Alvarez, Michael S. Summers, Don E. Maxwell, Markus Eisenbach, Jeremy S. Meredith, Jeffrey M. Larkin, John M. Levesque, Thomas A. Maier, Paul R. C. Kent, Eduardo F. D'Azevedo, Thomas C. Schulthess:
New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-T_c superconductors. 61
Carsten Burstedde, Omar Ghattas, Michael Gurnis, Georg Stadler, Eh Tan, Tiankai Tu, Lucas C. Wilcox, Shijie Zhong:
Scalable adaptive mantle convection simulation on petascale supercomputers. 62
Kevin J. Bowers, Brian J. Albright, Benjamin Bergen, Lin Yin, Kevin J. Barker, Darren J. Kerbyson:
0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner. 63
Sriram Swaminarayan, Kai Kadau, Timothy C. Germann, Gordon C. Fossum:
369 Tflop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer. 64
Lin-Wang Wang, Byounghak Lee, Hongzhang Shan, Zhengji Zhao, Juan C. Meza, Erich Strohmaier, David H. Bailey:
Linearly scaling 3D fragment method for large-scale electronic structure calculations. 65