PRE-

PRE-SYMPOSIUM TUTORIALS , Tuesday, August 3, 1999

You may select the all-day tutorial; or one tutorial from the morning session and/or one from the afternoon session. Each tutorial registration fee includes attendance at the tutorial session and materials. There are no student fees for the tutorials. Cancellations of tutorial registrations made after July 23, 1999 will be subject to the total fee. We reserve the right to cancel the tutorials due to insufficient participation or other unforeseeable problems, in which case tutorial fees will be refunded in full.

TUESDAY, AUGUST 3, 1999

Coral Foyer

7:00 a.m. - 6:00 p.m. Registration

Peninsula Room

9:00 AM - 4:30 PM - Full Day Tutorial

Tutorial 1: Cryptography, Security and Privacy

Presenter:
Charlie Catlett, National Center for Supercomputer Applications
This tutorial provides an overview of the basic elements of computer security and privacy, including the building blocks, cryptographic technologies, and protocols used to construct secure and private services and systems. An overview of existing and emerging technologies and implementations of secure and private systems will be given. This will include both current practice and technologies (Java, Kerberos, PGP, SSL, etc.) and their application in the real world for secure computing as well as newer capabilities (digital cash, digital signatures) supporting commerce on the Internet.

Prerequisite: Participants should be familiar with networked computing (the Internet, client/server applications, etc.) as well as basic mathematics and computer programming. This basic background information is essential to anyone involved in the Internet today, including technical staff as well as executives.

Tutorials 2,3,4,5,6 (half-day) each

8:30 - 12:00 PM

Catalina Room

Tutorial 2: Performance Analysis and Prediction of Large-Scale Scientific Applications

Presenters:
Adolfy Hoisie, hoisie@lanl.gov ,Los Alamos National Lab
Harvey Wasserman, hjw@lanl.gov, Los Alamos National Lab
Performance analysis techniques such as analytical modeling, simulation, and queuing theory can all offer significant insight into performance issues beyond "religious" considerations related to their usefulness. However, very few of these techniques ever make it into the toolkit of the application developer, due to their complexity, cumbersome usage, and/or limited direct value of the feedback provided. This is why a methodical but simplified approach to performance analysis is in order. Offering such methodology is our main goal in this tutorial.

We will begin with definitions (weak scalability, strong scalability, parallel efficiency, etc.) and a short overview of the performance analysis techniques mentioned above. We will then introduce rigorous metrics for performance, both serial and parallel. Performance expectations at a coarse level will be emphasized using examples. We will discuss, in detail, the single most important bottleneck in single-processor performance - the memory subsystem. We will demonstrate how users can obtain diagnostic

information about memory performance of their codes and how such information can help predict achievable single-processor performance.

With performance goals properly defined, we will offer a discussion of commonly and not-so-commonly utilized techniques for performance optimization of Fortran codes. Serial and parallel performance optimization will be analyzed.

We will then discuss topics related to analytical modeling of performance and scalability of large-scale applications. We will adopt a top-down approach in which computation and communication components are analyzed separately and any overlap between them is

considered. We will be careful to differentiate between algorithmic scalability and "real" scalability, where the latter takes into consideration constraints from a specific implementation. Codes from the ASCI workload will be utilized as examples throughout the lecture.

The tutorial will not emphasize any particular machine; rather it will generally address performance of application on RISC processors and on widely utilized parallel systems such as the SGI Origin 2000, IBM SP2 and Cray T3E.

The target audience is a mixture of computational scientists, computer scientists, and code developers interested in performance analysis of "real-life" applications. By carefully defining terms and metrics, we fully expect to overcome the "lingo" barrier associated with a diverse audience, while providing an in-depth understanding of the issues in a manner relevant to all backgrounds. The tutorial will also be useful to those trying to define future-generation, high-end computing needs.

Esplanade Room

Tutorial 3: The Globus Grid Programming Toolkit

Presenters:
Ian Foster, foster@mcs.anl.gov, Argonne National Lab
Carl Kesselman, carl@isi.edu, Information Sciences Institute/University of Southern California
Gregor von Laszewski, gregor@mcs.anl.gov, Argonne National Laboratory
Steven Fitzgerald, steve@isi.edu, Information Sciences Institute/University of Southern California
This tutorial is a introduction to the capabilities of the Globus grid programming toolkit. Computational grids promise to enable a wide range of emerging application concepts such as remote computing, distributed supercomputing, tele-immersion, smart instruments, and data mining. However, the development and use of such applications is

in practice difficult and time consuming, because of the need to deal with complex and highly heterogeneous systems. The Globus grid programming toolkit is designed to help application developers and tool builders overcome these obstacles to the construction of "grid-enabled" scientific and engineering applications. It does this by providing a set of standard services for authentication, resource location, resource allocation, configuration, communication, file access, fault detection, and executable management. These services

can be incorporated into applications and/or programming tools in a "mix-and-match" fashion to provide access to needed capabilities.

Our goal in this tutorial is both to introduce the capabilities of the Globus toolkit and to help attendees apply Globus services to their own applications. Hence, we will structure the tutorial as a combination of Globus system description and application examples.

Marina Room

Tutorial 4: Cluster Computing: The Commodity Supercomputing

Presenter:
Rajkumar Buyya, rajkumar@dgs.monash.edu.au ,Monash University
The availability of high-speed networks and increasingly powerful commodity microprocessors are making the usage of clusters, or networks, of computers an appealing vehicle for cost effective parallel computing. Clusters, built using commodity-of-the-shelf (COTS) hardware components as well as free, or commonly used, software, are playing a major role in redefining the concept of supercomputing. In this paper, we discuss the motivation for the transition from using dedicated parallel supercomputers, to COTS-based cluster supercomputers. We also describe the enabling technologies and then present a number of case studies of cluster-based projects to support our discussion. Finally, we summarise our findings and draw a number of conclusions relating to the usefulness and likely future of cluster computing.

The question naturally arises: How does Clusters, redefine concepts of traditional supercomputing ?; How is this different from traditional supercomputing or MPP computing?; Is this offers a completely different programming paradigm?; How one can make a Cluster based Supercomputer and what are its implications? This tutorial offers answer to all these questions and will also go beyond the hype.

1:30 - 5:00 PM

Catalina Room

Tutorial 5: Distributed Systems Performance Analysis Using Net Logger and Pablo

Presenters:
Brian L. Tierney, bltierney@lbl.gov, Lawrence Berkeley National Laboratory
Ruth A. Aydt, aydt@uiuc.edu, University of Illinois
As the computational environment for an application migrates from a single processor, to a group of machines on a local network, to the national or international grid, the issues impacting application performance become increasingly more complex. Performance analysis tools in such distributed environments must provide integrated information about the dynamic state of the computing infrastructure, as well as feedback on application execution behavior.

In this tutorial, we will present the NetLogger and Pablo toolkits which are both targeted toward understanding and improving the performance of applications in distributed computing environments. The components of each toolkit will be covered, along with case studies showing their use with actual distributed applications.

Participants will gain an understanding of the approaches taken by the two toolkits, become familiar with the capabilities provided by each, and be equipped to assess how they might use the toolkits to improve performance in their own computing environments.

Marina Room

Tutorial 6: High-Performance Computing with Legion

Presenter:
Andrew S. Grimshaw, grimshaw@cs.uva.edu, University of Virginia
Developed at the University of Virginia, Legion is an integrated software system for distributed parallel computation. While fully supporting existing codes written in MPI and PVM, Legion provides features and services that allow users to take advantage of

much larger, more complex resource pools. With Legion, for example, a user can easily run a computation on a supercomputer at a national center while dynamically visualizing the results on a local machine. As another example, Legion makes it trivial to schedule and run a large parameter space study on several workstation farms simultaneously. Legion permits computational scientists to use cycles wherever they are, allowing bigger jobs to run in shorter times through higher degrees of parallelization.

Key capabilities include the following:

- Legion eliminates the need to move and install binaries manually on multiple platforms. After Legion schedules a set of tasks over multiple remote machines, it

automatically transfers the appropriate binaries to each host. A single job can run on multiple heterogeneous architectures simultaneously; Legion will ensure that

the right binaries go to each, and that it only schedules onto architectures for which it has binaries.

- Legion provides a virtual file system that spans all the machines in a Legion system. Input and output files can be seen by all the parts of a computation, even when the computation is split over multiple machines that don't share a common file system. Different users can also use the virtual file system to collaborate, sharing data files and even accessing the same running computations.

- Legion's object-based architecture dramatically simplifies building add-on tools for tasks such as visualization, application steering, load monitoring, and job migration.

- Legion provides optional privacy and integrity of communications for applications distributed over public networks. Multiple users in a Legion system are protected from one another.

These features also make Legion attractive to administrators looking for ways to increase and simplify the use of shared high-performance machines. The Legion implementation emphasizes extensibility, and multiple policies for resource use can be embedded in a single Legion system that spans multiple resources or even administrative domains.

This tutorial will provide background on the Legion system and teach how to run existing parallel codes within the Legion environment. The target audience is supercomputing experts who help scientists and other users get their codes parallelized and running on high performance systems.

6:00 - 7:30 p.m. Evening Reception <Seascape Room>