The School is based on the presentation of approximately 29 lectures and on 24 hours of related practical exercises on PCs or workstations.
The programme of the Schools is organised round four themes:
The following lectures are now confirmed. Any additional lectures will be announced later.
“Grid” computing has emerged as an
important new field, distinguished from conventional distributed computing by
its focus on large-scale resource sharing and innovative applications.
In this track, we provide an in-depth introduction to Grid technologies
and applications. We review the “Grid problem,” which we define as
flexible, secure, coordinated resource sharing among dynamic collections of
individuals, institutions, and resources—what we refer to as virtual
organizations. In such settings, we encounter unique authentication,
authorization, resource access, resource discovery, and other challenges.
It is this class of problem that is addressed by Grid technologies.
We present an extensible and open Grid architecture, in which protocols,
services, application programming interfaces, and software development kits are
categorized according to their roles in enabling resource sharing.
We review major Grid projects worldwide and describe how they are
contributing to the realization of this architecture.
Then, we describe specific Grid technologies in considerable detail,
focusing in particular on the Globus Toolkit and on Data Grid technologies being
developed by the EU Data Grid, GriPhyN, and PPDG projects in Europe and the U.S.
The hands-on exercises will give participants practical experience of the Globus
toolkit for basic Grid activities.
Lecturers: I. Foster, Argonne National Laboratory, C. Kesselman, University of South California
Lectures:
Lecture:
GC.1/L:
Introduction
to Grids
Lecture
GC.2/L:
Overview of the Globus toolkit
Lecture GC.3/L:
Globus components
Lecture GC.4/L:
Globus components
Lecture GC.5/L:
Globus
components
Lecture GC.6/L:
Globus
components
Lecture
GC.7/L:
Other
issues and future
Lecture
GC.8/L:
Wrap-up & Feedback session
Exercises:
Ex.
GC.1/E basic
job submission and monitoring getting to know the globus toolkit, simple job
submission and monitoring, use of GSI
Ex.
GC.2/E
basic job submission
Ex.
GC.3/E:
advanced jobs
Ex.
GC.4/E:
advanced jobs
Ex. GC.5/E:
project work - students work in groups on a mini-project using the Globus
toolkit and related software to solve a physics related problem. The students
should build on the knowledge gained in the lectures and previous exercises to
develop an application capable of solving a given physics problem
Ex.
GC.6/E:
project work
Ex.
GC.7/E:
project work
Ex.
GC.8/E:
project work
The track addresses the Information
Technology challenges encountered in the transformation of the raw data coming
from the HEP experiments into physics results. Particular attention will be
devoted to the problem and opportunities arising from the distributed
environment in which both the development of the programs and the analysis of
the data will take place. This novel situation calls for the application of
innovative technologies, both at the level of software engineering, computing
infrastructure and data processing. Software engineering has to be rethought in
the context of a highly dynamic environment, where the communication between the
different actors is mainly remote. Computing techniques, such as programming
language choice and adoption of advanced software tools have to support this
approach. Management, planning and evaluation of the performance of a project
has to be adapted to this situation of loosely coupled and dispersed human
resources, which is not uncommon in other fields, but extreme in HEP. Data
processing also has to adapt to this situation of distributed computing
resources, where the ultimate goal is to transparently access them without
having to confront the underlying dynamics and complexity of the system. These
lectures will address how the different leading edge technologies both in the
field of software engineering and of distributed computing can be successfully
applied to a large collaboration of non-professional but demanding intensive
computing users. Finally different solutions to the problem adopted by other
experiments will be presented.
Lecturer: F.
Carminati, CERN
From data to analysis
DP.1.1/L: Introduction
The challenges of data processing in an LHC
collaboration. Main parameters involved and order of magnitudes. Typical
dataflow. Structure and organisation of an Off-line project. Management and
sociological issues. Examples taken from LHC experiments will be presented.
Problems and issues of technology transition. Example the transition from
FORTRAN to C++.
DP.1.2/L: Software Development
Planning and organisation of the work in a
distributed environment. New trends
in software engineering and their application to HEP. Rapid prototyping and architectural design.
Software tools
DP.1.3/L: Offline Frameworks
Need of an offline framework. Definition of
an offline framework. Framework components and structure, layers and modularity.
Interfaces to external component and basic services. Evolution and maintenance
of a framework. Practical examples from LHC.
DP.1.4/L: Practical Use of Grid Technology
Basic problem of transparent access to large data samples. Solutions offered by GRID technology. Use of GRID technology in HEP, from dreams to reality. Current testbeds and first results. Practical examples from LHC.
Lecturers: P. Buncic, F. Rademakers, CERN
Distributed data handling,
processing and analysis
The problems and issues of handling
distributed data in typical HEP experiment. Access patterns. File catalogue vs
file system. Generic API for data access. Logical, physical and transport file
names. File catalogue
implementation (AliEn). Distributed
data processing and analysis. Introduction to the PROOF system that provides for
the distributed processing of very large collections of data. PROOF uses a
parallel architecture to achieve (near) interactive performance. Introduction to
the ROOT I/O system. Discussion of the PROOF three-tier parallel architecture.
Description of interface of PROOF to GRID (especially AliEn, see lecture 5).
Lectures:
DP.2.1/L:
Distributed
data handling and analysis.
DP.2.2/L:
Distributed
data processing and analysis
Exercises
(4 hrs):
DP.2.1/E,
DP.2.2/E, DP.2.3/E, DP.2.4/E
An introduction to the AliEn architecture. Using the AliEn API from C++. An introduction to PROOF and its use in the analysis of data created using the AliEn service
Lecturer: R.G. Jacobsen, University of California
Current approaches
The computing systems of several
currently-running experiments are described, with emphasis on their experience
in building, commissioning and operating them.
Several approaches to ongoing development of new experiments will be
described. The choices made will be discussed
DP.3.1/L: Experience with current approaches
The
development of modern distributed computing and complex data management systems,
such as exemplified by the GRID, relies increasingly on two components where
specific advances are necessary to satisfy these stringent requirements. These
two areas are Computer Security and Network Performance. This track addresses
each of them, in the form of two series of lectures, and via a selection of
topics at the forefront of the technology. The security part starts with
background knowledge and move to specific technologies such as cryptography,
authentication, and their use in the Grid context.
The Networking part focuses on two aspects that are of primary importance
in a Grid context: TCP/IP enhancements and network monitoring. The aim is to
present the fundamentals and the evolutions of the TCP/IP stack and to explore
advanced Network measurement and analysis tools and services for end-to-end
performance measurement and prediction.
Lecturer:
R.D. Cowles, SLAC
SN.1.1/L:
Your Workstation
Threats
·
Destruction
·
Modification
·
Embarrassment
Responsibilities
·
Backup & Virus protection
·
Patching and configuration management
·
Email security
SN.1.2/L:
Cryptography and PKI
Symmetric
and Asymmetric encryption
Public
Key Infrastructure
·
X.509 Certificates
·
Certificate Authorities
·
Registration Authority
·
Obtaining a certificate
·
Protecting your private key
SN.1.3/L:
Grid Security
Registering
your identity
Authentication
models
Authorization
to use resources
Proxy
Certificates and delegation
MyProxy
server
Community
Access services
Threats
Vulnerabilities
How
you can make the Grid more secure
Exercises
(2 hrs)
SN.1.1/E:
Generate a key pair;
Perform steps necessary to
send email that is signed and encrypted either using PGP or using X.509
certificates.
SN.1.2/E: Register with a MyProxy server and use a web Grid portal to submit a job for execution.
Lecturers:
P. Primet, ENS
High
performance Grid Networking
These lectures present the fundamentals of the TCP/IP stack and the
limits of the protocols to meet the network requirements of the Grid application
and middleware. The evolution of the network layer and of the transport layer
are examine in order to understand the tendencies in the high performance
networking. Emphasis is placed on the practices that permit end-to-end
performance measurement and improvement.
SN.2.1/L:
Grid Networks requirements. IP protocol. TCP protocol : main features,
limits
SN.2.2/L:
IP Service Differentiation - Elevated services - Non elevated services :
ABE, EDS, QBSS.
SN.2.3/L:
High Performance Transport protocol and TCP optimization
Exercises
(2 hrs)
SN.2.1/E:
Configure
and use tools and services for Grid status and networks performance measurement.
SN.2.2/E: Measure and understand end-to-end performance of TCP connections over different types of links.
This
track presents modern techniques for software design and modern tools for
understanding and improving existing software.
The emphasis will be placed on the large software projects and large
executables that are common in HEP. The
track will consist of lectures, exercises and discussions.
The first discussion session will occur after several hours of exercises
have been completed. The last discussion session will be held at the end of the
track.
The
first 3 lectures will cover software engineering, design, methodology and
testing. This is followed by three lectures on working with large software
systems, including methods for analysing their structure and improving it.
The final 2 lectures will focus on the tools that are commonly used in
software design and testing
In the exercise sessions, the students will have a
chance to use the tools that are described in the lectures.
They will work with CVS and configuration management tools.
They will be asked to use the test and debugging tools on some simple
examples. By showing how these
tools can locate known problems, students will learn how to use them on new
problems. Students will then be
given a functional program and a brief description of what it does. The goal is to extend the program to handle a larger problem
domain. It is expected that the
example programs and exercises will be primarily in C++.
Lecturers:
R.G. Jacobsen, University of California, R. Jones, CERN
Software Engineering
An
introduction to the principles of Software Engineering, with emphasis on what we
know about building large software systems for high-energy physics.
These lectures cover the principles of software engineering, design,
methodology and testing.
TM.1.1/L:
Introduction to Software Engineering
TM.1.2/L:
Software Design
TM.1.3/L:
Long-term
Issues of Software Building
TM.2.1/L:
Static
code analysis, slicing
Program slicing is a static analysis technique that extracts from a
program the statements relevant to a particular computation. Informally, a slice
provides the answer to the question "What program statements potentially
affect the computation of variable v at statement s?" Programmers are known
to formulate questions of this kind when performing activities such as program
understanding and debugging. In
this lecture, the basic notions of program dependences will be introduced, so as
to allow a formal definition of the program slicing problem. A program slicing
algorithm will be then described. Finally, some variants of slicing and the
available tools will be presented.
TM.2.2L:
Reverse
Engineering
During software evolution, knowledge about the high level organization of
the system is important. In fact, it can help locating the focus of the change
and hypothesizing ripple effects. Often, available architectural views do not
accurately reflect the existing system. Their automatic extraction is thus
desirable. In this lecture, reverse engineering techniques based on the
specification of architectural patterns will be presented. The validation of the
extracted model through the reflection method is then described. Finally,
dynamic approaches to the identification of functionalities within components
will be considered.
TM2.3/L:
Restructuring
Exercises:
(2 hrs)
TM.2.1/E
and TM.2.2./E:
Introductory
work on analysis and re-engineering
Automation in code analysis and restructuring is fundamental in making
the techniques studied from a theoretical point of view usable in practice.
Among the available tools, during the exercises the tool TXL (http://www.txl.ca/)
will be used. It supports code transformation and analysis and it comes with
grammars for several widely used programming languages.
Exercises will focus on the implementation of some simple code transformations based on the restructuring techniques presented during the theoretical lectures. Basic refactoring operations for object oriented systems such as moving methods, replacing variables and renaming entities will be considered.
Lecturer: Robert G. Jacobsen,
University of California
Tools
and Techniques
These lectures present tools and techniques that are valuable when developing software for high energy physics. We discuss how to work more efficiently while still creating a high quality product that your colleagues will be happy with. The exercises provide practice with each of the tools and techniques presented, and culminate in a small project.
Lectures:
TM.3.1/L:
Tools
TM.3.2/L :
Techniques
Exercises(4
hrs) :
TM.3.1/E
TM.3.2/E
TM.3.3/E
TM.3.4/E
TM.3.5/E
TM.3.6/E
Project combining all three parts of the track
Predrag
Buncic
Predrag
Buncic obtained a degree in physics from Zagreb University in 1989. Then he
worked on tracking algorithms for the NA35 experiment and obtained a master
degree in particle physics from Belgrade University in 1994. In the period
1995-1999 he worked for the NA49 experiment on development of a persistent,
object-oriented I/O system and data manager (DSPACK) designed to handle data
volume on 100TB scale, and coordinated the NA49 computing efforts at CERN.
At present he works for the Institute fuer Kernphysik, Frankfurt in the Alice
experiment on the Alice production environment (AliEn). He is section leader of
the database section in the Alice Offline Team.
Federico
Carminati
Federico
Carminati obtained an Italian doctor’s degree in High Energy Physics at the
University of Pavia in 1981. After working as an experimental physicist at CERN,
Los Alamos and CalTech, he was hired at CERN were he has been responsible for
the development and support of the CERN Program Library and the GEANT3 detector
simulation MonteCarlo. From 1994 to 1998 he participated in the design of the
Energy Amplifier under the guidance of Prof. C.Rubbia (1984 Nobel Physics
Laureate) in the development of innovative MonteCarlo techniques for the
simulation of accelerator driven fission machines, and of the related fuel
cycle. In January 1998 he joined the ALICE collaboration at LHC assuming the
leadership of the ALICE software and computing project. Since January 2001 he
holds the position of Work Package Manager in the European DataGRID project. He
is responsible for the High Energy Physics Application Work Package whose aim is
to deploy large scale distributed HEP applications using the GRID technology.
Robert
Cowles
With
more than 30 years of experience in computing and as the Computer Security
Officer at SLAC, the lecturer can ground the more abstract discussions with
practical, real-world examples. In addition to seminars in the US and Europe, he
has taught regular classes on Internet and web security for the University of
California and Hong Kong University. Education:
BS Physics from University of Kansas, 1969; MS Computer Science from Cornell
University, 1971.
Ian
Foster
Dr.
Ian Foster is Senior Scientist and Associate Director of the Mathematics and
Computer Science Division at Argonne National Laboratory, Professor of Computer
Science at the University of Chicago, and Senior Fellow in the Argonne/U.Chicago
Computation Institute. He currently
co-leads the Globus project with Dr. Carl Kesselman of USC/ISI as well as a
number of other major Grid initiatives, including the DOE-funded Earth System
Grid and the NSF-funded GriPhyN and GRIDS Center projects.
He co-edited the book “The Grid: Blueprint for a New Computing
Infrastructure".
Bob
Jacobsen
Bob
Jacobsen is an experimental high-energy physicist and a faculty member at the
University of California, Berkeley. He's
a member of the BaBar collaboration, where he led the effort to create the
reconstruction software and the offline system.
He has previously been a member of the ALEPH (LEP) and MarkII (SLC)
collaborations. His original academic training was in computer engineering, and
he worked in the computing industry before becoming a physicist.
Bob
Jones
After studying computer science at university
Bob joined CERN and has been working on online systems for the LEP and LHC
experiments. Databases communication systems graphical user interfaces and the
application of these technologies to data acquisition systems was the basis of
his thesis. He is currently responsible for the control and configuration
sub-system of the ATLAS data acquisition prototype project.
Carl
Kesselman
Dr. Carl Kesselman is a Senior Project Leader at the University of Southern California's Information Sciences Institute and a Research Associate Professor of Computer Science, also at the University of Southern California. Prior to joining USC, Dr. Kesselman was a Member of the Beckman Institute and a Senior Research Fellow at the California Institute of Technology. He holds a Ph.D. in Computer Science from the University of California at Los Angles. Dr. Kesselman's research interests are in high-performance distributed computing, or Grid Computing. He is the Co-leader of the Globus project, and along with Dr. Ian Foster, edited a widely referenced text on Grid computing.
Pascale Primet
Pascale
Primet is an assistant professor in Computer Sciences. She has been giving
lectures in Advanced Networks, Quality of Service and Operating System for more
than ten years; member of the INRIA Reso project. She is Manager of the Work
Package Network (WP7) of the EU DataGRID project and scientific coordinator of
the French Grid project E-TOILE.
Fons
Rademakers
Fons
Rademakers received a Ph.D. in particle physics from the University of
Amsterdam. Since 1990 he is working on large-scale data analysis systems at
CERN. He is one of the main authors of the PAW and ROOT data analysis frameworks
and since July 2000 he works in the offline computing group of the ALICE
collaboration where he is in charge of the framework development.
Paolo
Tonella
Paolo
Tonella received his laurea degree cum laude in Electronic Engineering from the
University of Padua, Italy, in 1992, and his PhD degree in Software Engineering
from the same University, in 1999, with the thesis "Code Analysis in
Support to Software Maintenance". Since
1994 he has been a full time researcher of the Software Engineering group at
IRST (Institute for Scientific and Technological Research), Trento, Italy. He
participated in several industrial and European Community projects on software
analysis and testing. He is now the technical person responsible for a project
with the Alice, ATLAS and LHCb experiments at CERN on the automatic verification
of coding standards and on the extraction of high level UML views from the code.
In 2000-2001 he gave a course on Software Engineering at the University
of Brescia. Now he teaches Software
Analysis and Testing at the University of Trento. His current research interests
include reverse engineering, object oriented programming, web applications and
static code analysis.
Web: Pietro Paolo Martucci
Last update:
21 February 2002