CERN School of Computing 2002

Scientific programme

The School is based on the presentation of approximately 29 lectures and on 24 hours of related practical exercises on PCs or workstations.

The programme of the Schools is organised round four themes:



Preliminary Scientific Programme 2002

The following lectures are now confirmed. Any additional lectures will be announced later.


Grid Computing

“Grid” computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing and innovative applications.  In this track, we provide an in-depth introduction to Grid technologies and applications.  We review the “Grid problem,” which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources—what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges.  It is this class of problem that is addressed by Grid technologies.  We present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing.  We review major Grid projects worldwide and describe how they are contributing to the realization of this architecture.  Then, we describe specific Grid technologies in considerable detail, focusing in particular on the Globus Toolkit and on Data Grid technologies being developed by the EU Data Grid, GriPhyN, and PPDG projects in Europe and the U.S. The hands-on exercises will give participants practical experience of the Globus toolkit for basic Grid activities.  

Lecturers: I. Foster, Argonne National Laboratory, C. Kesselman, University of South California

 

Grid Technologies and Applications

Lectures:

Lecture: GC.1/L:             Introduction to Grids

Lecture GC.2/L:             Overview of the Globus toolkit

Lecture GC.3/L:            Globus components

Lecture GC.4/L:            Globus components

Lecture GC.5/L:            Globus components

Lecture GC.6/L:            Globus components

Lecture GC.7/L:            Other issues and future

Lecture GC.8/L:            Wrap-up & Feedback session

Exercises:       

Ex. GC.1/E        basic job submission and monitoring getting to know the globus toolkit, simple job submission   and monitoring, use of GSI 

Ex. GC.2/E       basic job submission

Ex. GC.3/E:      advanced jobs exploring the MDS info. service for SEs and Ces submitting more complicating jobs making use of the replica catalog,and MPI/MPICH-G, etc

Ex. GC.4/E:      advanced jobs

Ex. GC.5/E:      project work - students work in groups on a mini-project using the Globus toolkit and related software to solve a physics related problem. The students should build on the knowledge gained in the lectures and previous exercises to develop an application capable of solving a given physics problem

Ex. GC.6/E:      project work

Ex. GC.7/E:      project work

Ex. GC.8/E:      project work


From Detectors to Physics papers

The track addresses the Information Technology challenges encountered in the transformation of the raw data coming from the HEP experiments into physics results. Particular attention will be devoted to the problem and opportunities arising from the distributed environment in which both the development of the programs and the analysis of the data will take place. This novel situation calls for the application of innovative technologies, both at the level of software engineering, computing infrastructure and data processing. Software engineering has to be rethought in the context of a highly dynamic environment, where the communication between the different actors is mainly remote. Computing techniques, such as programming language choice and adoption of advanced software tools have to support this approach. Management, planning and evaluation of the performance of a project has to be adapted to this situation of loosely coupled and dispersed human resources, which is not uncommon in other fields, but extreme in HEP. Data processing also has to adapt to this situation of distributed computing resources, where the ultimate goal is to transparently access them without having to confront the underlying dynamics and complexity of the system. These lectures will address how the different leading edge technologies both in the field of software engineering and of distributed computing can be successfully applied to a large collaboration of non-professional but demanding intensive computing users. Finally different solutions to the problem adopted by other experiments will be presented.  

 

Lecturer: F. Carminati, CERN

 

From data to analysis

DP.1.1/L: Introduction

The challenges of data processing in an LHC collaboration. Main parameters involved and order of magnitudes. Typical dataflow. Structure and organisation of an Off-line project. Management and sociological issues. Examples taken from LHC experiments will be presented. Problems and issues of technology transition. Example the transition from FORTRAN to C++.

DP.1.2/L: Software Development

Planning and organisation of the work in a distributed environment.  New trends in software engineering and their application to HEP.  Rapid prototyping and architectural design.  Software tools

DP.1.3/L: Offline Frameworks

Need of an offline framework. Definition of an offline framework. Framework components and structure, layers and modularity. Interfaces to external component and basic services. Evolution and maintenance of a framework. Practical examples from LHC.

DP.1.4/L: Practical Use of Grid Technology

Basic problem of transparent access to large data samples. Solutions offered by GRID technology. Use of GRID technology in HEP, from dreams to reality. Current testbeds and first results. Practical examples from LHC.

 

Lecturers: P. Buncic, F. Rademakers, CERN

 

Distributed data handling, processing and analysis

The problems and issues of handling distributed data in typical HEP experiment. Access patterns. File catalogue vs file system. Generic API for data access. Logical, physical and transport file names. File catalogue implementation (AliEn).  Distributed data processing and analysis. Introduction to the PROOF system that provides for the distributed processing of very large collections of data. PROOF uses a parallel architecture to achieve (near) interactive performance. Introduction to the ROOT I/O system. Discussion of the PROOF three-tier parallel architecture. Description of interface of PROOF to GRID (especially AliEn, see lecture 5). 

Lectures:

DP.2.1/L:          Distributed data handling and analysis.

DP.2.2/L:          Distributed data processing and analysis

Exercises (4 hrs):

DP.2.1/E, DP.2.2/E, DP.2.3/E, DP.2.4/E

An introduction to the AliEn architecture. Using the AliEn API from C++. An introduction to PROOF and its use in the analysis of data created using the AliEn service

 

Lecturer: R.G. Jacobsen, University of California

 

Current approaches  

The computing systems of several currently-running experiments are described, with emphasis on their experience in building, commissioning and operating them.  Several approaches to ongoing development of new experiments will be described. The choices made will be discussed

DP.3.1/L: Experience with current approaches


Security and Networks

The development of modern distributed computing and complex data management systems, such as exemplified by the GRID, relies increasingly on two components where specific advances are necessary to satisfy these stringent requirements. These two areas are Computer Security and Network Performance. This track addresses each of them, in the form of two series of lectures, and via a selection of topics at the forefront of the technology. The security part starts with background knowledge and move to specific technologies such as cryptography, authentication, and their use in the Grid context.  The Networking part focuses on two aspects that are of primary importance in a Grid context: TCP/IP enhancements and network monitoring. The aim is to present the fundamentals and the evolutions of the TCP/IP stack and to explore advanced Network measurement and analysis tools and services for end-to-end performance measurement and prediction.  

 

Lecturer:  R.D. Cowles, SLAC    

 
Computer Security

SN.1.1/L:          Your Workstation

Threats

·         Destruction

·         Modification

·         Embarrassment

Responsibilities

·         Backup & Virus protection

·         Patching and configuration management

·         Email security

 

SN.1.2/L:          Cryptography and PKI

Symmetric and Asymmetric encryption

Public Key Infrastructure

·         X.509 Certificates

·         Certificate Authorities

·         Registration Authority

·         Obtaining a certificate

·         Protecting your private key

 

SN.1.3/L:          Grid Security

Registering your identity

Authentication models

Authorization to use resources

Proxy Certificates and delegation

MyProxy server

Community Access services
Threats

Vulnerabilities

How you can make the Grid more secure

Exercises (2 hrs)

SN.1.1/E:         Generate a key pair;
Perform steps necessary to send email that is signed and encrypted either using PGP or using X.509 certificates.

SN.1.2/E:         Register with a MyProxy server and use a web Grid portal to submit a job for execution.

 

Lecturers: P. Primet, ENS    

 

High performance Grid Networking

These lectures present the fundamentals of the TCP/IP stack and the limits of the protocols to meet the network requirements of the Grid application and middleware. The evolution of the network layer and of the transport layer are examine in order to understand the tendencies in the high performance networking. Emphasis is placed on the practices that permit end-to-end performance measurement and improvement.

SN.2.1/L:          Grid Networks requirements. IP protocol. TCP protocol : main features, limits

SN.2.2/L:         IP Service Differentiation - Elevated services - Non elevated services : ABE, EDS, QBSS.

SN.2.3/L:         High Performance Transport protocol and TCP optimization

Exercises (2 hrs)

SN.2.1/E:         Configure and use tools and services for Grid status and networks performance measurement.

SN.2.2/E:         Measure and understand end-to-end performance of TCP connections over different types of links.


Tools and Methods

This track presents modern techniques for software design and modern tools for understanding and improving existing software.  The emphasis will be placed on the large software projects and large executables that are common in HEP.  The track will consist of lectures, exercises and discussions.  The first discussion session will occur after several hours of exercises have been completed.  The last discussion session will be held at the end of the track.  

The first 3 lectures will cover software engineering, design, methodology and testing. This is followed by three lectures on working with large software systems, including methods for analysing their structure and improving it.  The final 2 lectures will focus on the tools that are commonly used in software design and testing

In the exercise sessions, the students will have a chance to use the tools that are described in the lectures.  They will work with CVS and configuration management tools.  They will be asked to use the test and debugging tools on some simple examples.  By showing how these tools can locate known problems, students will learn how to use them on new problems.  Students will then be given a functional program and a brief description of what it does.  The goal is to extend the program to handle a larger problem domain.  It is expected that the example programs and exercises will be primarily in C++.  

 

Lecturers: R.G. Jacobsen, University of California, R. Jones, CERN  

 

Software Engineering

An introduction to the principles of Software Engineering, with emphasis on what we know about building large software systems for high-energy physics.  These lectures cover the principles of software engineering, design, methodology and testing.

TM.1.1/L:          Introduction to Software Engineering

TM.1.2/L:         Software Design

TM.1.3/L:         Long-term Issues of Software Building

 

Lecturer: P. Tonella, Istituto Trentino di Cultura  

 

Analysis of Software Systems

This lecture series aims at investigating the issues related to program understanding and evolution. Since most of the programming activities are conducted on existing systems, it is important to know how their comprehension can be faced effectively and how interventions can be made without undesired side effects. Moreover, the system under evolution may require a preliminary restructuring. During the lectures some fundamental notions about static program analysis will be given. Then, program slicing will be presented as a support to program understanding and impact analysis. Reverse engineering techniques that extract an architectural description of existing systems will be also described. Finally, restructuring will be considered.  

TM.2.1/L:         Static code analysis, slicing

                        Program slicing is a static analysis technique that extracts from a program the statements relevant to a particular computation. Informally, a slice provides the answer to the question "What program statements potentially affect the computation of variable v at statement s?" Programmers are known to formulate questions of this kind when performing activities such as program understanding and debugging.  In this lecture, the basic notions of program dependences will be introduced, so as to allow a formal definition of the program slicing problem. A program slicing algorithm will be then described. Finally, some variants of slicing and the available tools will be presented.

TM.2.2L:           Reverse Engineering

                        During software evolution, knowledge about the high level organization of the system is important. In fact, it can help locating the focus of the change and hypothesizing ripple effects. Often, available architectural views do not accurately reflect the existing system. Their automatic extraction is thus desirable. In this lecture, reverse engineering techniques based on the specification of architectural patterns will be presented. The validation of the extracted model through the reflection method is then described. Finally, dynamic approaches to the identification of functionalities within components will be considered.

TM2.3/L:          Restructuring

Software systems are subject to a phenomenon called "architectural drift", consisting of a gradual deviation of the code implementing the system from its original design. One of its consequences is a progressive degradation of the code organization, making program maintenance harder. Refactoring is the process of modifying the code so that the external behaviour is not altered, while the internal structure is improved.  In this lecture, some examples of code refactoring will be presented with reference to the object oriented programming paradigm. They will be introduced on a small program, used throughout the lecture. Implications and approaches to use with large scale systems will be discussed.

Exercises: (2 hrs)

TM.2.1/E and TM.2.2./E:

          Introductory work on analysis and re-engineering

          Automation in code analysis and restructuring is fundamental in making the techniques studied from a theoretical point of view usable in practice. Among the available tools, during the exercises the tool TXL (http://www.txl.ca/) will be used. It supports code transformation and analysis and it comes with grammars for several widely used programming languages.

 

Exercises will focus on the implementation of some simple code transformations based on the restructuring techniques presented during the theoretical lectures. Basic refactoring operations for object oriented systems such as moving methods, replacing variables and renaming entities will be considered.

 

Lecturer: Robert G. Jacobsen, University of California  

 

Tools and Techniques

These lectures present tools and techniques that are valuable when developing software for high energy physics.  We discuss how to work more efficiently while still creating a high quality product that your colleagues will be happy with. The exercises provide practice with each of the tools and techniques presented, and culminate in a small project.   

Lectures:

TM.3.1/L:            Tools

TM.3.2/L :          Techniques

Exercises(4 hrs) :

TM.3.1/E

TM.3.2/E

TM.3.3/E

TM.3.4/E

TM.3.5/E

TM.3.6/E              Project combining all three parts of the track  

 Lecturers

Predrag Buncic

Predrag Buncic obtained a degree in physics from Zagreb University in 1989. Then he worked on tracking algorithms for the NA35 experiment and obtained a master degree in particle physics from Belgrade University in 1994. In the period 1995-1999 he worked for the NA49 experiment on development of a persistent, object-oriented I/O system and data manager (DSPACK) designed to handle data volume on 100TB scale,  and coordinated the NA49 computing efforts at CERN. At present he works for the Institute fuer Kernphysik, Frankfurt in the Alice experiment on the Alice production environment (AliEn). He is section leader of the database section in the Alice Offline Team.

Federico Carminati

Federico Carminati obtained an Italian doctor’s degree in High Energy Physics at the University of Pavia in 1981. After working as an experimental physicist at CERN, Los Alamos and CalTech, he was hired at CERN were he has been responsible for the development and support of the CERN Program Library and the GEANT3 detector simulation MonteCarlo. From 1994 to 1998 he participated in the design of the Energy Amplifier under the guidance of Prof. C.Rubbia (1984 Nobel Physics Laureate) in the development of innovative MonteCarlo techniques for the simulation of accelerator driven fission machines, and of the related fuel cycle. In January 1998 he joined the ALICE collaboration at LHC assuming the leadership of the ALICE software and computing project. Since January 2001 he holds the position of Work Package Manager in the European DataGRID project. He is responsible for the High Energy Physics Application Work Package whose aim is to deploy large scale distributed HEP applications using the GRID technology.

Robert Cowles

With more than 30 years of experience in computing and as the Computer Security Officer at SLAC, the lecturer can ground the more abstract discussions with practical, real-world examples. In addition to seminars in the US and Europe, he has taught regular classes on Internet and web security for the University of California and Hong Kong University.  Education: BS Physics from University of Kansas, 1969; MS Computer Science from Cornell University, 1971.

Ian Foster

Dr. Ian Foster is Senior Scientist and Associate Director of the Mathematics and Computer Science Division at Argonne National Laboratory, Professor of Computer Science at the University of Chicago, and Senior Fellow in the Argonne/U.Chicago Computation Institute.  He currently co-leads the Globus project with Dr. Carl Kesselman of USC/ISI as well as a number of other major Grid initiatives, including the DOE-funded Earth System Grid and the NSF-funded GriPhyN and GRIDS Center projects.  He co-edited the book “The Grid: Blueprint for a New Computing Infrastructure".

Bob Jacobsen

Bob Jacobsen is an experimental high-energy physicist and a faculty member at the University of California, Berkeley.  He's a member of the BaBar collaboration, where he led the effort to create the reconstruction software and the offline system.  He has previously been a member of the ALEPH (LEP) and MarkII (SLC) collaborations. His original academic training was in computer engineering, and he worked in the computing industry before becoming a physicist.

Bob Jones  

 After studying computer science at university Bob joined CERN and has been working on online systems for the LEP and LHC experiments. Databases communication systems graphical user interfaces and the application of these technologies to data acquisition systems was the basis of his thesis. He is currently responsible for the control and configuration sub-system of the ATLAS data acquisition prototype project.

Carl Kesselman

Dr. Carl Kesselman is a Senior Project Leader at the University of Southern California's Information Sciences Institute and a Research Associate Professor of Computer Science, also at the University of Southern California.  Prior to joining USC, Dr. Kesselman was a Member of the Beckman Institute and a Senior Research Fellow at the California Institute of Technology.  He holds a Ph.D. in Computer Science from the University of California at Los Angles.  Dr. Kesselman's research interests are in high-performance distributed computing, or Grid Computing.  He is the Co-leader of the Globus project, and along with Dr. Ian Foster, edited a widely referenced text on Grid computing.

Pascale Primet

Pascale Primet is an assistant professor in Computer Sciences. She has been giving lectures in Advanced Networks, Quality of Service and Operating System for more than ten years; member of the INRIA Reso project. She is Manager of the Work Package Network (WP7) of the EU DataGRID project and scientific coordinator of the French Grid project E-TOILE.

Fons Rademakers

Fons Rademakers received a Ph.D. in particle physics from the University of Amsterdam. Since 1990 he is working on large-scale data analysis systems at CERN. He is one of the main authors of the PAW and ROOT data analysis frameworks and since July 2000 he works in the offline computing group of the ALICE collaboration where he is in charge of the framework development.

Paolo Tonella

Paolo Tonella received his laurea degree cum laude in Electronic Engineering from the University of Padua, Italy, in 1992, and his PhD degree in Software Engineering from the same University, in 1999, with the thesis "Code Analysis in Support to Software Maintenance".  Since 1994 he has been a full time researcher of the Software Engineering group at IRST (Institute for Scientific and Technological Research), Trento, Italy. He participated in several industrial and European Community projects on software analysis and testing. He is now the technical person responsible for a project with the Alice, ATLAS and LHCb experiments at CERN on the automatic verification of coding standards and on the extraction of high level UML views from the code.  In 2000-2001 he gave a course on Software Engineering at the University of Brescia.  Now he teaches Software Analysis and Testing at the University of Trento. His current research interests include reverse engineering, object oriented programming, web applications and static code analysis.


Text: Jackie Franco-Turner
Web: Pietro Paolo Martucci
Last update: 21 February 2002