Version 5 as of 11 December 2001

From Detectors to Physics papers (DP)

Track description: a 200-5001words text describing the objective of the thread, listing briefly  the breakdown  into lecture series, as well as exercises as appropriate

The track addresses the Information Technology challenges encountered in the transformation of the raw data coming from the HEP experiments into physics results. Particular attention will be devoted to the problem and opportunities arising from the distributed environment in which both the development of the programs and the analysis of the data will take place.
This novel situation calls for the application of innovative technologies, both at the level of software engineering, computing infrastructure and data processing. Software engineering has to be rethought in the context of a highly dynamic environment, where the communication between the different actors is mainly remote. Computing techniques, such as programming language choice and adoption of advanced software tools has to coadiuvate this approach.
Management, planning and evaluation of the performance of a project has to be adapted to this situation of loosely coupled and dispersed human resources, which is not uncommon in other fields, but extreme in HEP.
Data processing also has to adapt to this situation of distributed computing resources, where the ultimate goal is to transparently access them without having to confront the underlying dynamics and complexity of the system. These lectures will address how the different leading edge technologies both in the field of software engineering and of distributed computing can be successfully applied to a large collaboration of non-professional but demanding intensive computing users.
Finally different solutions to the problem adopted by other experiments will be presented.

Track coordinators

Wisla Carena, CERN Wisla.Carena@cern.ch; Robert Edgecock, RAL r.edgecock@rl.ac.uk

 

Series Ref

Title of the Lecture Series

Description of the Lecture Series: a ~50-100 word text describing the series of lecture

Lecturer (s) name, affilaition

Lecturer (s) data: email, tel number

Lecturer (s) Biography: a 100-200 word text

L / E

Total # of hours

Lecture Description

Lecture / Exercise reference

Lecture description (a title or a short text as appropriate)

DP

From data to analysis

Lecture #1:
The challenges of data processing in an LHC collaboration. Main parameters involved and order of magnitudes. Typical dataflow. Structure and organisation of an Off-line project. Management and sociological issues. Examples taken from LHC experiments will be presented. Problems and issues of technology transition. Example the transition from FORTRAN to C++
Lecture #2:
Planning and organisation of the work in a distributed environment. New trends in software engineering and their application to HEP. Rapid prototyping and architectural design. Software tools.
Lecture #3:
Need of an offline framework. Definition of an offline framework. Framework components and structure, layers and modularity. Interfaces to external component and basic services. Evolution and maintenance of a framework. Practical examples from LHC.
Lecture #4:
Basic problem of transparent access to large data samples. Solutions offered by GRID technology. Use of GRID technology in HEP, from dreams to reality. Current testbeds and first results. Practical examples from LHC.

Federico Carminati CERN

federico.carminati@cern.ch

Federico Carminati obtained the Italian doctors degree in High Energy Physics at the University of Pavia in 1981. After working as an experimental physicist at CERN, Los Alamos and CalTech, he was hired at CERN were he has been responsible for the development and support of the CERN Program Library and the GEANT3 detector simulation MonteCarlo. From 1994 to 1998 he has participated in the design of the Energy Amplifier under the guidance of Prof. C.Rubbia (1984 Nobel Physics Laureate) in the development of innovative MonteCarlo techniques for the simulation of accelerator driven fission machines, and of the related fuel cycle. In January 1998 he has joined the ALICE collaboration at LHC assuming the leadership of the ALICE software and computing project. Since January 2001 he is holding the position of Work Package Manager in the European DataGRID project. He is responsible for the High Energy Physics Application Work Package whose aim is to deploy large scale distributed HEP applications using the GRID technology.

Lectures

4

DP1.1/L

Introduction

DP1.2/L

Software development

DP1.3/L

Offline frameworks

DP1.4/L

Practical use of GRID technology.

Exercises

0

 

 

DP

Distributed data handling, processing and analysis

The problems and issues of handling distributed data in typical HEP experiment. Access patterns. File catalogue vs file system. Generic API for data access. Logical, physical and transport file names. File catalogue implementation (AliEn). - Distributed data processing and analysis. Introduction to the PROOF system that provides for the distributed processing of very large collections of data. PROOF uses a parallel architecture to achieve (near) interactive performance. Introduction to the ROOT I/O system. Discussion of the PROOF three-tier parallel architecture. Description of interface of PROOF to GRID (especially AliEn, see lecture 5).

Predrag Buncic; Fons Rademakers

predrag.buncic@cern.ch, fons.rademakers@cern.ch

Fons Rademakers received a Ph.D. in particle physics from the University of Amsterdam. Since 1990 he is working on large scale data analysis systems at CERN. He is one of the main authors of the PAW and ROOT data analysis frameworks and since July 2000 he works in the offline computing group of the ALICE collaboration where he is in charge of the framework development.

Predrag Buncic obtained a degree in physics from Zagreb University in 1989. Then he worked on tracking algorithms for the NA35 experiment and obtained a master degree in particle physics from Belgrade University in 1994. In the period 1995-1999 worked for the NA49 experiment on development of a persistent, object-oriented I/O system and data manager (DSPACK)  designed to handle data volume on 100TB scale,  and coordinated the NA49 computing efforts at CERN. At present he works for the Institute fuer Kernphysik, Frankfurt in the Alice experiment on the Alice production environment (AliEn). He is section leader of the database section in the Alice Offline Team.

Lectures

2

DP2.1/L

Distributed data handling.

DP2.2/L

Distributed data processing and analysis

Exercises

4

DP2/E

An introduction to the AliEn architecture. Using the AliEn API from C++.An introduction to PROOF and its use in the analysis of data created using the AliEn service

DP

Current approaches

The computing systems of several currently-running experiments are described, with emphasis on their experience in building, commissioning and operating them.  Several approaches to ongoing development of new experiments will be described. The choices made will be discussed.

Bob Jacobsen

Bob_Jacobsen@lbl.gov

 

Lectures

1

DP3.1/L

Experience with current approaches

Exercises

0

 

.