Scientific programme

The School is based on the presentation of approximately 39 lectures and on 13 hours related practical exercises on PCs or workstations.

The programme of the Schools is organised round four themes:

Computing Architectures - Hardware and Software
High Throughput Distributed Computing
Distributed Real-Time Systems
Principles of Distributed Bata Bases

Preliminary Scientific Programme 2001

The following lectures are now confirmed. Any additional lectures will be announced later.

Computer Architecture - Software and Hardware

Title: Computer Architecture – Software and Hardware

Lecturers: D. Mosberger, Hewlett Packard Laboratories, Palo Alto, USA; J. Sorensen, Linuxcare, Ottawa, Canada; S. Jarp, CERN, Geneva, Switzerland

Theme: The basic building blocks of modern computers.

This series of lectures starts from a relatively detailed hardware description of a modern computer, as represented by the new IA-64 architecture co-developed by HP and Intel. Via the understanding of the instruction set architecture, the Itanium processor micro-architecture and the various components of the system architecture (e.g. I/O handling, multiprocessor support, etc.), it takes the audience through a detailed explanation of the required Linux kernel (both the system dependent and independent part) as well as the exploitation of the hardware features by the application enabling software (e.g. compilers and libraries adapted for the architecture). Finally it discusses the key features required for computing farms and compares to other farm computers, such as the IA-32 PC.

a) HARDWARE

Lecture 1:

Internal processor architecture (instruction set, etc.) (S. Jarp)

Lecture 2:

System "Box" architecture (memory, I/O buses, etc.) (D. Mosberger)

Lecture 3:

"Farm" architecture (interconnects, resulting throughput, reliability issues, etc.) (S. Jarp)

Demonstration: Practical demonstration of the instruction set architecture.

b) SOFTWARE

Lecture 1:

Hardware-dependent kernel software (interrupt handlers, etc.) (D. Mosberger)

Lecture 2:

Hardware independent kernel software (file systems, etc.) (J. Sorensen)

Lecture 3:

Application enabling software (toolchain, compilers, libraries, etc.) (J. Sorensen)

High Throughput Distributed Systems

Theme:

The increasing large CPU demands of experimental high energy physics are met with dedicated farms of low-cost processors, typically workstations running some form of Linux. In this track, students will learn about the computing challenges facing HEP experiments in reconstruction, simulation and analysis and about current technologies that have been adopted to meet these challenges. We also examine developing technologies, including ways of doing large scale computing when both the processors and data are widely distributed.

0) Introduction to HEP Offline Computing (Reconstruction, Analysis and Simulation)

Lecturer: to be confirmed

(possible 1 hour lecture)

1) Distributed Computing for HENP experiments

Lecturers: S. Wolbers, Fermi Lab, Batavia, USA

H. Schellmann, Dept. of Physics and Astronomy, Evanston, USA

3-4 hours of lectures

Goal: Understand large-scale computing by studying the characteristics of different computing problems and how they are solved on large systems.

Techniques and Exercises: Many problems in HENP must be addressed with large computing systems. The bulk reconstruction of data, creation of large simulation data-sets and physics analysis of large data-sets are a few examples that will be addressed. The analysis of each problem involves a study of the CPU and I/O characteristics of the problem, a proposed computer hardware and software architecture to solve the problem, an analysis of the appropriateness and effectiveness of the proposed solution. The solution will be found to be adequate or not and changes will be made to address the shortcomings. Prototype systems and performance testing is required to prove the solution. Further modifications will be made as required.

The interplay of hardware and software will be studied. This is especially important for error handling in large systems. A large distributed system must be capable of handling hardware failures and bugs in software with little or no effort on the part of the users of the system.

Data-flow is especially important in large systems. All systems have limitations. These limitations must be understood to avoid serious bottlenecks. These limitations occur locally and across networked systems. Examples include performance degradation of incorrectly tuned RAID arrays and limitations in network switches used to connect large sets of computers.

· Farming

· Analysis Clusters

· Dealing with large data sets

· Batch systems

· Wide Area Distributed Computing

2) Distributed Data Analysis

Lecturers:

A.S. Johnson, SLAC, Stanford, USA

M. Dönszelmann, CERN, Geneva, Switzerland

2 hours of lectures and 6 hours of exercises

Goal: Get students to work together on a project that simulates a real distributed data analysis problem. The skills and techniques used should be applicable to real problems the students would face when doing analysis of large data samples.

Exercise: The students would perform a distributed data analysis, vaguely similar to a "mock-data" challenge. We would create a Monte Carlo data sample representing a (simplified) data sample from some future experiment. The data sample would be distributed over the machines at the school and would include a physics "signal" which the students would need to find and identify.

Tools: As a front-end the students would use JAS/WIRED. This would be coupled to a distributed analysis system. The students would be provided with the analysis framework, so their task would be to customise the analysis to find the signal, and put together a simple event display (using WIRED) to display the interesting events.

The exercise would be best performed be organising the students into small groups (5-6 students) to work together on the problem. They would need to distribute the tasks to be performed amongst the group.

The distributed data analysis would ideally use GRID computing tools, installed on the computers at the school, and coupled to JAS via a simple Data Interface Module (DIM). If for any reason this proves to difficult to complete/install in time we could use a simplified implementation using Java Aglets. In either case the basic concepts would be the same.

Lectures: To support the exercise we would have lectures which would introduce the tools to be used, including JAS, WIRED, and the concepts behind distributed data analysis.

Notes: We plan to provide a Java tutorial that will be sent to the students prior to the school. Students unfamiliar with Java are encouraged to read it prior to the school.

3) Tools

Lecturer: R.G. Jacobsen, University of California, Berkeley, USA

2 hours of lectures, 2 hours of exercises

Goal: To present modern techniques for understanding and improving existing software, with emphasis on the large executables commonly seen in HEP.

Two different types of toolsets will be presented:

· Performance: CPU and memory profiling tools,

gprof, Insure

· Test and debug: Leak checkers, test suites,

Purify, dejaGnu

Distributed Real-Time Systems

Title: Event Selection and Data Acquisition in Modern HEP Experiments: the Real-Time Side.

Lecturer: L. Mapelli, CERN, Geneva, Switzerland

Theme:

At the LHC, the most demanding HEP experimental environment, 25 p-p interactions will occur simultaneously every 25 ns, at an energy one order of magnitude higher than the highest currently produced in any laboratory, therefore generating hundreds of particles every bunch crossing and depositing thousands of signals in millions of electronics channels. One of the main goals is the search of a physics signal so tiny that it will require the capability of selecting the one interesting event out of ten thousand billion interactions. This sets exceptionally demanding requirements for the event selection (Trigger) and data collection (DAQ) systems to be designed and implemented for experiments at the LHC.

Using the LHC case, modern Real-Time system architectures being studied for HEP experiments will be explained. Their innovative features and the technologies they are based upon, unused so far in HEP, will be described.

Lecture 1:

Introduce requirements, overview and main architectural aspects.

Lecture 2:

Describe details of the most crucial issues and elements of the architecture, such as the Timing distribution system, the multi-level Trigger and the hierarchical Data Acquisition and Event Building.

Lecture 3:

Taking as example one of the LHC experiments, describe recent and ongoing work aimed at reaching maturity for the final design and implementation. Such work is also addressing the question: "How far can we go in the use of commodity components?".

Title: Distributed Control Systems

Lecturers: P. Burkimsher and W. Salter, CERN, Geneva, Switzerland

Theme:

The lectures will address the subject of distributed control systems and will show why such systems are important within the CERN environment. After a general introduction to the subject, the domain of experiment controls will be taken as an example to illustrate the use of distributed control systems at the CERN. The lectures will cover many of the technologies employed in a typical distributed control system, both of a standard industrial and of a non-industrial nature. In particular, the lectures will address the use of commercial SCADA systems for the supervisory level. The lectures will highlight the needs of an experiment control system which differ from typical industrial systems and indicate how these influenced the choice of the SCADA system to be used for the LHC experiment control systems. This SCADA system will be used to highlight important features of such systems as well as the basis for the three exercises. Finally, the lectures will also highlight that the selection of tools, such as a SCADA system, can only be a first step in the development of an experiment control system. To stress this the development of a set of LHC experiment specific components, tools and facilities, known as the JCOP Framework, will be discussed.

Lecture 1 (W. Salter):

After a brief introduction to distributed control systems the first lecture will give an overview of the areas where distributed control systems are employed within the CERN environment to illustrate why they are important for the successful running of the CERN experimental programme. It will also highlight some of the major differences between their use in different areas of CERN e.g. accelerators, technical services and experiments. The lecture will then describe the use of such systems for experiment controls, firstly in the LEP era, before moving onto the LHC experiments. The lecture will finish by showing a typical architecture for an experiment control system to highlight and discuss the different technologies employed at different levels within the hierarchy.

Lecture 2 (W. Salter):

The second lecture will discuss SCADA, its general use in industry and the major differences between such industrial systems and an experiment control system. In particular, the required scalability and openness as well as the development strategy for the LHC experiment control systems will be described. The lecture will then go on to discuss the major criteria which were used as the basis for the selection of a SCADA system for the LHC experiments. Following on from this an overview of the selected SCADA system, PVSS, will be given which will highlight in particular those features which are especially important from an experiment control point of view. Finally, the concept of the Joint Controls Project (JCOP) Framework (FW) as well as its intended purpose will be described.

Lecture 3 (P. Burkimsher):

In the third lecture a number of aspects of PVSS and their use will be described in detail. These will be the facilities that are needed in order to perform the three exercises. The following will be covered:

· The creation of a Data Point Type (DPT - device class)

· The creation of Data Points (DP - device instances) and the parameterisation of these (highlighting alarm handling and archiving)

· The creation of a simple panel and the method to animate this via scripts. This will include a look at the trend object and alarm panel

Exercise 1

· Create a new instance of a Framework device (e.g. HV Crate or HV Subsystem - tbd).

· View and manipulate it.

Exercise 2

· Create a non-FW device:

· Create a DPT (that has at least an on/off command and one setting, on/off status and read back value)

· Create a DP (an instance of the above DPT)

· Add an alert config to the DP (read back value)

· Add an archive config to the DP (read back value)

· Create a panel to operate the device:

· Draw the device (some define graphical representation of the physical device)

· Add a button to switch it on/off

· Add an input field for changing the setting value

· Create a script to change the colour of the device based on the on/off state

· Add a field for displaying the read back value

· Add a trend chart to display the evolution of the read back value

· Add a button to call up the standard PVSS alarm panel

· Operate the device and view trend and alarms associated with it

Exercise 3

· Open pre-configured project

· Groups of 5 users should then connect to a single Event Manager (EV) (each User Interface Manager (UIM) would represent a subsystem of a sub-detector and the combination of the five the complete sub-detector)

· Interact via pre-defined panels and see the impact of changes in one subsystem on the overall status of the sub-detector

· Connect the EVs together using the PVSS distribution manager to build a pseudo complete detector control system

· Show the impact of interactions at the detector level on individual sub-detectors and subsystems and vice-versa (this will probably be demonstrated by one of the instructors)

Prerequisites:

1) Some knowledge of C would be useful but not essential

2) It is suggested that students read the JCOP web pages as useful background information to these lectures (http://itcowww.cern.ch/JCOP/)

Keywords:

Supervisory Control And Data Acquisition (SCADA), Experimental Control System (ECS), Detector Control System (DCS), Programmable Logic Controller (PLC), OLE for Process Control (OPC), Finite State Machine (FSM), Run Control, Prozeß-Visualisierungs- und Steuerungssystem (PVSS)

Principles of Distributed Data Bases

Theme:

Networked databases are becoming an essential component of modern computing, particularly in the field of particle physics. The aims of this track are twofold: to provide a solid background in the field of distributed databases, including object databases, and to illustrate the principles using the GRID system. To achieve the first objective, students will be exposed to the basics of distributed databases, including data integrity management, transparency, replication and concurrency issues. Then, the theme will move to object technologies and take Objectivity as an example of a distributed object database. The second part of the track will present the role played by distributed databases in the Grid architecture.

The theme is composed of 9 lectures, 3 on the Fundamentals of Distributed Databases, 3 on the Object-oriented Distributed Databases and 3 on the GRID example. It is complemented by 4 hours of exercises on Object-oriented Distributed Database and 3 hours of exercises on the Grid example.

The outline given below may be subject to slight adjustments.

Part 1: Fundamentals of Distributed Databases

Lecturer: E. Schikuta, University of Vienna, Vienna, Austria

Lecture 1: Overview and essentials

A lecture providing an introductrion to databases, including distributed architectures, data integrity, transparency, performance.

Lecture 2: Replication and concurrency

A lecture focusing on some specific aspects of distributed databases

Lecture 3: Connectivity and security

A lecture on access technologies, including Databases APIs, Object Database connectivity, access control, security

Part 2: Object-oriented Distributed Databases

Lecturer: D. Duellmann, CERN, Geneva, Switzerland

Lecture 4: Object technology for Distributed Databases

Lecture 5: Example of Distributed Databases: Objectivity (part 1)

Lecture 6: Example of Distributed Databases: Objectivity (part 2)

Part 3: The Grid and Distributed Databases

Lecturer: D. Malon, Argonne National Laboratory, Argonne, USA

Lecture 7: Grid data services

A lecture presenting the grid data model and services, including grid-enabled file transfer and grid-based file replica management

Lecture 8: Distributed databases on grids: early implementations

A lecture describing the use of grid services for database distribution and replication, and presenting a case study: the Grid Data Management Pilot (GDMP)

Lecture 9: Databases on emerging data grids: current research, and trends

A lecture presenting the future trends, in particular the integration of grid and database services, and the virtual data concept

Exercises: Object-oriented distributed database (4 hours of exercises): D. Duellman

Grid (3 hours of exercises): D. Malon.

Text: Jackie Franco-Turner
Web: Pietro Paolo Martucci
Last update: 16 March 2001