General
About CSC
Organisation

People
Process for CSC hosting
School Models
Role of Local Organisers
Other Roles

Participants
Past Schools

2004 2005 2006 2007 2008 2009 2010 2011

Diploma at CSC
Sport at CSC
Inverted CSCs

iCSC05 iCSC06 iCSC08 iCSC10 iCSC11

Special schools

School@chep06

CSC2009

CSC2009 Overview

Practical Information
Programme
Schedule
Lecturers
Participants
Organizers

 
Examination results
How to apply
 
CSC-Live

CERN School of Computing 2009 17 August-28 August 2009 - Göttingen, Germany

Programme
Overview

Data
Technologies

Base
Technologies

Physics
Computing

Schedule

Lecturers

Lecturer Bios

CSC-Live

 Printable Version  

CSC2009 Physics Computing Theme

Coordinators:

Rudi Frühwirth, HEPHY Vienna
Ivica Puljak, University of Split (FESB)
Are Strandlie, Gjøvik University College

 

The track will first introduce the fundamental concepts of Physics Computing and will then address two specific aspects of scientific computing: the ROOT Technologies and Data Analysis.

The first series of lectures gives an overview of the software and hardware components required for the processing of the experimental data, from the source - the detector - to the physics analysis. The emphasis is on the concepts, but some implementation details are discussed as well. The key concept is data reduction, both in terms of rate and in terms of information density. The various algorithms used for data reduction, both online and offline, are described. The flow of the real data is the main topic, but the need for and the production of simulated data is discussed as well.

 

The second series of lectures introduces the data analysis framework ROOT, covering all basic parts that are needed for a future LHC data analysis. The lectures will present by example how key requirements like performance, reliability, flexibility, platform independence, ease-of-use, and support for extensions are put into practice. Combined with the accompanying tutorials they will give an overview of the software techniques ROOT brings to life and hands-on experience of using ROOT.

 

The third lecture series concentrates on Data Analysis aspects. Data analysis lectures will contain many examples of data visualisation and analysis code. Exercises are done with ROOT data analysis toolkit.

 

Glossary of the different acronyms: http://www.gridpp.ac.uk/gas/

Overview

Series

Type

Lecture

Description

Lecturer

     

 

 

General Introduction to Physics Computing

Lectures

Series

General Introduction to Physics Computing

The two lectures give an overview of the software and hardware components required for the processing of the experimental data, from the source - the detector - to the physics analysis. The emphasis is on the concepts, but some implementation details are discussed as well. The key concept is data reduction, both in terms of rate and in terms of information density. The various algorithms used for data reduction, both online and offline, are described. The flow of the real data is the main topic, but the need for and the production of simulated data is discussed as well.

Rudi Frühwirth

Lecture 1

Event filtering

The first lecture deals with the multi-level event filters (triggers) that are used to select the physically interesting events and to bring down the event rate to an acceptable figure. Some examples of the hardware and software that is deployed by the LHC experiments are presented.

Lecture 2

Reconstruction and simulation

The second lecture describes the various stages of event reconstruction, including calibration and alignment. The emphasis is on algorithms and data structures. The need for large amounts of simulated data is explained. The lecture concludes with a brief resume of the principles of physics analysis and the tools that are currently employed.

 

 

 

 

 

ROOT Technologies

Lectures

Lecture 1

Basics

To lay the foundation for the lectures of the coming days, we start by introducing the purpose of ROOT and its primary contexts of use. This will cover e.g. the C++ interpreter CINT and the just-in-time compiler ACLiC.

Axel Naumann

Bertrand Bellenot

Lecture 2

Tree

I/O: The exabytes of LHC data will be saved using ROOT's i/o. We will explain how ROOT persistency is integrated into C++ and the basics of ROOT's storage structure. As things change, modified classes must be taken into account by a mechanism called schema evolution.

Trees 1: One of HEP's most powerful and most commonly used collections is ROOT's TTree. We will explain why in the HEP context they are superior to e.g. STL collections, and which efficiency optimizations they provide for processing data (splitting, data access without library).

Lecture 3

Analysis

Trees 2: Two mechanisms for combining TTrees, friends and chains, will be introduced.

Data Analysis: ROOT is mainly used to analyze data. We will go through the steps of a realistic use case, calculating a trigger's efficiency from triggered data. Combining statistics, fitting, and ROOT we end up with a solution.

PROOF: Even though HEP's data is trivial to parallelize, most code is written linearly. PROOF allows you to run regular ROOT analysis code in a parallel environment. We will show you what it does and how to use it.

Exercises

Exercise 1

We will play with a few example macros, to get a feeling for the pros and cons of compiled versus interpreted mode. We apply the knowledge from the C++ introduction in the lecture to understand how object oriented design can help.

Exercise 2

We will store objects of our own class to a ROOT file and read it back.

Exercise 3

We will create chains of trees; we will create friend trees. We will run a simple analysis on a chain of trees using PROOF to see interactive parallelism in action.

Pre-requisite Knowledge

Mandatory pre-requisite

Install ROOT if you don't have it; start it up.

Create a one-dimensional histogram with 10 bins spanning the range 0..5.

Fill it with the values 4., 4.2, 5.8, 3.8, 4.7, and 2.7.

Draw it.

Fit it with a Gaussian using the default options.

Check that the mean of the fit should is 4.0 - otherwise you've done something wrong.

Desirable pre-requisite

 

and

 

references to further information

If you need to install ROOT, use the recommended version mentioned at:  http://root.cern.ch/root/Availability.html

To learn how to create, fill, draw, and fit histograms look at the User's Guide at: http://root.cern.ch/root/doc/RootDoc.html

chapters "Histograms" and "Fitting Histograms".

To see examples on how to create, fill, draw, and fit histograms look at the macros in $ROOTSYS/tutorials, esp. hsimple.C and fit1.C.

The reference guide for ROOT's histogram base class TH1 is located at : http://root.cern.ch/root/html/TH1.html

 

 

 

 

 

Data Analysis

 

Lectures

Lecture 1

Data visualisation and random variables     Ivica Puljak

First lecture in Data Analysis series discuss graphical techniques used in exploratory data analysis, gives an introduction to concept of probability, and descriptive statistics summarising the basic features of the data gathered from experiments. Random variables and Monte Carlo method are introduced.

Aatos Heikkinen

Ivica Puljak

Lecture 2

Distributions and statistical tests     Ivica Puljak

Commonly used probability distributions are introduced. It is shown how a statistical hypothesis testing is used to make statistical decisions based on experimental data. Least squares fitting, the chi-square goodness-of-fit, and maximum likelihood method are introduced.

Lecture 3

Bayesian data analysis and unfolding experimental data     Aatos Heikkinen

We study Bayesian probability, which interprets the concept of probability as a measure of a state of knowledge and not a frequency as in orthodox statistics. In Bayesian statistics new measurements are used to update or to newly infer the probability that a hypothesis may be true. We show how Bayesian estimators are used to approximate the unknown parameters. Experimental errors are discussed and unfolding methods to correct distortions in measurements are introduced.

Lecture 4

Multivariate analysis and neural networks     Aatos Heikkinen

Multivariate analysis (MVA) describes procedures which involve observation and analysis of more than one statistical variable at a time. We study different models available in ROOT Multivariate Analysis Package TMVA. Artificial neural networks are discussed as an example of MVA classifiers. Finally, we study how neural networks can be useful in regression analysis to estimate unknown function and in Higgs boson search.

Exercises

Exercise 1

Data visualisation and random variables

  • Visualisation of data with ROOT.

  • Generating random numbers.

  • Monte Carlo integration.

Exercise 2

Distributions and statistical tests

  • Smoothing data.

  • Fitting with ROOT packages (finding peaks).

  • Modeling signal and background.

Exercise 3

Bayesian data analysis and unfolding experimental data

  • Determination of experimental error.

  • Unfolding example.

Exercise 4

Multivariate analysis and neural networks

  • Data visualisation and pre-processing with TMVA.

  • Classifiers in TMVA.

  • Neural network (Multi Layer Perception) example.

Prerequisite Knowledge

Desirable prerequisite

 

and

 references to further information

Desirable pre-requisite

Spend few minutes to familiarize yourself with following concepts:

  • Data Analysis http://en.wikipedia.org/wiki/Data analysis

  • Monte Carlo method http://en.wikipedia.org/wiki/Monte Carlo method

  • Least squares fitting http://en.wikipedia.org/wiki/Least squares

  • Experimental errors http://en.wikipedia.org/wiki/Observational error

  • Artificial neural networks http://en.wikipedia.org/wiki/Artificial neural network

References

Outline was prepared based on references:

  • roo08] (ROOT manual and tutorials),

  • [Hoc07] (multivariate analysis and data visualization),

  • [Lyo92, Cow98, Siv00] (data analysis textbooks),

  • [D’A99, Jam00] (confidence limits vs. Bayesian).

In addition some ideas are proposed to be taken from:

  • [HL08] (MC simulation and environment for exercises),

  • [Blo85] (unfolding).

Bibliography

  • V. Blobel

    • Unfolding methods in high energy physics experiments.

    • Technical report, In Proceedings of the 1984 CERN School of Computing,

    • CERN 85-09, 1985.

  • G. Cowan

    • Statistical Data Analysis.

    • Oxford University Press, 1998.

  • G. D’Agostini

    • Bayesian Reasoning in High-Energy Physics: Principles and Applications.

    • Technical report, CERN-99-03, 1999.

  • A. Heikkinen and M. Liendl

  • A. Hocker.

    • TMVA - Toolkit for Multivariate Data Analysis.

    • CERN-OPEN-2007-007, 2007.

    • A. Heikkinen: Data Analysis with ROOT C / [arXiv: physics/0703039].

  • F. James

    • Workshop on Confidence Limits.

    • Technical report, CERN-2000-005, 2000.

  • L. Lyons

    • Statistics for nuclear and particle physicists.

    • Cambridge University Press, 1992.

  • ROOT 5.21 Users Guide, October 2008.

  • D. S. Sivia

    • Data Analysis: a Bayesian Tutorial.

    • Oxford University Press, 2000

Feedback: Computing (dot) School (at) cern (dot) ch
Last update: Monday, 18. June 2012 09:42

Copyright CERN