|
|
|
|
CSC2009
Physics Computing Theme
Coordinators:
Rudi Frühwirth,
HEPHY Vienna
Ivica Puljak,
University of Split (FESB)
Are Strandlie,
Gjøvik
University College |
The track will first introduce the fundamental concepts of Physics
Computing and will then address two specific aspects of scientific
computing: the ROOT Technologies and Data Analysis.
The first series of lectures gives
an overview of the software and hardware components required for the
processing of the experimental data, from the source - the detector - to
the physics analysis. The emphasis is on the concepts, but some
implementation details are discussed as well. The key concept is data
reduction, both in terms of rate and in terms of information density.
The various algorithms used for data reduction, both online and offline,
are described. The flow of the real data is the main topic, but the need
for and the production of simulated data is discussed as well.
The
second series of lectures introduces the data analysis framework ROOT,
covering all basic parts that are needed for a future LHC data analysis.
The lectures will present by example how key requirements like
performance, reliability, flexibility, platform independence,
ease-of-use, and support for extensions are put into practice. Combined
with the accompanying tutorials they will give an overview of the
software techniques ROOT brings to life and hands-on experience of using
ROOT.
The
third lecture series concentrates on Data
Analysis aspects.
Data
analysis lectures will contain many examples of data visualisation and
analysis code. Exercises are done with ROOT data analysis toolkit.
Glossary of the different acronyms:
http://www.gridpp.ac.uk/gas/ |
Overview
Series |
Type |
Lecture |
Description |
Lecturer |
|
|
|
|
|
General
Introduction to Physics Computing
|
Lectures |
Series |
General
Introduction to Physics Computing
The two
lectures give an overview of the software and hardware
components required for the processing of the experimental
data, from the source - the detector - to the physics
analysis. The emphasis is on the concepts, but some
implementation details are discussed as well. The key
concept is data reduction, both in terms of rate and in
terms of information density. The various algorithms used
for data reduction, both online and offline, are described.
The flow of the real data is the main topic, but the need
for and the production of simulated data is discussed as
well. |
Rudi Frühwirth |
Lecture 1 |
Event filtering
The
first lecture deals with the multi-level event filters
(triggers) that are used to select the physically
interesting events and to bring down the event rate to an
acceptable figure. Some examples of the hardware and
software that is deployed by the LHC experiments are
presented. |
Lecture 2 |
Reconstruction
and simulation
The
second lecture describes the various stages of event
reconstruction, including calibration and alignment. The
emphasis is on algorithms and data structures. The need for
large amounts of simulated data is explained. The lecture
concludes with a brief resume of the principles of physics
analysis and the tools that are currently employed. |
|
|
|
|
|
ROOT Technologies
|
Lectures |
Lecture 1 |
Basics
To lay the
foundation for the lectures of the coming days, we start by
introducing the purpose of ROOT and its primary contexts of
use. This will cover e.g. the C++ interpreter CINT and the
just-in-time compiler ACLiC.
|
Axel Naumann
Bertrand Bellenot |
Lecture 2 |
Tree
I/O: The exabytes of LHC data will be
saved using ROOT's i/o. We will explain how ROOT persistency
is integrated into C++ and the basics of ROOT's storage
structure. As things change, modified classes must be taken
into account by a mechanism called schema evolution.
Trees 1: One of HEP's most powerful and
most commonly used collections is ROOT's TTree. We will
explain why in the HEP context they are superior to e.g. STL
collections, and which efficiency optimizations they provide
for processing data (splitting, data access without
library).
|
Lecture 3 |
Analysis
Trees 2: Two mechanisms for combining
TTrees, friends and chains, will be introduced.
Data Analysis: ROOT is mainly used to
analyze data. We will go through the steps of a realistic
use case, calculating a trigger's efficiency from triggered
data. Combining statistics, fitting, and ROOT we end up with
a solution.
PROOF: Even though HEP's data is trivial
to parallelize, most code is written linearly. PROOF allows
you to run regular ROOT analysis code in a parallel
environment. We will show you what it does and how to use
it.
|
Exercises |
Exercise 1 |
We will play with a few example macros, to
get a feeling for the pros and cons of compiled versus
interpreted mode. We apply the knowledge from the C++
introduction in the lecture to understand how object
oriented design can help.
|
Exercise 2 |
We will store objects of our own class to
a ROOT file and read it back.
|
Exercise 3 |
We will create chains of trees; we will
create friend trees. We will run a simple analysis on a
chain of trees using PROOF to see interactive parallelism in
action.
|
Pre-requisite Knowledge |
Mandatory
pre-requisite |
Install ROOT if you don't have it; start it up.
Create a one-dimensional histogram with 10 bins spanning the
range 0..5.
Fill it with the values 4., 4.2, 5.8, 3.8, 4.7, and 2.7.
Draw it.
Fit it with a Gaussian using the default options.
Check that the mean of the fit should is 4.0 - otherwise
you've done something wrong. |
Desirable pre-requisite
and
references to further
information |
If you need to install ROOT, use the recommended version
mentioned at:
http://root.cern.ch/root/Availability.html
To learn how to create, fill, draw, and fit histograms look
at the User's Guide at:
http://root.cern.ch/root/doc/RootDoc.html
chapters "Histograms" and "Fitting Histograms".
To see examples on how to create, fill, draw, and fit
histograms look at the macros in $ROOTSYS/tutorials, esp.
hsimple.C and fit1.C.
The reference guide for ROOT's histogram base class TH1 is
located at :
http://root.cern.ch/root/html/TH1.html |
|
|
|
|
|
Data Analysis
|
Lectures |
Lecture 1 |
Data visualisation and random variables
Ivica Puljak
First lecture in Data Analysis series discuss graphical
techniques used in exploratory data analysis, gives an
introduction to concept of probability, and descriptive
statistics summarising the basic features of the data
gathered from experiments. Random variables and Monte Carlo
method are introduced.
|
Aatos Heikkinen
Ivica Puljak |
Lecture 2 |
Distributions and statistical tests
Ivica Puljak
Commonly used probability distributions are introduced. It
is shown how a statistical hypothesis testing is used to
make statistical decisions based on experimental data. Least
squares fitting, the chi-square goodness-of-fit, and maximum
likelihood method are introduced.
|
Lecture 3 |
Bayesian data analysis and unfolding experimental
data
Aatos Heikkinen
We study Bayesian probability, which interprets the concept
of probability as a measure of a state of knowledge and not
a frequency as in orthodox statistics. In Bayesian
statistics new measurements are used to update or to newly
infer the probability that a hypothesis may be true. We show
how Bayesian estimators are used to approximate the unknown
parameters. Experimental errors are discussed and unfolding
methods to correct distortions in measurements are
introduced.
|
Lecture 4 |
Multivariate analysis and neural networks
Aatos Heikkinen
Multivariate analysis (MVA) describes procedures which
involve observation and analysis of more than one
statistical variable at a time. We study different models
available in ROOT Multivariate Analysis Package TMVA.
Artificial neural networks are discussed as an example of
MVA classifiers. Finally, we study how neural networks can
be useful in regression analysis to estimate unknown
function and in Higgs boson search.
|
Exercises |
Exercise 1
|
Data visualisation and random variables
|
Exercise 2
|
Distributions and statistical tests
|
Exercise 3 |
Bayesian data analysis and unfolding experimental
data
|
Exercise 4 |
Multivariate analysis and neural networks
|
Prerequisite
Knowledge |
Desirable
prerequisite
and
references
to further information
|
Desirable pre-requisite
Spend few
minutes to familiarize yourself with following concepts:
-
Data
Analysis
http://en.wikipedia.org/wiki/Data analysis
-
Monte
Carlo method
http://en.wikipedia.org/wiki/Monte Carlo method
-
Least
squares fitting
http://en.wikipedia.org/wiki/Least squares
-
Experimental errors
http://en.wikipedia.org/wiki/Observational error
-
Artificial neural networks
http://en.wikipedia.org/wiki/Artificial neural network
References
Outline was prepared
based on references:
-
roo08]
(ROOT manual and tutorials),
-
[Hoc07]
(multivariate analysis and data visualization),
-
[Lyo92,
Cow98,
Siv00]
(data analysis textbooks),
-
[D’A99,
Jam00]
(confidence limits vs. Bayesian).
In
addition some ideas are proposed to be taken from:
Bibliography
|
|
|