Coordinators:
Rudi Frühwirth,
HEPHY Vienna |
The track will first introduce the fundamental concepts of Physics
Computing and will then address two specific aspects of scientific
computing: the ROOT Technologies and Data Analysis.
The second series of lectures presents modern techniques for software design and modern tools and technologies for understanding and improving existing software, which are relevant for Physics Computing. The emphasis is placed on the large software projects and large executables that are common in HEP. The series consist of lectures and exercises. These lectures include topics such software engineering, design, methodology and testing.
The third series of lectures introduces the data analysis framework ROOT, covering all basic parts that are needed for a future LHC data analysis. The lectures will present by example how key requirements like performance, reliability, flexibility, platform independence, ease-of-use, and support for extensions are put into practice. Combined with the accompanying tutorials they will give an overview of the software techniques ROOT brings to life and hands-on experience of using ROOT.
The
fourth lecture series concentrates on Data
Analysis aspects.
Data
analysis lectures will contain many examples of data visualisation and
analysis code. Exercises are done with ROOT data analysis toolkit. |
Series |
Type |
Lecture |
Description |
Lecturer |
|
||||
General Introduction to Physics Computing |
Lectures |
Series |
General Introduction to Physics Computing The two lectures give an overview of the software and hardware components required for the processing of the experimental data, from the source - the detector - to the physics analysis. The emphasis is on the concepts, but some implementation details are discussed as well. The key concept is data reduction, both in terms of rate and in terms of information density. The various algorithms used for data reduction, both online and offline, are described. The flow of the real data is the main topic, but the need for and the production of simulated data is discussed as well. |
|
Lecture 1 |
Event filtering The first lecture deals with the multi-level event filters (triggers) that are used to select the physically interesting events and to bring down the event rate to an acceptable figure. Some examples of the hardware and software that is deployed by the LHC experiments are presented. |
|||
Lecture 2 |
Reconstruction and simulation The second lecture describes the various stages of event reconstruction, including calibration and alignment. The emphasis is on algorithms and data structures. The need for large amounts of simulated data is explained. The lecture concludes with a brief resume of the principles of physics analysis and the tools that are currently employed. |
|||
Tools and Techniques for Physics Computing | Lectures |
Lecture 1 Lecture 2 |
Introduction to the Track To start, we discuss some of the characteristics of software projects for high energy physics, and some of the issues that arise when people want to contribute to them. This forms the framework for the Software Technologies Track. We then continue with a brief introduction to software engineering from the perspective of the individual contributor, both as a formal process and how it actually effects what you do.
Tools You Can Use This lecture discusses several categories of tools & techniques you can use to make yourself more productive and effective. Continuous testing and documentation has proven to be important in producing high quality work, but it's often difficult to do; we discuss some available approaches. Many problems require specific tools and techniques to solve them effectively: We discuss the examples of performance tuning and memory access problems.
Tools for Collaboration
HEP software is built by huge teams. How can this be done
effectively, while still giving people satisfying tasks to
perform?
Software Engineering Across the Project
Now that we've covered both individual and group work, we go
back to the software engineering topics of the first lecture
to see how these fit together. How does our individual work
effect the ability of the entire project to proceed? What
are tools and techniques that will improve both our
individual work, and out contributions to the whole? |
Bob Jacobsen |
Exercises |
Exercise 1
and |
Exercises 1 and 2 The first two exercises provide some direct experience with the tools and techniques described in Lectures 1 and 2. In particular, pairs of students will work together to update existing applications, working through examples designed to show the strengths and weaknesses of several approaches. |
Bob Jacobsen | |
Exercise 3
and |
Exercises 3 and 4 After the two-person teams acquire some experience with the CMT release system, and CVS if needed, we will have groups of 5 teams work together to create a functional release from individual sub-projects at various stages of completion. Although a limited exercise, this is intended to demonstrate some of the real issues discussed in the lecture. |
Bob Jacobsen | ||
Exercise 5 |
Exercises 5 Wrap-up session. |
Bob Jacobsen | ||
|
|
|
|
|
ROOT Technologies |
Lectures |
Lecture 1 |
Basics To lay the foundation for the lectures of the coming days, we start by introducing the purpose of ROOT and its primary contexts of use. This will cover e.g. the C++ interpreter CINT and the just-in-time compiler ACLiC. |
|
Lecture 2 |
Tree I/O: The exabytes of LHC data will be saved using ROOT's i/o. We will explain how ROOT persistency is integrated into C++ and the basics of ROOT's storage structure. As things change, modified classes must be taken into account by a mechanism called schema evolution. Trees 1: One of HEP's most powerful and most commonly used collections is ROOT's TTree. We will explain why in the HEP context they are superior to e.g. STL collections, and which efficiency optimizations they provide for processing data (splitting, data access without library). |
|||
Lecture 3 |
Analysis Trees 2: Two mechanisms for combining TTrees, friends and chains, will be introduced. Data Analysis: ROOT is mainly used to analyze data. We will go through the steps of a realistic use case, calculating a trigger's efficiency from triggered data. Combining statistics, fitting, and ROOT we end up with a solution. PROOF: Even though HEP's data is trivial to parallelize, most code is written linearly. PROOF allows you to run regular ROOT analysis code in a parallel environment. We will show you what it does and how to use it. |
|||
Exercises |
Exercise 1 |
We will play with a few example macros, to get a feeling for the pros and cons of compiled versus interpreted mode. We apply the knowledge from the C++ introduction in the lecture to understand how object oriented design can help. |
||
Exercise 2 |
We will store objects of our own class to a ROOT file and read it back. |
|||
Exercise 3 |
We will create chains of trees; we will create friend trees. We will run a simple analysis on a chain of trees using PROOF to see interactive parallelism in action. |
|||
Pre-requisite Knowledge |
Mandatory pre-requisite |
Install ROOT if you don't have it; start it up. Create a one-dimensional histogram with 10 bins spanning the range 0..5. Fill it with the values 4., 4.2, 5.8, 3.8, 4.7, and 2.7. Draw it. Fit it with a Gaussian using the default options. Check that the mean of the fit is4.0 - otherwise you've done something wrong. |
||
Desirable pre-requisite
and
references to further information |
If you need to install ROOT, use the recommended version mentioned at: http://root.cern.ch/drupal/content/downloading-root To learn how to create, fill, draw, and fit histograms look at the User's Guide at: http://root.cern.ch/drupal/content/users-guide chapters "Histograms" and "Fitting Histograms". To see examples on how to create, fill, draw, and fit histograms look at the macros in $ROOTSYS/tutorials, esp. hsimple.C and fit1.C. The reference guide for ROOT's histogram base class TH1 is located at : http://root.cern.ch/root/html/TH1.html |
|||
|
|
|
|
|
Data Analysis |
Lectures |
Lecture 1 |
Data visualisation and random variables Ivica Puljak
First lecture in Data Analysis series discuss graphical
techniques used in exploratory data analysis, gives an
introduction to concept of probability, and descriptive
statistics summarising the basic features of the data
gathered from experiments. Random variables and Monte Carlo
method are introduced. |
|
Lecture 2 |
Distributions and statistical tests
Commonly used probability distributions are introduced. It
is shown how a statistical hypothesis testing is used to
make statistical decisions based on experimental data. Least
squares fitting, the chi-square goodness-of-fit, and maximum
likelihood method are introduced. |
|||
Lecture 3 |
Bayesian data analysis and unfolding experimental
data
We study Bayesian probability, which interprets the concept
of probability as a measure of a state of knowledge and not
a frequency as in orthodox statistics. In Bayesian
statistics new measurements are used to update or to newly
infer the probability that a hypothesis may be true. We show
how Bayesian estimators are used to approximate the unknown
parameters. Experimental errors are discussed and unfolding
methods to correct distortions in measurements are
introduced. |
|||
Lecture 4 |
Multivariate analysis and neural networks
Multivariate analysis (MVA) describes procedures which
involve observation and analysis of more than one
statistical variable at a time. We study different models
available in ROOT Multivariate Analysis Package TMVA.
Artificial neural networks are discussed as an example of
MVA classifiers. Finally, we study how neural networks can
be useful in regression analysis to estimate unknown
function and in Higgs boson search. |
|||
Exercises |
Exercise 1 |
Data visualisation and random variables
Ivica Puljak
|
||
Exercise 2 |
Distributions and statistical tests
|
|||
Exercise 3 |
Bayesian data analysis and unfolding experimental data
|
|||
Exercise 4 |
Multivariate analysis and neural networks
|
|||
Prerequisite Knowledge |
Desirable prerequisite
and references to further information |
Desirable pre-requisite Spend few minutes to familiarize yourself with following concepts:
References
Outline was prepared
based on references:
In
addition some ideas are proposed to be taken from:
Bibliography
|