CERN School of Computing 2012
13-24 August, Uppsala, Sweden
 

 CSC2012 Physics Computing Theme

Coordinators:

Arnulf Quadt, University of Göttingen
Are Strandlie, Gjřvik University College

 

The track will first introduce the fundamental concepts of Physics Computing and will then address two specific aspects of scientific computing: tools and techniques for scientific software and Data Analysis.

The first series of lectures gives an overview of the software and hardware components required for the processing of the experimental data, from the source - the detector - to the physics analysis. The emphasis is on the concepts, but some implementation details are discussed as well. The key concept is data reduction, both in terms of rate and in terms of information density. The various algorithms used for data reduction, both online and offline, are described. The flow of the real data is the main topic, but the need for and the production of simulated data is discussed as well.

 

The second series of lectures presents modern techniques for software design and modern tools and technologies for understanding and improving existing software, which are relevant for Physics Computing.  The emphasis is placed on the large software projects and large executables that are common in HEP. The series  consist of lectures and exercises. These lectures include topics such software engineering, design, methodology and testing. 

 

The third lecture series concentrates on Data Analysis aspects. Data analysis lectures will contain many examples of data visualisation and analysis code. Exercises are done with ROOT data analysis toolkit.

 

 

 General Introduction to Physics Computing

Session

Description

Lecturer

Whole series

General Introduction to Physics Computing

The two lectures give an overview of the software and hardware components required for the processing of the experimental data, from the source - the detector - to the physics analysis. The emphasis is on the concepts, but some implementation details are discussed as well. The key concept is data reduction, both in terms of rate and in terms of information density. The various algorithms used for data reduction, both online and offline, are described. The flow of the real data is the main topic, but the need for and the production of simulated data is discussed as well.

Are Strandlie

Lecture 1

Event filtering

The first lecture deals with the multi-level event filters (triggers) that are used to select the physically interesting events and to bring down the event rate to an acceptable figure. Some examples of the hardware and software that is deployed by the LHC experiments are presented.

Are Strandlie

Lecture 2

Reconstruction and simulation

The second lecture describes the various stages of event reconstruction, including calibration and alignment. The emphasis is on algorithms and data structures. The need for large amounts of simulated data is explained. The lecture concludes with a brief resume of the principles of physics analysis and the tools that are currently employed.

Are Strandlie

Prerequisite

and

References

Desirable Prerequisite

Basic knowledge of experimental science

 

 

Tools and Techniques for Physics Computing

Session

Description

Lecturer

Lecture 1

Lecture 2

Introduction to the Track

To start, we discuss some of the characteristics of software projects for high energy physics, and some of the issues that arise when people want to contribute to them. This forms the framework for the Software Technologies Track. We then continue with a brief introduction to software engineering from the perspective of the individual contributor, both as a formal process and how it actually effects what you do.

 

Tools You Can Use

This lecture discusses several categories of tools & techniques you can use to make yourself more productive and effective. Continuous testing and documentation has proven to be important in producing high quality work, but it's often difficult to do; we discuss some available approaches. Many problems require specific tools and techniques to solve them effectively: We discuss the examples of performance tuning and memory access problems.

 

Tools for Collaboration

HEP software is built by huge teams. How can this be done effectively, while still giving people satisfying tasks to perform?
This lecture discusses some of the technical approaches used. Source control (e.g. CVS) is becoming common, so we just skim over it's advantages and disadvantages to get to the larger area of release control (e.g. CMT) and release testing & distribution. We'll focus on why is this considered a hard problem, and what are the current techniques for dealing with it.

 

Software Engineering Across the Project

Now that we've covered both individual and group work, we go back to the software engineering topics of the first lecture to see how these fit together. How does our individual work effect the ability of the entire project to proceed? What are tools and techniques that will improve both our individual work, and out contributions to the whole?
We close with a summary of observations.

Bob Jacobsen

Exercise  1

Exercise  2

Exercise  3

Exercises 1, 2 and 3

The first two exercises provide some direct experience with the tools and techniques described in Lectures 1 and 2. Teams of two students will work together to update existing applications, working through examples designed to show the strengths and weaknesses of various tools and approaches. This will be followed by small projects for additional development experience.

Bob Jacobsen

Exercise  4

Exercise  5

Exercise  6

 

Exercises 4, 5 and 6

After the two-person teams acquire some experience with the development and release tools, we will group projects to demonstrate some of the real-world issues discussed in the lecture. Groups of two teams will first work together to create a functional release from individual sub-projects at various stages of completion to show the strengths and weaknesses of test and release tools. This is followed by a larger scale exercise with groups of five teams.

Bob Jacobsen

Prerequisite

and

References

Desirable Prerequisite

Basic programming and software engineering

 

 

Data Analysis  ( Introductory slides to Leerier 1 (not in the booklet) available here . )

Session

Description

Lecturer

Lecture 1

Introduction to data analysis
First lecture in Data Analysis series discuss graphical techniques used in exploratory data analysis, gives an introduction to concept of probability, and descriptive statistics summarizing the basic features of the data gathered from experiments.

Ivica Puljak

Lecture 2

Monte Carlo method
Monte Carlo method is introduced and explained with examples from engineering and high energy physics.

Ivica Puljak

Lecture 3

Distributions and estimators
In this lecture commonly used probability distributions are introduced with basic properties and few examples. Parameter estimation with maximum likelihood and least-squared methods is explained.

Ivica Puljak

Lecture 4

Confidence intervals
Determining the errors on the parameters, which is equivalent to the confidence interval estimation is shown with specific examples on maximum likelihood and least-squared methods in one and more then one dimension. Uncertainties in physics and error propagation are also discussed.

Ivica Puljak

Lecture 5

Statistical tests
Hypothesis testing is introduced with examples of goodness-of-fit tests and the most recent examples from high energy physics. Particular emphasis is given on the p-values and when we claim the discoveries.

Ivica Puljak

Exercise 1

Introduction to ROOT

  • Basic and advanced ROOT examples

  • Visualisation of Data with ROOT

Ivica Puljak

Exercise 2

Monte Carlo method

  • Generating random numbers

  • Monte-Carlo toy experiments

 

Exercise 3

Fitting with ROOT

  • Modeling signal and background.
  • Fitting with ROOT packages (finding peaks).
 

Exercise 4

Confidence interval

  • Finding errors on fit parameters

  • Extracting confidence intervals

 

Exercise 5

Hypothesis testing

  • Finding p-value

  • Converting p-values to significance

  • Low count experiments and hypothesis testing

 

Prerequisite

and

References

Desirable Prerequisite

Spend few minutes to familiarize yourself with following concepts:
  • Data Analysis http://en.wikipedia.org/wiki/Data analysis

  • Monte Carlo method http://en.wikipedia.org/wiki/Monte Carlo method

  • Least squares fitting http://en.wikipedia.org/wiki/Least squares

  • Experimental errors http://en.wikipedia.org/wiki/Observational error

References

Outline was prepared based on references:

  • roo08] (ROOT manual and tutorials),

  • [Hoc07] (multivariate analysis and data visualization),

  • [Lyo92, Cow98, Siv00] (data analysis textbooks),

  • [D’A99, Jam00] (confidence limits vs. Bayesian).

In addition some ideas are proposed to be taken from:

  • [HL08] (MC simulation and environment for exercises),

Bibliography

  • G. Cowan

    • Statistical Data Analysis.

    • Oxford University Press, 1998.

  • G. D’Agostini

    • Bayesian Reasoning in High-Energy Physics: Principles and Applications.

    • Technical report, CERN-99-03, 1999.

  • A. Heikkinen and M. Liendl

  • A. Hocker.

    • TMVA - Toolkit for Multivariate Data Analysis.

    • CERN-OPEN-2007-007, 2007.

    • A. Heikkinen: Data Analysis with ROOT C / [arXiv: physics/0703039].

  • F. James

    • Workshop on Confidence Limits.

    • Technical report, CERN-2000-005, 2000.

  • L. Lyons

    • Statistics for nuclear and particle physicists.

    • Cambridge University Press, 1992.

  • ROOT 5.21 Users Guide, October 2008.

  • D. S. Sivia

    • Data Analysis: a Bayesian Tutorial.

    • Oxford University Press, 2000

 

 

 

 Multivariate Analysis and Visualisation

Session

Description

Lecturer

Lecture 1

Multivariate Analysis and Visualisation

The aim of this lecture is to make the audience aware of multivariate analysis (MVA) methods and alternative visualisations available. It will describe the general MVA sequence before going into more detail on two classifiers commonly used in particle physics. Two visualisations are also described and the possible benefits that might arise from using MVA and multivariate visualisations are outlined.

B.Radburn Smith

Prerequisite

and

References

Desirable Prerequisite
This lecture can be followed by anyone who has some experience of data analysis and data visualisation.

 

 
 

Copyright CERN

Print version