Data Analysis

Session

Description

Lecturer

Lecture 1

Introduction to data analysis
First lecture in Data Analysis series discuss graphical techniques used in exploratory data analysis, gives an introduction to concept of probability, and descriptive statistics summarizing the basic features of the data gathered from experiments.

Ivica Puljak

Lecture 2

Monte Carlo method
Monte Carlo method is introduced and explained with examples from engineering and high energy physics.

Ivica Puljak

Lecture 3

Distributions and estimators
In this lecture commonly used probability distributions are introduced with basic properties and few examples. Parameter estimation with maximum likelihood and least-squared methods is explained.

Ivica Puljak

Lecture 4

Confidence intervals
Determining the errors on the parameters, which is equivalent to the confidence interval estimation is shown with specific examples on maximum likelihood and least-squared methods in one and more then one dimension. Uncertainties in physics and error propagation are also discussed.

Ivica Puljak

Lecture 5

Statistical tests
Hypothesis testing is introduced with examples of goodness-of-fit tests and the most recent examples from high energy physics. Particular emphasis is given on the p-values and when we claim the discoveries.

Ivica Puljak

Exercise 1

Introduction to ROOT

  • Basic and advanced ROOT examples

  • Visualisation of Data with ROOT

Ivica Puljak

Exercise 2

Monte Carlo method

  • Generating random numbers

  • Monte-Carlo toy experiments

 

Exercise 3

Fitting with ROOT

  • Modeling signal and background.
  • Fitting with ROOT packages (finding peaks).
 

Exercise 4

Confidence interval

  • Finding errors on fit parameters

  • Extracting confidence intervals

 

Exercise 5

Hypothesis testing

  • Finding p-value

  • Converting p-values to significance

  • Low count experiments and hypothesis testing

 

Prerequisite

and

References

Desirable Prerequisite

Spend few minutes to familiarize yourself with following concepts:
  • Data Analysis http://en.wikipedia.org/wiki/Data analysis

  • Monte Carlo method http://en.wikipedia.org/wiki/Monte Carlo method

  • Least squares fitting http://en.wikipedia.org/wiki/Least squares

  • Experimental errors http://en.wikipedia.org/wiki/Observational error

References

Outline was prepared based on references:

  • roo08] (ROOT manual and tutorials),

  • [Hoc07] (multivariate analysis and data visualization),

  • [Lyo92, Cow98, Siv00] (data analysis textbooks),

  • [D’A99, Jam00] (confidence limits vs. Bayesian).

In addition some ideas are proposed to be taken from:

  • [HL08] (MC simulation and environment for exercises),

Bibliography

  • G. Cowan

    • Statistical Data Analysis.

    • Oxford University Press, 1998.

  • G. D’Agostini

    • Bayesian Reasoning in High-Energy Physics: Principles and Applications.

    • Technical report, CERN-99-03, 1999.

  • A. Heikkinen and M. Liendl

  • A. Hocker.

    • TMVA - Toolkit for Multivariate Data Analysis.

    • CERN-OPEN-2007-007, 2007.

    • A. Heikkinen: Data Analysis with ROOT C / [arXiv: physics/0703039].

  • F. James

    • Workshop on Confidence Limits.

    • Technical report, CERN-2000-005, 2000.

  • L. Lyons

    • Statistics for nuclear and particle physicists.

    • Cambridge University Press, 1992.

  • ROOT 5.21 Users Guide, October 2008.

  • D. S. Sivia

    • Data Analysis: a Bayesian Tutorial.

    • Oxford University Press, 2000