General
About CSC
Organisation

People
Process for CSC hosting
School Models
Role of Local Organisers
Other Roles

Participants
Past Schools

2004 2005 2006 2007 2008 2009 2010 2011

Diploma at CSC
Sport at CSC
Inverted CSCs

iCSC05 iCSC06 iCSC08 iCSC10 iCSC11

Special schools

School@chep06

Inverted School 2008

3-5 March 2008

CSC2008

CSC2008 Overview

Practical Information

Programme

Schedule

Lecturers

Participants

Organisers

 
Examination results
How to apply
 

CSC-Live

inverted CERN School of Computing 2008 3-5-March 2008, CERN

Programme Overview

Towards Reconfigurable High-Performance Computing

Special Topics

Schedule

Lecturers

Lecturer Bios

Printable Version

Overview of advanced aspects of data analysis software and techniques

   

Wednesday 5 march

 
09:00 - 09:55 Lecture 1

Overview of advanced aspects of data analysis software and techniques

 

Alfio Lazzaro

Summary
In this lecture we give an overview of the advanced data analysis techniques based on multivariate techniques, which are recently used in many High Energy Physics data analysis. The topic is relevant to many Particle Physics analysis, as well as in several other fields.  We will give an overview on the different techniques and their relative merits.

Audience
This lecture targets an audience with experience in data analysis, in particular interested in techniques of signal/background discrimination

Pre-requisite
This lecture can be reasonably followed without having attended to the other lecturers of this school

Keywords

  • Data analysis

  • Parallel processing

  • Signal Background Separation

  • Maximum Likelihood

  • Artificial Neural Network

  • Decision Tree

Details

In the past years, many advanced techniques in statistical data analysis have been used in High Energy Physics (such as maximum likelihood fits, Neural Networks, and Decision Trees). In the past, the most common technique was the simple cut and count analysis. This technique consists in the following steps: several cuts are applied on well studied discriminating variables, background estimation is performed using Monte Carlo simulation samples or events outside the signal region, and then the final measurement is done counting the events after cuts minus the estimated background events.

 

This simple technique is hampered by its  low efficiency (defined as ratio between the number events after and before the cuts) and does not provide a good discrimination between signal and background events. For this reason it was replaced by more sophisticated techniques, such as the multivariate maximum likelihood for the measurements done at the BaBar experiment, running at Stanford Linear Accelerator Center (SLAC) in California.

 

The maximum likelihood (ML) technique permits to achieve higher efficiency, the possibility to take in account errors with a better precisions, and consider correlations between the discriminating variables used in the analysis. Anyway, in future experiments, like LHC experiments at CERN, it may be crucial to have better discrimination between signal and background events to discover new phenomenas, which suffer higher background. Neural Networks and Decision Trees are good techniques to reach this goal. Another important issue to take into account lies in the fact  that these techniques are in most cases very CPU-time consuming. It is possible to speed them up using concepts of High Performance Computing (HPC).

 

In this lecture we will give an overview of the advanced data analysis techniques mentioned above, introducing some software packages commonly used in HEP. This will be preceded by a short session at the end of the previous theme,  giving briefly examples of possible HPC optimizations.

 

iCSC
All on iCSCs
News
Registration
Handouts
Programme

Reconfigurable high Performance Computing

Lecture1

Lecture2

Lecture3

Lecture4

Lecture5

Lecture6

Lecture7

Lecture8

Lecture9

Lecture10

All lectures

Special topics

Lecture1

Lecture2

All lectures

FAQ

Feedback: Computing (dot) School (at) cern (dot) ch
Last update: Thursday, 14. November 2013 11:50

Copyright CERN