iCSC2006 Computational Intelligence for HEP Data Analysis

Details of all lectures

 

   

Monday 6 March

 

09:30 - 10:00

 

Introduction

This introduction will present a brief overview of the main challenges of the Data Analysis in HEP and how Computational Intelligence methods can help in addressing these challenges. It will provide the general background that will allow the audience to put in the context the specific information presented in the lectures of the series.

 

The introduction together with the whole series of lectures target both particle physicists and computer scientists.  They are meant as an inspiration and encouragement for particle physicists to explore new algorithms and as an invitation for computer scientists to propose other powerful algorithms developed in their field.

 

Liliana Teodorescu

Feature Selection and Statistical Learning Basics

10:00 - 11:00 Lecture 1

Feature Selection and Statistical Learning Basics

 

Anselm Vossen

Selecting meaningful features is as important for the success of a classification or regression task as the intelligent system used for the actual work. This lecture shows you important aspects in feature selection and introduces the basics of classification with the Bayesian theorem. This is done while walking through the data analysis process, from the initial features to the classification decision.


The lecture targets physicists and computer scientists alike and lays some of the groundwork for the following lectures.

 

Motivation
- The Data Analysis Process

- Why Attribute Selection and Data Preparation is important

 

Strategies for Feature Selection

- Reducing the Dimensionality of the feature space with the principal Component Analysis (Karhunen-Loeve Transformation)

- Measuring the Importance of a Feature with Information Gain and Significance Measures1

 

Statistical Learning Basics

- Optimal classification: Bayes theorem

- Bayes Classifier

- Evaluating the performance of a classifier

 

Basic Machine Learning Algorithms

   

Monday 6 March

 
11:30 12:25 Lecture 2 Basic Machine Learning Algorithms

 

Jaroslaw Prybyszewski

This lecture will present a set of fundamental Machine Learning algorithms and software tools for their easy application to data analysis. The lecture targets both physicists and computer scientists. It will rely on some of the background information presented in the Future Selection and Statistical Learning Basics lecture of this series.

 

The lecture will address the following issues:

 

What are and how to build decision trees?

Short description of the idea, method of choosing correct test, advantages and

disadvantages of decision trees.

 

Random forest as a variation of decision tree algorithm

Why and when random forest are better than a single tree. What do we loose in

comparison to decision trees?

 

Main ideas of "lazy learning"

Nearest neighbour algorithm and its generalization - kNN as examples of lazy

algorithms. They provide very good results in classification with minimum effort.


Grouping algorithms - when to use them

Presentation of cobweb algorithm.

 

R-language - open source environment for development and testing

Some of the algorithms described in the presentation will be presented in
R-environment.

 

Neural Networks

   

Monday 6 March

 
14:00 - 14:55 Lecture 3 Neural Networks

 

Liliana Teodorescu

This lecture will present the fundamentals of the Artificial Neural Networks and examples of their applications in HEP data analysis. The examples will show the development cycle of these algorithms in HEP: a slow and late start followed by  a gradually increasing presence and acceptance in the physicists’ community.

 

The lecture targets both physicists and computer scientists interested in algorithms for data analysis.

 

A minimal general background in particle physics data analysis techniques is sufficient for understanding the topic. No a priory knowledge on Artificial Neural Networks is required.

 

Introduction 

- Biological Neural Networks

- Artificial Neural Networks (NN)

 

Basics of NN

- Artificial neuron

- Percepton

- Classification of NN

 

Operation of NN

- Learning types and rules

- Learning and testing

 

Examples of NN

- Feed-forward NN

- Recurrent NN

- Functional NN

 

Performance Issues

- Performance factors and measures

- Analysis of performance

 

Examples of NN in HEP

- NN triggers

- NN  for offline data analysis applications

 

Pro's and Con's  NN in HEP

 

Support Vector Machines

   

Monday 6 March

 
15:05 16:00 Lecture 4 Support Vector Machines

 

Anselm Vossen

Support Vector Machines (SVMs) are advanced algorithms for classification and regression, that are conceptually easy to understand. Recently SVMs gained increased popularity due to state-of-the-art performance paired with a good mathematical understanding which enables users to choose arbitrary complex classification or regression functions without over-fitting the data. A technique known as structural risk minimization.

 

The lecture targets computer scientists interested in state of the art
pattern recognition algorithms. It will also be interesting for physicists interested in making these algorithms working for them.

 

For some of the intricacies a basic knowledge of linear algebra and statistics will be helpful. Additionally, this lecture will use the vocabulary introduced in the preceding ones, especially "Feature Selection and Classification Basics".

 

The Linear Classifier

- Toy Example: Separating points on a plane

- Optimal Margin and Support Vectors

 

Structural Risk Minimization

- Short (and incomplete) Introduction to Vapnik-Chervonenkis (VC) Theory

- Finding a balance between fitting and overfitting the data

- How to incorporate this into the linear classifier

 

Kernel Methods

-The "Kernel Trick": Mapping the data into a convenient higher
dimensional space with little computing overhead

- Using the "Kernel Trick" to extend linear algorithms to nonlinear ones

 

Support Vector Machines

- Putting everything together to build powerful classification and
regression algorithms

- SVM Libraries: how to use SVMs in your code

 

Evolutionary Computation

   

Monday 6 March

 
16:30 - 17:25 Lecture 5

Evolutionary Computation

 

Liliana Teodorescu

This lecture will present the fundamentals of the Evolutionary Computation and of the main types of evolutionary algorithms. A survey of the   applications of these algorithms  in HEP data analysis will also be presented, illustrating the early development phase of an emerging  technique in HEP data analysis.

 

 A new evolutionary algorithm, Gene Expression Programming, and its first application to HEP data analysis will also be presented.

 

The lecture targets both physicists and computer scientists interested in algorithms for data analysis.

 

A minimal general background in particle physics data analysis techniques is sufficient  for understanding the topic. No a priory knowledge on Evolutionary Computation is required.

 

Introduction

- Natural evolution

- Simulation of the natural evolution on a computer

- Specific terminology

 

Structure of an evolutionary algorithm

- Problem representation (encoding solutions)

- Fitness functions

- Genetic operators

- Termination conditions

 

Types of evolutionary algorithms: Genetic Algorithms, Genetic Programming

- Problem representation for each type of algorithm

- Genetic variation in each type of algorithm

- Comparison of the different types of algorithms

- Applications in HEP data analysis

 

New development in Evolutionary Computation: Gene Expression Programming

- Problem representation

- Genetic variation

- First application of Gene Expression Programming to HEP data analysis

 

Recommendations on when to use Evolutionary Algorithms