CERN School of Computing 2012
13-24 August, Uppsala, Sweden

 Computer Architecture and Performance Tuning




Lecture 1

Understanding scalable hardware
The first part of this double lecture describes the hardware architecture of a modern PC server with processors based on the Intel Core micro-architecture. Other processor architectures, such as ARM, will also be mentioned. Acceleration opportunities (but also bottlenecks) in the architecture will be covered in detail, not just inside the processor, but also related to the memory hierarchy. The aim is to give each student a good understanding of what resources are available from a hardware viewpoint.

Sverre Jarp

Lecture 2

Software that scales with the hardware

In the second part of this double lecture we will discuss several strategies which can allow software to scale to the maximum resource potential in a given architecture. These strategies are based on both data and task parallelism. We will stress the importance of a Data Oriented Design and also mention the issue of “performance portability” across platforms. Some important factors related to programming styles will be reviewed. To back up everything with evidence, several scalable examples from physics will be portrayed.

Sverre Jarp

Lecture 3

Key aspects of multi-threading
The vast majority of modern micro-processors come with two to several dozen computing cores, opening up new possibilities but also creating some significant challenges. This major shift in hardware has already been underway many years ago, but the software world is still struggling to take full benefit of the new features. This lecture goes into the details of key choices and compromises associated with threaded programming and scalability. New programming paradigms are demonstrated alongside real world technologies that can be used for implementations.

Andrzej Nowak

Lecture 4

Performance Optimization
Considering the rise of many-core processors, performance tuning has become an even more important step in software development. Modern processor architectures often give us the benefit of being able to look inside the application from various angles, however drawing high-level conclusions is not always straightforward. The objective of this lecture is to familiarize the attendees with the topic of performance optimization “where it matters” and with common techniques used to define and improve application efficiency. Language independent performance tools for Linux will be demonstrated, in order to obtain information about program characteristics and bottlenecks.

Andrzej Nowak

Exercise 1

Exercise 2

Exercise 3

The aim of the exercises in this series is to give the attendees a practical introduction to performance oriented programming on Linux. Advanced tools will be used during the course, enabling the participants to discover how the interaction of the code and the hardware influences performance. The participants will also be given the task of correlating performance figures with certain programming decisions. In addition, the participants will understand the limits of performance optimization and the ways to establish at which point inside those limits their workload is placed. The exercises will be supported by demonstrating real world problems in production environments, including multi-threaded examples.

Sverre Jarp
Andrzej Nowak




Desirable Prerequisite

  • Basics of modern computer architecture

  • Basic knowledge about compilers

  • Familiarity with Linux and the C/C++ programming languages


Copyright CERN

Print version