CSC

Computer Architecture and Performance Tuning
Session	Description	Lecturer
Lecture 1	Understanding scalable hardware The first part of this double lecture describes the hardware architecture of a modern PC server with processors based on the Intel Core micro-architecture. Other processor architectures, such as ARM, will also be mentioned. Acceleration opportunities (but also bottlenecks) in the architecture will be covered in detail, not just inside the processor, but also related to the memory hierarchy. The aim is to give each student a good understanding of what resources are available from a hardware viewpoint.	Sverre Jarp
Lecture 2	Software that scales with the hardware In the second part of this double lecture we will discuss several strategies which can allow software to scale to the maximum resource potential in a given architecture. These strategies are based on both data and task parallelism. We will stress the importance of a Data Oriented Design and also mention the issue of “performance portability” across platforms. Some important factors related to programming styles will be reviewed. To back up everything with evidence, several scalable examples from physics will be portrayed.	Sverre Jarp
Lecture 3	Key aspects of multi-threading The vast majority of modern micro-processors come with two to several dozen computing cores, opening up new possibilities but also creating some significant challenges. This major shift in hardware has already been underway many years ago, but the software world is still struggling to take full benefit of the new features. This lecture goes into the details of key choices and compromises associated with threaded programming and scalability. New programming paradigms are demonstrated alongside real world technologies that can be used for implementations.	Andrzej Nowak
Lecture 4	Performance Optimization Considering the rise of many-core processors, performance tuning has become an even more important step in software development. Modern processor architectures often give us the benefit of being able to look inside the application from various angles, however drawing high-level conclusions is not always straightforward. The objective of this lecture is to familiarize the attendees with the topic of performance optimization “where it matters” and with common techniques used to define and improve application efficiency. Language independent performance tools for Linux will be demonstrated, in order to obtain information about program characteristics and bottlenecks.	Andrzej Nowak
Exercise 1 Exercise 2 Exercise 3	The aim of the exercises in this series is to give the attendees a practical introduction to performance oriented programming on Linux. Advanced tools will be used during the course, enabling the participants to discover how the interaction of the code and the hardware influences performance. The participants will also be given the task of correlating performance figures with certain programming decisions. In addition, the participants will understand the limits of performance optimization and the ways to establish at which point inside those limits their workload is placed. The exercises will be supported by demonstrating real world problems in production environments, including multi-threaded examples.	Sverre Jarp Andrzej Nowak
Prerequisite and References	Desirable Prerequisite Basics of modern computer architecture Basic knowledge about compilers Familiarity with Linux and the C/C++ programming languages