Data Technologies Theme
Coordinator: Alberto Pace, CERN |
This series of lectures addresses the broad domain of data storage and management technologies. It starts by setting the scene and surveying the various data storage media. Then, the series describes possible data storage architectures and the associated software solutions. Focusing on Large Data Centres, it addresses the issues of heating and power consumption. This is followed by a description of storage models and addresses data management issues and their supporting techniques and tools. Finally, the series focuses on reliability and performance of modern Data storage systems. In the course of the series, elements of computer security an authentication that are relevant to data management are also presented. The series of lectures is complemented by 5 hours of practical exercises on aspects such as Performance Tuning and Peer-to-Peer storage. |
Data Technologies |
||
Session |
Description |
Lecturer |
Lecture 1 |
Setting the scene: Storage technologies The lecture presents the various Storage Models, and the supporting management techniques including Name Servers and interfaces for Data Management.
Storage Reliability and performance
The lecture will and discuss the various solution to ensure
long data preservation and reliability
with the consequences on
performance, including when using
Peer
to Peer Storage and data transfers. |
|
Lecture 2 Lecture 3 |
Cryptography, authentication authorization and accounting These lectures give elements of computer security that are relevant to data management. The lectures addresses the various technologies used in data storage systems to ensure data encryption, integrity, confidentiality and access control |
|
Lecture 4 |
Additional component for Data Replication, Caching, Monitoring, Alarms and Quota This lecture describes the various possible technologies used to implement data workflows and complex data transfer processes. It also discusses problems with data caching and Garbage Collection to conclude on monitoring and quota enforcement. |
|
Lecture 5 |
Security in different phases of software development The second lecture addresses the following question: how to create secure software? It introduces the main security principles (like least-privilege, or defense-in-depth) and discusses security in different phases of the software development cycle. The emphasis is put on the implementation part: most common pitfalls and security bugs are listed, followed by advice on best practice for security development. |
|
Exercise 1 Exercise 2 Exercise 3 Exercise 4 Exercise 5 |
The first part of hands-on exercises aims to improve understanding of basic parameters in IO systems: - network and media latency - access patterns - OS caching - bottlenecks and optimization strategies for local and remote data access.
Few essential Linux tools will be introduced to monitor and measure IO performance avoiding bias introduced by OS caching. Students will experience and measure the impact of latency and access patterns on IO performance.
The second part covers the concept of parallelism and redundancy in storage system. We will apply the technology of Cloud storage systems to store and retrieve files in our local desktop cluster using a distributed hash table to locate files or file fragments and a REST interface to do GET, PUT or DELETE operations on these.
The exercises conclude with the implementation and performance tuning of a RAID verification algorithm. |
|
Prerequisite and References |
Desirable Prerequisite
Ability
to develop simple programs, basic understanding of
networking technologies. |