Abstract:
In this lecture, I present the course, its goals, the textbook,
etc... Then I present the case-study we will be using during the
course. It is important to get familiar with it as early as
possible.
Download the source here.
Reading: Chapter 1 and Sections 2.1.
Slides: Introduction to Parallel Computing, Model-Checker Case-Study, and Parallel Programming Platforms.
Exercises: During this first exercise session, you are expected to refresh your skills in C. Please look at these C introduction slides and this web page on Pointers and memory on your own before the lecture. Exercises:
void matrix_mult(const int* a, const int* b, int* c, int n)
Remarks:
int A[4] = { 1,0,0,1 }
for the declaration
+ initialization. If you insist using something like int
A[2][2] = { 1,0,0,1 }
then you have to use the cast
(int*)A
when calling your function with
A
as argument.-Wall
option.Abstract: I start the chapter on parallel programming platforms that will serve mostly as an introduction to parallelism from the hardware aspects. I introduce the architecture of an ideal parallel computer, known as the PRAM model. I use extra material from Sven Skyum to present Brent's scheduling principle and go through examples on sorting and merging.
Reading:
Sections 2.2, 2.3, and 2.4.1.
Sven
Skyum's lecture notes
(typo p6, read "O(n/lg n) processors" instead of "O(nlg n) processors"
typo p9, read "Time O(log(n + m))" instead of "Time O(1)").
Slides: Parallel Programming Platforms and PRAM Model.
Exercises:
Exercises 2.1 (design only, implementation is optional), 2.2, and
2.3. This completes assignment 1.
Remark: The exercises are
not very precise and there are degrees of freedom. It is very good
that you see these imprecisions. To solve them, you fill in the
gaps with your own assumption, e.g., assume a[i] is one word
etc... You may come up with different answers depending on your
assumptions, which is fine.
Assignment 1: Exercises of lecture 1 and lecture 2. All exercises are compulsory unless stated otherwise. Deadline February 27th.
Abstract: I treat the part on physical organization of parallel platforms and show you different ways to parallelize our little model-checker. From now on we'll alternate between theory and practice.
Reading: Sections 2.4.
Slides: Physical Organization of Parallel Platforms, Parallelizing the Model-Checker, and
Exercises:
Remarks: The exercises related to the case-study will be used for further assignments. Please do them carefully.
Abstract: I give you some help to use the API of the pthread library and I treat Pthreads.
Reading: Chapter 7.
Slides: Help on the API and Programming Shared Address Space Platforms.
Exercises:
Assignment 2: Exercises of lecture 3 and lecture 4. Deadline March 10th.
Remarks: The exercises related to the case-study will be used for further assignments. Please do them carefully. In particular, you will have to use the library.
Abstract: I discuss different communication costs (message passing and shared adress space), the impact of process-processor mapping, and start on parallel algorithm design (decomposition techniques of apparently sequential algorithm into parallel algorithms).
Reading: Sections 2.5, 2.6, 2.7, and 3.1.
Slides: Communication Costs, Process-Processor Mapping, and Principles of Parallel Algorithm Design.
Assignment 3: Implement incrementally 4 of the 5 solutions presented in lecture 3.
Abstract: I continue on parallel algorithm design, treating characteristics of tasks and their interactions and mapping techniques. The point is to use techniques to minimize interaction.
Reading: Chapter 3.
Slides: Principles of Parallel Algorithm Design and Principles of Parallel Algorithm Design (cont.).
Exercises: Continue on assignment 3.
Abstract: I start on MPI.
Reading: Sections 6.1, 6.2, 6.3, 6.4, and 6.5.
Slides: MPI.
Exercises: Continue on assignment 3. Extra time granted: Deadline March 25th. The 24th is a holiday so you get until the 25th.
Remarks:
The API of the library (utils.pthread.h
) offers
everything you could possibly need for the assignments. It does
not mean that you must use everything. In particular, you
can do assignment 3 without using the
local_queue_t
type and its shared version. This type
is provided here for those of you who want a challenge and
implement a multi-threaded solution with non-blocking calls. Sorry
for the confusion if you felt that you had to use this type.
Abstract: I finish MPI and I discuss problems and solutions related to parallelizing our model-checker using MPI. This is to prepare assignment 5. I'll also give you some help on the API of the MPI library.
Reading: Chapter 6 and section 11.4.4.
Slides: MPI, MPI in our Model-Checker, and MPI API.
Exercises: 3.2, 3.7, and 3.21. [Optional: 2.24, 3.11, 3.12, 3.13]
Solutions: solutions for the optional exercises.
Abstract: I start chapter 4. It's about how to implement collective communication operations that we saw during the lectures on MPI. There are interesting algorithms for that.
Reading: Sections 4.1, 4.2, 4.3, and 4.4.
Slides: Basic Communication Operations.
Exercise: 4.5.
Assignment 4: Exercises of lectures 8 and 9. Deadline April 9th.
Abstract: I finish chapter 4 and I summarize and complete on thread synchronization.
Reading: Sections 4.5, 4.6, and 4.7. Tutorial page of LAM/MPI.
Slides: Basic Communication Operations and Thread Synchronization.
Assignment 5: This assignment is split on 3 exercise sessions (that do not necessarily map to the following 3 sections for the exercises). There is very little coding to be made but on the other side, you need to read and analyse quite some code. Questions:
modelcheck.mpi.c
.Note: Progress is not implemented at this time. You should not add it. You can test MPI with the following hellompi simple program.
Abstract: I treat chapter 5. This chapter looks scary but it's not! It's actually very interesting.
Reading: Chapter 5.
Slides: Analytical Modeling.
Exercises: Work on assignment 5.
Abstract: I will discuss different issues in sorting on parallel computers, present some well-known (sequential) sorting algorithms and their parallel versions.
Reading: Chapter 9.
Slides: Sorting.
Exercises: Work on assignment 5.
Abstract: I go through dense matrix algorithms for matrix*vector and matrix*matrix multiplications. I give the idea of parallel versions of Gauss' elimination algorithm.
Reading: Chapter 8.
Slides: Dense Matrix Algorithms.
Exercises: This exercise is not part of assignment 5 but you should try it anyway.
gcc -o psort psort.c -O2 -Wall -lpthread
This is a Linux only program. It makes sense to run it only on
a multi-core/CPU machine.
p_merge
or radix sort (or both
if you want).Abstract: I treat of some parallel algorithms for dense graphs.
Reading: Sections 10.1 to 10.6.
Slides: Graph Algorithms.
Abstract: I will talk about the last 2 exercises during the first 30 min and then Gerd will take over. Gerd will talk about the GRID and his work on a distributed file server that is needed for the famous LHC.
Slides: To appear.
Exercises: Finish your completions if any.