Schedule

See Calendar

Lecture	Title	Date	Time	Room
1	Introduction to Parallel Computing Parallel Programming Platforms	~~7/2/06~~	12h30-16h15	A4-108
2	Physical Organization of Parallel Platforms The PRAM Model	~~14/2/06~~	12h30-16h15	A4-108
3	Communication Costs in Parallel Machines Process-Processor Mapping	~~17/2/06~~	12h30-16h15	A4-108
4	Process-Processor Mapping cont. Preliminaries to Parallel Algorithm Design Decomposition Techniques and Tasks	~~21/2/06~~	12h30-16h15	A4-108
5	Mapping Techniques for Load Balancing Methods for Containing Interaction Overheads	~~24/2/06~~	12h30-16h15	A4-108
6	Parallel Algorithm Models Basic Communication Operations	~~28/2/06~~	12h30-16h15	A4-108
7	Basic Communication Operations (cont.)	~~7/3/06~~	12h30-16h15	A4-108
8	Analytical Modeling of Parallel Programs	~~10/3/06~~	12h30-16h15	A4-108
9	Programming Using the Message Passing Paradigm	~~14/3/06~~	12h30-16h15	A4-108
10	MPI (cont.) Programming Shared Address Space Platforms	~~21/3/06~~	12h30-16h15	A4-108
11	POSIX Threads (cont.)	~~28/3/06~~	12h30-16h15	A4-108
12	Dense Matrix Algorithms	~~4/4/06~~	12h30-16h15	A4-108
13	Sorting	~~21/4/06~~	12h30-16h15	A4-108
14	Graph Algorithms	~~28/4/06~~	12h30-16h15	A4-108
15	Search Algorithms for Discrete Optimization Problems	~~2/5/06~~	12h30-16h15	A4-108

Lecture 1: Introduction to Parallel Computing, Parallel Programming Platforms.

Abstract: In the first lecture, I present the course, its goals, the textbook, etc... Then I start the course with an introduction to parallelism, motivating parallelism and presenting its scope. In the second lecture I dig into the chapter of parallel programming platforms that will serve mostly as an introduction to parallelism from the hardware aspects.

Reading: Chapter 1, Sections 2.1, 2.2, and 2.3.

Material: Introduction to Parallel Computing slides / notes and Parallel Programming Platforms slides / notes.

Exercises:

Study the performance of the different copy implementations in this matrix copy example. Get familiar with gcc, use the manual (man gcc), and search how to profile code.
The example has been updated.
Check out the slides on introduction to C made by myself last year.
Check out the slides on code optimization made by Emmanuel Fleury last year.

Lecture 2: Physical Organization of Parallel Platforms, The PRAM Model.

Abstract: In these lectures I introduce the architecture of an ideal parallel computer, known as the PRAM model. I use extra material from Sven Skyum to present Brent's scheduling principle and go through examples on sorting and merging. I complete on the networked connection of parallel computers.

Reading: Skyum's lecture notes (see material) and Section 2.4.

Material: Introduction to Parallel Algorithms, Sven Skyum (BRICS Lecture Series).
Physical Organization of Parallel Platforms slides / notes.

Exercises: 2.2, 2.3, 2.1, 2.10.

Solutions: See bandwidth.c and cache.c as suggestions to 2.1. Answers to 2.2, 2.3, and 2.10.

Lecture 3: Communication Costs in Parallel Machines, Process-Processor Mapping.

Abstract: I finish to treat the different network topologies and explain briefly cache coherence protocols. I discuss different communication costs (message passing and shared adress space), and start to discuss the impact of process-processor mapping.

Reading: Sections 2.5, 2.6, and 2.7.

Material: Communication costs in parallel machines slides / notes and process-processor mapping slides / notes.

Exercises: 2.16 and 2.17, and check exercises around to see other kinds of topologies. You can try other exercises if you wish.

Solutions: Suggested answers.

Lecture 4: Process-Processor Mapping cont. Preliminaries to Parallel Algorithm Design, Decomposition Techniques and Tasks.

Abstract: I finish to discuss different technique for mapping process to processors. I introduce you to parallel algorithm design to start to make you think parallel. Then I present different decomposition techniques of apparently sequential algorithm into parallel algorithms. I discuss characteristics of tasks and their interactions.

Reading: Sections 2.7, 3.1, and 3.2.

Material: Process-processor mapping slides / notes and Introduction to Parallel Algorithm Design slides / notes.

Exercises: 2.24, 3.2, 3.4, 3.6, 3.7, and 3.8.

Solutions: Suggested answers.

Lecture 5: Mapping Techniques for Load Balancing, Methods for Containing Interaction Overheads.

Abstract: In these lectures I present different techniques for load balancing: the two families of load balancing algorithms (static and dynamic). Then I will discuss how to contain interaction overheads between parallel tasks.

Reading: Sections 3.3, 3.4, and 3.5.
The comments in algorithm 3.3 may be difficult to read. A[k+1:n,k] means all the elements A[i,k] with k+1 ≤ i ≤ n. Similarly, A[k,k:n] means all the elements A[k,j] with k ≤ j ≤ n.

Material: slides / notes.

Exercises: 3.11, 3.12, 3.13, 3.15, 3.17, and 3.21.

Solutions: Suggested answers.

Lecture 6: Parallel Algorithm Models, Basic Communication Operations.

Abstract: In these lectures I present the different models of parallel algorithms and I start to present the basic communication operations (finish in the next lecture).

Reading: Sections 3.6, 4.1, 4.2, and 4.3.

Material: slides / notes.

Exercises: 4.1, 4.3, and 4.5. You can try other exercises and also think about the differences between the sum-prefix of section 4.3 and the one I presented in lecture 2.

Solutions: Suggested answers.

Lecture 7: Basic Communication Operations (cont.).

Abstract: I finish to treat chapter 4.

Reading: Sections 4.4, 4.5, 4.6, and 4.7.

Material: slides / notes.

Exercises: 4.9, 4.11, 4.14, 4.15, complete slide 32, 4.22, 4.25, and 4.26.

Solutions: Slide 32 has been updated and is now complete. Suggested answers to the exercises.

Lecture 8: Analytical Modeling of Parallel Programs.

Abstract: I treat of chapter 5 with some material on Nick's complexity class.

Reading: Sections 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, and 5.7.

Material: slides / notes. Wikipedia and slides on Nick's complexity class.

Exercises: 5.1, 5.2, 5.3, and 5.9. If you have time you can try 5.4 and 5.5.

Solutions: Suggested answers.

Lecture 9: Programming Using the Message Passing Paradigm.

Abstract: These lectures treat of MPI (Message Passing Interface), a widely used standard for message passing in parallel computing.

Reading: Sections 6.1, 6.2, 6.3, 6.4, and 6.5.

Material: slides / notes. Check the LAM/MPI homepage. The hello.c program example.

Exercises: 6.2, 6.3, and programming exercises. You will need the mpi-sample.c C file. Note: To run MPI on different machines, be sure you are running the same LAM on the different machines and that you have compiled your program with the right version of mpicc.

Solutions: Suggested answers for exercises 6.2 and 6.3. As I mentioned during the exercise session and lecture 10, there is a problem is you try to replace the send and the receive calls by one call to sendrecv. If you try to replace the calls, you will obtain something like this. Apart from the fact that in this particular case you should use MPI_Sendrecv_replace instead of MPI_Sendrecv, there is a problem with the first iteration of the loop because you have garbage (shown as -1) to send. That also causes problems later. The point was to find the problems and see that you can't just replace send and recv but you have to think a little more. I said "use", I didn't say what would happen. :)

Lecture 10: MPI (cont.), Programming Shared Address Space Platforms.

Abstract: I will first finish the chapter on MPI, treating of collective communication, and then I will begin to present threads, in particular POSIX threads (pthreads).

Reading: Sections 6.6, 6.7, 7.1, 7.2, 7.3, and 7.4.

Material: MPI slides / notes. Pthread slides / notes.

Exercises: Since we switched the exercise and lecture sessions, you continue the programming exercises from lecture 9.

Lecture 11: POSIX Threads (cont.).

Abstract: I will finish chapter 7 on threads and talk a bit about OpenMP.

Reading: Sections 7.5, 7.6, 7.7, 7.8, 7.9, and 7.10.

Material: Pthreads slides / notes. Please ask the secretary for copying the chapter 8 of the book I mentioned.

Exercises: This time you choose what you would like to implement. You may continue on MPI or try any of the pthread implementation exercise in the book. Since implementing takes time I prefer that you choose what you want to implement. You can also implement the well-known dining philosophers problem with pthreads.

Lecture 12: Dense Matrix Algorithms.

Abstract: In these lectures, I will present two simple parallel matrix algorithms (matrix-vector and matrix-matrix multiplications). I will talk about equation solving as well but I will not cover everything in full details.

Reading: Sections 8.1 and 8.2.

Material: Complement on monitors slides / notes. Dense matrix algorithms slides / notes.

Exercises: Continue on programming with pthreads (or MPI if you did so).

Lecture 13: Sorting.

Abstract: I will discuss different issues in sorting on parallel computers, present some well-known (sequential) sorting algorithms and their parallel versions.

Reading: Chapter 9. Now is also a good time to re-read Introduction to Parallel Algorithms by Sven Skyum (BRICS Lecture Series) presented in lecture 2. Please check the slides in order not to be confused by some inconsistencies in the book. Slide 27 has elements 3 and 5 swapped but this is after the first iteration (the initial array corresponds to the one in the book).

Material: Sorting slides / notes.

Exercises: Since there are only 3 lectures left, I think you should keep practicing to learn more about parallel programs. This time you should finish your previous programs using MPI or pthreads as you began. For those who have already finished, try the other paradigm (pthread instead of MPI or vice-versa to try both). Now, also depending on what you achieved you should try to implement prefix-sum with pthreads, which will be useful if you want to implement parallel quicksort next time.

Lecture 14: Graph Algorithms.

Abstract: I will present some well-known graph algorithms and their parallelizations (Prim's, Dikjkstra's, Floyd's).

Reading: Sections 10.1, 10.2, 10.3, 10.4, 10.5, and 10.6.

Material: Graph algorithms slides / notes.

Exercises: Continue your programming exercises and try to implement one algorithm presented in the previous lecture. Alternatively, here are some pthread examples that were available on the site of the textbook. You can try them, fix them, and explain them to me.

Lecture 15: Search Algorithms for Discrete Optimization Problems.

Abstract: In these lectures I will recall sequential search algorithms and present parallel versions of them (depth-first and best-first). I will not cover the whole chapter in details.

Reading: Chapter 11.

Material: Search algorithms slides / notes.

Exercises: This is the last exercise session so use it to ask questions about the course and finish your exercises. You can experiment with the programs here (and fix some of them). The programs should compile (except readWriteLocks.c and sparseMatVec.c that are using non standard timing functions).