Schedule

See Calendar

Lecture	Title	Date	Time	Room
1	Introduction to Parallel Computing Parallel Programming Platforms	~~Fr 16/2/07~~	12h30-16h15	B3-104
2	Introduction to the Case-Study The PRAM Model Physical Organization of Parallel Platforms	~~On 21/2/07~~	8h15-12h	A4-108
3	Physical Organization of Parallel Platforms cont., Programming Shared Address Space Platforms	~~On 28/2/07~~	8h15-12h	A4-108
4	POSIX threads cont. Hints for assignment 3	~~Fr 2/3/07~~	8h15-12h	B3-104
5	Communication Costs in Parallel Machines Process-Processor Mapping Preliminaries to Parallel Algorithm Design	~~On 7/3/07~~	8h15-12h	4.110 KS3
6	Decomposition Techniques and Tasks Mapping Techniques for Load Balancing Methods for Containing Interaction Overheads Parallel Algorithm Models	~~Fr 9/3/07~~	8h15-12h	B3-104
7	More help for assignment 3 Programming Using the Message Passing Paradigm	~~On 14/3/07~~	8h15-12h	A4-108
8	MPI cont. MPI in our model-checker	~~Fr 16/3/07~~	8h15-12h	B3-104
9	Basic Communication Operations	~~On 21/3/07~~	8h15-12h	A4-108
10	Communication Operations cont.	~~Fr 23/3/07~~	8h15-12h	B3-104
11	Analytical Modeling of Parallel Programs	~~On 28/3/07~~	8h15-12h	4.112 KS3
12	Sorting	~~Fr 30/3/07~~	8h15-12h	B3-104
13	Clusters & GRID (Henrik)	~~On 18/4/07~~	8h15-12h	A4-108
14	Dense Matrix Algorithms	~~On 25/4/07~~	8h15-12h	A4-108
15	Graph Algorithms	~~Fr 27/4/07~~	8h15-12h	B3-104

MVP6: 4.110 at Kroghstræde 3.
MVP11: 4.112 at Kroghstræde 3.
The schedule is preliminary and will be changed.
Since the exercises count in your assignments, no solution will be provided unless you all finish them.

Lecture 1: Introduction to Parallel Computing, Parallel Programming Platforms.

Abstract: In this lecture, I present the course, its goals, the textbook, etc... Then I start the course with an introduction to parallelism, motivating parallelism and presenting its scope. I start the chapter on parallel programming platforms that will serve mostly as an introduction to parallelism from the hardware aspects.

Reading: Pointers and memory, Chapter 1, Sections 2.1, 2.2, and 2.3.

Material: C introduction, introduction slides.

Exercises: During this first exercise session, you are expected to refresh your skills in C by looking at the C introduction slides on your own and by doing the following exercises:

Write a function that, given two matrices A and B of dimension nxn, computes the result of the matrix multiplication AxB and writes the result in a matrix C. Testing of your function: Implement Fibonacci numbers using the simple (linear) algorithm and the matrix based (logarithmic) algorithm, see slides for the algorithms and some implementations.
Given an array of length n, implement different functions to invert it. Benchmark your implementations and explain the results.
[By benchmarking read/write access time on arrays (this is a hint), find out the size of the cache of your machine -- optional].

This is the first half of assignment 1.

Lecture 2: Introduction to the Case-Study, The PRAM Model, Physical Organization of Parallel Platforms.

Abstract: I present the case-study that will be used later for a series of assignments. You can download it here. Then, I introduce the architecture of an ideal parallel computer, known as the PRAM model. I use extra material from Sven Skyum to present Brent's scheduling principle and go through examples on sorting and merging. I complete (maybe) on the networked connection of parallel computers.

Reading: Skyum's lecture notes and Section 2.4.

Material: Model-checker case-study, Introduction to Parallel Algorithms Sven Skyum (BRICS Lecture Series) -- typo p6, read "O(n/lg n) processors" instead of "O(nlg n) processors" --, PRAM/Organization slides.

Exercises: Exercises 2.1 (design only, implementation is optional), 2.2, and 2.3. This completes assignment 1.

Assignment 1: Exercises of lecture 1 and lecture 2. All exercises are compulsory unless stated otherwise. Deadline March 7th.

Lecture 3: Physical Organization of Parallel Platforms cont., Programming Shared Address Space Platforms

Abstract: I finish the part on physical organization of parallel platforms and start (in parallel) pthread programming. From now on we'll have two threads in the course running in parallel, one on theory, one on practice.

Reading: Sections 2.4, 7.1, 7.2, 7.3, 7.4, and 7.5.

Material: Pthread slides.

Exercises: Exercise 2.10. Questions on the the case-study (see previous lecture):

Find-out the data flow diagram of the model-checker.
What are the data dependencies?
Which computations can be done in parallel?
What are the critical sections?
Find two designs to parallelize the model-checker applicable to shared-address space paradigm.

Lecture 4: POSIX Threads cont., Hints for Assignment 3.

Abstract: I finish Pthreads and I give you help on how to parallelize the model-checker. The slides are not here, you are expected to find your own solution.

Reading: Chapter 7.

Material: Slides from the previous lecture.

Exercises: Exercise 2.17.

Assignment 2: Exercises of lecture 3 and lecture 4. Please do the exercise on the case-study as soon as possible! Note: The model-checker has been updated to be more modular. Deadline March 16th.

Lecture 5: Communication Costs in Parallel Machines, Process-Processor Mapping, Preliminaries to Parallel Algorithm Design.

Abstract: I discuss different communication costs (message passing and shared adress space), the impact of process-processor mapping, and start on parallel algorithm design (decomposition techniques of apparently sequential algorithm into parallel algorithms).

Reading: Sections 2.5, 2.6, 2.7, and 3.1.

Material: Communication costs slides, process-processor mapping slides, algorithm design slides.

Assignment 3: The model-checker has been updated with a pthread utility support library to help you for this assignment.

How does detection of termination work in the support library?
Why is it not an issue for getting traces when a goal state is found?
Parallelize the model-checker with pthreads, following one of your design. The solution of having one shared waiting queue and one shared state-set is adviced to start with but it is not an acceptable solution for this assignment.

Note: There is a modelchecker.help.c file that you are expected to use for the exercise.

Lecture 6: Decomposition Techniques and Tasks. Mapping Techniques for Load Balancing, Methods for Containing Interaction Overheads, Parallel Algorithm Models.

Abstract: I continue on parallel algorithm design, treating characteristics of tasks and their interactions and mapping techniques. The point is to use techniques to minimize interaction.

Reading: Chapter 3.

Material: Algorithm design slides (from last time) and new slides (for today).

Exercises: Continue on assignment 3.

Lecture 7: Help for Assignment 3, Programming Using the Message Passing Paradigm.

Abstract: I present the API of pthread_utils.h for those who still haven't got its overview and you have the opportunity to use part of the lecture to ask questions and get any issue concerning the assignment clarified. I start on MPI then.

Reading: Sections 6.1, 6.2, 6.3, 6.4, and 6.5.

Material: Help slides and MPI slides.

Exercises: Continue on assignment 3. Deadline March 28th.

Lecture 8: MPI cont., MPI in our Model-checker.

Abstract: I finish MPI and I discuss problems and solutions related to parallelizing our model-checker using MPI. This is to prepare assignment 5.

Reading: Chapter 6 and section 11.4.4.

Material: previous MPI slides and the model-checker slides.

Exercises: 3.2, 3.7, and 3.21. [Optional: 2.24, 3.11, 3.12, 3.13]

Solutions: solutions for the optional exercises.

Lecture 9: Basic Communication Operations.

Abstract: I start chapter 4. It's about how to implement collective communication operations that we saw during the lectures on MPI. There are interesting algorithms for that.

Reading: Sections 4.1, 4.2, 4.3, and 4.4.

Material: slides.

Exercises: 4.1 and 4.5.

Assignment 4: Exercises of lectures 8 and 9. Deadline April 18th (2 weeks extra because some of you are "traveling").

Lecture 10: Basic Communication Operations cont.

Abstract: I finish chapter 4.

Reading: Sections 4.5, 4.6, and 4.7.

Material: slides from last time. You may be interested in the tutorial page of LAM/MPI.

Assignment 5: This assignment is split on 3 exercise sessions (that do not necessarily map to the following 3 sections for the exercises). There is very little coding to be made but on the other side, you need to read and analyse quite some code. The model-checker has been updated for MPI. Questions:

The model-checker explores states, receives states, sends states, detects termination, receives stop or shutdown notifications, and gets traces in a distributed manner. Most of it is hidden in the MPI support library and works automatically through a simple interface.

Which kinds of messages and protocols are implemented?
How are blocking and non-blocking communications used?
How do all these protocols work together?

During the lectures, I have presented a termination algorithm. A distributed leader election protocol is also implemented to avoid races when one or several processes want to stop because they found an interesting state and they want to initiate a trace. Of course only one trace should be printed here. A special protocol is also implemented to print traces.

How is the termination algorithm implemented? Give details on where the parts of the algorithm are implemented.
Give the leader election algorithm used here.
The traces are printed in a distributed manner. How does this work?

Eventually you want to make this work with MPI.

In the comments of the MPI support library you will discover that the local queue provided here is supposed to be used in a particular way to and fit together with incoming messages from other processes. This differs from the slides presented during the lectures. Update the design from one process point of view and use the local queue.
Copy modelchecker.mpi-help.c to modelchecker.mpi.c and complete the code.
You will see that this MPI version of the modelchecker has terrible performance. Why? Suggest some improvements to address one of the major issues.

The debug versions of the model-checker are available on my account on the system in MVP/.

Note: Progress is not implemented at this time. You should not add it.

Lecture 11: Analytical Modeling of Parallel Programs.

Abstract: I treat chapter 5. This chapter looks scary but it's not! It's actually very interesting.

Reading: Chapter 5.

Material: slides.

Exercises: Continue exercise from lecture 10. [Optional exercise 4.3].

Solution: exercise 4.3.

Lecture 12: Sorting.

Abstract: Sorry for swapping Dense matrix algorithms and sorting at the last minute but I just realized that SW6 will need the lecture on matrices. I will discuss different issues in sorting on parallel computers, present some well-known (sequential) sorting algorithms and their parallel versions.

Reading: Chapter 9.

Material: slides.

Exercises: Finish exercise from lecture 10. Deadline for asssignment 5: 27th April. [Optional exercises 5.1 and 5.2].

Solutions: exercises 5.1 and 5.2.

Lecture 13: Clusters & GRID

Abstract: Henrik Thostrup will hold this lecture.

Reading: Paper about GRID - get the basic concepts out of it.

Material: slides part 1 and part 2.

Exercises (Assignment 6, part 1): Study the scalability of your implementations, in particular

run experiments (pthread + MPI) with different sizes of a given problem (multiple readers, single writer is easily configurable), or different number of threads/processes,
explain your results,
come up with a reasonable model (in terms of cost) for your MPI implementation that should not be scalable as it it -- the improvement that you are supposed to suggest (previous assignment) should improve the cost accordingly.

Lecture 14: Dense Matrix Algorithms.

Abstract: I go through dense matrix algorithms for matrix*vector and matrix*matrix multiplications. I give the idea of parallel versions of Gauss' elimination algorithm.

Reading: Chapter 8.

Material: slides.

Exercises (Cluster/GRID - Assignment 6, part 2):

General
- The idea of a huge virtual supercomputer was rejected in the lecture. Explain what problems could be encountered in such a project.
- Describe problems that must be dealt with when sharing resource with people outside an organization.
Job Execution
- What are the advantages and disadvantages of centralized and decentralized brokering?
  Decentralized brokering: Client submits directly to resources.
  Centrialized brokering: Clients submit to a central broker which decides where jobs are executed.
- Make a list of the main attributes that are needed to check if a job can execute on a cluster.
Information System
- What happens if the infrastructure for resource discovery crashes / becomes unavailable?
- What are the implications of an information system delivering information which does not accurately describes the state of a resource?
Security
- Explain why delegation is used in grid.
- What are the problem of reusing existing security infrastructures in grid?

Deadline 11th of May for assignment 6.

Assignment 7 (SW6 only): In this directory you will find

gaussian_solver.c a matrix inversion program based on Gauss' elimination algorithm,
check_id.c a matrix multiplication program that checks that A*B == Identity,
test_allreduce.c an MPI program to test MPI_Allreduce,
matrices a directory that contains 3 matrices.

Your task is to make a Pthread *or* MPI version of gaussian_solver.c where you have to parallelize the solver itself and its built-in identity checker. The standalone check_id.c is provided for testing. To do so, use a row partitioning with a cyclic distribution of rows around processes to ensure some minimal load balancing. Since the solver is implementing the Gaussian method with pivoting, it is more difficult to implement the pipeline algorithm as described in the book so do not do it. Getting it right with pivoting is enough. You may want to add a second inv_index array to the solver to get the inverse mapping of index.
The MPI version is recommended. It is recommended you do it in groups. You may generate your own test matrices of different sizes. Test your program. Run experiments. Deadline 31st of May.

Lecture 15: Graph Algorithms.

Abstract: I treat of some parallel algorithms for dense graphs.

Reading: Sections 10.1 to 10.6.

Material: slides.

Exercises: Finish your completions if any.