Ph.D. course
Database Management on Modern Hardware

Aalborg University
May 11-12, 2009


Short Description

In the past decade, four (often correlated) factors have shifted the performance bottleneck of data-intensive commercial workloads from I/O to the processor and memory subsystem. First, storage systems are becoming faster and more intelligent (now disks come complete with their own processors and caches). Second, modern database storage managers aggressively improve locality through clustering, hide I/O latencies using prefetching, and parallelize disk accesses using data striping. Third, main memories have become much larger and often hold the application's working set. Fourth, the increasing memory/processor speed gap accentuates the importance of processor caches to database performance. Additionally to deep memory hierarchies, however, the new multi-core chips add aggressive parallelism as a first-class requirement for database system scalability and performance.

How is database technology coping with these changes? This course will first motivate the problem of database performance on modern hardware by discussing how database and computer microarchitecture technologies have evolved over the past three decades. We will discuss approaches and methodologies used to produce time breakdowns when executing database workloads on modern processors. Then, we will survey techniques proposed in the literature towards architecture-conscious database systems, and their evaluation. We will emphasize the importance and explain the challenges when determining the optimal data placement on all levels of memory hierarchy, and contrast to other approaches such as prefetching data and instructions. Finally, we will discuss open problems and future directions on that arise on the new multi-core chip platforms.

Prerequisites:
The course assumes that the participants are familiar with basic database access methods such as B-tree indexing, with query processing operators such as join and sort, as well as with general database topics as covered by typical undergraduate-level database textbooks. A certain familiarity with undergraduate computer architecture material is a definite plus.

The course will consist of a number of lectures and exercises. Active student participation is expected.

Instructor:
Anastasia Ailamaki
Director, Data Intensive Applications and Systems Laboratory (DIAS)
Professor, EPFL School of Informatics and Communications
Adjunct Professor,
Carnegie Mellon School of Computer Science
e-mail: firstname dot lastname at epfl dot ch

Schedule

Click on section named to see required literature readings for each lecture. Note that slides (in a three-on-one handout format) are available only from inside the aau.dk network.

Day 1 (May 11, 2009)

Location: 0.2.13
Slides: part1, part2, part3.

9:00- 10:00   Architectural Characterizations of DB workloads (1)
10:00- 11:00   Architectural Characterizations of DB workloads (2)
11:00- 12:00   Cache-conscious Database Algorithms (1)
12:00- 13:00    Lunch break
13:00- 14:00   Cache-conscious Database Algorithms (2)
14:00- 15:00   Rethinking Transaction Processing (1)
15:00- 16:00   Rethinking Transaction Processing (2)

Day 2 (May 12, 2009)

Location: 0.2.13
Slides: part4, part5, part6, part7.

9:00- 10:00   Query Engines for Modern Hardware (1)
10:00- 11:00   Query Engines for Modern Hardware (2)
11:00- 12:00   Resource Sharing and Thread Scheduling on Multicore
12:00- 13:00    Lunch break
13:00- 14:00   Storage-aware query processing (1)
14:00- 15:00   Storage-aware query processing (2)
15:00- 16:00   Query co-processing on commodity hardware

Homework

As homework for this course, each student should prepare a thorough review of three of the papers in the list below. Make your selections from the papers listed under required reading. The review for each paper should be at most two pages long. The review should contain Please be thorough and constructive in your reviews. To pass this course, you have to email your reviews in PDF to simas at cs dot aau dot dk with subject "PhD Course: Reviews by < your_name>" by May 18, 2009.

Suggested Literature

To prepare for the course, it is recommended to read at least the papers labeled "required reading". The rest, labeled "Additional resources", would be great to read after the class to gan additional perspective on the topic.

Architectural Characterizations of DB workloads

Required Reading

Additional Resources

Back to schedule

Cache-conscious Database Algorithms

Required Reading

Additional Resources

Back to schedule

Rethinking Transaction Processing

Required Reading

Additional Resources

Back to schedule

Query Engines for Modern Hardware

Required Reading

Additional Resources

Back to schedule

Resource Sharing and Thread Scheduling on Multicore

Required Reading

Additional Resources

Back to schedule

Storage-aware query processing

Required Reading

Additional Resources

Back to schedule

Query co-processing on commodity hardware

Required Reading

Additional Resources

Back to schedule


Last updated May 10, 2009