In the past decade, four (often correlated) factors have shifted the performance bottleneck of data-intensive commercial workloads from I/O to the processor and memory subsystem. First, storage systems are becoming faster and more intelligent (now disks come complete with their own processors and caches). Second, modern database storage managers aggressively improve locality through clustering, hide I/O latencies using prefetching, and parallelize disk accesses using data striping. Third, main memories have become much larger and often hold the application's working set. Fourth, the increasing memory/processor speed gap accentuates the importance of processor caches to database performance. Additionally to deep memory hierarchies, however, the new multi-core chips add aggressive parallelism as a first-class requirement for database system scalability and performance.
How is database technology coping with these changes? This course will first motivate the problem of database performance on modern hardware by discussing how database and computer microarchitecture technologies have evolved over the past three decades. We will discuss approaches and methodologies used to produce time breakdowns when executing database workloads on modern processors. Then, we will survey techniques proposed in the literature towards architecture-conscious database systems, and their evaluation. We will emphasize the importance and explain the challenges when determining the optimal data placement on all levels of memory hierarchy, and contrast to other approaches such as prefetching data and instructions. Finally, we will discuss open problems and future directions on that arise on the new multi-core chip platforms.
Prerequisites:
The course assumes that the participants are familiar with basic database access methods such as B-tree indexing, with query processing operators such as join and sort, as well as with general database topics as covered by typical undergraduate-level database textbooks. A certain familiarity with undergraduate computer architecture material is a definite plus.
The course will consist of a number of lectures and exercises. Active student participation is expected.
Instructor:Click on section named to see required literature readings for each lecture. Note that slides (in a three-on-one handout format) are available only from inside the aau.dk network.
Location: 0.2.13
Slides: part1, part2, part3.
9:00 | - 10:00 | Architectural Characterizations of DB workloads (1) | |
10:00 | - 11:00 | Architectural Characterizations of DB workloads (2) | |
11:00 | - 12:00 | Cache-conscious Database Algorithms (1) | |
12:00 | - 13:00 | Lunch break | |
13:00 | - 14:00 | Cache-conscious Database Algorithms (2) | |
14:00 | - 15:00 | Rethinking Transaction Processing (1) | |
15:00 | - 16:00 | Rethinking Transaction Processing (2) |
Location: 0.2.13
Slides: part4, part5, part6, part7.
9:00 | - 10:00 | Query Engines for Modern Hardware (1) | |
10:00 | - 11:00 | Query Engines for Modern Hardware (2) | |
11:00 | - 12:00 | Resource Sharing and Thread Scheduling on Multicore | |
12:00 | - 13:00 | Lunch break | |
13:00 | - 14:00 | Storage-aware query processing (1) | |
14:00 | - 15:00 | Storage-aware query processing (2) | |
15:00 | - 16:00 | Query co-processing on commodity hardware |
To prepare for the course, it is recommended to read at least the papers labeled "required reading". The rest, labeled "Additional resources", would be great to read after the class to gan additional perspective on the topic.
Additional Resources
Additional Resources
Additional Resources
Additional Resources
Back to scheduleAdditional Resources
Additional Resources
Additional Resources
Back to schedule