Solutions for Exercise no. 9

1 Cost Computations

Assume that r is stored in the data file df that contains 10.000 records. Since each block has 50 records, the maximum number of input operations for the evaluation of a query by scanning is 200.

1.1 No index, unordered file

All the 200 blocks of df have to be accessed, in the worst case. This applies to all the queries listed in the exercise. As for the average case for the first query SELECT * FROM r WHERE a = 8000, 100 blocks must be read

1.2 No index, ordered file

Binary search can be used in this case. It takes at most log2(N) input operations for the block that contains the desired record to be identified, where N is the total number of blocks. Since N = 200, the maximum number of input operations is log2(200) = 7,6.

1.3 Secondary B+-tree on r (A)

The maximum number of nodes that have to be accessed for the location of a particular block of df is given by: logn(K) where n is the number of records that can be stored in a leaf node. and K is the number of distinct search kay values. In this case, this formula evaluates to: log50(10000) = 2,4

The access of a single record requires loading into main memory three index nodes (blocks), one of which is a leaf node, then one table block.

For the remaining queries, it is necessary to reach the leaf nodes in the first place, which requires loading 2 blocks initially. At the leaf node level, there are 200 blocks. In the worst case, it will then be necessary to load a distinct table block in to main memory for each search key value.

In the solutions given below, the first term indicates the number of accessed nodes required for reaching the leaf node, the second, the number of leaf nodes that may have to be accessed thereafter, and the last one, the maximum number of table blocks that may have to be loaded.

1.4 Primary B+-tree on r (A)

The access procedure described above applies almost unchanged to the primary B+-tree case. The only differences is that, when the index is primary, any table block loaded into main memory will usually contain more than one record relevant for query processing, hence the number of times a table block has to be accessed is drastically reduced compared to using a secondary index We here assume that leaf nodes form a double linked list.

1.5 Hash index on r (A)

In an ideal situation, the chosen hash function should return the correct address of a table block where a searched tuple is stored. The remaining queries cannot be computed by using a hash index and we have to resort to full table scans

2 Empirical Performance Measurements

In general, we can expect queries with access structures to perform better than without access structures. Note that this is not the case for Q2 and Q5. (The reason why will be explain in the lecture on query processing).
% Difference  Explanation of Difference
Q1 0.01 0.08  700%  Hash look up close to optimal 
Q2 1.22 0.11  1.009%  Index bad due to selectivity 
Q3 0.01  0.07  600% Effective use of B+-tree: no table access needed 
Q4 0.20  1.60 700%  B+-tree lookup on r2; hash lookup on r1 
Q5 0.19 0.60  683%  Index bad due to selectivity 

Best regards,
Kristian Torp