Solutions for Exercise no. 9

Assume that r is stored in the data file df that contains 10.000 records. Since each block has 50 records, the maximum number of input operations for the evaluation of a query by scanning is 200.

All the 200 blocks of df have to be accessed, in the worst case. This applies to all the queries listed in the exercise. As for the average case for the first query SELECT * FROM r WHERE a = 8000, 100 blocks must be read

1.2 No index, ordered file

Binary search can be used in this case. It takes at most log₂(N) input operations for the block that contains the desired record to be identified, where N is the total number of blocks. Since N = 200, the maximum number of input operations is log₂(200) = 7,6.

SELECT * FROM r WHERE A = 8000 (8 input operations)
SELECT * FROM r WHERE A <> 8000 (200 input operations)
SELECT * FROM r WHERE A > 8000 (8 + 40 input operations)
SELECT * FROM r WHERE A BETWEEN 8000 AND 9000 (8 + 20 input operations)

1.3 Secondary B⁺-tree on r (A)

The maximum number of nodes that have to be accessed for the location of a particular block of df is given by: log_n(K) where n is the number of records that can be stored in a leaf node. and K is the number of distinct search kay values. In this case, this formula evaluates to: log₅₀(10000) = 2,4

The access of a single record requires loading into main memory three index nodes (blocks), one of which is a leaf node, then one table block.

SELECT * FROM r WHERE a = 8000 (2 + 1+ 1 input operations)

For the remaining queries, it is necessary to reach the leaf nodes in the first place, which requires loading 2 blocks initially. At the leaf node level, there are 200 blocks. In the worst case, it will then be necessary to load a distinct table block in to main memory for each search key value.

In the solutions given below, the first term indicates the number of accessed nodes required for reaching the leaf node, the second, the number of leaf nodes that may have to be accessed thereafter, and the last one, the maximum number of table blocks that may have to be loaded.

SELECT * FROM r WHERE A <> 8000 (2 + 200 + 10.000 input operations)
SELECT * FROM r WHERE A > 8000 (2 + 40 + 2.000 input operations)
SELECT * FROM r WHERE A BETWEEN 8000 AND 9000 (2 + 20 + 1.000 input operations)

1.4 Primary B⁺-tree on r (A)

The access procedure described above applies almost unchanged to the primary B⁺-tree case. The only differences is that, when the index is primary, any table block loaded into main memory will usually contain more than one record relevant for query processing, hence the number of times a table block has to be accessed is drastically reduced compared to using a secondary index

SELECT * FROM r WHERE A = 8000 (2 + 1+ 1 input operations)
SELECT * FROM r WHERE A <> 8000 (2 + 200 + 200 input operations)
SELECT * FROM r WHERE A > 8000 (2 + 40 + 40 input operations)
SELECT * FROM r WHERE A BETWEEN 8000 AND 9000 (2 + 20 + 20 input operations)

We here assume that leaf nodes form a double linked list.

1.5 Hash index on r (A)

In an ideal situation, the chosen hash function should return the correct address of a table block where a searched tuple is stored.

SELECT * FROM r WHERE A = 8000 (1 input operation)

The remaining queries cannot be computed by using a hash index and we have to resort to full table scans

SELECT * FROM r WHERE A <> 8000 (200 input operations)
SELECT * FROM r WHERE A > 8000 (200 input operations)
SELECT * FROM r WHERE A BETWEEN 8000 AND 9000 (200 input operations)

2 Empirical Performance Measurements

In general, we can expect queries with access structures to perform better than without access structures. Note that this is not the case for Q2 and Q5. (The reason why will be explain in the lecture on query processing).

Query	a)	b)	% Difference	Explanation of Difference
Q1	0.01	0.08	700%	Hash look up close to optimal
Q2	1.22	0.11	1.009%	Index bad due to selectivity
Q3	0.01	0.07	600%	Effective use of B⁺-tree: no table access needed
Q4	0.20	1.60	700%	B⁺-tree lookup on r2; hash lookup on r1
Q5	0.19	0.60	683%	Index bad due to selectivity

Best regards,
Kristian Torp

Solutions for Exercise no. 9

1 Cost Computations

1.1 No index, unordered file

1.2 No index, ordered file

1.3 Secondary B⁺-tree on r (A)

1.4 Primary B⁺-tree on r (A)

1.5 Hash index on r (A)

2 Empirical Performance Measurements

Solutions for Exercise no. 9

1 Cost Computations

1.1 No index, unordered file

1.2 No index, ordered file

1.3 Secondary B+-tree on r (A)

1.4 Primary B+-tree on r (A)

1.5 Hash index on r (A)

2 Empirical Performance Measurements

1.3 Secondary B⁺-tree on r (A)

1.4 Primary B⁺-tree on r (A)