Solutions for Exercise no. 9
1 Cost Computations
-
Assume that r is stored in the data file df that contains 10.000
records. Since each block has 50 records, the maximum number of input operations
for the evaluation of a query by scanning is 200.
1.1 No index, unordered file
All the 200 blocks of df have to be accessed, in the worst case.
This applies to all the queries listed in the exercise. As for the average
case for the first query SELECT * FROM r WHERE a = 8000, 100 blocks must
be read
1.2 No index, ordered file
Binary search can be used in this case. It takes at most log2(N)
input
operations for the block that contains the desired record to be identified,
where N is the total number of blocks. Since N = 200, the
maximum number of input operations is log2(200) = 7,6.
-
SELECT * FROM r WHERE A = 8000 (8 input operations)
-
SELECT * FROM r WHERE A <> 8000 (200 input operations)
-
SELECT * FROM r WHERE A > 8000 (8 + 40 input operations)
-
SELECT * FROM r WHERE A BETWEEN 8000 AND 9000 (8 + 20 input operations)
1.3 Secondary B+-tree on r (A)
The maximum number of nodes that have to be accessed for the location of
a particular block of df is given by: logn(K) where
n
is the number of records that can be stored in a leaf node. and
K is
the number of distinct search kay values. In this case, this formula evaluates
to: log50(10000) = 2,4
The access of a single record requires loading into main memory three
index nodes (blocks), one of which is a leaf node, then one table block.
-
SELECT * FROM r WHERE a = 8000 (2 + 1+ 1 input operations)
For the remaining queries, it is necessary to reach the leaf nodes in the
first place, which requires loading 2 blocks initially. At the leaf node
level, there are 200 blocks. In the worst case, it will then be necessary
to load a distinct table block in to main memory for each search
key value.
In the solutions given below, the first term indicates the number of
accessed nodes required for reaching the leaf node, the second, the number
of leaf nodes that may have to be accessed thereafter, and the last one,
the maximum number of table blocks that may have to be loaded.
-
SELECT * FROM r WHERE A <> 8000 (2 + 200 + 10.000 input operations)
-
SELECT * FROM r WHERE A > 8000 (2 + 40 + 2.000 input operations)
-
SELECT * FROM r WHERE A BETWEEN 8000 AND 9000 (2 + 20 + 1.000 input operations)
1.4 Primary B+-tree on r (A)
The access procedure described above applies almost unchanged to the primary
B+-tree case. The only differences is that, when the index is
primary, any table block loaded into main memory will usually contain more
than one record relevant for query processing, hence the number of times
a table block has to be accessed is drastically reduced compared to using
a secondary index
-
SELECT * FROM r WHERE A = 8000 (2 + 1+ 1 input operations)
-
SELECT * FROM r WHERE A <> 8000 (2 + 200 + 200 input operations)
-
SELECT * FROM r WHERE A > 8000 (2 + 40 + 40 input operations)
-
SELECT * FROM r WHERE A BETWEEN 8000 AND 9000 (2 + 20 + 20 input operations)
We here assume that leaf nodes form a double linked list.
1.5 Hash index on r (A)
In an ideal situation, the chosen hash function should return the correct
address of a table block where a searched tuple is stored.
-
SELECT * FROM r WHERE A = 8000 (1 input operation)
The remaining queries cannot be computed by using a hash index and we have
to resort to full table scans
-
SELECT * FROM r WHERE A <> 8000 (200 input operations)
-
SELECT * FROM r WHERE A > 8000 (200 input operations)
-
SELECT * FROM r WHERE A BETWEEN 8000 AND 9000 (200 input operations)
2 Empirical Performance Measurements
In general, we can expect queries with access structures to perform better
than without access structures. Note that this is not the case for Q2 and
Q5. (The reason why will be explain in the lecture on query processing).
Query
|
a)
|
b)
|
% Difference |
Explanation of Difference |
Q1 |
0.01 |
0.08 |
700% |
Hash look up close to optimal |
Q2 |
1.22 |
0.11 |
1.009% |
Index bad due to selectivity |
Q3 |
0.01 |
0.07 |
600% |
Effective use of B+-tree: no table access
needed |
Q4 |
0.20 |
1.60 |
700% |
B+-tree lookup on r2; hash lookup on r1 |
Q5 |
0.19 |
0.60 |
683% |
Index bad due to selectivity |
Best regards,
Kristian Torp