Exercise 9

1 Cost Computations

Assume a table r with 10000 tuples and candidate key A whose range is [1 - 10000]. Assume 50 (fixed size) records per block and no buffering. Given the queries

  SELECT * FROM r WHERE A = 8000;
  SELECT * FROM r WHERE A <> 8000;
  SELECT * FROM r WHERE A > 8000;
  SELECT * FROM r WHERE A BETWEEN 8000 AND 9000;

Determine the number of I/Os for the following situations:

no index, unordered file
no index, ordered file
secondary B⁺-tree index on r(A), assume a maximum fanout of 100
primary B⁺-tree index on r(A), assume a maximum fanout of 100
hash index on r(A)

2 Empirical Performance Measurements

The measurements shall be run on a mini-database that reflects the structure of the Wisconsin Benchmark.

2.1 Setup & Initialization

First, you have to create and populate two tables. To create the tables execute the DDL statements that are available via the world wide web (create_tables.sql). Tables r1 and r2 have the following characteristics.

|r1| = 8.000 tuples
Secondary B⁺-tree index on r1(unique1)
|r2| = 16.000 tuples
Secondary B⁺-tree index on r2(unique1)
Primary hash index on r2(unique1)
The values of unique1 and unique3 are identical for each tuple.

To load data use copy the files in the data directory (ctl are control files, dat are data files). Run the load_data script from a unix shell to load the data into the tables you have created.

2.2 Test Queries

Assume the following pairs of semantically equivalent statements (see queries.sql on the web):

Q1:
a) select unique2 from r2 s where s.unique1 = 8530;
b) select unique2 from r2 s where s.unique3 = 8530;
Q2:
a) select avg(unique2) from r2 s where s.unique1 > 1000;
b) select avg(unique2) from r2 s where s.unique3 > 1000;
Q3:
a) select avg(unique1) from r2 s where s.unique1 between 200 and 900;
b) select avg(unique1) from r2 s where s.unique3 between 200 and 900;
Q4:
a) select avg(r.four+s.four) from r1 r, r2 s

b) select avg(r.four+s.four) from r1 r, r2 s

Q5
a) select count(*) from r1 r, r2 s

b) select count(*) from r1 r, r2 s
where r.unique3=s.unique3 and r.unique3 > 700;
Fill out the following performance measurement table. Use set timing on; to measure the time.

Query	a)	b)	% Difference	Explanation of Difference
Q1
Q2
Q3
Q4
Q5

Best regards,
Kristian Torp