Solutions for Exercise no. 10

1 Query Processing

From "Database System Concepts"

12.2 See the text in the book
In general it is not desirable to force users to choose a query processing strategy because naive users might select an inefficient strategy. The reason users would make poor choices about processing queries is that they would not know how a relation is stored, nor about its indices. It is unreasonable to force users to be aware of these details since ease of use is a major object of database query languages. If users are aware of the costs of different strategies they could write queries efficiently, thus helping performance. This could happen if experts were using the system.

12.3 See the text in the book

pT.branch-name((pbrance-name, assets(rT(branch)))nT.assets > S.assets
(passets(s(branch-city = "Brocklyn")(rS(branch)))))
This expression performs the theta join on the smallest amount of data possible. It does this by restricting the right hand side operand of the join to only those branches in Brooklyn, and also eliminating the unneeded attributes from both the operands.

12.4 See the text in the book

1. The relation resulting from the join of r1, r2, and r3 will be the same no matter which way we join them, due to the associative and commutative properties of joins. So we will consider the size based on the strategy of ((r1 nr2)nr3). Joining r1 and r2 will yield a relation of at most 1000 tuples since C is a key of r2. Likewise, joining the result with r3 will yield a relation of at most 1000 tuples because E is a key for r3. Therefore the final relation will have at most 1000 tuples.
2. An efficient strategy for computing the join would be to create an index on attribute C of relation r2 and on E for r3. Then for each tuple in r1, we do the following:
1. Use the index for r2 to look up at most one tuple which matches the C value for r1
2. Use the created index on E to look up in r3 at most one tuple which matches the unique value for E in r2.

12.5 See the text in the book
The estimated size of the relation can be determined by calculating the average number of tuples which would be joined with each tuple of the second relation. In this case, for each tuple in r1, 1500/V(C, r2) = 15/11 tuples (on the average) of r2would join with it. The intermediate relation would have 15000/11 tuples. This relation is joined with r3 to yield a result of approximately 10,227 tuples (15000/11 x 750/100). A good strategy should join r1and r2 first, since the intermediate relation is about the same size as r1 or r2. Then r3 is joined to this result.

2 Plan Generation and Interpretation in Oracle

In the left column of the table below, the query plans for the database without statistics is given. In the right column the query plans for the same queries are given after the analyze table statements are executed.

As expected the cost-based query optimizer performs better with accurate statistics and makes close to optimal choices. In particular, note Q2a (full scan instead of an index range scan), Q4b (nested-loop instead of sort-merge) and Q5a (nested-loop instead of index range scans).

 Query Plans before ANALYZE Tables Query Plans After ANALYZE Tables SELECT STATEMENT   StmtId = Q1a Cost =   TABLE ACCESS BY INDEX ROWID R2     INDEX UNIQUE SCAN SYS_C0072395 SELECT STATEMENT   StmtId = Q1a Cost = 2   TABLE ACCESS BY INDEX ROWID R2     INDEX UNIQUE SCAN SYS_C0072395 SELECT STATEMENT   StmtId = Q1b Cost =   TABLE ACCESS FULL R2 SELECT STATEMENT   StmtId = Q1b Cost = 97   TABLE ACCESS FULL R2 SELECT STATEMENT   StmtId = Q2a Cost =   SORT AGGREGATE     TABLE ACCESS BY INDEX ROWID R2       INDEX RANGE SCAN SYS_C0072395 SELECT STATEMENT   StmtId = Q2a Cost = 97   SORT AGGREGATE     TABLE ACCESS FULL R2 SELECT STATEMENT   StmtId = Q2b Cost =   SORT AGGREGATE     TABLE ACCESS FULL R2 SELECT STATEMENT   StmtId = Q2b Cost = 97   SORT AGGREGATE     TABLE ACCESS FULL R2 SELECT STATEMENT   StmtId = Q3a Cost =   SORT AGGREGATE     INDEX RANGE SCAN SYS_C0072395 SELECT STATEMENT   StmtId = Q3a Cost = 8   SORT AGGREGATE     INDEX RANGE SCAN SYS_C0072395 SELECT STATEMENT   StmtId = Q3b Cost =   SORT AGGREGATE     TABLE ACCESS FULL R2 SELECT STATEMENT   StmtId = Q3b Cost = 97   SORT AGGREGATE     TABLE ACCESS FULL R2 SELECT STATEMENT   StmtId = Q4a Cost =   SORT AGGREGATE     NESTED LOOPS       TABLE ACCESS BY INDEX ROWID R2         INDEX UNIQUE SCAN SYS_C0072395       TABLE ACCESS BY INDEX ROWID R1         INDEX UNIQUE SCAN SYS_C0072394 SELECT STATEMENT   StmtId = Q4a Cost = 3   SORT AGGREGATE     NESTED LOOPS       TABLE ACCESS BY INDEX ROWID R2         INDEX UNIQUE SCAN SYS_C0072395       TABLE ACCESS BY INDEX ROWID R1         INDEX UNIQUE SCAN SYS_C0072394 SELECT STATEMENT   StmtId = Q4b Cost =   SORT AGGREGATE     MERGE JOIN       SORT JOIN         TABLE ACCESS FULL R2       SORT JOIN         TABLE ACCESS FULL R1 SELECT STATEMENT   StmtId = Q4b Cost = 145   SORT AGGREGATE     NESTED LOOPS       TABLE ACCESS FULL R1       TABLE ACCESS FULL R2 SELECT STATEMENT   StmtId = Q5a Cost =   SORT AGGREGATE     NESTED LOOPS       INDEX RANGE SCAN SYS_C0072394       INDEX UNIQUE SCAN SYS_C0072395 SELECT STATEMENT   StmtId = Q5a Cost = 48   SORT AGGREGATE     NESTED LOOPS       TABLE ACCESS FULL R1       INDEX UNIQUE SCAN SYS_C0072395 SELECT STATEMENT   StmtId = Q5b Cost =   SORT AGGREGATE     MERGE JOIN       SORT JOIN         TABLE ACCESS FULL R2       SORT JOIN         TABLE ACCESS FULL R1 SELECT STATEMENT   StmtId = Q5b Cost = 672   SORT AGGREGATE     MERGE JOIN       SORT JOIN         TABLE ACCESS FULL R2       SORT JOIN         TABLE ACCESS FULL R1

Best regards,
Kristian Torp