Solutions for Exercise no. 10

1 Query Processing

From "Database System Concepts"

12.2 See the text in the book
Answer:
In general it is not desirable to force users to choose a query processing strategy because naive users might select an inefficient strategy. The reason users would make poor choices about processing queries is that they would not know how a relation is stored, nor about its indices. It is unreasonable to force users to be aware of these details since ease of use is a major object of database query languages. If users are aware of the costs of different strategies they could write queries efficiently, thus helping performance. This could happen if experts were using the system.
 

12.3 See the text in the book
Answer:

pT.branch-name((pbrance-name, assets(rT(branch)))nT.assets > S.assets
(passets(s(branch-city = "Brocklyn")(rS(branch)))))
This expression performs the theta join on the smallest amount of data possible. It does this by restricting the right hand side operand of the join to only those branches in Brooklyn, and also eliminating the unneeded attributes from both the operands.
 

12.4 See the text in the book
Answer:

  1. The relation resulting from the join of r1, r2, and r3 will be the same no matter which way we join them, due to the associative and commutative properties of joins. So we will consider the size based on the strategy of ((r1 nr2)nr3). Joining r1 and r2 will yield a relation of at most 1000 tuples since C is a key of r2. Likewise, joining the result with r3 will yield a relation of at most 1000 tuples because E is a key for r3. Therefore the final relation will have at most 1000 tuples.
  2. An efficient strategy for computing the join would be to create an index on attribute C of relation r2 and on E for r3. Then for each tuple in r1, we do the following:
    1. Use the index for r2 to look up at most one tuple which matches the C value for r1
    2. Use the created index on E to look up in r3 at most one tuple which matches the unique value for E in r2.


12.5 See the text in the book
Answer:
The estimated size of the relation can be determined by calculating the average number of tuples which would be joined with each tuple of the second relation. In this case, for each tuple in r1, 1500/V(C, r2) = 15/11 tuples (on the average) of r2would join with it. The intermediate relation would have 15000/11 tuples. This relation is joined with r3 to yield a result of approximately 10,227 tuples (15000/11 x 750/100). A good strategy should join r1and r2 first, since the intermediate relation is about the same size as r1 or r2. Then r3 is joined to this result.
 

2 Plan Generation and Interpretation in Oracle

In the left column of the table below, the query plans for the database without statistics is given. In the right column the query plans for the same queries are given after the analyze table statements are executed.

As expected the cost-based query optimizer performs better with accurate statistics and makes close to optimal choices. In particular, note Q2a (full scan instead of an index range scan), Q4b (nested-loop instead of sort-merge) and Q5a (nested-loop instead of index range scans).
 
Query Plans before ANALYZE Tables
Query Plans After ANALYZE Tables
SELECT STATEMENT   StmtId = Q1a Cost =
  TABLE ACCESS BY INDEX ROWID R2
    INDEX UNIQUE SCAN SYS_C0072395
SELECT STATEMENT   StmtId = Q1a Cost = 2
  TABLE ACCESS BY INDEX ROWID R2
    INDEX UNIQUE SCAN SYS_C0072395
SELECT STATEMENT   StmtId = Q1b Cost =
  TABLE ACCESS FULL R2
SELECT STATEMENT   StmtId = Q1b Cost = 97
  TABLE ACCESS FULL R2
SELECT STATEMENT   StmtId = Q2a Cost =
  SORT AGGREGATE
    TABLE ACCESS BY INDEX ROWID R2
      INDEX RANGE SCAN SYS_C0072395
SELECT STATEMENT   StmtId = Q2a Cost = 97
  SORT AGGREGATE
    TABLE ACCESS FULL R2
SELECT STATEMENT   StmtId = Q2b Cost =
  SORT AGGREGATE
    TABLE ACCESS FULL R2
SELECT STATEMENT   StmtId = Q2b Cost = 97
  SORT AGGREGATE
    TABLE ACCESS FULL R2
SELECT STATEMENT   StmtId = Q3a Cost =
  SORT AGGREGATE
    INDEX RANGE SCAN SYS_C0072395
SELECT STATEMENT   StmtId = Q3a Cost = 8
  SORT AGGREGATE
    INDEX RANGE SCAN SYS_C0072395
SELECT STATEMENT   StmtId = Q3b Cost =
  SORT AGGREGATE
    TABLE ACCESS FULL R2
SELECT STATEMENT   StmtId = Q3b Cost = 97
  SORT AGGREGATE
    TABLE ACCESS FULL R2
SELECT STATEMENT   StmtId = Q4a Cost =
  SORT AGGREGATE
    NESTED LOOPS
      TABLE ACCESS BY INDEX ROWID R2
        INDEX UNIQUE SCAN SYS_C0072395
      TABLE ACCESS BY INDEX ROWID R1
        INDEX UNIQUE SCAN SYS_C0072394
SELECT STATEMENT   StmtId = Q4a Cost = 3
  SORT AGGREGATE
    NESTED LOOPS
      TABLE ACCESS BY INDEX ROWID R2
        INDEX UNIQUE SCAN SYS_C0072395
      TABLE ACCESS BY INDEX ROWID R1
        INDEX UNIQUE SCAN SYS_C0072394
SELECT STATEMENT   StmtId = Q4b Cost =
  SORT AGGREGATE
    MERGE JOIN
      SORT JOIN
        TABLE ACCESS FULL R2
      SORT JOIN
        TABLE ACCESS FULL R1
SELECT STATEMENT   StmtId = Q4b Cost = 145
  SORT AGGREGATE
    NESTED LOOPS
      TABLE ACCESS FULL R1
      TABLE ACCESS FULL R2
SELECT STATEMENT   StmtId = Q5a Cost =
  SORT AGGREGATE
    NESTED LOOPS
      INDEX RANGE SCAN SYS_C0072394
      INDEX UNIQUE SCAN SYS_C0072395
SELECT STATEMENT   StmtId = Q5a Cost = 48
  SORT AGGREGATE
    NESTED LOOPS
      TABLE ACCESS FULL R1
      INDEX UNIQUE SCAN SYS_C0072395
SELECT STATEMENT   StmtId = Q5b Cost =
  SORT AGGREGATE
    MERGE JOIN
      SORT JOIN
        TABLE ACCESS FULL R2
      SORT JOIN
        TABLE ACCESS FULL R1
SELECT STATEMENT   StmtId = Q5b Cost = 672
  SORT AGGREGATE
    MERGE JOIN
      SORT JOIN
        TABLE ACCESS FULL R2
      SORT JOIN
        TABLE ACCESS FULL R1


Best regards,
Kristian Torp