Consider a join (relation algebra) between relations \(r(R)\) and \(s(S)\)…
2014
Consider a join (relation algebra) between relations \(r(R)\) and \(s(S)\) using the nested loop method. There are 3 buffers each of size equal to disk block size, out of which one buffer is reserved for intermediate results. Assuming size \(\text{size}(r(R))<\text{size}(s(S)),\) , the join will have fewer number of disk block accesses if
- A.
relation
\(r(R)\)is in the outer loop. - B.
relation
\(s(S)\)is in the outer loop. - C.
join selection factor between
\(r(R)\)and\(s(S)\)is more than 0.5. - D.
join selection factor between
\(r(R)\)and\(s(S)\)is less than 0.5.
Attempted by 99 students.
Show answer & explanation
Correct answer: A
Key idea: choose the smaller relation as the outer relation to minimize nested-loop I/O when buffers are limited.
Reasoning:
We have 3 buffers and one is reserved for intermediate results, leaving two buffers available for the join.
In a block nested-loop join one buffer is typically used to stream the inner relation and one or more buffers are used to hold blocks of the outer relation. With the available two buffers, one is used for streaming the inner relation and only one remains to hold blocks of the outer relation, so you can effectively load just one block of the outer relation at a time.
Let B(r) and B(s) be the numbers of disk blocks of relations r and s respectively. If r is the outer relation, total disk reads ≈ B(r) + B(r) * B(s) (read r once, then for each outer block read the entire inner relation). If s is outer, cost ≈ B(s) + B(s) * B(r).
Because size(r) < size(s) implies B(r) < B(s), choosing r as the outer relation yields smaller total disk I/O: B(r) + B(r) * B(s) < B(s) + B(s) * B(r).
Conclusion: Put the smaller relation r in the outer loop to minimize disk block accesses.
A video solution is available for this question — log in and enroll to watch it.