In a query involving more than one table, join processing is involved to determine the most optimal plan to perform the join. This comprises:
- Join cardinality
- Enumerating join orders
- Evaluating the costs associated with each join path under each join method.
Join cardinality is the determination of the number of rows that will make up the joined relation. In the worst case, we have no join predicates and the cardinality is the Cartesian product of the two or more tables involved. But typically when two or more tables participate in a join, the join is based on values of some columns. Using the statistical descriptions on the columns and the number of rows in each table, we calculate the cardinality of the joined form.
A “Join Order” is a particular permutation of ordering the access to tables participating in the join to complete the joined relation. Join orders depend on the type of join or the join topology. Depending on the structuring of the join predicates, joins can be classified into various types viz chain, star, cycle, complete graph, etc. For example, consider the query with a three table join –
SELECT … FROM t1,t2,t3 WHERE
t1.col1 = t2.col1 AND
t2.col1 = t3.col1 ;
Tables t1, t2 and t3 are joined in a chain order.
Tables t1,t2,t3 can be joined in n! i.e n factorial way .
where n is number of tables .
So in This case it will be 3*2*1=6
Possible join orders:
t1->t2->t3, t2->t1->t3, t2->t3->t1, t3->t2->t1
Note the Join orders such as t1->t3->t2, t3->t1->t2 are evaluated using
Cartesian product as there is no join predicate specified between t1 and t3.
Each of the join orders are possible and need to be evaluated for resource usage. As number of tables increase, depending on the join topology, the search space or the possible permutations go up steeply. Several techniques are used to prune the search space and reduce the work involved in identifying the best join order.
Evaluating Join Path Costs
For each join order, the optimizer evaluates the cost of performing the join by breaking the order into pairs. The first part is made of the relations already joined and the second part of the next table to be joined. Costs are considered using both join methods currently implemented in the kernel, the sort-merge join method and the nested-loops join method.
In the sort-merge method, the tables participating in the join are first sorted by the join columns. The sorted relations are then joined. The cost of performing join by this method includes the sorting costs and the cost of retrieving sorted records to perform the join.
In the nested-loops method, we scan the outer table row one at a time and for each row retrieved, we access the inner table based on the join columns. The outer table will actually be a joined relation where more than two tables are joined. The cost of performing the join in this method includes the cost of accessing the outer table and for each row retrieved and the cost of fetching the group of rows from the inner table based on the join columns.
For each table in a particular join order, there is a large number of possible access paths, the table can be accessed using a ROWID or could be accessed by doing a table scan. Alternatively, it may be accessed using a single index or a cluster scan or a merge of several single column no unique indexes.
Several rules are used to narrow down or prune the search space to speed up access path selection. For example, if there is a lookup by ROWID, it is considered the best and we don’t bother to evaluate other access paths open to the table.
Source := US Education Website