Friday, April 8, 2011

Are all these SQL joins logically equivalent?

Hi,

I'm just wondering if all of the following joins are logically equivalent, and if not, why not?

SELECT t1.x, t2.y from t1, t2 where t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a where t1.b=t2.b and t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b where t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

I guess my real question is: does combining "where" with "on" doing something different from just having multiple conditions ANDed together with "on"?

I work with MySQL, in case that makes a difference.

Thanks,

Ben

From stackoverflow
  • They are logically equivalent and should produce the same result. However, the last one is to be preferred as it states more correctly the semantics of the query - i.e. "join tables t1 and t2".

    The WHERE clause should be used for "filtering" results of the join - e.g.

    ... WHERE t2.some_col > 10
    

    Also, as Constantin has said in another answer, the 4 queries would be different if the join was an OUTER join.

    Gavin Miller : Would the first query fully execute the cross join, or are DBMS smart enough to apply the where clause first?
    Russ Cam : SQL Server editions 2000 onwards will infer the top example as INNER JOINs. Test it and view the execution plan
    Tony Andrews : The answer to that depends on the DBMS. I know that in Oracle the optimiser is smart enough to choose the same best execution plan in all 4 cases, but I don't know about other DBMSs like mySQL.
    Ben Blank : Most RDBMSes (MySQL included) will be able to optimize all four forms identically. However, the fourth form is still preferred both because it's semantically more clear (it makes it more obvious that the conditions are part of the join) and because it provides "insurance" against naive RMBMSes.
  • For INNER JOIN it makes no logical difference and optimizer should produce same plans. But for OUTER joins it becomes important whether you put condition in WHERE or FROM ... JOIN clause. This is because FROM and ON clauses are processed before WHERE clause: ANSI SQL logical query processing

  • Yes, as others have stated, the result is the same from all these queries.

    FWIW, you can also use this shorthand syntax when you're doing an equi-join on column names that are the same in both tables:

    SELECT t1.x, t2.y from t1 join t2 using (a, b, c);
    

    As far as optimization, it should be optimized the same. That is, the RDBMS should be smart enough to analyze the WHERE syntax the same, and perform joins instead of generating an intermediate huge cross-join result and applying filtering conditions to it. This is such a common type of query, that it's also common for a given RDBMS implementation to recognize and optimize it.

    In the case of MySQL, join and where are (kind of) evaluated together. Try using EXPLAIN to analyze your query. If the "type" column indicates "eq_ref" it means it's using an indexed join. This is the best type of join with respect to optimization. If "type" is "ref" it's good too.

    You can get these join optimization types whether you put the condition in the JOIN...ON clause or the WHERE clause.

    IronGoofy : Is using 'using' standard SQL or is that specific to some vendor's SQL implementation?
    Bill Karwin : Yes, the USING syntax is standard SQL. I haven't encountered any brand of database that supports JOIN...ON, but does not support JOIN...USING. Note the parentheses are mandatory after USING, but optional after ON.
  • They are logically equivalent. However, where you define the join conditions makes a difference as to how many records are used in the temporary table on which the where clause is applied. That is,

    If table t1, t2 and t3 had 10 records each, the statement,

    SELECT t1.x, t2.y from t1, t2 where t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;
    

    results in 1000 records of a permutation of the three tables records and then the where clause is applied.

    For

    SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;
    

    only ten records are in the temporary table before any where clause (none in this case) is applied. The second method is much faster when working with large tables.

0 comments:

Post a Comment