Tuesday, May 3, 2011

SQL: efficiently look up many IDs coming from outside the DB

I need to look up thousands of rows by ID, but the IDs come from outside the DB, so I can't get them from other tables like so:

SELECT * FROM some_table WHERE ID IN ( SELECT KnownID FROM some_other_table WHERE someCondition )

Is there anything better than this:

SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003', ...)

Can I parametrize such a query where the number of id's isn't fixed? Or do I just end up concatenating a big string together each time?

(I'm using sqlite, btw, but I'm curious about other databases too.)

From stackoverflow
  • I would create a temp table with the variable values and do a "select were in".

    SquareCog : If you create a temp table you might as well use a join instead of an IN clause.
    Jegschemesch : I find it odd to think that would be faster. What would the creation query look like? Wouldn't that just result in a really big query string just the same?
    KM : select * FROM some_table s inner join temp_table t on s.key=t.key
    KM : also, on a simple query, it may run the same, but as you get more complex queries (derived tables, group by, etc), I'd bet that the join would be much faster than the IN
  • (I'm using sqlite, btw, but I'm curious about other db's too.)

    in sql server you could have a linked server and then just do a join between the two if it is on a different server. If it is on the same server but in different database then you would just do a regular join and add the dbname in from if the table and schema names

  • See my (accepted and implemented by the asker) answer here, to a very similar question.

  • what format are your input IDs arriving in?

    Are they manually typed in by a user? in a text/csv/xml file? pulled in from a service ?

    If you get them in bulk, you could insert them into a #temp table and then select against your #temp table... e.g.

    //PSEUDO CODE
    foreach(string s in ListOfIdsFromFile)
    {
        //INSERT INTO #TempTable (ID) Values (ConvertToInt32(s))
    }
    

    Then

    SELECT * FROM some_table a INNER JOIN #temp_table b ON a.ID = b.ID
    
    Jegschemesch : Doesn't the cost of creating the temp nullify the gain in the select?
    SquareCog : No, if you do a bulk load. Yes, if you actually do the foreach loop above.
  • SQL Server 2008 has a feature called Table Valued Parameters. See this article.

  • In SqlServer you can do something like this

    SELECT * FROM some_table WHERE ID IN (SELECT ID1 UNION ALL SELECT ID2  UNION ALL SELECT ID3  UNION ALL SELECT ID4  UNION ALL SELECT ID4 )
    
  • Read this article by Erland Sommarskog

0 comments:

Post a Comment