Monday, April 11, 2011

How can I implement a boolean tag search in SQL?

Given a table of items, a table of tags and a join table between them, what is a good and efficient way to implement queries of the form:

p1 AND p2 AND ... AND pn AND NOT n1 AND NOT n2 ... AND NOT nk

I am using SQL. So to find all items that match all tags p1...pn and none of n1...nk?

Is there a good "standard" solution for this?

From stackoverflow
  • I think this is what you're looking for:

    SELECT * FROM TABLE_NAME WHERE COLUMN1 IN ('value1','value2','value3') AND COLUMN1 NOT IN ('value4','value5','value6')
    

    If not, let me know. I may have misunderstood your question.

  • It depends on how you're storing tags in the database, but you probably want the IN operator:

    SELECT tag FROM myTags WHERE tag IN ('p1','p2',...)
    SELECT tag FROM myTags WHERE tag NOT IN ('p1','p2',...)
    
  • SELECT DISTINCT itemID FROM ItemsTags it, Tags t 
    WHERE it.tagID = t.ID AND t.tag IN ('p1','p2','p3') AND t.tag NOT IN ('p4','p5','p6')
    
  • Difficult to say without knowing your schema, but something like this would work:

    select article_id from articles
    inner join tag t1 on t1.article_id=articles.article_id and t1.tag='included_tag'
    inner join tag t2 on t2.article_id=articles.article_id and t2.tag='another_included_tag'
    left outer join tag t3 on t3.article_id=articles.article_id and t3.tag='dont_include_tag'
    left outer join tag t4 on t4.article_id=articles.article_id and t4.tag='also_dont_include_tag'
    where t3.tag_id is null and t4.tag_id is null
    

    inner join to tags that are to be included, and do an anti-join (outer join + where a required column is null) to tags that are not to be included

    Ropstah : Is this implementation faster than the two `IN` / `NOT IN` operators combined as described by vartec? http://stackoverflow.com/questions/602849/how-can-i-implement-a-boolean-tag-search-in-sql/602870#602870
    ʞɔıu : if an article can have more than one tag, vartec's solution would not even work
  • SELECT i.title
      FROM items i
     WHERE EXISTS(SELECT * FROM join_table j JOIN tags t ON t.id = j.tag_id WHERE j.item_id = i.id AND t.name = 'tag1')
       AND NOT EXISTS(SELECT * FROM join_table j JOIN tags t ON t.id = j.tag_id WHERE j.item_id = i.id AND t.name = 'tag2')
    

    SQL server does a good job about this construct, but Oracle might need some hinting to get it right (at least it did 5 years ago).

0 comments:

Post a Comment