Thursday, April 28, 2011

Django DB, finding Categories whose Items are all in a subset

I have a two models:

class Category(models.Model):
    pass

class Item(models.Model):
    cat = models.ForeignKey(Category)

I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].

How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.

From stackoverflow
  • lets say you require all items to be in the following set:

    allowable_items = set([1,3,4])
    

    one bruteforce solution would be to check the item_set for every category as so:

    categories_with_allowable_items = [
        category for category in 
        Category.objects.all() if
        set([item.id for item in category.item_set.all()]) <= allowable_items
    ]
    

    but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):

    categories_with_allowable_items = set([
        item.category for item in
        Item.objects.select_related('category').filter(pk__in=allowable_items) if
        set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
    ])
    

    if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module

    Derek Dahmer : This would work, though ideally I am looking for a way to push all the work to the database.
  • I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.

    EDIT:

    This is the query that gets you part way there:

    from django.db.models import Count
    Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
    

    The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:

    Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])
    
    Derek Dahmer : In Django 1.1, they have added an annotate() function that will let you use the result of aggregation functions in filters. For example, Category.objects.annotate(num_items=Count('item')).filter(num_items=2) would return the categories that have 2 items. (just tested it)
    Carl Meyer : I'm well aware of the new annotate functionality (spent quite a while playing with that very query trying to answer this question). What it doesn't let you do is the correlated subquery to determine what should be in place of 2 (the number of items in allowed_items that link to each category).
  • Category.objects.filter(item__id__in=[1, 3, 5])
    

    Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.

    AdamKG : True, but not the OP's question: he doesn't want Categories with *any* item ID in the set, but Categories for which *all* related items are in the item ID set.
    Jason Christa : In that case this might work: Category.objects.filter(item__id=1).filter(item__id=3).filter(item__id=5)

0 comments:

Post a Comment