Thursday, May 5, 2011

Recommend SQL data model for Semantic Network nodes?

We're building a RDBMS-based web site for a federal semantic network (RDF, Protege, etc). This is basically a large collection of nodes, each having a large and indefinite set of named relationships to (and from) other nodes.

My first thought is a single table for all the nodes (name, description, etc), plus one table per named relationship. Any better ideas out there?

From stackoverflow
  • No, that sould be fine. Pay attention to primary key and indexes, so that the performance is good.

  • If you didn't have a single table for the nodes, you'd have to define a lot of relation tables. Each new node type would require a new relation table with every old node type. That could get out of hand quickly.

    So a single table sounds best. You can always use a 1:1 relation to extend it, if you need additional fields for certain node types.

  • On further reflection, two tables total might do, one for nodes (id, name, description), and other for relations (id, name, description, from, to), where from and two are ids in the nodes table (ints). Still on the right track?

    Lucero : I thought that you wanted two tables from the start, I misread your text as "plus one table for named relationships"... ;) Maybe make an edit, one table per relationship is not the way you want to go (except if you have few very specific relationship types with additional attributes).
    Brad Cox : Sorry about that. Thanks for the help!!
    Lucero : No need to be sorry, I was just reading to quickly...
  • if you're using sql server 2008, you might want to consider the new HierarchyID datatype to store your hierarchy in. It's optimized for storage.

  • You could optimize the performance by creating 2 rows per relation.

    Let's say you have a table Items and a table Relations and that Person A has a relation with Person B. The Relations table has a left and right column, both referring to Items. Now, if you only have one row for this relation, and you want all relations for a certain Item, you would have a query looking like this:

    SELECT * FROM Relations WHERE LeftItemId = @ItemId OR RightItemId = @ItemId
    

    The OR in this query will ruin your performance! If you would duplicate the row and switch the relation (left becomes right and vice versa) the query looks like this:

    SELECT * FROM Relations WHERE LeftItemId = @ItemId
    

    With the right index this one will go blazingly fast.

0 comments:

Post a Comment