Sunday, May 1, 2011

How to create autosuggest to get keywords as fast as google search or live search

I am creating auto suggest function in my website search box, every time the user press a new key, javascript call a webservice on the server side to get 10 most relevant keywords from the db and gives to the javascript again, and javascript populate the search autosuggest list.

My function is not too slow, but comparing to what live.com or google.com doing its veryyyy slow, i tested them and really i feel that they are getting the keywords from my PC not from their servers.

How they get keywords very fast like this, and sure they have million million times my keywords?
there is a famous style to do like this?
Also using my firebug, i found that they are not calling webservice "may be calling by a way i am not aware", but i found in the Net tab that a new get is happening.

From stackoverflow
  • The first thing I would suggest is to make sure your web service is caching they keywords in memory rather than hitting the DB each time - assuming of course that your data set is small enough to do this.

    Other than that, you would have to parallelize the query somehow across multiple servers, which is probably way more complicated than you want to get.

  • First, you should probably re-phrase your question. Any answer to "how can I be as fast as google" is bound to be "get used to disappointment."

    Given that, you can still make your solution better. It seems like every keypress, you do a round trip to a service and to a database. I doubt google is going that. Perhaps you should focus on doing less round trips (cahcing, bring back more when you do have to go to the DB, etc).

  • Not sure where you're looking, but certainly on live.com I get a request for each letter:

    Firbug Net Console - AutoComplete

    As you can see, there's very little coming back across the wire - 500B - that's what you're aiming for - a lean web service that returns the minimum you need to display that back to the user.

    Then on top of that, as the others have said, cache previous responses, etc.

    Not that the results often aren't alphabetical, and so if you don't display your ordering criteria, you can work on the principle of "Something now is better than completely accurate later".

  • Rather than making a request every keypress, what if you just make a request every certain amount of time if there has been a keypress in the interim? If you did something like 100ms intervals, it would still look "instant" but could potentially be much less load on your server. Also, are you having the client cache the keywords? If the user backspaces in the search field, it shouldn't have to re-contact the server to get the keywords. Also, you can immediately filter your current list of keywords on every keypress without contacting the server (you'll just end up with less than 10 of them because some/all of the ones you already have will not contain the letter just typed). That can fill in the "gaps" between actual requests for data to make it seem more instant.

    Amr ElGarhy : Can you explain "make a request every certain amount of time" idea more?, seams something smart, but i feel that i can't understand it perfect.
    rmeador : the idea is to limit requests to at most every 100ms. in pseudocode, instead of "on_keypress: make request", do something like this: "on_keypress: if timer is running, return. else set timer to make request 100ms from now". The request the timer makes should use whatever the user has typed at that point, not what the user had typed when the timer was started (not sure how to do this in JS, since it may be a threading thing).
  • There's no reason to request search terms with every keypress. Google does it (1) because they can, and (2) because they're presenting terms across the corpus of the Internet.

    In most web applications, there are a much smaller number of "common" search terms -- usually no more than a hundred or so, and as few as a dozen that are context appropriate.

    You can retrieve the entire set of relevant terms and build a prefix map on the client side at the time the page loads. By matching the current search term against this prefix map, you can give suggestions far faster that Google.

    The limitation is that, at some point, you'll run out of suggested terms. But again, that's really not an issue: even Google runs out of suggestions for "transnormative" (a made-up word, but there are 191 results from a full search).

  • As someone suggested above using ETags with REST api could mean some additional caching for repeated queries . Check out this article on Jeo Gregorio's blog for more info etags in REST context.

  • There are two main things that you can do:

    1. Use as much caching as possible in the server side. After all, the search queries follow the power-law. There are few queries with many requests and many-many queries with very few requests each. A perfect environment for caching
    2. You would need to minimize the amount of data that is transmitted over and one way to do this is through the use of the radix tree. If you need to transmit a list of 20 strings all sharing a common prefix then you don't need to transmit 20 separate strings. You can transmit the prefix once and then the 20 different parts.
  • Found this Blog post which talk in details in this point:

    Autocomplete Data Structures

0 comments:

Post a Comment