Creating a “did you mean” search function in Yii

Today we are going to be looking at how to create a very basic google style “did you mean” function for our search results. This tutorial is in the Yii framework, but the logic can be applied to any style of programming and as always I’m happy to respond to any questions you might have, mail me.

So first up, there are a million different ways to do this kind of thing. We will be doing it using PHP’s metaphone function. What the metaphone function does, in plain english, is break the words down into sounds by removing the vowels (except if the word starts with a vowel), and replacing certain string patters with characters.

Consider the following:

echo metaphone("accounting");

The results is AKKNTNK

The metaphone function accepts 2 arguments, the second being optional. First is the string, in this case “accounting”,  the next is the number of phonemes returned, that is the amount of characters returned by the function. So for example

echo metaphone("accounting",4);

The results is AKKN

If left blank, the number of phonemes returned is the total amount that the word contains.  For this example lets pretend we are looking at a product database with a structure something like:

  • id
  • product_name
  • product_description

And we will be searching the product_description field for results. Now, the next thing we are going to do is store every search query that the users submit in a table called search_query. The structure of that is going to be:

  • id
  • search_term
  • metaphone_7
  • metaphone_6
  • metaphone_5
  • metaphone_4
  • frequency

Let me explain what that structure is about. The id is self explanatory,  the search_term field will be storing the “clean” version of the users query, like below:

 $query = strtolower(trim($query));

We use PHP’s trim() function to remove any spaces before or after the query, and we use the strtolower() function to make everything lower case. Next up, we store the metaphone values for 7,6,5 and 4 phonemes in their respective fields. The frequency is the amount of times that query has been searched by users, so before we create a new row in the table, we check if there is an existing record for that query and increment its frequency. The frequency value is very important. because we are going to be using it later to ensure we only return results with a frequency higher than a certain value, for our example here, we will use 5 as the minimum frequency

So, the flow currently goes like:

  • User enters a search terms
  • We query the products table for products that match LIKE ‘%{search term}%’
  • If we find a result, we give the user a result

If we don’t find a result we do the  following:

$count = 7;
$result = array();
while ($count >= 4 && empty($result)){
    $result = Product::model()->find(array(
        “condition” => “metaphone_”.$count.” = ‘”.metaphone($query,$count).”‘”
        “order”     => “frequency DESC”,
    );
    $count-–;
}

So what that is doing is setting a count that starts at 7, which we decrement in every instance of the loop by 1. We then use that count to create a dynamic query, where we compare the metaphone({search query},$count) result with the metaphone_$count value from the database, and order it by the frequency (descending). So what this does it give us the result that is most similar and most popular!

But we are not finished there. It is pointless asking a user “did you mean…” if the suggestion returns no result either. So before we return the result to the user, we need to query the product table using the result->search_term value we got from our query.

If that is not empty, we can then return the ask the user “did you mean”. One last thing to do is delete the search term the user just entered from the database. This is because it was most likely a miss spelling and there is no need to store it

That’s it really. It’s a fairly complicated tutorial I guess, so feel free to or mail me if you get stuck