August 19, 2021

Understanding How Search Scores Work in Craft CMS

A user on the Craft CMS Discord recently asked a question about why their search results were being scored in a particular way which seemed unintuitive to them, and it became apparent to me that a) I wasn't totally sure how it works either, and b) the search documentation doesn't really go into any details about how the actual scores are calculated, nor are there any resources out there on the topic that I could find. So why not take a closer look and figure out what's really going on here?

The first thing I did was take a peek at Craft's underlying search scoring code to get a better idea of how it actually works. In a nutshell, the database query will return one row for each match on an element title, slug, or other field (element is going to mean entry here, in most cases). It will determine a score for each of those matches based on a number of factors, and then sum all the scores that are related to the same element to come up with the final score. Essentially this means that a single entry might have a search score of 76 which is actually a combined score of 70 for a match in the entry title, 5 for a match on its slug, and 1 for a match somewhere in its body text field.

Here's some example code to output some search results and their scores:

{% set searchTerm = 'example' %}
{% set results = craft.entries()
  .search(searchTerm)
  .orderBy('score')
  .all() %}
<table>
  <thead>
    <tr>
      <th>Title</th>
      <th>Score</th>
    </tr>
  </thead>
  <tbody>
    {% for result in results %}
    <tr>
      <td>{{ result.title }}</td>
      <td>{{ result.searchScore }}</td>
    </tr>
    {% endfor %}
  </tbody>
</table>

To generate these scores, as mentioned above, each row will be scored against each individual search term (in the example above, we've only got one search term which is "example" but we could have multiple or more complex search terms). Obviously this search term would normally be entered into a search field by a user, but here we're hardcoding it for some quick testing.

It's also important to be aware of Craft's defaultSearchTermOptions, which you can override in your project's /config/general.php file. These will have a major impact on what appears in your search results and how they are scored. Of particular interest are subLeft and subRight, which determine whether characters to the left or right of the match in the same word are ignored or not. There's some more info on this here.

The bottom line is: do you want searching for "story" to match "storytime" and/or "history"? By default, Craft sets subLeft to false and subRight to true, so "story" will match "storytime" but not "history". If you set subLeft to true in your config file, it will also match "history". If you set both to false, it will match neither. Personally, I usually set both to true for my projects, but from what I understand this will somewhat negatively impact the performance of searches, although I'm not sure to what extent as I've never really encountered any issues.

Back to scoring, let's take a look at how terms are scored against rows.

Row Scoring

See: _scoreRow(). For each database row returned by the search query:

  1. Each search term linked by an AND (which just refers to a multi-word search term like "story book", which is treated as two terms: "story" AND "book") is scored against the row individually. These terms each have a weight of 1.
  2. Each group of search terms linked by an OR has its terms scored against the row, and each term within the group is weighted less the more terms there are in the group. For example, if I've got the search term "novel OR book OR story", there are three terms in the group so each term's score is weighted at 0.333.
  3. The scores of the AND and OR terms above are added together to produce the final score for the row.

Term Scoring

See: _scoreTerm(). Regardless of whether it's an AND or OR situation, when an individual term is being scored against a row, these are the rules followed:

  1. The term is normalized (meaning that special and accented characters are simplified, everything is lowercased, dashes are removed from slugs etc).
  2. If the search is looking for exact terms only, it will return a score of 0 immediately if there isn't an exact match between the term and the row.
  3. The number of matches is determined, meaning the number of times the term appears in the row (in simple cases, this is probably going to be 1).
  4. If there's an exact match between term and row, the modifier is set to 100.
  5. If subLeft or subRight is true, the modifier is set to 10. If both are false, it's set to 50.
  6. If the row is a page title, the modifier is multiplied by 5.
  7. The term's score is calculated by dividing the number of matches in the row by the row word count and then multiplying by the modifier and weight (remember the weight will be 1 for AND terms, and (1 / number of terms) for terms in OR term groups). The final formula for the term ends up looking like this:
Score = ( Matches / Word Count) x Modifier x Weight

Example 1

Let's try an example. Imagine we've got a search term of "story" where the only result is an entry titled "Our Story" with a slug of "our-story". We get two result rows, one for the title and one for the slug. Both are for the same element (the entry), so they'll be added together to find the final score for that entry.

For the title row ("Our Story"):

  1. The row value is normalized to "our story".
  2. The number of matches is 1 because the term "story" appears once in the row.
  3. There is only one term, so it has a weight of 1.
  4. The row has a word count of 2.
  5. The modifier is 10 because subRight is true, and then is multiplied by 5 because the row is an entry title.
Score = ( Matches / Word Count) x Modifier x Weight
25    = ( 1       / 2         ) x 50       x 1

For the slug row ("our-story"):

  1. The row value is normalized to "our story".
  2. The number of matches is 1 because the term "story" appears once in the row.
  3. There is only one term, so it has a weight of 1.
  4. The row has a word count of 2.
  5. The modifier is 10 because subRight is true.
Score = ( Matches / Word Count) x Modifier x Weight
5     = ( 1       / 2         ) x 10       x 1

So the total score for all rows matching the element will be 25 + 5 = 30, which will be the final score for the "Our Story" entry.

If the slug was something else, like "about", then the final score for the entry would be 25 instead, because the slug result row wouldn't be present if there were no matches there. If there were other fields on the entry that had the Use this field’s values as search keywords checkbox checked and those fields' values also contained the word "story", we'd also get additional result rows for them which would increase the entry's score even further.

Searchable checkbox bbd3c699
Aside from title and slug, only fields with this checkbox checked will be searched.

Also of note is that if the entry was titled "Storytime" instead and had a slug of "storytime", then its score would be doubled (60). This is because the word count is just one which means we're not dividing the number of matches by 2.

Let's look at this search query for "story" when it has a few more results (the slugs are all auto-generated here and are identical in word count to the titles):

Entry TitleScore
Story600
Storytime60
History60
Our Story30
A timely story of history24
A single story in a very long title with lots of words4
An entry containing the word in a non-title field1

The entry titled "Story" gets the best score because its title and slug are exact matches for the search term "story". The "Storytime" and "History" entries get the same score because they're single word titles (reminder: "History" only appears because I have subLeft set to true). "Our Story" gets 30 for the reasons outlined previously. "A timely story of history" scores a little lower due to the added word count despite containing two matches for "story". An entry with a longer title that contains the search term gets a much lower score of 4 because of the division by word count. The last item only has a match for "story" in one of its other fields - but note that if this field contains only one match and more than 10 words it will end up getting rounded down to a score of 0 (though it will still appear in search results).

Example 2

Now let's try a slightly more advanced search with the same set of entries as above. Our search term is now "story word OR time" which can be functionally read as "story AND (word OR time)". This is going to mean that "story" will have a weight of 1, and "word" and "time" will both have a weight of 0.5 since there are two terms in that OR group. Our results will end up being:

Entry TitleScore
Storytime90
A timely story of history30
A single story in a very long title with lots of words7

Let's examine how each of these three entries got their scores. We've got a lot of data here so I'm going to throw the values into tables below, just remember that this is the formula we're using:

Score = ( Matches / Word Count) x Modifier x Weight

Storytime

Slug: "storytime"

RowTermScore =(Matches /Word Count)x Modifierx Weight
Titlestory5011501
Titleword001500.5
Titletime2511500.5
Slugstory1011101
Slugword001100.5
Slugtime511100.5
Total Score90 

A timely story of history

Slug: "a-timely-story-of-history"

RowTermScore =(Matches /Word Count)x Modifierx Weight
Titlestory2025501
Titleword005500.5
Titletime515500.5
Slugstory425101
Slugword005100.5
Slugtime115100.5
Total Score30 

A single story in a very long title with lots of words

Slug: "a-single-story-in-a-very-long-title-with-lots-of-words"

RowTermScore =(Matches /Word Count)x Modifierx Weight
Titlestory4.1667112501
Titleword2.0833112500.5
Titletime0012500.5
Slugstory0.8333112101
Slugword0.4167112100.5
Slugtime0012100.5
Total Score7 

There are plenty of more complex search queries you can use in Craft, but chances are pretty good that your end users aren't going to use any of them, so I'm not going to dig into those.

Key Takeaways

  1. Titles are the most important field for search result scoring.
  2. Don't remove important keywords from slugs as this is another way to score highly.
  3. Titles, slugs, and field values with fewer words will always be stronger matches to individual search terms.
  4. Search keywords in body copy fields with high word counts will probably be rounded out of existence unless the keyword appears multiple times.
  5. You might consider adding a "Search Keywords" field to your entries that doesn't render on the front-end but gives you a place to put in some terms that you really want the entry to appear for when they're searched. Note that this won't help you with SEO as it's only for the internal Craft search. This method has been useful for me in the past, but one thing I've learned in writing up this post is that the more words you stuff into the field, the less impact it will have on the entry's score for any of the individual search terms.

Hope that was helpful! As always, let me know if I've gotten any details wrong or you have any feedback - @GregorTerrill on Twitter!