Wednesday, 4 November 2020

Performing Phrase Searches with a FULLTEXT Index

 

Problem

You want to perform a FULLTEXT search for a phrase, that is, for words that occur adjacent to each other and in a specific order.

Solution

Use the FULLTEXT phrase search capability, or combine a non-phrase FULLTEXT search with regular pattern matching.

Discussion

To find records that contain a particular phrase, you can’t use a simple FULLTEXT search:

mysql> SELECT COUNT(*) FROM kjv
    -> WHERE MATCH(vtext) AGAINST('still small voice');
+----------+
| COUNT(*) |
+----------+
|      548 |
+----------+

The query returns a result, but it’s not the result you’re looking for. A FULLTEXT search computes a relevance ranking based on the presence of each word individually, no matter where it occurs within the vtext column, and the ranking will be nonzero as long as any of the words are present. Consequently, this kind of query tends to find too many records.

As of MySQL 4.0.2, FULLTEXT searching supports phrase searching in Boolean mode. To use it, just place the phrase within double quotes.

mysql> SELECT COUNT(*) FROM kjv
    -> WHERE MATCH(vtext) AGAINST('"still small voice"' IN BOOLEAN MODE);
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+

Prior to 4.0.2, a workaround is necessary. You could use an IN BOOLEAN MODE search to require each word to be present, but that doesn’t really solve the problem, because the words can still occur in any order:

mysql> SELECT COUNT(*) FROM kjv
    -> WHERE MATCH(vtext)
    -> AGAINST('+still +small +voice' IN BOOLEAN MODE);
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+

If you use a SQL pattern match instead, it returns the correct result:

mysql> SELECT COUNT(*) FROM kjv
    -> WHERE vtext LIKE '%still small voice%';
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+

However, using a SQL pattern match is likely to be slower than a FULLTEXT search. So it seems you have the unpleasant choice of using a method that is faster but doesn’t produce the desired results, or one that works properly but is slower. Fortunately, those aren’t your only options. You can combine both methods in the same query:

mysql> SELECT COUNT(*) FROM kjv
    -> WHERE MATCH(vtext) AGAINST('still small voice')
    -> AND vtext LIKE '%still small voice%';
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+

What this gains you is the best of both types of matching:

  • Using the MATCH( ) expression, MySQL can perform a FULLTEXT search to produce a set of candidate rows that contain words in the phrase. This narrows the search considerably.

  • Using the SQL pattern test, MySQL can search the candidate rows to produce only those records that have all the words arranged in the proper order.

This technique will fail if all the words are less than the indexing engine’s minimum word length or occur in more than half the records. In that case, the FULLTEXT search returns no records at all. You can find the records using a SQL pattern match.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

Blog Archive