Full-text search
Full-Text Search in MySQL server lets users run full-text queries against character-based data in MySQL tables. You must create a full-text index on the table before you run full-text queries on a table. The full-text index can include one or more character-based columns in the table.
- FULLTEXT is the index type of full-text index in MySQL.
- InnoDB or MyISAM tables use Full-text indexes.
- Full-text indexes can be created only for VARCHAR, CHAR or TEXT columns.
- A FULLTEXT index definition can be given in the CREATE TABLE statement or can be added later using ALTER TABLE or CREATE INDEX.
- Large data sets without FULLTEXT index is much faster to load data into a table than to load data into a table which has an existing FULLTEXT index. Therefore create the index after loading data.
MySQL Version: 5.6
Syntax:
MATCH (col1,col2,col3...) AGAINST (expr [search_modifier])
- col1, col2, col3 - Comma-separated list that names the columns to be searched
- AGAINST() takes a string to search, and an optional modifier that indicates what type of search to perform.
- The search string must be a string value. The value is constant during query evaluation.
There are three types of full-text searches :
- Natural Language Full-Text Searches
- Boolean Full-Text searches
- Query expansion searches
Note: Some words are ignored in full-text searches.
- The minimum length of the word for full-text searches as of follows :
- Three characters for InnoDB search indexes.
- Four characters for MyISAM search indexes.
- Stop words are words that are very common such as 'on', 'the' or 'it', appear in almost every document. These type of words are ignored during searching.
Natural Language Full-Text Searches
Natural language full-text search interprets the search string as a free text (natural human language) and no special operators are required. Full-text searches are natural language searches if the IN NATURAL LANGUAGE MODE modifier (see the following syntax) is given or not. MATCH() function searches a string against a text collection (A set of one or more columns included in a FULLTEXT index.). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string (given as the argument to AGAINST() function) and the text in that row in the columns named in the MATCH() list.
The basic format of a natural Language null-text searches mode query is as follows:
Code:
Example
mysql> CREATE TABLE tutorial ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), description TEXT, FULLTEXT(title,description) ) ENGINE=InnoDB; Query OK, 0 rows affected (2.40 sec)
Let insert some records in tutorial table:
mysql>INSERT INTO tutorial (title,description) VALUES ('SQL Joins','An SQL JOIN clause combines rows from two or more tables. It creates a set of rows in a temporary table.'), ('SQL Equi Join','SQL EQUI JOIN performs a JOIN against equality or matching column(s) values of the associated tables. An equal sign (=) is used as comparison operator in the where clause to refer equality.'), ('SQL Left Join','The SQL LEFT JOIN, joins two tables and fetches rows based on a condition, which is matching in both the tables and the unmatched rows will also be available from the table before the JOIN clause.'), ('SQL Cross Join','The SQL CROSS JOIN produces a result set which is the number of rows in the first table multiplied by the number of rows in the second table, if no WHERE clause is used along with CROSS JOIN.'), ('SQL Full Outer Join','In SQL the FULL OUTER JOIN combines the results of both left and right outer joins and returns all (matched or unmatched) rows from the tables on both sides of the join clause.'), ('SQL Self Join','A self join is a join in which a table is joined with itself (which is also called Unary relationships), especially when the table has a FOREIGN KEY which references its own PRIMARY KEY.');
Let search the string 'left right' in description field:
mysql> SELECT * FROM tutorial WHERE MATCH(title,description) AGAINST ('left right' IN NATURAL LANGUAGE MODE); +----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | title | description | +----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 5 | SQL Full Outer Join | In SQL the FULL OUTER JOIN combines the results of both left and right outer joins and returns all (matched or unmatched) rows from the tables on both sides of the join clause. | | 3 | SQL Left Join | The SQL LEFT JOIN, joins two tables and fetches rows based on a condition, which are matching in both the tables, and the unmatched rows will also be available from the table before the JOIN clause. | +----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec)
By default, the search is case-insensitive. To perform a case-sensitive full-text search, use a binary collation for the indexed columns. For example, a column that uses the latin1 character set of can be assigned a collation of latin1_bin to make it case sensitive for full-text searches.
When MATCH() is used in a WHERE clause, as in the example shown earlier, the rows returned are automatically sorted with the highest relevance first.
- Relevance values are nonnegative floating-point numbers.
- Zero relevance means no similarity.
- Relevance is computed based on -
- the number of words in the row
- the number of unique words in that row
- the total number of words in the collection
- the number of documents (rows) that contain a particular word.
The following example shows how to retrieve the relevance values explicitly:
<mysql> SELECT id, MATCH(title,description) AGAINST ('left right' IN NATURAL LANGUAGE MODE) AS score FROM tutorial; +----+---------------------+ | id | score | +----+---------------------+ | 1 | 0 | | 2 | 0 | | 3 | 0.45528939366340637 | | 4 | 0 | | 5 | 0.8331640362739563 | | 6 | 0 | +----+---------------------+ 6 rows in set (0.00 sec)
Count matches
To count matches, you can use a query like this:
mysql> SELECT COUNT(*) FROM tutorial WHERE MATCH(title,description) AGAINST ('left right' IN NATURAL LANGUAGE MODE); +----------+ | COUNT(*) | +----------+ | 2 | +----------+ 1 row in set (0.03 sec)
Boolean Full-Text Searches
A boolean search interprets the search string using the rules of a special query language. The string contains the words to search for. It can also contain operators that specify requirements such that a word must be present or absent in matching rows, or that it should be weighted higher or lower than usual. Certain common words (stopwords) are omitted from the search index and do not match if present in the search string. MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier. With this modifier, certain characters have special meaning at the beginning or end of words in the search string.
The basic format of a boolean mode query is as follows :
Code:
Characteristics of Boolean Full-Text searches :
- Do not use the 50% threshold that applies to MyISAM search indexes.
- Do not automatically sort rows in order of decreasing relevance.
- Boolean queries against a MyISAM search index can work even without a FULLTEXT index.
- The minimum and maximum word length full-text parameters apply :
- For InnoDB search indexes, innodb_ft_min_token_size and innodb_ft_max_token_size
- for MyISAM search indexes, ft_min_word_len and ft_max_word_len.
- InnoDB full-text search does not support the use of multiple operators on a single search word.
The boolean full-text search supports the following operators:
Operator | Description | Example |
---|---|---|
+
| A leading plus sign indicates that a word must be present in each row that is returned. |
'+join +union'
Find rows that contain both words. '+join union' Search rows that contain the word 'join', but rank rows higher if they also contain 'union' |
- | A leading minus sign indicates that a particular word must not be present in any of the rows that are returned. The - operator acts only to exclude rows that are otherwise matched by other search terms. | '+join -union' Find rows that contain the word 'join' but not 'union'. |
(no operator) | By default, the word is optional, but the rows that contain it are rated higher. | 'join -union' Search rows that contain at least one of the two words. |
> < | These two operators are used to change a word's contribution to the relevance value that is assigned to a row. The > operator increases the contribution and the < operator decreases it. | '+join +(>left <right)' Find rows that contain the words 'join' and 'left' or 'join' and 'right' (in any order), but rank 'join left' higher than 'join right'. |
( ) | Parentheses group words into subexpressions. Parenthesized groups can be nested. | |
~ | A leading tilde acts as a negation operator, causing the word's contribution to the row's relevance to be negative. | '+join ~left' Find rows that contain the word 'join', but if the row also contains the word 'left', rate it lower than if row does not. |
* | The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it is appended to the word to be affected. Words match if they begin with the word preceding the * operator. | 'join*' Find rows that contain words such as 'join', 'joins', 'joining' etc. |
" | A phrase that is enclosed within double quote (“"”) characters matches only rows that contain the phrase literally, as it was typed. | '"left join"' Find rows that contain the exact phrase "let join". |
Example: Boolean Full-Text Searches
In the following query, the query retrieves all the rows that contain the word 'Joins' but not 'right'.
mysql> SELECT * FROM tutorial WHERE MATCH(title,description) AGAINST ('+Joins -right' IN BOOLEAN MODE); +----+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | title | description | +----+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | SQL Joins | An SQL JOIN clause combines rows from two or more tables. It creates a set of rows in a temporary table. | | 3 | SQL Left Join | The SQL LEFT JOIN, joins two tables and fetches rows based on a condition, which is matching in both the tables and the unmatched rows will also be available from the table before the JOIN clause. | +----+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.02 sec)
No comments:
Post a Comment
Note: only a member of this blog may post a comment.