In MySQL, the
REGEXP_INSTR() function returns the starting index of a substring that matches the regular expression pattern.
The index starts at
1. If there’s no match, the result is 0.Syntax
The syntax goes like this:
REGEXP_INSTR(expr, pat[, pos[, occurrence[, return_option[, match_type]]]])
Where
expr is the input string and pat is the regular expression pattern for the substring.
The optional
pos argument allows you to specify a position within the string to start the search. If omitted, it starts at position 1.
The optional
occurrence argument allows you to specify which occurrence of the match to search for. If omitted, the first occurrence is used (occurrence 1).
The optional
return_option argument allows you to specify which type of position to return. If you use 0, it will return the position of the first character in the matching substring. If you use 1 it returns the position of the first character following the matching substring. If omitted, the value is 0.
The optional
match_type argument allows you to refine the regular expression. For example, you can use this argument to specify case-sensitive matching or not.Example 1 – Basic Usage
Here’s a basic example:
SELECT REGEXP_INSTR('Cat', 'at') Result;
Result:
+--------+ | Result | +--------+ | 2 | +--------+
In this case there’s a match, and the substring starts at position 2.
Example 2 – No Match
Here’s an example where there’s no match:
SELECT REGEXP_INSTR('Cat', '^at') Result;
Result:
+--------+ | Result | +--------+ | 0 | +--------+
There’s no match, so the result is
0. There’s no match because I specified that the string must start with the substring.
Let’s change it so that it does start with that substring:
SELECT REGEXP_INSTR('at', '^at') Result;
Result:
+--------+ | Result | +--------+ | 1 | +--------+
Example 3 – The pos Argument
Here’s an example of specifying a starting position:
SELECT REGEXP_INSTR('Cat Cat', 'Cat', 2) Result;
Result:
+--------+ | Result | +--------+ | 5 | +--------+
So we get the index of the second occurrence.
Note that the index still starts counting from position 1 regardless of where you specify the starting position.
The following example demonstrates this more clearly:
SELECT
REGEXP_INSTR('Cat Cat', 'Cat', 2) AS 'Pos 2',
REGEXP_INSTR('Cat Cat', 'Cat', 3) AS 'Pos 3',
REGEXP_INSTR('Cat Cat', 'Cat', 5) AS 'Pos 5';
Result:
+-------+-------+-------+ | Pos 2 | Pos 3 | Pos 5 | +-------+-------+-------+ | 5 | 5 | 5 | +-------+-------+-------+
Of course, depending on your regex pattern, this can return the index of completely different substrings. Example:
SELECT
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1) 'Pos 1',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 2) 'Pos 2',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 6) 'Pos 6';
Result:
+-------+-------+-------+ | Pos 1 | Pos 2 | Pos 6 | +-------+-------+-------+ | 1 | 5 | 16 | +-------+-------+-------+
SELECT
REGEXP_SUBSTR('Cat City is SO Cute!', 'C.t', 1) 'Pos 1',
REGEXP_SUBSTR('Cat City is SO Cute!', 'C.t', 2) 'Pos 2',
REGEXP_SUBSTR('Cat City is SO Cute!', 'C.t', 6) 'Pos 6';
Result:
+-------+-------+-------+ | Pos 1 | Pos 2 | Pos 6 | +-------+-------+-------+ | Cat | Cit | Cut | +-------+-------+-------+
Example 4 – The occurrence Argument
Here’s an example of using the
occurrence argument. In all cases, we start at position 1:SELECT
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1) 'Occurrence 1',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 2) 'Occurrence 2',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 3) 'Occurrence 3';
Result:
+--------------+--------------+--------------+ | Occurrence 1 | Occurrence 2 | Occurrence 3 | +--------------+--------------+--------------+ | 1 | 5 | 16 | +--------------+--------------+--------------+
However, if we start at a different position, the result is different:
SELECT
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 2, 1) 'Occurrence 1',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 2, 2) 'Occurrence 2',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 2, 3) 'Occurrence 3';
Result:
+--------------+--------------+--------------+ | Occurrence 1 | Occurrence 2 | Occurrence 3 | +--------------+--------------+--------------+ | 5 | 16 | 0 | +--------------+--------------+--------------+
This happened because our starting position came after the first occurrence had started. Therefore, occurrence 2 became occurrence 1, and occurrence 3 became occurrence 2. And because there were no more occurrences, the result of occurrence 3 was negative (i.e. there was no occurrence 3).
Example 5 – The return_option Argument
Here’s an example of using the
return_option argument:SELECT
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1, 0) 'Option 0',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1, 1) 'Option 1';
Result:
+----------+----------+ | Option 0 | Option 1 | +----------+----------+ | 1 | 4 | +----------+----------+
Option 0 returned the matched substring’s first character. Option 1 returned the position following the matched substring.
Here’s what it looks like if we apply it to the previous example:
SELECT
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1, 0) 'Occurrence 1',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 2, 0) 'Occurrence 2',
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 3, 0) 'Occurrence 3'
UNION ALL
SELECT
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1, 1),
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 2, 1),
REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 3, 1);
Result:
+--------------+--------------+--------------+ | Occurrence 1 | Occurrence 2 | Occurrence 3 | +--------------+--------------+--------------+ | 1 | 5 | 16 | | 4 | 8 | 19 | +--------------+--------------+--------------+
In this case we did one set using option 0, and another using option 1, then joined them together using
UNION ALL.
Example 6 – The match_type Argument
You can provide an additional argument to determine the match type. This allows you to specify things like whether or not the match is case-sensitive, whether or not to include line terminators, etc.
Here’s an example of specifying a case-sensitive match and a case-insensitive match:
SELECT
REGEXP_INSTR('Cat City is SO Cute!', 'c.t', 1, 1, 0, 'c') 'Case-Sensitive',
REGEXP_INSTR('Cat City is SO Cute!', 'c.t', 1, 1, 0, 'i') 'Case-Insensitive';
Result:
+----------------+------------------+ | Case-Sensitive | Case-Insensitive | +----------------+------------------+ | 0 | 1 | +----------------+------------------+
The
match_type argument can contain the following characters:c- Case sensitive matching.
i- Case insensitive matching.
m- Multiple-line mode. Recognize line terminators within the string. The default behavior is to match line terminators only at the start and end of the string expression.
n- The
.character matches line terminators. The default is for.matching to stop at the end of a line. u- Unix-only line endings. Only the newline character is recognized as a line ending by the
.,^, and$match operators.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.