Matcher Parameters
You can parameterize any
"clause"
in the the "matchers"
object of an entity model. Matcher parameters ("params"
) are variables that allow you to pass arbitrary values from the attribute to the matcher. This gives you more flexibility and control over the matching process when you run a resolution job.
This tutorial will show how to define and override matcher parameters to modify the behavior of matchers at runtime.
Let's dive in.
1. Prepare for the tutorial
1.1 Open the Kibana Console UI
1.2 Delete any old tutorial indices
Let's start from scratch. Delete any tutorial indices you might have created from other tutorials.
DELETE zentity_tutorial_7_*
1.3 Create the tutorial index
Now create the index for this tutorial.
PUT zentity_tutorial_7_matcher_parameters
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis" : {
"filter" : {
"punct_white" : {
"pattern" : "\\p{Punct}",
"type" : "pattern_replace",
"replacement" : " "
},
"remove_non_digits" : {
"pattern" : "[^\\d]",
"type" : "pattern_replace",
"replacement" : ""
}
},
"analyzer" : {
"name_clean" : {
"filter" : [
"lowercase",
"punct_white"
],
"tokenizer" : "standard"
},
"phone_clean" : {
"filter" : [
"remove_non_digits"
],
"tokenizer" : "keyword"
}
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"first_name": {
"type": "text",
"fields": {
"clean": {
"type": "text",
"analyzer": "name_clean"
}
}
},
"last_name": {
"type": "text",
"fields": {
"clean": {
"type": "text",
"analyzer": "name_clean"
}
}
},
"phone": {
"type": "text",
"fields": {
"clean": {
"type": "text",
"analyzer": "phone_clean"
}
}
}
}
}
}
1.4 Load the tutorial data
Add the tutorial data to the index.
POST _bulk?refresh
{"index": {"_id": "1", "_index": "zentity_tutorial_7_matcher_parameters"}}
{"first_name": "Allie", "id": "1", "last_name": "Jones", "phone": "202-555-1234"}
{"index": {"_id": "2", "_index": "zentity_tutorial_7_matcher_parameters"}}
{"first_name": "Alicia", "id": "2", "last_name": "Johnson", "phone": "202-123-4567"}
{"index": {"_id": "3", "_index": "zentity_tutorial_7_matcher_parameters"}}
{"first_name": "Allie", "id": "3", "last_name": "Joans", "phone": "202-555-1432"}
{"index": {"_id": "4", "_index": "zentity_tutorial_7_matcher_parameters"}}
{"first_name": "Ellie", "id": "4", "last_name": "Jones", "phone": "202-555-1234"}
{"index": {"_id": "5", "_index": "zentity_tutorial_7_matcher_parameters"}}
{"first_name": "Ali", "id": "5", "last_name": "Jones", "phone": "202-555-1234"}
Here's what the tutorial data looks like.
id | first_name | last_name | phone |
---|---|---|---|
1 | Allie | Jones | 202-555-1234 |
2 | Alicia | Johnson | 202-123-4567 |
3 | Allie | Joans | 202-555-1423 |
4 | Ellie | Jones | 202-555-1234 |
5 | Ali | Jones | 202-555-1234 |
2. Create the entity model
Request
PUT _zentity/models/zentity_tutorial_7_person
{
"attributes": {
"first_name": {
"type": "string"
},
"last_name": {
"type": "string"
},
"phone": {
"type": "string"
}
},
"resolvers": {
"name_phone": {
"attributes": [ "first_name", "last_name", "phone" ]
}
},
"matchers": {
"fuzzy": {
"clause":{
"match": {
"{{ field }}": {
"query": "{{ value }}",
"fuzziness": "auto"
}
}
}
},
"fuzzy_params": {
"clause":{
"match": {
"{{ field }}": {
"query": "{{ value }}",
"fuzziness": "{{ params.fuzziness }}"
}
}
},
"params": {
"fuzziness": "auto"
}
}
},
"indices": {
"zentity_tutorial_7_matcher_parameters": {
"fields": {
"first_name.clean": {
"attribute": "first_name",
"matcher": "fuzzy_params"
},
"last_name.clean": {
"attribute": "last_name",
"matcher": "fuzzy_params"
},
"phone.clean": {
"attribute": "phone",
"matcher": "fuzzy_params"
}
}
}
}
}
2.1 Review the matchers
We defined two matchers called
"fuzzy"
and "fuzzy_params"
as shown in this section:{
"matchers": {
"fuzzy": {
"clause":{
"match": {
"{{ field }}": {
"query": "{{ value }}",
"fuzziness": "auto"
}
}
}
},
"fuzzy_params": {
"clause":{
"match": {
"{{ field }}": {
"query": "{{ value }}",
"fuzziness": "{{ params.fuzziness }}"
}
}
},
"params": {
"fuzziness": "auto"
}
}
}
}
These matchers are nearly identical. Both will perform the same fuzzy matching logic by default. But the
"fuzzy_params"
matcher uses "params"
to turn the "fuzziness"
field into a variable.
Let's look at our
"fuzzy"
matcher. This matcher uses no params.{
"matchers": {
"fuzzy": {
"clause":{
"match": {
"{{ field }}": {
"query": "{{ value }}",
"fuzziness": "auto"
}
}
}
}
}
}
Additional variables or "params" can be passed to the matcher using the syntax
{{ params.PARAM_NAME }}
where PARAM_NAME
is the name of your parameter. You must define the default values for each parameter in the "params"
object adjacent to the "clause"
object of a matcher.
Now let's look at our
"fuzzy_params"
matcher. This matcher uses "params"
to allow the "fuzziness"
field of the match
clause to be changed at runtime.{
"matchers": {
"fuzzy_params": {
"clause":{
"match": {
"{{ field }}": {
"query": "{{ value }}",
"fuzziness": "{{ params.fuzziness }}"
}
}
},
"params": {
"fuzziness": "auto"
}
}
}
}
Now that we have defined a matcher with parameters, let's see how we can override the default values of those parameters.
3. Resolve an entity
Let's use the Resolution API to resolve a person with a known first name, last name, and phone number.
Request
POST _zentity/resolution/zentity_tutorial_7_person?pretty&_source=false&_explanation=true
{
"attributes": {
"first_name": [ "Allie" ],
"last_name": [ "Jones" ],
"phone": [ "202-555-1234" ]
}
}
Response
{
"took" : 7,
"hits" : {
"total" : 2,
"hits" : [ {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "1",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Allie" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Allie",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
}, {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "4",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Ellie" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Ellie",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
} ]
}
}
Only two results were returned. Out of the five documents that exist in the index, four of them appear to be the entity that we are searching for. We need to allow more fuzziness to capture those other two documents.
Our
"fuzzy_params"
matcher uses a "fuzziness"
value of "auto"
. According to its documentation, a value of "auto"
will match strings that differ by one character if the strings are 3-5 characters long, and it will match strings that differ by two characters if the strings are 6+ characters long. The first name and last name of our entity falls within the range of 3-5 characters, which will allow only one character difference to match.
Let's set the value of
"fuzziness"
to 2
for our "first_name"
attribute.
Request
POST _zentity/resolution/zentity_tutorial_7_person?pretty&_source=false&_explanation=true
{
"attributes": {
"first_name": {
"values": [ "Allie" ],
"params": {
"fuzziness": "2"
}
},
"last_name": [ "Jones" ],
"phone": [ "202-555-1234" ]
}
}
Response
{
"took" : 10,
"hits" : {
"total" : 3,
"hits" : [ {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "1",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Allie" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Allie",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
}, {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "4",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Ellie" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Ellie",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
}, {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "5",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Ali" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Ali",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
} ]
}
}
This returned three of the four matching documents. The first names "Allie" and "Ali" now match because they differ by two characters.
Let's also set the value of
"fuzziness"
to 2
for our "last_name"
attribute.
Request
POST _zentity/resolution/zentity_tutorial_7_person?pretty&_source=false&_explanation=true
{
"attributes": {
"first_name": {
"values": [ "Allie" ],
"params": {
"fuzziness": "2"
}
},
"last_name": {
"values": [ "Jones" ],
"params": {
"fuzziness": "2"
}
},
"phone": [ "202-555-1234" ]
}
}
Response
{
"took" : 15,
"hits" : {
"total" : 4,
"hits" : [ {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "1",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Allie" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Allie",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
}, {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "3",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Allie" ],
"last_name" : [ "Joans" ],
"phone" : [ "202-555-1432" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Allie",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Joans",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1432",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
}, {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "4",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Ellie" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Ellie",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
}, {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "5",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Ali" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Ali",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "2"
}
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : { }
} ]
}
} ]
}
}
Now we have all four matching results. The last names "Jones" and "Joans" now match because they differ by two characters. The phone number for document
"3"
also differs by two characters from the other phone numbers. They matched because a fuzziness value of "auto"
allows two characters to differ when the length of the strings are greater than or equal to six characters.
What if we disabled
"fuzziness"
on every attribute? Let's try it.
Request
POST _zentity/resolution/zentity_tutorial_7_person?pretty&_source=false&_explanation=true
{
"attributes": {
"first_name": {
"values": [ "Allie" ],
"params": {
"fuzziness": "0"
}
},
"last_name": {
"values": [ "Jones" ],
"params": {
"fuzziness": "0"
}
},
"phone": {
"values": [ "202-555-1234" ],
"params": {
"fuzziness": "0"
}
}
}
}
Response
{
"took" : 2,
"hits" : {
"total" : 1,
"hits" : [ {
"_index" : "zentity_tutorial_7_matcher_parameters",
"_type" : "_doc",
"_id" : "1",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"first_name" : [ "Allie" ],
"last_name" : [ "Jones" ],
"phone" : [ "202-555-1234" ]
},
"_explanation" : {
"resolvers" : {
"name_phone" : {
"attributes" : [ "first_name", "last_name", "phone" ]
}
},
"matches" : [ {
"attribute" : "first_name",
"target_field" : "first_name.clean",
"target_value" : "Allie",
"input_value" : "Allie",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "0"
}
}, {
"attribute" : "last_name",
"target_field" : "last_name.clean",
"target_value" : "Jones",
"input_value" : "Jones",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "0"
}
}, {
"attribute" : "phone",
"target_field" : "phone.clean",
"target_value" : "202-555-1234",
"input_value" : "202-555-1234",
"input_matcher" : "fuzzy_params",
"input_matcher_params" : {
"fuzziness" : "0"
}
} ]
}
} ]
}
}
Only one document matched because every attribute matched our inputs exactly.
Conclusion
You learned how to parameterize the clauses of matchers in entity models. This gives you the ability to modify the behavior of matchers at runtime.
The next tutorial will introduce date attributes, which require matcher parameters and can be used to match both points in time and ranges of time.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.