{"id":490,"date":"2020-03-29T17:37:33","date_gmt":"2020-03-29T17:37:33","guid":{"rendered":"https:\/\/www.samarthya.me\/wps\/?p=490"},"modified":"2020-03-29T17:51:30","modified_gmt":"2020-03-29T17:51:30","slug":"text-elastic","status":"publish","type":"post","link":"https:\/\/blog.samarthya.me\/wps\/2020\/03\/29\/text-elastic\/","title":{"rendered":"Text &#8211; Elastic"},"content":{"rendered":"\n<p>If you are not aware of mapping, it might be a little confusing but I would try my best.<\/p>\n\n\n\n<p>Every document that gets indexed in elastic has fields, every field that is accessible has an associated type with it (Inverted Index) which helps elastic organize data for quick retrieval.<\/p>\n\n\n\n<p>I create a sample index my_index with a field <code>country_name<\/code> that was auto-mapped as I added the value.<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<pre class=\"wp-block-preformatted\">POST my_index\/_doc\/1\n {\n   \"country_name\": \"The United States of America\"\n }\n\n\nGET my_index\/_mapping\n\n{\n   \"my_index\" : {\n     \"mappings\" : {\n       \"properties\" : {\n         \"country_name\" : {\n           \"type\" : \"text\",\n           \"fields\" : {\n             \"keyword\" : {\n               \"type\" : \"keyword\",\n               \"ignore_above\" : 256\n             }\n           }\n         }\n       }\n     }\n   }\n }<\/pre>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Text<\/h2>\n\n\n\n<p>As evident the default type provided to the <code>country_name<\/code> is TEXT, and another something called <code>fields<\/code> (Multi-fields which I will touch upon later).<\/p>\n\n\n\n<p>From the official documentation<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>A field to index full-text values, such as the body of an email or the description of a product. These fields are <code>analyzed<\/code>, that is they are passed through an <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/analysis.html\">analyzer<\/a> to convert the string into a list of individual terms before being indexed.<\/p><cite>@elastic.co<\/cite><\/blockquote>\n\n\n\n<p>If you need to index structured content it is better to use Keyword (the one in the fields as shown above)<\/p>\n\n\n\n<p>Keyword is typically used for Filtering, like show all the movies of category comedy.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What is the difference?<\/h3>\n\n\n\n<p>One is for full text search and the other for aggregations and sorting.<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<p>A simple search request<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">GET my_index\/_search<\/pre>\n\n\n\n<p>Since there is only one document currently it should return something like<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">{<br>   \"took\" : 1,<br>   \"timed_out\" : false,<br>   \"_shards\" : {<br>     \"total\" : 1,<br>     \"successful\" : 1,<br>     \"skipped\" : 0,<br>     \"failed\" : 0<br>   },<br>   \"hits\" : {<br>     \"total\" : {<br>       \"value\" : 1,<br>       \"relation\" : \"eq\"<br>     },<br>     \"max_score\" : 1.0,<br>     \"hits\" : [<br>       {<br>         \"_index\" : \"my_index\",<br>         \"_type\" : \"_doc\",<br>         \"_id\" : \"1\",<br>         \"_score\" : 1.0,<br>         \"_source\" : {<br>           \"country_name\" : \"The United States of America\"<br>         }<br>       }<br>     ]<br>   }<br> }<\/pre>\n\n\n\n<p>Let us run an aggregation to highlight the difference<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">GET my_index\/_search<br> {<br>   \"aggs\": {<br>     \"Top Country\": {<br>       \"terms\": {<br>         \"field\": \"country_name\",<br>         \"size\": 10<br>       }<br>     }<br>   }<br> }<\/pre>\n\n\n\n<p>What do you guess the response will be? I am running a <code>terms<\/code> aggregation on <code>country_name<\/code>.<\/p>\n\n\n\n<p>You should get an error like below<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\"error\": {<br>     \"root_cause\": [<br>       {<br>         \"type\": \"illegal_argument_exception\",<br>         \"reason\": \"Fielddata is disabled on text fields by default. Set fielddata=true on [country_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.\"<br>       }<br>     ],<\/pre>\n\n\n\n<p>If it is simple <code>TEXT<\/code> it will be broken down to be indexed, to make the following query possible<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">GET my_index\/_search<br> {<br>   \"size\": 0, <br>   \"query\": {<br>     \"match\": {<br>       \"country_name\": \"America\"<br>     }<br>   }<br> }<\/pre>\n\n\n\n<p>It should respond as under<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">{<br>   \"took\" : 1,<br>   \"timed_out\" : false,<br>   \"_shards\" : {<br>     \"total\" : 1,<br>     \"successful\" : 1,<br>     \"skipped\" : 0,<br>     \"failed\" : 0<br>   },<br>   \"hits\" : {<br>     \"total\" : {<br>       \"value\" : 1,<br>       \"relation\" : \"eq\"<br>     },<br>     \"max_score\" : null,<br>     \"hits\" : [ ]<br>   }<br> }<\/pre>\n\n\n\n<p>But if I aggregate and since it is broken down it wont work, and hence the error as seen above. If I want that <code>agg<\/code> to work, I will simply modify the query as under.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">GET my_index\/_search<br> {<br>   \"aggs\": {<br>     \"Top Country\": {<br>       \"terms\": {<br>         \"field\": \"country_name.keyword\",<br>         \"size\": 10<br>       }<br>     }<br>   }<br> }<\/pre>\n\n\n\n<p>and when executed it shall return something like<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">{<br>\n  \"took\" : 1,<br>\n  \"timed_out\" : false,<br>\n  \"_shards\" : {<br>\n    \"total\" : 1,<br>\n    \"successful\" : 1,<br>\n    \"skipped\" : 0,<br>\n    \"failed\" : 0<br>\n  },<br>\n  \"hits\" : {<br>\n    \"total\" : {<br>\n      \"value\" : 1,<br>\n      \"relation\" : \"eq\"<br>\n    },<br>\n    \"max_score\" : 1.0,<br>\n    \"hits\" : [<br>\n      {<br>\n        \"_index\" : \"my_index\",<br>\n        \"_type\" : \"_doc\",<br>\n        \"_id\" : \"1\",<br>\n        \"_score\" : 1.0,<br>\n        \"_source\" : {<br>\n          \"country_name\" : \"The United States of America\"<br>\n        }<br>\n      }<br>\n    ]<br>\n  },<br>\n  \"aggregations\" : {<br>\n    \"Top Country\" : {<br>\n      \"doc_count_error_upper_bound\" : 0,<br>\n      \"sum_other_doc_count\" : 0,<br>\n      \"buckets\" : [<br>\n        {<br>\n          \"key\" : \"The United States of America\",<br>\n          \"doc_count\" : 1<br>\n        }<br>\n      ]<br>\n    }<br>\n  }<br>\n}<\/pre>\n<\/div><\/div>\n\n\n\n<p>Simple yet effective.<\/p>\n\n\n\n<p>Helpful links<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/keyword.html<\/li><li>https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/text.html<\/li><li>https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/doc-values.html<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Multi Fields<\/h3>\n\n\n\n<p>It is often useful to index the same field in different ways for different\npurposes. For instance, a <code>string<\/code> field could be mapped as\na <code>text<\/code> field for full-text search, and as a <code>keyword<\/code> field for\nsorting or aggregations.  Alternatively, you could index a text field with\nthe <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/analysis-standard-analyzer.html\"><code>standard<\/code> analyzer<\/a>, the\n<a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/analysis-lang-analyzer.html#english-analyzer\"><code>english<\/code><\/a> analyzer, and the\n<a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/analysis-lang-analyzer.html#french-analyzer\"><code>french<\/code> analyzer<\/a>.<\/p>\n\n\n\n<p>Most datatypes support multi-fields via the <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/multi-fields.html\"><code>fields<\/code><\/a> parameter.<\/p>\n\n\n\n<h2 class=\"has-text-align-center wp-block-heading\">&#8212; THE &#8211; END &#8212;<\/h2>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you are not aware of mapping, it might be a little confusing but I would try my best. Every document that gets indexed in elastic has fields, every field that is accessible has an associated type with it (Inverted Index) which helps elastic organize data for quick retrieval. I create a sample index my_index [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":335,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[34],"tags":[53],"class_list":["post-490","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical","tag-elasticsearch"],"_links":{"self":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/490","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/comments?post=490"}],"version-history":[{"count":0,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/490\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media\/335"}],"wp:attachment":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media?parent=490"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/categories?post=490"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/tags?post=490"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}