Identify your documents - Using predefined fields

Diving into the functionality

3.4 Using predefined fields

3.4.2 Identify your documents

To identify a document within the same index, Elasticsearch uses a combination of the document’s type and ID, in the _uid field. The _uid field is made up from the _id and _type fields that you always get when searching or retrieving documents:

% curl 'localhost:9200/get-together/group/1?fields&pretty' {

"_index" : "get-together", "_type" : "group",

"_id" : "1", "_version" : 1, "exists" : true }

At this point you might wonder, “Why does Elasticsearch store the same data in two places: you have _id, then _type, then _uid?”

Elasticsearch uses _uid internally for identification, and you don’t have any useful options around it. In contrast, _id and _type are special fields you can search on, And you can change their settings. To make them stored, set store to yes; to make them indexed or even analyzed, change the index option. Table 3.2 shows the default settings for _id and _type:

Table 3.2 Default settings for _id and _type fields

Field name store value index value Observations

_id no no It’s not indexed and not analyzed. You can search on it, but Elasticsearch uses _uid to give you the results.

_type no not_analyzed It’s indexed, and it produces a single term. You can search on it, but you can’t get it as a single field.

PROVIDING IDS FOR YOUR DOCUMENTS

You’ve seen here, and in chapter 2, that when you index a document, you need to tell Elasticsearch the type and the index it belongs to. The document also needs an ID to uniquely identify it within the type. That’s your _id field. There are three ways to specify IDs for your documents:

• Manually add the ID when you index the document.

So far, you’ve mostly provided IDs manually as part of the URI. For example, to index a document with ID 1st, you run something like this:

% curl -XPUT 'localhost:9200/get-together/manual_id/1st&pretty' -d '{

"name": "Elasticsearch Denver"

And you get back something like this:

{

"_index" : "get-together", "_type" : "manual_id", "_id" : "1st",

"_version" : 1 }

You can see in the reply that the _id field returns the value you provided.

• Configure Elasticsearch to take the ID from a field within your document.

The second way to get IDs for your documents is to have Elasticsearch pick the ID from a field within your document. This is useful if you already have a field with unique values, like a barcode for items in an online shop. If you use that as the _id as well, you’ll have a quick way of getting an item if you know the barcode, the index, and the type: you retrieve the document, and no search is required. Also, you have a reliable way of identifying items if you need to update their content. We’ll look at updating documents later in this chapter.

To get IDs from the barcode field, you first need to put that field name in the path option of your _id field. This makes Elasticsearch look for an ID in the barcode field:

% curl -XPUT localhost:9200/online-shop/barcode_id/_mapping -d '{

"barcode_id": { "_id": {

"path": "barcode"

} } }'

TIP For the command to work without an error, create the online-shop index first: curl -XPUT 'localhost:9200/online-shop/'

To index an item with the barcode as the ID, omit the ID from the URI, and use an HTTP POST request instead:

% curl -XPOST 'localhost:9200/online-shop/barcode_id/?pretty' -d '{

"barcode": "abcd",

"name": "Promotional T-Shirt"

And you get back a reply like this:

{

"_index" : "online-shop", "_type" : "barcode_id", "_id" : "abcd",

"_version" : 1 }

You can still use the reply to see that the _id field is what you provided in the barcode field.

• Configure Elasticsearch to automatically generate a unique ID for you.

The final approach to creating document IDs is to rely on Elasticsearch to generate unique IDs for you. This is useful if you don’t have an unique ID already, or you don’t need to identify documents by a certain property. Typically, this is what you do when you index application logs: they don’t have a unique property to identify them, and they’re never updated.

To have Elasticsearch generate the ID, use HTTP POST and omit the ID, like you did with barcodes. The difference is that you don’t need to configure the path property for the _id field.

% curl -XPOST 'localhost:9200/logs/auto_id/?pretty' -d '{

"message": "I have an automatic id"

The reply should look similar to the following:

{

"_index" : "logs", "_type" : "auto_id",

"_id" : "RWdYVcU8Rjyy8sJPobVqDQ", "_version" : 1

}

As was the case with the other methods, you can see the ID that was generated in the JSON reply.

STORING THE INDEX NAME INSIDE THE DOCUMENT

To have Elasticsearch store the index name in the document, along with the ID and the type, use the _index field.

As with _id and _type, you can see _index in the results of a search or a GET request, but, as with _id and _type, what you see there doesn’t come from the field contents: _index is disabled by default.

Elasticsearch knows which index each result came from, so it can show an _index value there, but, by default, you can’t search for _index yourself. The following command shouldn’t find anything:

% curl 'localhost:9200/_search?q=_index:get-together'

To enable _index, set enabled to true. The mapping might look like this:

% curl -XPUT 'localhost:9200/get-together/with_index/_mapping' -d '{

"with_index": {

"_index": { "enabled": true } }

Then, if you add documents to this type and rerun the previous search, you should find your new documents.

The _id, _type and _index fields help you search in properties that define your documents: each of them belongs to a type in an index and has an ID. Next, we’ll look at predefined fields that add new properties to your documents, such as their size.

在文檔中 MEAP Edition Manning Early Access Program Elasticsearch in Action Version 11 (頁 79-82)