Elasticsearch

Information on this page is taken from elastic.co.

Overview

The open source Elastic Stack includes a number of tightly coupled products:

  • Kibana: Elasticsearch data visualization tool.
  • Elasticsearch: Search engine based on lucene.
  • Beats: Lightweight data shipper.
  • Logstash: ETL tool used for data enrichment.

Elasticsearch is horizontally scalable and provides high availability.

Kibana

The Kibana Dev Tools console lets you interact with Elasticsearch. Navigate to kibana and choose Dev Tools in the side bar. Commands should be entered into the editor pane (left). Each command will begin with REST methods like POST or GET.

After typing a command, click the green arrow to execute it. You may also execute with ⌘-ENTER on a Mac. Results will appear in the response pane (right). You can also click the wrench next to the green arrow to copy as a cURL command.

CRUD

Elasticsearch operates through REST endpoints. Even the language specific clients (JavaScript, Python, etc) use REST behind the scenes. All commands in this section will be written as if they were typed into Kibana.

Create

POST

Elasticsearch stores documents using the JSON format. Each value must be one of six types: string, number, object, array, boolean, and null.

POST /inspections/report
{
    "business_address": "660 Sacramento St",
    "business_city": "San Francisco",
    "business_location": {
        "type": "Point",
        "coordinates": [
            -122.585833,
            37.985355
        ]
    },
    "inspection_date": "2016-02-04T00:00:00.000"
    "inspection_score": 96
}

This command will index a document in Elasticsearch. inspections is the index name. report is the type. You can only have one type per index in the latest version of Elasticsearch.

Executing this command will automatically create the index for us, named "inspection". Instead of dynamically creating the index based on the first document we add, we can create the index beforehand, to set certain settings.

PUT /inspections
{
    "settings": {
        "index.number_of_shards": 1,
        "index.number_of_replicas": 0
    }
}

PUT

PUT lets you specify the ID of the document. POST creates the document's ID for us.

PUT /inspections/report/1234
{
    "business_address": "660 Sacramento St",
    "business_city": "San Francisco",
    "business_location": {
        "type": "Point",
        "coordinates": [
            -122.585833,
            37.985355
        ]
    },
    "inspection_date": "2016-02-04T00:00:00.000"
    "inspection_score": 96
}

Bulk Insert

When you need to index a large number of documents, you should use the bulk API (at the _bulk endpoint). You may see significant performance benefits.

POST /inspections/report/_bulk
{ "index": { "_id": 1 }}
{ "business_address": "315 California St", ... }
{ "index": { "_id": 2 }}
{ "business_address": "10 Mason St", ... }

Notice that we have one line for the operation type and a second type for the document we are going to index.

Read

Find a single document by specifying the ID:

GET /inspections/report/1

Update

We can add fields by hitting the _update endpoint.

POST /inspections/report/5/_update
{
    "doc": {
        "flagged": true,
        "views": 0
    }
}

Here we're adding "flagged" and "views" fields to document 5. This will create a new version of the document.

We can also use a PUT. This will replace the entire document with the contents given in the request.

Delete

To delete a document, we can just pass the document ID to the DELETE API.

DELETE /inspections/report/5

Delete an entire index with

DELETE /inspections

Search

Documents in the results list will have a _score field. This indicates how well they match the query.

Find All

GET /inspections/report/_search

Match

Use match to find all documents that contain a specific string within a field.

GET /inspections/report/_search
{
    "query": {
        "match": {
            "business_name": "soup"
        }
    }
}

The documents in the return set will all have the string "soup" somewhere in their business name. You may also use term to match an exact string.

Another way to match is with match_phrase.

GET /inspections/report/_search
{
    "query": {
        "match_phrase": {
            "business_name": "san francisco"
        }
    }
}

Match phrase requires that words exist in the exact order given.

Range

Find documents with fields that have terms within a certain range.

GET /inspections/report/_search
{
    "query": {
        "range": {
            "inspection_score": {
                "gte": 50,
                "lte": 90
            }
        }
    }
}

This query returns all documents where inspection_score is between 50 and

The range query accepts the parameters gte (greater-than or equal to), gt (greater-than), lte (less-than or equal to), and lt (less-than).

When applying these values to dates, Date Math may be useful. In addition, a the default date format can be overridden by the format parameter.

Boolean Fields

We can also do boolean combinations of queries. The bool fields can be must (similar to AND), should (similar to OR), must_not, and filter.

GET /inspections/report/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "business_name": "soup"
                    }
                },
                {
                    "match": {
                        "business_state": "CA"
                    }
                }
            ]
        }
    }
}

This command will find all documents that have "soup" somewhere in their name, and "CA" somewhere in their state field.

Sort

sort is another top level term like query.

GET /inspections/report/_search
{
    "query": {
        "range": {
            "inspection_score": {
                "gte": 80
            }
        }
    },
    "sort": [
        { "inspection_score": "desc" }
    ]
}

Aggregations

Use the top-level aggregations command to bucket results.

GET /inspections/report/_search
{
    "query": {
        "match": {
            "business_name": "soup"
        }
    },
    "aggregations": {
        "healthscore": {
            "range": {
                "field": "inspection_score",
                "ranges": [
                    {
                        "key": "0-80",
                        "from": 0,
                        "to": 80
                    },
                    {
                        "key": "81-90",
                        "from": 81,
                        "to": 90
                    },
                    {
                        "key": "91-100",
                        "from": 91,
                        "to": 100
                    },
                ]
            }
        }
    }
}

Field Types

Elasticsearch will automatically determine field types. You can see these with:

GET /inspections/_mapping/report

You can edit this mapping with a PUT to the same endpoint.