Elasticsearch
Information on this page is taken from elastic.co.
Overview
The open source Elastic Stack includes a number of tightly coupled products:
- Kibana: Elasticsearch data visualization tool.
- Elasticsearch: Search engine based on lucene.
- Beats: Lightweight data shipper.
- Logstash: ETL tool used for data enrichment.
Elasticsearch is horizontally scalable and provides high availability.
Kibana
The Kibana Dev Tools console lets you interact with Elasticsearch. Navigate
to kibana and choose Dev Tools in the side bar. Commands should be entered
into the editor pane (left). Each command will begin with REST methods like
POST or GET.
After typing a command, click the green arrow to execute it. You may also execute with ⌘-ENTER on a Mac. Results will appear in the response pane (right). You can also click the wrench next to the green arrow to copy as a cURL command.
CRUD
Elasticsearch operates through REST endpoints. Even the language specific clients (JavaScript, Python, etc) use REST behind the scenes. All commands in this section will be written as if they were typed into Kibana.
Create
POST
Elasticsearch stores documents using the JSON format. Each value must be one of six types: string, number, object, array, boolean, and null.
POST /inspections/report
{
"business_address": "660 Sacramento St",
"business_city": "San Francisco",
"business_location": {
"type": "Point",
"coordinates": [
-122.585833,
37.985355
]
},
"inspection_date": "2016-02-04T00:00:00.000"
"inspection_score": 96
}
This command will index a document in Elasticsearch. inspections is the
index name. report is the type. You can only have one type per index in the
latest version of Elasticsearch.
Executing this command will automatically create the index for us, named "inspection". Instead of dynamically creating the index based on the first document we add, we can create the index beforehand, to set certain settings.
PUT /inspections
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
}
}
PUT
PUT lets you specify the ID of the document. POST creates the document's
ID for us.
PUT /inspections/report/1234
{
"business_address": "660 Sacramento St",
"business_city": "San Francisco",
"business_location": {
"type": "Point",
"coordinates": [
-122.585833,
37.985355
]
},
"inspection_date": "2016-02-04T00:00:00.000"
"inspection_score": 96
}
Bulk Insert
When you need to index a large number of documents, you should use the bulk
API (at the _bulk endpoint). You may see significant performance benefits.
POST /inspections/report/_bulk
{ "index": { "_id": 1 }}
{ "business_address": "315 California St", ... }
{ "index": { "_id": 2 }}
{ "business_address": "10 Mason St", ... }
Notice that we have one line for the operation type and a second type for the document we are going to index.
Read
Find a single document by specifying the ID:
GET /inspections/report/1
Update
We can add fields by hitting the _update endpoint.
POST /inspections/report/5/_update
{
"doc": {
"flagged": true,
"views": 0
}
}
Here we're adding "flagged" and "views" fields to document 5. This will
create a new version of the document.
We can also use a PUT. This will replace the entire document with the
contents given in the request.
Delete
To delete a document, we can just pass the document ID to the DELETE API.
DELETE /inspections/report/5
Delete an entire index with
DELETE /inspections
Search
Documents in the results list will have a _score field. This indicates how
well they match the query.
Find All
GET /inspections/report/_search
Match
Use match to find all documents that contain a specific string within a
field.
GET /inspections/report/_search
{
"query": {
"match": {
"business_name": "soup"
}
}
}
The documents in the return set will all have the string "soup" somewhere in
their business name. You may also use term to match an exact string.
Another way to match is with match_phrase.
GET /inspections/report/_search
{
"query": {
"match_phrase": {
"business_name": "san francisco"
}
}
}
Match phrase requires that words exist in the exact order given.
Range
Find documents with fields that have terms within a certain range.
GET /inspections/report/_search
{
"query": {
"range": {
"inspection_score": {
"gte": 50,
"lte": 90
}
}
}
}
This query returns all documents where inspection_score is between 50 and
The range query accepts the parameters gte (greater-than or equal to),
gt (greater-than), lte (less-than or equal to), and lt
(less-than).
When applying these values to dates, Date Math may be useful. In addition, a
the default date format can be overridden by the format parameter.
Boolean Fields
We can also do boolean combinations of queries. The bool fields can be
must (similar to AND), should (similar to OR), must_not, and filter.
GET /inspections/report/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"business_name": "soup"
}
},
{
"match": {
"business_state": "CA"
}
}
]
}
}
}
This command will find all documents that have "soup" somewhere in their name, and "CA" somewhere in their state field.
Sort
sort is another top level term like query.
GET /inspections/report/_search
{
"query": {
"range": {
"inspection_score": {
"gte": 80
}
}
},
"sort": [
{ "inspection_score": "desc" }
]
}
Aggregations
Use the top-level aggregations command to bucket results.
GET /inspections/report/_search
{
"query": {
"match": {
"business_name": "soup"
}
},
"aggregations": {
"healthscore": {
"range": {
"field": "inspection_score",
"ranges": [
{
"key": "0-80",
"from": 0,
"to": 80
},
{
"key": "81-90",
"from": 81,
"to": 90
},
{
"key": "91-100",
"from": 91,
"to": 100
},
]
}
}
}
}
Field Types
Elasticsearch will automatically determine field types. You can see these with:
GET /inspections/_mapping/report
You can edit this mapping with a PUT to the same endpoint.