Skip to content

Instantly share code, notes, and snippets.

@mrtdeh
Forked from nirev/elasticsearch.md
Created August 23, 2022 10:39

Revisions

  1. @nirev nirev created this gist Sep 15, 2020.
    173 changes: 173 additions & 0 deletions elasticsearch.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,173 @@
    # Elastic Search Storage

    The idea is to use a Elastic Search *Data Stream*
    https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html

    ## Important concepts

    ### Data Streams

    A data stream is a way to handle time-series data (such as webhook logs) that
    rolls over time.

    A data stream is backed by:
    - an alias used for writing/searching (eg "webhooks_logs")
    - a set of hidden backing indexes that store data
    - a index template, that defines the mapping and fields used in each index
    - a rollover configuration that can delete old indexes, create new ones and
    modify which is the active writing index

    ### Mapping

    Mappings are the way to specify schema for indexed documents

    important: make sure that `_source` metadata is not disabled
    https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html

    important: mappings don't have types since ES 7.x
    https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html


    ## Data Stream how to

    Prerequisites:
    - Elasticsearch data streams are intended for time series data only. Each document indexed to a data stream must contain the `@timestamp` field. This field must be mapped as a `date` or `date_nanos` field data type.
    - Data streams are best suited for time-based, append-only use cases. If you frequently need to update or delete existing documents, we recommend using an index alias and an index template instead.

    ### 1. Create a Index Lifecycle Management policy

    ILM can be used to automatically manage a data stream’s backing indices.
    For example: rotating indexes based on size or age.

    https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-put-lifecycle.html
    https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-index-lifecycle.html

    ```
    PUT /_ilm/policy/my-data-stream-policy
    {
    "policy": {
    "phases": {
    "hot": {
    "actions": {
    "rollover": {
    "max_age": "1d",
    "max_size": "100GB"
    }
    }
    },
    "delete": {
    "min_age": "30d",
    "actions": {
    "delete": {}
    }
    }
    }
    }
    }
    ```

    response:
    ```
    {"acknowledged": true}
    ```

    ### 2. Create an index template for the data stream

    A data stream uses an index template to configure its backing indices.
    A template for a data stream must specify:

    - One or more index patterns that match the name of the stream.
    - The mappings and settings for the stream’s backing indices.
    - That the template is used exclusively for data streams.
    - A priority for the template.

    ```
    PUT /_index_template/my-data-stream-template
    {
    "index_patterns": [ "my-data-stream*" ],
    "data_stream": { },
    "priority": 200,
    "template": {
    "mappings": {
    "properties": {
    "@timestamp": { "type": "date_nanos" }
    }
    },
    "settings": {
    "index.lifecycle.name": "my-data-stream-policy"
    }
    },
    "version": "external-version",
    "_meta": { "whatever": "you-want" }
    }
    ```

    ## 3. Create the Data Stream

    ```
    PUT /_data_stream/my-data-stream
    ```

    After it's created, you can query the Data Stream params:
    ```
    GET /_data_stream/my-data-stream
    {
    "data_streams": [
    {
    "name": "my-data-stream",
    "timestamp_field": {
    "name": "@timestamp"
    },
    "indices": [
    {
    "index_name": ".ds-my-data-stream-000001",
    "index_uuid": "krR78LfvTOe6gr5dj2_1xQ"
    },
    {
    "index_name": ".ds-my-data-stream-000002",
    "index_uuid": "C6LWyNJHQWmA08aQGvqRkA"
    }
    ],
    "generation": 2,
    "status": "GREEN",
    "template": "my-data-stream-template",
    "ilm_policy": "my-data-stream-policy"
    }
    ]
    }
    ```

    ## 4. Index documents to the Data Stream

    You can add documents to a data stream using two types of indexing requests:

    - Individual indexing requests
    - Bulk indexing requests

    ### Individual

    ```
    PUT /my-data-stream/_create/{id}
    {
    "@timestamp": "2020-12-07T11:06:07.000Z",
    "user": {
    "id": "8a4f500d"
    },
    "message": "Login successful"
    }
    ```

    ### Bulk

    ```
    PUT /my-data-stream/_bulk?refresh
    {"create":{ }}
    { "@timestamp": "2020-12-08T11:04:05.000Z", "user": { "id": "vlb44hny" }, "message": "Login attempt failed" }
    {"create":{"_id": "3"}}
    { "@timestamp": "2020-12-08T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
    {"create":{ }}
    { "@timestamp": "2020-12-09T11:07:08.000Z", "user": { "id": "l7gk7f82" }, "message": "Logout successful" }
    ```