Skip to content

Instantly share code, notes, and snippets.

@warborn
Last active August 15, 2018 18:53

Revisions

  1. warborn revised this gist Aug 15, 2018. 1 changed file with 14 additions and 5 deletions.
    19 changes: 14 additions & 5 deletions insights_builder.md
    Original file line number Diff line number Diff line change
    @@ -16,7 +16,7 @@ In order to extract the information, the following steps are followed by the mod
    The following are the available configurations that can be set:
    - Source
    - X Axis
    - Period
    - Inverval
    - X Segment
    - Y Axis
    - Y Operation
    @@ -180,6 +180,7 @@ The **X Segment** can be any of the following depending on the **X Axis** select
    - **internal**
    - **crime_category**: `String`
    - **location (municipality, state)**: `String`
    - **loss_amount**: `String`
    - **source_name**: `String`
    - **reporter_name**: `String`
    - **secretariat**
    @@ -195,8 +196,8 @@ The **Y Axis** can be any of the following depending on the following:
    - **internal**
    - **crime_category**: `String`
    - **loss_amount**: `Number`
    - **source_name**: `String`
    - **reporter_name**: `String`
    - **source_name?**: `String`
    - **reporter_name?**: `String`
    - **secretariat**
    - **crime_category**: `String`
    - **crime_type**: `String`
    @@ -259,7 +260,7 @@ For **date** fields a **range query** is used in order to apply the filter

    #### Discrete Locations

    In order to be able to generate charts that contain results with more than one of the different types of locations (i.e. municipalities, states, countries) it's necessary to specified this information.
    In order to be able to generate charts that contain results with more than one of the different types of locations (i.e. municipalities, states, countries) is necessary to specify this information.

    Discrete locations are a way to get the data of different types of locations in the same chart and also filter the data by the locations selected.

    @@ -279,7 +280,15 @@ Behind the scene the module will convert the list of locations into a dictionary
    }
    ```

    For each type of location a query will be built, executed and the final results will be merged resulting in the desired dataset.
    #### Discrete Crimes

    In order to be able to generate charts that contain results with more than one of the different types of crime levels (i.e. crime_category, crime_type, modality) is necessary to specify this information.

    Discrete crimes are a way to get the data of different levels of crimes in the same chart and also filter the data by the crimes selected.

    For example, you want to generate a chart that contains the categories **Homicidio y Robo**, the types **Homicidio Doloso** along with the modality **Con arma de fuego**. In order to get this information a list of crimes must be sent.

    For each crime level a query will be built, executed and the final results will be merged resulting in the desired dataset.

    ## DSL

  2. warborn revised this gist Aug 10, 2018. 1 changed file with 24 additions and 0 deletions.
    24 changes: 24 additions & 0 deletions insights_builder.md
    Original file line number Diff line number Diff line change
    @@ -257,6 +257,30 @@ For **date** fields a **range query** is used in order to apply the filter
    }
    ```

    #### Discrete Locations

    In order to be able to generate charts that contain results with more than one of the different types of locations (i.e. municipalities, states, countries) it's necessary to specified this information.

    Discrete locations are a way to get the data of different types of locations in the same chart and also filter the data by the locations selected.

    For example, you want to generate a chart that contains **two municipalities from Aguascalientes**, **the state of Coahuila** along with the **National** rate of **homicides**. In order to get this information a list of **location IDs** must be sent.

    ```py
    locs: ['ags_municipality1_id', 'ags_municipality2_id', 'coahuila_state_id', 'mexico_country_id']
    ```

    Behind the scene the module will convert the list of locations into a dictionary where the keys are the different types of locations and the values are the list of **IDs** of those types, like the following:

    ```py
    {
    'country_id': ['mexico_country_id'],
    'state_id': ['coahuila_state_id'],
    'municipality_id': ['ags_municipality1_id', 'ags_municipality2_id']
    }
    ```

    For each type of location a query will be built, executed and the final results will be merged resulting in the desired dataset.

    ## DSL

    #### Complex Cases
  3. warborn created this gist Aug 10, 2018.
    308 changes: 308 additions & 0 deletions insights_builder.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,308 @@
    # Insights Builder

    Insights Builder is a module that allows to query different source of data in order to extract information and transform it in a format which can be used to generate dynamic charts.

    ## Summary

    In order to extract the information, the following steps are followed by the module:

    1. Set configuration
    2. Build query
    3. Get data (execute query)
    4. Transform data

    #### Setting the configuration

    The following are the available configurations that can be set:
    - Source
    - X Axis
    - Period
    - X Segment
    - Y Axis
    - Y Operation
    - Filters

    | Configuration Name | Required |
    | ------------------ |:--------:|
    | Source | true |
    | X Axis | true |
    | X Segment | false |
    | Y Axis | true |
    | Y Operation | true |
    | Filters | false |

    #### Building the query

    The query is built depending on the configurations set. The following are the different ways in which a query is built:

    ##### X Axis + Y Axis + Y Operation

    ```js
    {
    aggs: {
    by_main_grouping: {
    terms: {
    field: value
    },
    aggs: {
    by_operation {
    operation: {
    field: value
    }
    }
    }
    }
    }
    }
    ```

    ##### X Axis + X Segment + Y Axis + Y Operation + Filters
    ```js
    {
    query: {
    bool: {
    must: [
    {terms: {field: value}}
    ]
    }
    },
    aggs: {
    by_main_grouping: {
    terms: {
    field: value
    },
    aggs: {
    by_segment: {
    terms: {
    field: value
    },
    aggs: {
    by_operation: {
    operation: {
    field: value
    }
    }
    }
    }
    }
    }
    }
    }
    ```

    #### Executing the query

    A query built with the following configurations results in the following responses:

    ##### X Axis + Y Axis + Y Operation

    Results in:
    ```js
    {
    by_main_grouping: [
    {
    by_operation: {
    value: value
    }
    }
    ]
    }
    ```

    ##### X Axis + X Segment + Y Axis + Y Operation + Filters

    Results in:
    ```js
    {
    by_main_grouping: [
    {
    by_segment: [
    {
    by_operation: {
    value: value
    }
    }
    ]
    }
    ]
    }
    ```

    #### Transforming the data

    After getting a result, a series of transformations can be executed, the following are the supported transformations:

    - Resolve location names **(X Axis)**
    - Convert result values to the total percentage **(Y Axis)**
    - Convert result values to percentage change **(Y Axis)**
    - Convert result values to the corresponding value by 100k inhabitants **(Y Axis)**

    ## Rules & Restrictions

    The module has a set of rules and restrictions that applied to each configuration depending on the value which are listed below:

    #### Source

    There are only two supported sources which are: **internal** and **secretariat**

    #### X Axis

    The **X Axis** can be any of the following depending on the **source** selected:

    - **internal**
    - **crime_category**: `String`
    - **loss_amount**: `Number`
    - **location (municipality, state)**: `String`
    - **source_name**: `String`
    - **reporter_name**: `String`
    - **incident_date**: `Date`
    - **secretariat**
    - **crime_category**: `String`
    - **crime_type**: `String`
    - **modality**: `String`
    - **location (municipality, state, country)**: `String`
    - **date**: `Date`
    - **value**: `Number`
    - **value_100k**: `Number`

    When a field of type **date** is selected, an **interval** configuration can be set in order to group the resulting data by the interval specified, the following values are supported:

    | interval `String` |
    | -------- |
    | month |
    | trimester |
    | year |

    #### X Segment

    The **X Segment** can be any of the following depending on the **X Axis** selected (with the restriction that the selected **X Axis** cannot be set as **X Segment**):

    - **internal**
    - **crime_category**: `String`
    - **location (municipality, state)**: `String`
    - **source_name**: `String`
    - **reporter_name**: `String`
    - **secretariat**
    - **crime_category**: `String`
    - **crime_type**: `String`
    - **modality**: `String`
    - **location (municipality, state, country)**: `String`

    #### Y Axis

    The **Y Axis** can be any of the following depending on the following:

    - **internal**
    - **crime_category**: `String`
    - **loss_amount**: `Number`
    - **source_name**: `String`
    - **reporter_name**: `String`
    - **secretariat**
    - **crime_category**: `String`
    - **crime_type**: `String`
    - **modality**: `String`
    - **value**: `Number`
    - **value_100k**: `Number`

    #### Y Operation

    The **Y Operation** can be any of the following depending on the selected **source** and **data type** of the **Y Axis**:

    - **internal**
    - **string**
    - count
    - **number**
    - sum
    - average
    - **secretariat**
    - **string**
    - count (resolves to the sum of the **value** field)
    - by 100k inhabitants (resolves to the sum of the **value_100k** field)
    - total percentage (resolves to the sum of the **value** field)
    - percentage change (resolves to the sum of the **value** field)

    #### Filters

    The filters are a way to get only the data you want, these filters depends on the **data type** of the selected field used to filter, the following are the available fields:

    - **internal**
    - **crime_category**: `String`
    - **location (municipality, state)**: `String`
    - **incident_date**: `Date`
    - **secretariat**
    - **crime_category**: `String`
    - **crime_type**: `String`
    - **modality**: `String`
    - **location (municipality, state, country)**: `String`
    - **date**: `Date`

    For **string** fields a **terms query** is used in order to apply the filter
    ```js
    {
    terms: {
    field: [values]
    }
    }
    ```

    For **date** fields a **range query** is used in order to apply the filter
    ```js
    {
    range: {
    field: {
    gte: start_date,
    lte: end_date
    }
    }
    }
    ```

    ## DSL

    #### Complex Cases

    ```py
    # Compare the anual rate of "robo de coche de 4 ruedas sin violencia"
    # of 5 municipalities from Aguascalientes with the rate of the
    # Aguascalientes state and the National one.

    gb = GraphicBuilderService(owner, GraphicBuilderService.SECRETARIAT)
    gb.set_x_axis('location')
    gb.set_y_axis('modality', 'count')
    gb.set_date_filter('date', '2017-01:01 00:00:00', '2017-12-31 00:00:00')
    gb.set_filter('modality', ['robo de coche de 4 ruedas sin violencia'])
    gb.set_discrete_field('location', ['0924850f-abe4-407c-80d8-175416b169bb', 'c9b46887-24f2-414a-8f39-07f273a1de2a', '4aa2137a-c19a-4db8-8f4d-1409c0625b70', 'c68c4d2a-b34e-4a22-a2da-d71bceaa1944'])
    data = gb.execute()
    ```

    ```py
    # Compare the percentage change of the first trimester in 2018
    # with the same trimester of the former year of
    # "robo comun con violencia a transeuntes"
    # in five municipalities of Chihuahua with the state of Chihuahua,
    # three municipalities of Nuevo León and the neighbor state of Nuevo León

    gb = GraphicBuilderService(owner, GraphicBuilderService.SECRETARIAT)
    gb.set_x_axis_date('date', 'trimester')
    gb.set_x_segment('location')
    gb.set_y_axis('modality', 'percentual change')
    gb.set_date_filter('date', '2017', '2018', 'year')
    gb.set_filter('modality', ['robo comun con violencia a transeuntes'])
    gb.set_discrete_field('location', ['0924850f-abe4-407c-80d8-175416b169bb', 'c9b46887-24f2-414a-8f39-07f273a1de2a', '4aa2137a-c19a-4db8-8f4d-1409c0625b70', 'c68c4d2a-b34e-4a22-a2da-d71bceaa1944'])
    data = gb.execute()
    ```

    ```py
    # Compare the 100k inhabitants rate from January of 2017 to February of 2017
    # of "lesiones dolosas con arma de fuego" in one municipality of Aguascalientes,
    # two of Zacatecas, one of Guanajuato, one of SLP and one of Tlaxcala

    gb = GraphicBuilderService(owner, GraphicBuilderService.SECRETARIAT)
    gb.set_x_axis_date('date', 'month')
    gb.set_x_segment('location')
    gb.set_y_axis('modality', '100k')
    gb.set_date_filter('date', '2016-01:01 00:00:00', '2017-01-28 00:00:00')
    gb.set_filter('modality', ['lesiones dolosas con arma de fuego'])
    gb.set_discrete_field('location', ['0924850f-abe4-407c-80d8-175416b169bb', 'c9b46887-24f2-414a-8f39-07f273a1de2a', '4aa2137a-c19a-4db8-8f4d-1409c0625b70', 'c68c4d2a-b34e-4a22-a2da-d71bceaa1944'])
    data = gb.execute()
    ```