Skip to content

Instantly share code, notes, and snippets.

@rasmi
Created November 6, 2018 19:52

Revisions

  1. rasmi created this gist Nov 6, 2018.
    41 changes: 41 additions & 0 deletions upload_to_bq.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,41 @@
    #!/bin/sh
    # Copyright 2018 Google LLC.
    # SPDX-License-Identifier: Apache-2.0

    DATA_FILENAME="incarceration_trends.csv"
    DATA_URL="https://github.com/vera-institute/incarceration_trends/raw/master/incarceration_trends.csv"
    BQ_DATASET_NAME="vera_data"
    BQ_DATASET_DESCRIPTION="County-level jail data (1970-2015) and prison data (1983-2015)"
    BQ_SCHEMA_FILENAME="schema.json"
    BQ_SCHEMA_URL="https://gist.githubusercontent.com/rasmi/b46101a1d20d3da46bafbd77299126c9/raw/c7f12bbdff7adc61e763ff1be22bc21215dc3ff5/schema.json"
    BQ_TABLE_NAME="incarceration_trends"

    # Download data file locally.
    wget $DATA_URL

    # Download schema file locally.
    wget $BQ_SCHEMA_URL

    # Create a BQ dataset to store the table in if it does not exist.
    bq --location="US" mk --dataset \
    --description "$BQ_DATASET_DESCRIPTION" \
    $GOOGLE_CLOUD_PROJECT:$BQ_DATASET_NAME

    #################################################
    # MAKE DATASET PUBLICLY AVAILABLE AS READ-ONLY. #
    #################################################
    # Download the permissions file to edit read permissions.
    bq show --format=prettyjson $BQ_DATASET_NAME > permissions.json
    # Patch the permissions file to allow allAuthenticatedUsers to read the dataset.
    sed -i 's/projectReaders/allAuthenticatedUsers/' permissions.json
    # Uploaded patched permissions file to make changes take effect.
    bq update --source permissions.json $BQ_DATASET_NAME
    # Clean up.
    rm permissions.json
    #################################################

    # Load the local CSV data into the BQ table.
    bq load --source_format=CSV --skip_leading_rows=1 $BQ_DATASET_NAME.$BQ_TABLE_NAME $DATA_FILENAME $BQ_SCHEMA_FILENAME

    # Clean up.
    rm $DATA_FILENAME $BQ_SCHEMA_FILENAME