Skip to content

Instantly share code, notes, and snippets.

@afeld
Last active March 19, 2020 14:44
Show Gist options
  • Save afeld/0feb015e329c20b2f5682ca028e3374d to your computer and use it in GitHub Desktop.
Save afeld/0feb015e329c20b2f5682ca028e3374d to your computer and use it in GitHub Desktop.
running SQL against Google Sheets

Google Sheets has a QUERY function that can be used for writing SQL queries. It uses the Google Visualization API Query Language, which is unfortunately a pretty limited SQL dialect. A few options for writing more complex SQL queries:

  • Export to CSV and load into a database somewhere
    • Pros: Use whatever database you want
    • Cons: Have to re-export every time the data is updated
  • Load into BigTable
    • Pros: Automatically updates
    • Cons: Requires some Google Cloud setup
  • Load into Google Colaboratory ("Colab")
    • Pros: Very flexible, as you can use SQL, Pandas, or any other Python code/packages
    • Cons: Jupyter Notebooks (which Colab is built on) can take a little getting used to

The Notebook above shows the quickest path to getting that data into SQLite on Google Colab, which can be done in a matter of minutes. To get started, open the Notebook below in Google Colab.

Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "google_sheets_to_sqlite.ipynb",
"provenance": [],
"collapsed_sections": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Q8Y0mE8rwbYt"
},
"source": [
"Followed [this tutorial](https://colab.research.google.com/notebooks/io.ipynb#scrollTo=sOm9PFrT8mGG). You will need to replace the spreadsheet URL below."
]
},
{
"cell_type": "code",
"metadata": {
"id": "3y3PQs90OCDI",
"colab_type": "code",
"colab": {}
},
"source": [
"spreadsheet = 'https://docs.google.com/spreadsheets/…'\n",
"sheet = 0"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Kysn5pEnOCDO",
"colab_type": "text"
},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v6eA4_CXPEqZ",
"colab_type": "text"
},
"source": [
"Install dependencies."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "U1DAe2IwyoiB",
"colab": {}
},
"source": [
"!pip install --upgrade --quiet 'gspread~=3.3.0' 'ipython-sql==0.3.9'\n",
"%load_ext sql"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "xuXDUY_KPH22",
"colab_type": "text"
},
"source": [
"Authorize access through Google OAuth."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "jLTP2ii9CK-f",
"colab": {}
},
"source": [
"from google.colab import auth\n",
"auth.authenticate_user()\n",
"\n",
"import gspread\n",
"from oauth2client.client import GoogleCredentials\n",
"\n",
"gc = gspread.authorize(GoogleCredentials.get_application_default())"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "doX0iFCRwgHj"
},
"source": [
"# Load data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gzKUeA7OPMpO",
"colab_type": "text"
},
"source": [
"Read into a Pandas DataFrame."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "YmF-U7tCCQuh",
"colab": {}
},
"source": [
"import pandas as pd\n",
"\n",
"worksheet = gc.open_by_url(spreadsheet).get_worksheet(sheet)\n",
"rows = worksheet.get_all_values()\n",
"\n",
"df = pd.DataFrame.from_records(rows[1:])\n",
"df.columns = rows[0]\n",
"\n",
"df.head()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "H6G4IOB3uJKb"
},
"source": [
"Load into an in-memory SQLite database."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "ggTRMLnlGgGv",
"colab": {}
},
"source": [
"%sql sqlite://\n",
"%sql DROP TABLE IF EXISTS df\n",
"%sql PERSIST df"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "qgjEU6QcwkFL"
},
"source": [
"# Analysis"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "xVd5zZ5-t_9Q",
"colab": {}
},
"source": [
"%%sql\n",
"\n",
"SELECT COUNT(*) FROM df"
],
"execution_count": 0,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment