Last active
July 5, 2022 16:08
-
-
Save evan-burke/15ab3418c4239a81f463b2b7d1080034 to your computer and use it in GitHub Desktop.
Fastest way to generate a SQL schema from raw data and/or insert data to the table
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Use Pandas & SQLAlchemy. | |
# https://stackoverflow.com/questions/23103962/how-to-write-dataframe-to-postgres-table | |
# Note this will create a new table; see the 'faster option' at the above link for a method using 'copy' to an existing table. | |
# However, 'copy' cannot do an upsert; that requires inserting to a temp table, then upserting form temp table to destination table. | |
# This will lack PKs and FKs and indexes, of course, so if used naively you may see data duplication. | |
# Also the df.to_sql command can do an append (but not upsert), using the if_exists param: | |
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html | |
# If using 'append' with an existing table with a PK, you'll get a key error on insert. | |
import pandas as pd | |
import sqlalchemy as sa | |
# DB URL follows format defined here | |
# https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls | |
db_url = 'postgresql://user:pass@localhost:5432/mydatabase' | |
engine = sa.create_engine(db_url) | |
df = pd.DataFrame(mydata) | |
# or | |
df = pd.read_csv(mycsv) | |
# do quick and dirty date conversion if needed | |
object_cols = [col for col, col_type in df.dtypes.iteritems() if col_type == 'object'] | |
df[object_cols] = df[object_cols].apply(pd.to_datetime, errors='ignore') | |
# write to the table | |
df.to_sql('new_tablename', engine, method='multi') | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment