Skip to content

Instantly share code, notes, and snippets.

@pizofreude
Last active April 25, 2025 12:54
Show Gist options
  • Save pizofreude/ff4d0601f1eb353683d8af8f4b5aac27 to your computer and use it in GitHub Desktop.
Save pizofreude/ff4d0601f1eb353683d8af8f4b5aac27 to your computer and use it in GitHub Desktop.
All things dbt

dbt Core

set up dbt (Data Build Tool) for using Postgres locally

To set up dbt (Data Build Tool) for using Postgres locally, you need to:

  1. Install dbt on your local machine.
  2. Create a dbt project.
  3. Configure the dbt project to connect to your local Postgres database.
  4. Run dbt commands to test the setup.

Here are the detailed steps:

Step 1: Install dbt

You can install dbt using pip:

pip install dbt-postgres

This will install both dbt-core and its plugin dbt-postgres:

$ dbt --version
installed version: 1.0.0
   latest version: 1.0.0

Up to date!

Plugins:
  - postgres: 1.0.0

Step 2: Create a dbt Project

Navigate to the directory where you want to create your dbt project and run the following command:

dbt init my_dbt_project

Replace my_dbt_project with the name of your project.

Step 3: Configure the dbt Project

Navigate to the project directory:

cd my_dbt_project

Open the profiles.yml file located in the ~/.dbt directory (create it if it doesn't exist) and add the following configuration:

my_dbt_project:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: postgres
      password: postgres
      port: 5433  # This should match the port mapping in your docker-compose file for the postgres_dbt container
      dbname: nyc_taxi
      schema: public

Step 4: Run dbt Commands

You can now run dbt commands to test the setup. For example, to run the dbt debug command to test the connection:

dbt debug

If the connection is successful, you can start developing your dbt models. Here are some common dbt commands:

  • dbt run: Compile and run the models in your project.
  • dbt test: Run tests on your data.
  • dbt docs generate: Generate documentation for your project.
  • dbt docs serve: Serve the documentation locally.

Example dbt Model

Create a simple dbt model to test the setup. Create a file named example_model.sql in the models directory of your dbt project with the following content:

-- models/example_model.sql

SELECT *
FROM {{ ref('my_table') }}

Replace my_table with the name of a table in your Postgres database.

Running the Model

Run the dbt model:

dbt run

This will execute the SQL in example_model.sql and create a table/view in the Postgres database.

By following these steps, you will have dbt set up to work with your local Postgres database using Docker.

dbt Cloud

@pizofreude
Copy link
Author

pizofreude commented Feb 26, 2025

Error 1:

$ dbt debug
09:31:56  Running with dbt=1.9.2
09:31:56  dbt version: 1.9.2
09:31:56  python version: 3.12.3
09:31:56  python path: C:\workspace\de-zoomcamp\04_analytics_engineering\homework\venv\Scripts\python.exe
09:31:56  os info: Windows-10-10.0.19045-SP0
09:31:56  Using profiles dir at C:\Users\AbdulHafeez\.dbt
09:31:56  Using profiles.yml file at C:\Users\AbdulHafeez\.dbt\profiles.yml
09:31:56  Using dbt_project.yml file at C:\workspace\de-zoomcamp\04_analytics_engineering\homework\taxi_rides_ny\dbt_project.yml
09:31:56  Configuration:
09:31:56    profiles.yml file [ERROR invalid]

Solution:

You do not need to pip install os for dbt to access the profiles.yml file. The issue might be due to the way the paths are being interpreted on your Windows system, or there might be a permissions issue.

Here are a few steps you can take to troubleshoot and resolve the issue:

1. Verify the Path to profiles.yml

Ensure that the profiles.yml file is located in the correct directory and that the path is correct. On Windows, this should be:

C:\Users\AbdulHafeez\.dbt\profiles.yml

2. Check File Permissions

Ensure that the profiles.yml file has the correct permissions and is accessible by your user. You can do this by right-clicking the file, selecting "Properties," and ensuring that the file is not read-only and that your user has full control.

3. Use an Absolute Path for the Profiles Directory

You can explicitly specify the profiles directory when running dbt commands to ensure that it is using the correct path. Use the --profiles-dir option to specify the directory: !PAY ATTENTION TO / OR \\

dbt debug --profiles-dir C:\\Users\\AbdulHafeez\\.dbt

OR

dbt debug --profiles-dir C:/Users/AbdulHafeez/.dbt

4. Verify the Content of profiles.yml

Ensure that the content of your profiles.yml file matches the expected format. Here is the complete content again for reference:

pg-dbt-workshop: # profile name must be the same as the name of the `profile:` target in your dbt_project.yml file
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: postgres
      password: postgres
      port: 5433 # This should match the port mapping in your docker-compose file for the postgres_dbt container
      dbname: nyc_taxi # or database instead of dbname
      schema: public
      threads: 4
      keepalives_idle: 0 # default 0, indicating the system default

5. Check Environment Variables

Ensure that there is no environment variable overriding the default path for the profiles directory. The DBT_PROFILES_DIR environment variable can be used to specify a custom path. If this is set, dbt will use this path instead of the default:

echo %DBT_PROFILES_DIR%

If it is set and you want to use the default location, you can unset it:

set DBT_PROFILES_DIR=

6. Run dbt debug

After verifying the above steps, run dbt debug again:

dbt debug --profiles-dir C:\Users\AbdulHafeez\.dbt

This should help dbt find and access the profiles.yml file correctly.

@pizofreude
Copy link
Author

what are the differences between dbt run vs dbt build commands?

The dbt run and dbt build commands are both used to execute models in a dbt project, but they have different scopes and functionalities. Here are the key differences:

dbt run

  • Purpose: The dbt run command is used to compile and run the models in your dbt project. It executes the SQL in the models and materializes them as tables or views in your data warehouse.
  • Scope: It only runs the models (i.e., models directory).
  • Usage: Use dbt run when you want to build or refresh your dbt models.
dbt run

dbt build

  • Purpose: The dbt build command is a more comprehensive command that not only runs the models but also executes tests, seeds, and snapshots.
  • Scope: It runs models, tests, seeds, and snapshots. This command is essentially a combination of dbt run, dbt test, dbt seed, and dbt snapshot.
  • Usage: Use dbt build when you want to build the entire project, including running tests, seeds, and snapshots, in addition to running the models.
dbt build

Summary of Differences

Feature dbt run dbt build
Scope Models only Models, tests, seeds, and snapshots
Commands included Only runs models Combination of run, test, seed, snapshot
Typical use case Building or refreshing models Building the entire project including tests, seeds, and snapshots

Example Commands

Running Models Only

dbt run

Building the Entire Project

dbt build

By understanding these differences, you can choose the appropriate command based on your needs, whether it's to build just the models or to build and test the entire project.

@pizofreude
Copy link
Author

as best practice when should one run dbt deps command?

Running the dbt deps command is essential to ensure that all dependencies specified in your packages.yml file are correctly installed and up-to-date. Here are some best practices for when to run dbt deps:

1. After Updating packages.yml

Whenever you add, remove, or update a package in your packages.yml file, you should run dbt deps to install the new dependencies or update the existing ones.

2. After Cloning a Repository

If you clone a dbt project from a repository, you should run dbt deps to install all the dependencies specified in the packages.yml file. This ensures that your local environment has all the necessary packages to run the project.

3. Before Running dbt Commands in CI/CD Pipelines

In Continuous Integration/Continuous Deployment (CI/CD) pipelines, it's a good practice to run dbt deps before executing any dbt commands like dbt run, dbt test, or dbt build. This ensures that all dependencies are installed and up-to-date in the CI/CD environment.

4. Periodically to Check for Updates

Periodically running dbt deps can help ensure that your project is using the latest versions of the dependencies specified in your packages.yml. This can be part of your regular maintenance routine.

5. After Switching Branches

If you switch branches in your version control system (e.g., Git), and the new branch has different dependencies specified in the packages.yml file, you should run dbt deps to ensure that the correct dependencies are installed for the branch you are working on.

Summary

Running dbt deps is a crucial step in maintaining a dbt project. Here’s a summary of when to run the command:

  • After updating the packages.yml file
  • After cloning a repository
  • Before running dbt commands in CI/CD pipelines
  • Periodically to check for updates
  • After switching branches

By following these best practices, you can ensure that your dbt project has all the necessary dependencies installed and up-to-date.

@pizofreude
Copy link
Author

pizofreude commented Apr 25, 2025

Commonly used dbt commands:

  • dbt init: Initialize a new dbt project.
  • dbt defer: Defers compilation of a model, allowing you to break up long-running models.
  • dbt run: Runs all compiled SQL scripts in the project.
  • dbt test: Tests all models in the project.
  • dbt build: Runs dbt seed, dbt run, dbt test, and dbt snapshot. together. Builds and tests all selected resources (models, seeds, snapshots, tests).
  • dbt seed: Loads data from CSV files into the database.
  • dbt docs generate: Generates documentation for the project.
  • dbt docs serve: Serves the generated documentation on a webserver.

For more information, refer to the dbt documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment