By default dbt has functionality that auto-expands similar column types (i.e. varchar(3)
to varchar(4)
) if the incoming data is too large (https://docs.getdbt.com/faqs/Snapshots/snapshot-schema-changes). We can see this happening like so:
-- models/foo.sql
{{ config(materialized='incremental') }}
select 'foo' as c
Debug logs below are truncated to show only the things of interest.
Build foo
for the first time:
$ dbt --debug run -s foo --full-refresh
...
21:07:52 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
as
(
select 'foo' as c
);
...
And if we describe that table:
jyeo_integration#TRANSFORMING@DEVELOPMENT_JYEO.DBT_JYEO>describe table foo;
+------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------+
| name | type | kind | null? | default | primary key | unique key | check | expression | comment | policy name | privacy domain |
|------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------|
| C | VARCHAR(3) | COLUMN | Y | NULL | N | N | NULL | NULL | NULL | NULL | NULL |
+------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------+
Modify foo
by making column c
double in size:
-- models/foo.sql
{{ config(materialized='incremental') }}
select 'foofoo' as c
Do an incremental (i.e. subsequent) run:
$ dbt --debug run -s foo
...
21:13:51 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace temporary view development_jyeo.dbt_jyeo.foo__dbt_tmp
as (
select 'foofoo' as c
);
21:13:51 Opening a new connection, currently in state closed
21:13:52 SQL status: SUCCESS 1 in 1.0 seconds
21:13:52 Using snowflake connection "model.my_dbt_project.foo"
21:13:52 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo__dbt_tmp
21:13:53 SQL status: SUCCESS 1 in 0.0 seconds
21:13:53 Using snowflake connection "model.my_dbt_project.foo"
21:13:53 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
21:13:53 SQL status: SUCCESS 1 in 0.0 seconds
21:13:53 Changing col type from character varying(3) to character varying(6) in table database: "development_jyeo"
schema: "dbt_jyeo"
identifier: "foo"
21:13:53 Using snowflake connection "model.my_dbt_project.foo"
21:13:53 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter "C" set data type character varying(6);
21:13:53 SQL status: SUCCESS 1 in 0.0 seconds
21:13:53 Using snowflake connection "model.my_dbt_project.foo"
21:13:53 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."FOO"
21:13:53 SQL status: SUCCESS 1 in 0.0 seconds
21:13:53 Writing runtime sql for node "model.my_dbt_project.foo"
21:13:53 Using snowflake connection "model.my_dbt_project.foo"
21:13:53 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
-- back compat for old kwarg name
begin;
21:13:54 SQL status: SUCCESS 1 in 0.0 seconds
21:13:54 Using snowflake connection "model.my_dbt_project.foo"
21:13:54 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
insert into development_jyeo.dbt_jyeo.foo ("C")
(
select "C"
from development_jyeo.dbt_jyeo.foo__dbt_tmp
);
21:13:55 SQL status: SUCCESS 1 in 1.0 seconds
21:13:55 Using snowflake connection "model.my_dbt_project.foo"
21:13:55 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
COMMIT
21:13:55 SQL status: SUCCESS 1 in 0.0 seconds
21:13:55 Using snowflake connection "model.my_dbt_project.foo"
21:13:55 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
drop view if exists development_jyeo.dbt_jyeo.foo__dbt_tmp cascade
21:13:55 SQL status: SUCCESS 1 in 0.0 seconds
21:13:55 Timing info for model.my_dbt_project.foo (execute): 10:13:51.371870 => 10:13:55.819302
21:13:55 On model.my_dbt_project.foo: Close
...
And if we now describe that table again:
jyeo_integration#TRANSFORMING@DEVELOPMENT_JYEO.DBT_JYEO>describe table foo;
+------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------+
| name | type | kind | null? | default | primary key | unique key | check | expression | comment | policy name | privacy domain |
|------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------|
| C | VARCHAR(6) | COLUMN | Y | NULL | N | N | NULL | NULL | NULL | NULL | NULL |
+------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------+
1 Row(s) produced. Time Elapsed: 0.942s
We can see to no ones surprise - we've gone from varchar(3)
to varchar(6)
.
dbt uses the expand_target_column_types
method to do this - which is not user overwritable. What can be user overwritten however is the alter_column_type
macro. If we add a macro to our project like so:
-- macros/alter_column_type.sql
{% macro alter_column_type(relation, column_name, new_column_type) -%}
{% do exceptions.raise_compiler_error(relation ~ '.' ~ column_name ~ ' type has changed to ' ~ new_column_type) %}
{% endmacro %}
Modify foo
to increase the column size yet again:
-- models/foo.sql
{{ config(materialized='incremental') }}
select 'foofoofoo' as c
And then a subsequent run will throw an error preventing any auto-expansion:
21:40:31 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo__dbt_tmp
21:40:31 SQL status: SUCCESS 1 in 0.0 seconds
21:40:31 Using snowflake connection "model.my_dbt_project.foo"
21:40:31 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
21:40:32 SQL status: SUCCESS 1 in 0.0 seconds
21:40:32 Changing col type from character varying(6) to character varying(9) in table database: "development_jyeo"
schema: "dbt_jyeo"
identifier: "foo"
21:40:32 Snowflake adapter: Error running SQL: macro alter_column_type
21:40:32 Snowflake adapter: Rolling back transaction.
21:40:32 Timing info for model.my_dbt_project.foo (execute): 10:40:30.415283 => 10:40:32.154167
21:40:32 On model.my_dbt_project.foo: Close
21:40:32 Compilation Error in macro alter_column_type (macros/alter_column_type.sql)
development_jyeo.dbt_jyeo.foo.C type has changed to character varying(9)
> in macro alter_column_type (macros/alter_column_type.sql)
> called by macro materialization_incremental_snowflake (macros/materializations/incremental.sql)
> called by macro alter_column_type (macros/alter_column_type.sql)
21:40:32 Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'd4427960-c1a5-4a49-9447-51986b8fd3ec', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x118e9dcd0>]}
21:40:32 1 of 1 ERROR creating sql incremental model dbt_jyeo.foo ....................... [ERROR in 2.17s]
21:40:32 Finished running node model.my_dbt_project.foo
21:40:32 Connection 'master' was properly closed.
21:40:32 Connection 'model.my_dbt_project.foo' was properly closed.
21:40:32
21:40:32 Finished running 1 incremental model in 0 hours 0 minutes and 5.79 seconds (5.79s).
21:40:32 Command end result
21:40:32
21:40:32 Completed with 1 error and 0 warnings:
21:40:32
21:40:32 Compilation Error in macro alter_column_type (macros/alter_column_type.sql)
development_jyeo.dbt_jyeo.foo.C type has changed to character varying(9)
> in macro alter_column_type (macros/alter_column_type.sql)
> called by macro materialization_incremental_snowflake (macros/materializations/incremental.sql)
> called by macro alter_column_type (macros/alter_column_type.sql)
21:40:32
21:40:32 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
21:40:32 Command `dbt run` failed at 10:40:32.577146 after 7.76 seconds
21:40:32 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x102ef7d90>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x117e7aca0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x117d7fdc0>]}
21:40:32 Flushing usage events
Note that I have not tested this extensively so there may be other parts of dbt that does this column expansion for other reasons but if you really need to not have dbt auto-expand columns at anytime then it should be fine to take this approach but certainly make sure to do more testing.
Finally, in the grand scheme of "analytics" things - one should not be too overly concerned with whether a column is varchar(10) or varchar(9) - most assuredly such small differences will not be the result of startup becoming a unicorn vs crashing and burning to the ground :P
Additionally, some folks may want to "trap" the incoming data that was responsible (since those __dbt_tmp
tables are gone once your session has ended so you can't go back and debug) - to do that we can modify the alter_column_type macro just a little bit more:
-- macros/alter_column_type.sql
{% macro alter_column_type(relation, column_name, new_column_type) -%}
{% set incoming_backup_name = relation ~ '__dbt_tmp_backup_' ~ run_started_at.strftime("%Y%m%d_%H%M%S") %}
{% set query %}
create or replace table {{ incoming_backup_name }} as (select * from {{ relation }}__dbt_tmp);
{% endset %}
{% do run_query(query) %}
{% do exceptions.raise_compiler_error(relation ~ '.' ~ column_name ~ ' type has changed to ' ~ new_column_type ~ '. Check incoming data type at ' ~ incoming_backup_name) %}
{% endmacro %}
In this way, we will create a table that we can use for debugging:
...
10:07:22 Compilation Error in macro alter_column_type (macros/alter_column_comment.sql)
development_jyeo.dbt_jyeo.foo.C type has changed to character varying(9). Check incoming data type at development_jyeo.dbt_jyeo.foo__dbt_tmp_backup_20241107_100718
> in macro alter_column_type (macros/alter_column_comment.sql)
> called by macro materialization_incremental_snowflake (macros/materializations/incremental.sql)
> called by macro alter_column_type (macros/alter_column_comment.sql).
By describing the __dbt_tmp_backup
table and checking what the data type was at the time:
jyeo_integration#TRANSFORMING@DEVELOPMENT_JYEO.DBT_JYEO>describe table foo___dbt_tmp_backup_20241107_100718;
+------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------+
| name | type | kind | null? | default | primary key | unique key | check | expression | comment | policy name | privacy domain |
|------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------|
| C | VARCHAR(9) | COLUMN | Y | NULL | N | N | NULL | NULL | NULL | NULL | NULL |
+------+------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+----------------+