We need to write two queries, that will return:
- The weighted rate of growth for graduates who studied certain fields
- The weighted rate of growth for jobs in those fields
We need these rates in order to calculate the "spread": the difference between the rate of growth for graduates and jobs per job type.
spread = weighted_job_growth_rate - weighted_job_growth_rate
For this technical document I've included the SQL commands alongside a copy command that will create CSVs on your desktop so that you can see the output along each step of the way.
One half of our equation is the weighted job growth. To do this, we first have to get the total number of jobs in a given region for the selected fields.
The first subquery for this half of the equation looks like this:
\copy (
SELECT SUM(employment_2019) as jobs, region
FROM labor.data
WHERE job_category_label IN ('Computer and Mathematical', 'Military Specific')
GROUP BY region
) TO '~/Desktop/all_jobs.csv' With CSV DELIMITER ',' HEADER;
This query just returns the total number of jobs, grouped by region, for the fields selected.
Our next subquery will, for each field, multiply the number of jobs by the weighted job growth value, and group the results by region and job category:
\copy (
SELECT
(employment_2019 * employment_five_year_growth_rate) AS weighted_value,
SUM(employment_2019) AS total_2019,
employment_five_year_growth_rate,
region,
job_category_label
FROM labor.data
WHERE job_category_label IN ('Computer and Mathematical', 'Military Specific')
GROUP BY employment_five_year_growth_rate, employment_2019, region, job_category_label
) TO weighted_job_growth.csv With CSV DELIMITER ',' HEADER;
For each row in the labor data that matches the fields we're interested in, we multiply the number of employees in 2019 by the five year growth rate, which returns a weighted value.
We then add up those weighted values, grouping by region and job type, and divide them by the total number of jobs in the region, which we get from our first subquery. We multiply that number by 100 to get a percentage. That gives us the weighted growth rate for a given region per job type.
Here's the full query:
\copy (
WITH total_jobs AS (
SELECT SUM(employment_2019) as jobs, region
FROM labor.data
WHERE job_category_label IN ('Computer and Mathematical', 'Military Specific')
GROUP BY region
),
jobs_with_weight AS (
SELECT
(employment_2019 * employment_five_year_growth_rate) AS weighted_value,
SUM(employment_2019) AS total_2019,
employment_five_year_growth_rate,
region,
job_category_label
FROM labor.data
WHERE job_category_label IN ('Computer and Mathematical', 'Military Specific')
GROUP BY employment_five_year_growth_rate, employment_2019, region, job_category_label
)
SELECT (SUM(weighted_value) / SUM(total_jobs.jobs)) * 100 AS weighted_job_growth_rate,
jobs_with_weight.region,
jobs_with_weight.job_category_label
FROM jobs_with_weight
JOIN total_jobs
ON total_jobs.region = jobs_with_weight.region
GROUP BY jobs_with_weight.region, jobs_with_weight.job_category_label
) TO '~/Desktop/weighted_job_growth_per_region.csv' With CSV DELIMITER ',' HEADER;
Here's a quick peek at what the table produced by this query looks like:
┌─────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────┬───────────────────────────┐
│ weighted_job_growth_rate │ region │ job_category_label │
├─────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────┤
│ (NULL) │ The Villages, FL │ Military Specific │
│ 8.08320185764483243600 │ Hot Springs, AR │ Computer and Mathematical │
│ 0.64561702837486221600 │ Hickory-Lenoir-Morganton, NC │ Computer and Mathematical │
│ 2.53242205250346038000 │ Crestview-Fort Walton Beach-Destin, FL │ Military Specific │
│ 0.15466346703692049000 │ Worcester, MA-CT │ Military Specific │
│ (NULL) │ St. George, UT │ Military Specific │
│ 0.44609576940604977900 │ Portland-Vancouver-Hillsboro, OR-WA │ Computer and Mathematical │
│ 15.84707572911962775800 │ White, Jackson, Prairie & Woodruff Counties │ Computer and Mathematical │
│ 33.51955307262569832400 │ St. Lawrence County │ Military Specific │
│ 16.48772053669391746400 │ Las Cruces, NM │ Computer and Mathematical │
│ 7.26077642020796139400 │ Jonesboro, AR │ Computer and Mathematical │
│ 11.76641183993670956000 │ Isabella, Gratiot & Clare Counties │ Computer and Mathematical │
│ 7.44545324010031289900 │ Beaumont-Port Arthur, TX │ Computer and Mathematical │
│ 15.91293232866185552100 │ Highland, Clinton & Adams Counties │ Computer and Mathematical │
│ 5.12390292877144782900 │ Des Moines, Lee, Henry & Louisa Counties │ Computer and Mathematical │
│ 3.50914150154451035500 │ Jacksonville, NC │ Military Specific │
│ 3.06273062730627306300 │ Lewiston, ID-WA │ Military Specific │
│ (NULL) │ San German, PR │ Military Specific │
│ 0.63090987525921571400 │ Greensboro-High Point, NC │ Military Specific │
│ 0.01181614084839891300 │ Green Bay, WI │ Military Specific │
│ 2.98367951498535026000 │ Springfield, MO │ Military Specific │
│ 12.10965522702029689500 │ Moore & Montgomery Counties │ Military Specific │
We now need a similar query to get the weighted graduate growth, the other half of our equation.
We still need the other half of our equation: the weighted graduate growth.
First, we'll get the total number of gradutes in each region for the supplied job types. This query is slightly more complicated than before because we have to use a crosswalk.
We need to use a crosswalk table to get the correct degrees, and then query all of our graduate data using that subquery. It looks like this:
\copy (
WITH cip_codes AS (
SELECT DISTINCT cip_4, job_category_label
FROM c2c.crosswalk
WHERE job_category_label IN ('Computer and Mathematical', 'Management')
)
SELECT
SUM(ctotalt_2019) as graduates,
region
FROM c2c.field
JOIN cip_codes ON cip_codes.cip_4 = field.cipcode_rf
JOIN c2c.institution ON institution.unitid = field.unitid
GROUP BY region
) TO '~/Desktop/total_graduates.csv' With CSV DELIMITER ',' HEADER;
Next, we can use the same crosswalk to get the weighted graduate value of graduates in each region, for each job type.
\copy (
WITH cip_codes AS (
SELECT DISTINCT cip_4, job_category_label
FROM c2c.crosswalk
WHERE job_category_label IN ('Computer and Mathematical', 'Management')
)
SELECT
(ctotalt_2019 * tot_five_grw_rt) AS weighted_value,
region,
cip_codes.job_category_label
FROM c2c.field
JOIN cip_codes ON cip_codes.cip_4 = field.cipcode_rf
JOIN c2c.institution ON institution.unitid = field.unitid
WHERE institution.region = 'Mount Vernon-Anacortes, WA'
GROUP BY tot_five_grw_rt, region, ctotalt_2019, cip_codes.job_category_label
) TO '~/Desktop/weighted_grad_growth.csv' With CSV DELIMITER ',' HEADER;
Finally, we want to aggregate all of these values into a single percentage per region and job type. To do this, according to the spreadsheets that Michael supplied, we should add up all the weighted values, grouping by region and job category label, and divide them by the total that we calculated earlier.
\copy (
WITH cip_codes AS (
SELECT DISTINCT cip_4, job_category_label
FROM c2c.crosswalk
WHERE job_category_label IN ('Computer and Mathematical', 'Management')
),
total_grads AS (
SELECT SUM(ctotalt_2019) as grads, region
FROM c2c.field
JOIN cip_codes ON cip_codes.cip_4 = field.cipcode_rf
JOIN c2c.institution ON institution.unitid = field.unitid
GROUP BY institution.region
),
grads_with_weight AS (
SELECT
(ctotalt_2019 * tot_five_grw_rt) AS weighted_value,
region,
cip_codes.job_category_label
FROM c2c.field
JOIN cip_codes ON cip_codes.cip_4 = field.cipcode_rf
JOIN c2c.institution ON institution.unitid = field.unitid
GROUP BY tot_five_grw_rt, region, ctotalt_2019, cip_codes.job_category_label
)
SELECT
(SUM(weighted_value) / SUM(total_grads.grads)) * 100 AS weighted_grad_growth_rate,
grads_with_weight.region,
grads_with_weight.job_category_label
FROM grads_with_weight
JOIN total_grads
ON total_grads.region = grads_with_weight.region
GROUP BY grads_with_weight.region, grads_with_weight.job_category_label
) TO '~/Desktop/weighted_grad_growth.csv' WITH CSV DELIMITER ',' HEADER;
With the previous two queries, we've compiled rows of data that contain:
- The weighted job growth rate, per region, per job type
- The weighted graduate growth rate, per region, per job type
We can now aggregate all of this data together to produce the various numbers we need for our visualization.
\copy (
WITH total_jobs AS (
SELECT SUM(employment_2019) as jobs, region
FROM labor.data
WHERE job_category_label IN ('Computer and Mathematical', 'Management')
GROUP BY region
),
jobs_with_weight AS (
SELECT
(employment_2019 * employment_five_year_growth_rate) AS weighted_value,
SUM(employment_2019) AS total_2019,
employment_five_year_growth_rate,
region,
job_category_label
FROM labor.data
WHERE job_category_label IN ('Computer and Mathematical', 'Management')
GROUP BY employment_five_year_growth_rate, employment_2019, region, job_category_label
),
job_rate AS (
SELECT (SUM(weighted_value) / SUM(total_jobs.jobs)) * 100 AS weighted_job_growth_rate,
jobs_with_weight.region,
jobs_with_weight.job_category_label
FROM total_jobs
JOIN jobs_with_weight
ON total_jobs.region = jobs_with_weight.region
GROUP BY jobs_with_weight.region, jobs_with_weight.job_category_label
),
cip_codes AS (
SELECT DISTINCT cip_4, job_category_label
FROM c2c.crosswalk
WHERE job_category_label IN ('Computer and Mathematical', 'Management')
),
total_grads AS (
SELECT SUM(ctotalt_2019) as grads, region
FROM c2c.field
JOIN cip_codes ON cip_codes.cip_4 = field.cipcode_rf
JOIN c2c.institution ON institution.unitid = field.unitid
GROUP BY institution.region
),
grads_with_weight AS (
SELECT
(ctotalt_2019 * tot_five_grw_rt) AS weighted_value,
region,
cip_codes.job_category_label
FROM c2c.field
JOIN cip_codes ON cip_codes.cip_4 = field.cipcode_rf
JOIN c2c.institution ON institution.unitid = field.unitid
GROUP BY tot_five_grw_rt, region, ctotalt_2019, cip_codes.job_category_label
),
grad_rate AS (
SELECT
(SUM(weighted_value) / SUM(total_grads.grads)) * 100 AS weighted_grad_growth_rate,
grads_with_weight.region,
grads_with_weight.job_category_label
FROM grads_with_weight
JOIN total_grads
ON total_grads.region = grads_with_weight.region
GROUP BY grads_with_weight.region, grads_with_weight.job_category_label
)
SELECT
(grad_rate.weighted_grad_growth_rate - job_rate.weighted_job_growth_rate) AS spread,
job_rate.*,
grad_rate.weighted_grad_growth_rate
FROM job_rate
FULL JOIN grad_rate
ON grad_rate.region = job_rate.region
AND job_rate.job_category_label = grad_rate.job_category_label;
) TO '~/Desktop/weighted_final.csv' WITH CSV DELIMITER ',' HEADER;