Common Scripts
I created the "Common Scripts" folder so that the same code can be re-used across different years, without having to copy small edits over across different source files. The files in "Common Scripts" are called after the required dataset is loaded in the respective file under the year folders, with global macros set.
Master
Read "LMDSA 2018/Scripts/Master earnings analysis.bat" to see the workflow for our project. The contents of that file follows. I used R and Stata interchangably (R more for graphics and wrangling; Stata more for cleaning and imputation). The .bat file shows how the earnings analysis scripts integrate (for 2018), to produce updated output files, ready for the main LaTeX document.
"QLFS 2020\Scripts\Trends_graphs_national_accounts.R" uses the update of PALMS. R is better for time series analysis (and graphs). This creates "Product metric.xlsx". Regression results can now be \input 'ted into LaTeX.
"LMDSA 2018\Scripts\imputation.do" merges PALMS imputed earnings with the regular PALMS dataset, and imputes the UIF variable (for employers), with probit regressions. It creates "PALMS2018imputedUIF.dta". The table reporting on the imputation success rate can now be \input 'ted into LaTeX.
"LMDSA 2018\Scripts\Kdensities.do" creates individual kernel density plots in Stata, for industries, on the UIF-contributors subsample.
"LMDSA 2018\Scripts\simulation.do" in Stata uses "QLFS 2020/DataOUT/Product metric.xlsx"—i.e. the forecast error for 2020Q2 for total compensation of employees. This exports "ImputedUIF_coverage_rates.dta" (coverage rates).
"LMDSA 2018\Scripts\Earn_densities_sectors_TERScoverage.R" creates faceted density plots by industry, using the coverage rates. R is better for faceted graphs.
"Writing\main.tex" can then be compiled.
simulation.do
The algorithms in this script simulates the cost to government, in Stata, of different social insurance schemes. This is the main focus of this website, and these simulations are reproduced in an interactive way in the Shiny web app, to make it easier for non-Stata users to use the work. It starts of with simulating the increase in revenue from adjusting the UIF contribution threshold.
Earn_densities_sectors_TERScoverage.R
This script compiles earnings density graphs by industry. "Graphs/PDF/propdens_earn_industry.pdf" is used in our Part 3 TERS Outcomes paper and shows where TERS benefits were concentrated on the earnings distribution (it needs "simulation.do" to be run first, as it uses the dataset "DataOUT/ImputedUIF_coverage_rates.dta"). "Graphs/PNG/dens_earn_industry.png" is used in our Part 1 labour market statistics paper and shows the effect of uprating earnings levels.
This script is run by "Tables_v2.do" in the year folder and uses loops. It outputs Excel tables in "DataOUT/Earnings quantiles (Excel)" which show the deciles of the earnings distribution, as the total, and disaggregated by public/private sector, formal/informal sector, industry, UIF-contributors, and employers versus employees.
This script is run by "Tables (means of tenths).do" in the year folder. It is similar to the previous script, but it shows the mean earnings level within each tenth of the distribution. It outputs Excel tables in "DataOUT/Mean earnings of tenths (Excel)".