Earnings distribution tables


Labour market statistics and uprating scripts

The following scripts are in both the "LMDSA 2018" and "QLFS 2010" folders. They provide summary statistics on the earnings distribution, disaggregated by various sectors and industries, and employment levels in the population (as found by using the person weight).


This script produces output tables ("DataOUT/distribution_earnings_*.html") of labour market earnings quantiles and means, straight from the PALMS dataset. I originally intended on linking the output tables to a LaTeX document, but Andrew wanted to use Word for the first paper, so I then switched to producing HTML tables, which can be opened in Word.

Perhaps Excel output would have been fine; although I thought the tables were going to be for final publication not further analysis.

This script is called after loading the dataset in earnings_industry_<year>_base.R, earnings_industry_<year>_linear.R or earnings_industry_<year>_loglinear.R. The latter two datasets make adjustments what we consider appropriate for Social Accounting Matrix (SAM) analysis.


This script loads the normal PALMS dataset from one year, with imputed earnings values for outliers and refused responses. It splits up the services sector into public and private, and sets the base period of real earnings to 2020.


This script does the same as the base script above, and also:

  • Multiplies earnings levels in different industries upwards, as per Donaldson's calculations, explained in Donaldson and Horn's labour market statistics paper.

  • Increases the weight for mining, as the mining sector is underrepresented in the QLFS survey. Also, prior to 2015 the person weight for agriculture should be about 20% higher, as there is a discontinuity in the series.


This script also increases earnings levels by industry, but on a log-linear scale (by exponentiating the earnings levels). These parameters are optimized so that the mean earnings in each industry are the same as the linear uprating as in the previous script.

The weight for mining is increased in the same way as the previous script, and the weight for agriculture is also made 20% higher than the normal PALMS dataset for years prior to 2015.


This script runs "Common Scripts/Excel_earnings_deciles_sectors.do". It outputs Excel tables in "DataOUT/Earnings quantiles (Excel)" which show the deciles of the earnings distribution, as the total, and disaggregated by public/private sector, formal/informal sector, industry, UIF-contributors, and employers versus employees.

Tables (means of tenths).do

This script runs "Common Scripts/Excel_earnings_tenths_sectors.do". It is similar to the previous script, but it shows the mean earnings level within each tenth of the distribution. It outputs Excel tables in "DataOUT/Mean earnings of tenths (Excel)". This is what we base our Excel simulations on, and these summary distributions can be used to check the means from the earnings_industry R scripts.

LMDSA 2018/Scripts/social-insurance-simulation

app.R is the source code for the web app.


The main output files are described above, although there are two more important data files to mention here.

PALMS with imputed earnings and UIF.dta

This is the dataset we use for the Excel earnings distribution files, in our Stata scripts that are presented above. It is the result of "imputation.do" before the following two lines are run

drop if realearnings==0

keep if isuif==1 & realearnings<.

Where isuif is our UIF dummy with imputations. The next data file is then saved:


This focuses on the distribution of earnings for those that contribute to the UIF. We use this file (with only several variables selected) in the Shiny social insurance web app.