Data

Post-Apartheid Labour Market Series (PALMS)

To undertake the data analysis, using the latest (2018) Labour Market Dynamics in South Africa (LMDSA) dataset and most recent quarters of the Quarterly Labour Force Survey (QLFS), Aidan Horn updated the Post-Apartheid Labour Market Series (PALMS) on his computer. PALMS renames variables from the QLFS, so all the code in the TERS folder is written for the PALMS-named variables. The code to compile PALMS code is openly available on DataFirst's website and the dataset itself should be updated regularly (although it is not). We feel that it would breach DataFirst's protocol to publicly share our modified PALMS syntax files or our own palms2020.dta dataset. You will need to spend a few days figuring out how to update PALMS yourself, in order to use the code provided in the TERS project folder or in the Shiny web app backend. A quicker option is that you can email us directly at hrnaid001@myuct.ac.za , asking us to share the raw data files with you; the cleaned data files are in the Dropbox folder.

Quarterly Labour Force Survey (QLFS) data

The Quarterly Labour Force Survey (QLFS) collects earnings data in the household questionnaire. However, the earnings data is delayed by a few years, being released in the Labour Market Dynamics in South Africa (LMDSA) dataset, whilst the other variables are released after only a few months in the QLFS dataset. The following paper describes how the LMDSA dataset underestimates average earnings levels (when compared to the System of National Accounts (SNA) aggregate earnings levels). We need information on the distribution of earnings in order to run simulations on microdata.

Donaldson, A.R. & Horn, A.J. 2021. Employment and earnings by industry before Covid-19. (SALDRU Working Paper 277). Cape Town: Southern Africa Labour and Development Research Unit, School of Economics, University of Cape Town. Available: http://opensaldru.uct.ac.za/handle/11090/1005

Abstract

Employment levels and the distribution of earnings by industry in 2010 and 2019/20 are examined in this paper, illustrating trends over this decade before the impact of Covid-19 and the accompanying economic downturn. Drawing on both the Quarterly Labour Force Survey (QLFS) and the Quarterly Employment Statistics (QES) aggregates, we provide estimates of the distribution of earnings consistent with the System of National Accounts (SNA) income and production aggregates.

We draw attention to similarities and differences between the QLFS, QES and SNA data sources, and note differences in the implicit trends over the 2010-2020 decade. We provide distributions of gross earnings within eleven employment and industry sectors, consistent with the national accounts compensation of employees’ aggregates adjusted to include earned income attributable to employers and the self-employed in unincorporated enterprises.

We find evidence that the national accounts have under-estimated growth in earnings since 2010, and that the levels of both nominal and real GDP in recent years are understated. Nonetheless we find that QLFS estimates of earnings have to be raised by about 50 per cent in order to generate earnings levels consistent with the national production accounts. The adjustments required vary considerably by industry. We compile uprated earnings distributions by industry in two ways: aligned with industry-specific SNA aggregate earnings, and uniformly uprated to align with aggregate SNA earnings.

Both employment and earnings were severely disrupted by the 2020 Covid-19 economic shock. At the time of writing (early 2021) the economic recovery path is far from clear. This paper provides sectoral benchmark data from official sources against which the recovery might be assessed, but also indicates that there are substantial discrepancies between the available measures of earnings by industry.


We thus increase the earnings variable by industry (using either a linear or a log-linear multiplication), creating new datasets with which to work.

Future development: The web app allows one to choose which version of the LMDSA dataset to use, when simulating total costs.

Adjusted QLFS/PALMS earnings data

Andrew's Excel file which calculates the uprating factors.

We provide three sets of datasets: a 'base' dataset, a linearly-uprated dataset and a log-linearly-uprated dataset. All have the upward-revised weights for mining, and agriculture prior to 2015. We recommend the log-linearly-uprated datasets. The microdata is cleaned in imputation.do, as described in the Master workflow on the Common Scripts page. The scripts in LMDSA 2018 save the cleaned datasets, and the scripts highlighted below give the uprating and weight adjustment factors.

Excel files

The simulations can be run on the means of the tenths of the earnings distributions.

These distributions are disaggregated by industry, public/private sector, formal/informal sector, UIF-contributors, and employers versus employees

.dta files

Stata scripts

You can use these Stata scripts to create the 2018 microdatasets yourself.

R scripts

You can use these R scripts to create the 2018 microdatasets yourself.