Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide
  • Income distribution of the Uganda LSMS-ISA Training dataset (random subset of 50%).
  • If we know the income distribution we can set the poverty rate such that it exactly encompasses the chosen poverty line. But, if we do not know the income distribution we need to fix one of the two and the predicted distribution may be off (next slide).

Which Model for Poverty Prediction

(Working Paper by Paolo Verme 2020)

Philipp Kollenda

Vrije Universiteit Amsterdam

18 June 2021 (last updated: 17 June 2021)

1 / 15

Why Targeting

  • Determine eligibility for a program.
  • Which measure? Which cut-off?

Consumption (Brown et al.; Verme), Rankings (Martin), ... Absolute poverty (line) or relative poverty (rate)

  • The relevant measure may not be available for the entire sample 😢
  • The relation between poverty line and poverty rate may be unknown 😢
2 / 15
  • Income distribution of the Uganda LSMS-ISA Training dataset (random subset of 50%).
  • If we know the income distribution we can set the poverty rate such that it exactly encompasses the chosen poverty line. But, if we do not know the income distribution we need to fix one of the two and the predicted distribution may be off (next slide).

How it started

How it's going

  • Fix poverty line at 14,925 is too low to reach the targeted 20 percent in the testing data.
  • Fix poverty rate at 16,063 (for 20 percent) is too high to reach the targeted 20 percent.
3 / 15
  • Left is observed training data income distribution. Right is predicted income distribution for testing data, but the true underlying income distribution is not observed.
  • Again LSMS-ISA Uganda. OLS model like in Brown et al.
  • Brown et al. look at both and recommend in practice to fix the poverty rate (focus on exclusion error), Verme fixes poverty line and compares model performance when poverty line changes.

A combined framework

Brown et al. & Verme

  1. Modelling: yi=α+βxi+εi1(yiz)=α+βxi+ϵi Choice of outcome (consumption or poverty) and model (Verme: OLS/Logit + Random Forest and LASSO)1: 6 models

  2. Prediction (out of sample): ^yi=ˆα+ˆβxiˆpi=P[yiz|xi]=ˆα+ˆβxi+ϵi

[1] This is the part where Brown et al. compare OLS, quantile regression, "poverty-weighted least-square" for basic and extended set of covariates.

4 / 15
  • Point for discussion: how important is training and testing versus simply estimating the model.
  • Smaller point, Verme splits error term into a random error and a modelling error, but those are not separately identifiable.

A combined framework (cont.)

  1. Modelling: yi=α+βxi+εi

  2. Prediction: ^yi=ˆα+ˆβxi


  3. Classification: Use the predictions to classify into poor or non-poor yipoor if ^yiz1(yiz)poor if ˆpiτ τ is a pre-specified probability cut-off (unclear what Verme uses).

5 / 15

All classifications will have errors

Poverty Confusion Matrix

Predicted Non-Poor Predicted Poor
Real Non-Poor True Negative (TN) False Positive (FP)
Real Poor False Negative (FN) True Positive (TP)

To evaluate different models we need a targeting measure (Verme: "objective function").

Verme: "coverage rate" (= 1 - Exclusion Error Rate, 11(ˆyi>z|yiz)1(yiz))
Verme: "leakage rate" (= Inclusion Error Rate, 1(yi>z|ˆyz)1(ˆyiz))

And some additional measures

6 / 15

We need a way to evaluate the different predictions. Typically we use the mean squared error: 1NNi=1(yiˆyi)2. But here we do not care about all errors equally. Instead, formulate targeting measure in terms of coverage of program.

Exclusion Rate is share of true poor which are misclassified as non-poor. True poor are wrongly excluded. Inclusion Rate is share of predicted poor (Ravallion, true poor Verme) which are misclassified as poor. True non-poor are wrongly included.

Verme: Data and Results

  • 7062 households (3482 in testing data).
  • 6 models (OLS, Logit, Random Forest x2 , LASSO x2) at "simplest Stata specification."
  • Simplest consumption model: gender, age, marital status and skills of the head of the household, household size and urban-rural location
Continuous
Binary
Measure
OLS
Random Forest
LASSO
Logit
Random Forest
LASSO
Undercoverage (EER)
0.4
0.33
0.4
0.3
0.35
0.3
Leakage (IER)
0.24
0.34
0.24
0.31
0.36
0.31
1–2 of 4 rows
7 / 15
  • You can tell that it is a working paper...
  • Undisclosed middle-income-country
  • Poverty line set at median value. But unclear how exactly, probably look at the median value of the training data (or the entire data?).
  • Points to discuss: with so few variables, using regularization techniques like LASSO and Random Forest makes little sense.
  • Brown et al. had IER of 0.2-0.35 and EER of 0.25-0.45 for poverty line at 40 percent.

Verme: Coverage Curves

8 / 15
  • Verme plots coverage (1-EER) and leakage (IES) rate for different values of the poverty line for all three models. Unclear results, most clear for binary outcome and random forest dominated by logit and lasso.
  • Question: Is this the right comparison? The poverty line does not vary as much so it seems there are other parameters for which we would like to know how the dominance of models varies. Most notably we would want first a measure of precision, no?

Application: Uganda LSMS

  • 2744 observations and split them equally into training and testing data.

  • For fixed poverty line at z=F1(0.2) and fixed poverty rate at H=0.2, I calculate Headcount, IER and EER or TER using the basic and extended OLS model like in Brown et al. and add LASSO and Random Forest models.

  • Make 500 repeated testing - training splits and re-estimate the models to get bootstrapped precisions for the Headcount, IER, EER and TER prediction errors.

  • Many limitations and to-do's

9 / 15

I was not convinced by the comparisons in the Verme paper. So, I did it again for the Uganda LSMS-ISA dataset which is used in the Brown et al. paper. The dataset has 2744 observations after cleaning everything, attempting to do it exactly like the Brown et al. paper. There are a couple of limitations (most importantly survey weights) but its a start.

  • Addition to Brown et al. Add machine learning techniques (LASSO and Random Forest) and divide into testing and training.
  • Addition to Verme. Use a more sophisticated consumption model where the ML techniques may have an advantage. Although it would still be better to have interaction terms and polynomials.
  • And then, I also get estimates of the precision of the predictions with the competing algorithms.

My results

Results for z=0.4

10 / 15

Brown et al. had IER = 0.403, EER = 0.619 and TER 0.344 for poverty line at 20 percent.

Poverty rate at predicted poverty lines

11 / 15

Questions

  • In-sample versus out-of-sample testing. Is this crucial? Verme says:

    The choice of the optimal model depends on the location of the poverty line, the choice of objective function and the particular income distribution at hand. Unlike current practices, it is essential to test alternative models and perform stochastic dominance analysis before selecting the optimal model

How can we do this if we don't have the data and need to do out-of-sample validation?

  • How important is it to consider the precision of targeting models? Bootstrap reflects sampling uncertainty, but what if we have data of whole population?
12 / 15

More Targeting Measures

  • Specificity Rate = 1 - Leakage Rate
  • Precision = TP/(TP+FP) share of all predicted poor that are poor
  • Accuracy = (TP+TN)/(TP+TN+FP+FN) share of all observations correctly classified
  • F2 = 5TP/(5TP+4FN+FP) 😕
  • Chi2 = (OijEij)2Eij 😮
  • Chi2 likelihood ratio


Back to presentation

13 / 15

Limitations of Uganda Application

  • Survey weights are not used (not trivial with LASSO and Random Forest)
  • Clustering at region (ok) and PSU level (where in data?)
  • Correct bootstraps? Some people train on full sample and validate on sub-sample. I randomly split data but could also sample random subsets. Point estimate, first draw or average of bootstraps?

To results of application
14 / 15

My results

Back to main results

15 / 15

Brown et al. had IER = 0.28, EER = 0.486 and TER 0.326 for poverty line at 40 percent.

Why Targeting

  • Determine eligibility for a program.
  • Which measure? Which cut-off?

Consumption (Brown et al.; Verme), Rankings (Martin), ... Absolute poverty (line) or relative poverty (rate)

  • The relevant measure may not be available for the entire sample 😢
  • The relation between poverty line and poverty rate may be unknown 😢
2 / 15
  • Income distribution of the Uganda LSMS-ISA Training dataset (random subset of 50%).
  • If we know the income distribution we can set the poverty rate such that it exactly encompasses the chosen poverty line. But, if we do not know the income distribution we need to fix one of the two and the predicted distribution may be off (next slide).
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow