Which Model for Poverty Prediction(Working Paper by Paolo Verme 2020)Philipp KollendaVrije Universiteit Amsterdam18 June 2021 (last updated: 17 June 2021)1 / 15

Why Targeting

Determine eligibility for a program.
Which measure? Which cut-off?

Consumption (Brown et al.; Verme), Rankings (Martin), ... Absolute poverty (line) or relative poverty (rate)

The relevant measure may not be available for the entire sample 😢
The relation between poverty line and poverty rate may be unknown 😢

2 / 15

Income distribution of the Uganda LSMS-ISA Training dataset (random subset of 50%).
If we know the income distribution we can set the poverty rate such that it exactly encompasses the chosen poverty line. But, if we do not know the income distribution we need to fix one of the two and the predicted distribution may be off (next slide).

How it started

How it's going

Fix poverty line at 14,925 is too low to reach the targeted 20 percent in the testing data.
Fix poverty rate at 16,063 (for 20 percent) is too high to reach the targeted 20 percent.

3 / 15

Left is observed training data income distribution. Right is predicted income distribution for testing data, but the true underlying income distribution is not observed.
Again LSMS-ISA Uganda. OLS model like in Brown et al.
Brown et al. look at both and recommend in practice to fix the poverty rate (focus on exclusion error), Verme fixes poverty line and compares model performance when poverty line changes.

A combined framework

Brown et al. & Verme

Modelling: $\begin{aligned} y_{i} = α + β x_{i} + ε_{i} \\ 1 (y_{i} \leq z) = α^{'} + β^{'} x_{i} + ϵ_{i} \end{aligned}$ Choice of outcome (consumption or poverty) and model (Verme: OLS/Logit + Random Forest and LASSO)¹: 6 models
Prediction (out of sample): $\begin{aligned} \hat{y_{i}} = \hat{α} + \hat{β} x_{i} \\ {\hat{p}}_{i} = P [y_{i} \leq z | x_{i}] = {\hat{α}}^{'} + {\hat{β}}^{'} x_{i} + ϵ_{i} \end{aligned}$

[1] This is the part where Brown et al. compare OLS, quantile regression, "poverty-weighted least-square" for basic and extended set of covariates.

4 / 15

Point for discussion: how important is training and testing versus simply estimating the model.
Smaller point, Verme splits error term into a random error and a modelling error, but those are not separately identifiable.

A combined framework (cont.)

Modelling: $y_{i} = α + β x_{i} + ε_{i}$
Prediction: $\hat{y_{i}} = \hat{α} + \hat{β} x_{i}$
Classification: Use the predictions to classify into poor or non-poor $\begin{aligned} y_{i} \to poor if & \hat{y_{i}} \leq z \\ 1 (y_{i} \leq z) \to poor if & {\hat{p}}_{i} \geq τ \end{aligned}$ $τ$ is a pre-specified probability cut-off (unclear what Verme uses).

5 / 15

All classifications will have errors

Poverty Confusion Matrix

	Predicted Non-Poor	Predicted Poor
Real Non-Poor	True Negative (TN)	False Positive (FP)
Real Poor	False Negative (FN)	True Positive (TP)

To evaluate different models we need a targeting measure (Verme: "objective function").

Verme: "coverage rate" (= 1 - Exclusion Error Rate, $1 - \frac{\sum 1 ({\hat{y}}_{i} > z | y_{i} \leq z)}{\sum 1 (y_{i} \leq z)}$ )
Verme: "leakage rate" (= Inclusion Error Rate, $\frac{\sum 1 (y_{i} > z | \hat{y} \leq z)}{\sum 1 ({\hat{y}}_{i} \leq z)}$ )

And some additional measures

6 / 15

We need a way to evaluate the different predictions. Typically we use the mean squared error: $\frac{1}{N} \sum_{i = 1}^{N} (y_{i} - {\hat{y}}_{i})^{2}$ . But here we do not care about all errors equally. Instead, formulate targeting measure in terms of coverage of program.

Exclusion Rate is share of true poor which are misclassified as non-poor. True poor are wrongly excluded. Inclusion Rate is share of predicted poor (Ravallion, true poor Verme) which are misclassified as poor. True non-poor are wrongly included.

Verme: Data and Results

7062 households (3482 in testing data).
6 models (OLS, Logit, Random Forest x2 , LASSO x2) at "simplest Stata specification."
Simplest consumption model: gender, age, marital status and skills of the head of the household, household size and urban-rural location

7 / 15

You can tell that it is a working paper...
Undisclosed middle-income-country
Poverty line set at median value. But unclear how exactly, probably look at the median value of the training data (or the entire data?).
Points to discuss: with so few variables, using regularization techniques like LASSO and Random Forest makes little sense.
Brown et al. had IER of 0.2-0.35 and EER of 0.25-0.45 for poverty line at 40 percent.

Verme: Coverage Curves

8 / 15

Verme plots coverage (1-EER) and leakage (IES) rate for different values of the poverty line for all three models. Unclear results, most clear for binary outcome and random forest dominated by logit and lasso.
Question: Is this the right comparison? The poverty line does not vary as much so it seems there are other parameters for which we would like to know how the dominance of models varies. Most notably we would want first a measure of precision, no?

Application: Uganda LSMS

2744 observations and split them equally into training and testing data.
For fixed poverty line at $z = F^{- 1} (0.2)$ and fixed poverty rate at H=0.2, I calculate Headcount, IER and EER or TER using the basic and extended OLS model like in Brown et al. and add LASSO and Random Forest models.
Make 500 repeated testing - training splits and re-estimate the models to get bootstrapped precisions for the Headcount, IER, EER and TER prediction errors.
Many limitations and to-do's

9 / 15

I was not convinced by the comparisons in the Verme paper. So, I did it again for the Uganda LSMS-ISA dataset which is used in the Brown et al. paper. The dataset has 2744 observations after cleaning everything, attempting to do it exactly like the Brown et al. paper. There are a couple of limitations (most importantly survey weights) but its a start.

Addition to Brown et al. Add machine learning techniques (LASSO and Random Forest) and divide into testing and training.
Addition to Verme. Use a more sophisticated consumption model where the ML techniques may have an advantage. Although it would still be better to have interaction terms and polynomials.
And then, I also get estimates of the precision of the predictions with the competing algorithms.

My results

Results for z=0.4

10 / 15

Brown et al. had IER = 0.403, EER = 0.619 and TER 0.344 for poverty line at 20 percent.

Poverty rate at predicted poverty lines

11 / 15

Questions

In-sample versus out-of-sample testing. Is this crucial? Verme says:

The choice of the optimal model depends on the location of the poverty line, the choice of objective function and the particular income distribution at hand. Unlike current practices, it is essential to test alternative models and perform stochastic dominance analysis before selecting the optimal model

How can we do this if we don't have the data and need to do out-of-sample validation?

How important is it to consider the precision of targeting models? Bootstrap reflects sampling uncertainty, but what if we have data of whole population?

12 / 15

More Targeting Measures

Specificity Rate = 1 - Leakage Rate
Precision = TP/(TP+FP) share of all predicted poor that are poor
Accuracy = (TP+TN)/(TP+TN+FP+FN) share of all observations correctly classified
F2 = $5 * T P / (5 * T P + 4 * F N + F P)$ 😕
Chi2 = $\sum \frac{(O_{i j} - E_{i j})^{2}}{E_{i j}}$ 😮
Chi2 likelihood ratio

Back to presentation

13 / 15

Limitations of Uganda Application

Survey weights are not used (not trivial with LASSO and Random Forest)
Clustering at region (ok) and PSU level (where in data?)
Correct bootstraps? Some people train on full sample and validate on sub-sample. I randomly split data but could also sample random subsets. Point estimate, first draw or average of bootstraps?

To results of application

14 / 15

My results

Back to main results

15 / 15

Brown et al. had IER = 0.28, EER = 0.486 and TER 0.326 for poverty line at 40 percent.

Why Targeting

Determine eligibility for a program.
Which measure? Which cut-off?

Consumption (Brown et al.; Verme), Rankings (Martin), ... Absolute poverty (line) or relative poverty (rate)

The relevant measure may not be available for the entire sample 😢
The relation between poverty line and poverty rate may be unknown 😢

2 / 15

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help