Estimating the Uncertainty of a Small Area Estimator Based on a Microsimulation Approach

Spatial microsimulation encompasses a range of alternative methodological approaches for the small area estimation (SAE) of target population parameters from sample survey data down to target small areas in contexts where such data are desired but not otherwise available. Although widely used, an enduring limitation of spatial microsimulation SAE approaches is their current inability to deliver reliable measures of uncertainty—and hence confidence intervals—around the small area estimates produced. In this article, we overcome this key limitation via the development of a measure of uncertainty that takes into account both variance and bias, that is, the mean squared error. This new approach is evaluated via a simulation study and demonstrated in a practical application using European Union Statistics on Income and Living Conditions data to explore income levels across Italian municipalities. Evaluations show that the approach proposed delivers accurate estimates of uncertainty and is robust to nonnormal distributions. The approach provides a significant development to widely used spatial microsimulation SAE techniques.

Large-scale surveys are designed to obtain reliable estimates at national level or, in some instances, for large subnational scales such as regions.These can be considered to be the planned domains of the survey sampling design (Benavent and Morales 2016).However, there is a growing demand from both research and policy communities for various local estimates at more detailed spatial resolutions such as municipalities or neighborhoods due to the absence of data at such small area scales from existing census or administrative sources.However, this small area desire frequently encounters a problem of unplanned domains, given that for cost reasons such small areas typically have small or zero sample sizes in the survey sampling design.In these circumstances, commonly used direct estimators such as the Horvitz-Thompson estimator (Horvitz and Thompson 1952) that only use sample survey information either cannot be used (in the case of zero sample size domains) or provide unacceptably large variability in the estimates to be practically useful (in the case of small sample size domains).
In such scenarios, indirect small area estimation (SAE) of target population parameters has become a relatively widely used and increasingly demanded methodological technique via a range of SAE approaches.We refer to Rao and Molina (2015), Whitworth (2013), Rahman and Hardin (2017), and Marshall (2010) for useful methodological reviews on both regression-based and microsimulation-based SAE methods.Spatial microsimulation approaches, sometimes referred to as survey calibration approaches (Espuny-Pujol, Morrissey, and Williamson 2018), represent a family of reweighting approaches to SAE in which the challenge is to reweight the survey units such that they optimally fit the demographic and socioeconomic profile of each small area according to a selected set of benchmark constraints.Part of the appeal of spatial microsimulation approaches to SAE for both researchers and policy users is their intuitive and accessible appeal without much of the complex statistical expertise required within many regression-based SAE methods, particularly as assumptions fail or more complex outcomes are desired.In those circumstances, one advantage of spatial microsimulation approaches over regression-based SAE estimators is that they tend to be more robust to failures in model assumptions, given that as model-assisted estimators it is only necessary that the population be reasonably well described by an assumed model for that model to be valid (Sa ¨rndal, Swensson, and Wretman 1992).
Spatial microsimulation SAE approaches have been used to produce small area estimates across a range of policy areas including child malnutrition (Johnson et al. 2012), obesity (Edwards et al. 2010), fuel poverty (Office for National Statistics 2019), income and poverty (Bell, Basel, and Maples 2016;Pratesi 2016; World Bank 2018), regional planning (Clarke and Holm 1987), participation in sport (Ipsos MORI 2018), and transport (Lovelace, 2016;Ravulaparthy and Goulias 2011;Tribby and Zandbergen 2012).Spatial microsimulation SAE approaches have been well validated against known external data and against alternative regression-based SAE techniques (Moretti and Whitworth 2019;Tanton, Williamson, and Harding 2014;Whitworth and Carter 2015).Spatial microsimulation SAE has also been used effectively to assess the spatial impacts of differing "what if" policy scenarios (Chin and Harding 2006;Cullinan, Hynes, and O'Donoghue 2006;Tanton and Edwards 2013;Williamson, Birkin, and Rees 1998).For example, Campbell and Ballas (2013) introduce the SimAlba spatial microsimulation model Scotland in order to estimate the simulated impact of various policy scenarios on individual's health outcomes.In an Australian context, Tanton and Edwards (2013) use spatial microsimulation SAE to assess the geographical impacts on expected elderly poverty levels from changes to the state pension (Tanton et al. 2009).
However, despite these widespread applications and contributions, an important long-standing limitation of spatial microsimulation SAE approaches is their continued inability to deliver estimates of uncertainty around central point estimates at small area level.More specifically, one faces something of a trade-off with any SAE approach in seeking via the indirect SAE estimates to reduce variance compared to the direct estimates while acknowledging that those direct estimates are unbiased compared to the indirect SAE estimates that are inherently biased.As such, it is essential when calculating the uncertainty of any SAE estimator that the mean squared error (MSE) is used, given that this takes into account both variance and bias.However, calculating MSE is often challenging in an SAE content and particularly so in spatial microsimulation approaches.In the case of design-based estimators, the design weights are known before the sample selection and are therefore nonrandom meaning that analytical approximations of MSE are available (Sa ¨rndal et al. 1992).However, this is no longer the case in spatial microsimulation approaches to SAE when reweighing algorithms are used as the weights become random variables themselves.In these scenarios, analytical approximation of MSE becomes highly challenging, given that bias and variance cannot be computed in closed form such that empirically based resampling techniques are instead required to estimate their uncertainty (Chen and Shen 2015).Some analytical approximations have been suggested in the literature (D'Arrigo and Skinner 2010; Deville and Sa ¨rndal 1992), but Chen and Shen (2015) highlight important practical challenges around the requirement for joint selection probabilities that are rarely computed or known in practice.
Empirical attempts to estimate the uncertainty around spatial microsimulation SAE estimates have been proposed in recent years (Chen and Shen 2015;Nagle et al. 2014;Whitworth et al. 2017), though none entirely successful.In response, this article develops a novel modified parametric bootstrap technique in order to estimate the uncertainty of spatial microsimulation small area estimates based on the MSE such that it captures both bias and variance components in the uncertainty estimate, unlike previous attempts.Our approach benefits from clear statistical properties under a linear model and can be used flexibly across alternative spatial microsimulation SAE techniques.
The remainder of this article is structured as follows.In the second section, the general problem of SAE of the population mean using iterative proportional fitting (IPF) is outlined.In the third section, the challenge of uncertainty estimation in SAE contexts is further described, and our approach to MSE estimation via bootstrap is detailed.In the fourth section, the bootstrap approach is evaluated via a simulation study, and in the fifth section, a practical data application focusing on Italian Statistics on Income and Living Conditions (SILC) data is presented to illustrate the approach.The sixth section provides a concluding discussion focusing on wider implications for spatial microsimulation SAE and potential next steps for research.

SAE With IPF
This section sets out the SAE problem of the population mean and formally introducing the IPF spatial microsimulation approach used in the later development of our proposed uncertainty estimator.

The General SAE Problem for a Small Area Mean
Let us consider a sample s & O of size n drawn from the target finite population O of size N. Let d ¼ 1; . . .; D denote the small areas for which we want to compute the small area estimates.N À n are the nonsampled units and these are denoted by i2s d y di w di , where w di denotes the design weight for ith unit from dth area in s d .The direct survey estimate comes from a standard direct estimator and is based on sample survey information only.They are weighted averages where the weights are the design weight based on the complex survey design.As a consequence, there is a need in such circumstances to consider indirect SAE estimation techniques using auxiliary information if one wishes small area estimates either at all (in the case of zero survey sample sizes) or with reduced uncertainty (in the case of low small area survey sample sizes).There is a bias-variance trade-off in operation when doing so such that any reductions in variability must naturally be balanced with acknowledgment of increased bias in the indirect SAE estimates compared to the unbiased direct estimates (Rahman and Harding 2017;Rao and Molina 2015).

Reweighing Using the IPF Algorithm
IPF is one of three main spatial microsimulation approaches to SAE-IPF, generalized regression reweighting and combinatorial optimization.To demonstrate our approach to the estimation of uncertainty in spatial microsimulation SAE, we focus on the IPF algorithm, given that this is both widely used and a constructively challenging test of our MSE estimator, given that its iterative nature renders the survey weights random variables themselves (Chen and Shen 2015;Rahman et al. 2013;Simpson and Tranmer 2005).
Like all spatial microsimulation SAE approaches, IPF can be understood as a reweighting optimization problem where the aim is to reweight the survey units (e.g., individuals or households) such that they optimally fit the demographic and socioeconomic profile of each small area according to a selected set of benchmark constraints (e.g., age-sex, employment status, ethnicity, health, education).Deville and Sa ¨rndal (1992) provide a statistical theory of these reweighting techniques and alternative approaches.For each local area, the result is a tailored set of reweighted survey cases that fit to the benchmark characteristics of that small area in terms both of total population and the profile of that population across the benchmark constraints.A key data set created during IPF is a weights matrix giving new weights for each survey unit in each separate small area.For each small area, those final IPF weights show how representative each survey unit is of each area given their respective characteristics across the benchmarks.Across all survey individuals, these reweighted units sum to the small area population total and map onto its population profile across the benchmarks.As such, these reweighted data provide a valuable synthetic micropopulation for each small area that can be employed in further local analyses (e.g., "what if" policy simulations) as desired (Anderson 2013;Lovelace and Dumont 2016).
Formally, IPF can be understood as follows.Let w i be the initial weight (usually the survey design weight) for i 2 s.The calibration problem is areaspecific and therefore generates new weights denoted by w Ã i for i 2 s d for area d that satisfy the calibration equation given by , where x i is a vector of auxiliary variables.Here, w Ã i minimizes a given distance function between w Ã i ; i 2 s d È É and w i ; i 2 s d f g .IPF is the exponential case within a wider family of synthetic reweighting algorithms (Deville and Sa ¨rndal 1992).Its constrained optimization problem is given as follows, where a i denotes the initial weights, usually the design weights (Chen and Shen 2015; Deville and Sa ¨rndal 1992): min : As noted above, (1) unfortunately does not have a closed-form solution such that solution via analytical approximation is required.This is however highly challenging.The IPF algorithm is therefore employed iteratively across the benchmark constraints in order to estimate the final weights for each survey unit in order to derive a solution empirically.To describe the IPF method more fully, the terminology and notation used by Kolenikov (2014) is followed: 1. Initialize the iteration counter t 0 and the weights as w 0; p i w i .2. Increment the iteration counter t t þ 1, thus updating the weights as w t;0 i w tÀ1; p i .

1790
Sociological Methods & Research 52(4) 3. Update the weights through each of the benchmark constraint variables in turn, v ¼ 1; . . .; p: 4. If the discrepancies between P i2s w t; p i x v (i.e., the sample totals with the new weights) and T X v ð Þ are within a priori defined tolerance for all v ¼ 1; . . .; p, then declare convergence and go to step 5, otherwise return to step 2. 5.The weights w t; p i are the final calibrated weights and are denoted by The benchmark constraints used for the survey reweighting are usually categorical variables in real applications.Therefore, where l ¼ 1; . . .; p denotes the lth benchmark constraint and d is the number of categories of the lth benchmark constraint.Anderson (2007) suggests that R ¼ 20 is sufficient as a conservative guide to the number of loops through the benchmark constraints in order to optimize the calibration to the set of benchmarks.The IPF algorithm is area-specific, and the IPF reweighting therefore needs to be iterated for each small area d ¼; 1 . . .; D.
The IPF estimator can therefore be defined as follows: where w Ã di denotes the IPF-calibrated survey weight for unit ith from area dth.It can be noted that y i appears for i ¼ 1; . . .; n, which means that Y IPF d belongs to the class of small area synthetic estimators (Rao and Molina 2015).Of course, in order for in terms of its combination of variance and bias as measured by the MSE, the auxiliary variables used in the calibration problem need to be related sufficiently to the target variable Y, as with all such model-based or model-assisted small area estimators (Fuller 2002).

Measuring the Uncertainty: the MSE Estimator of Y IPF d
The quality of an estimate is assessed with reference both to its accuracy (bias) and to its precision (variability).It is therefore important to capture both aspects in any measure of uncertainty (Dodge and Commenges 2006;Statistics Canada 2009).The bias of an estimate can be defined as its degree to which it describes the measured phenomena correctly, in other words its difference from the true (though often unobserved) population value.In contrast, the variability of an estimate relates to how closely repeated observations confirm themselves (e.g., under random sampling).Figure 1 reproduces a visual summary of these two considerations from Ferrante and Cameriere (2009).
The MSE is the second moment about the origin of the error and thus takes into account both bias and variance.It is therefore an appropriate measure for our proposed estimation of uncertainty around spatial microsimulation SAE estimator.For the design unbiased direct survey estimates, the MSE is equal to the variance, while for the indirect SAE estimator, the MSE is equal to the bias squared plus the variance.As such, the attractiveness of any indirect SAE estimator compared to the unbiased direct estimator is dependent upon the reductions in variance in any SAE approach exceeding its increases to bias such that the MSE of the indirect SAE estimator is smaller than the MSE of the direct estimator.This is possible in SAE contexts where small areas have low or no survey sample sizes such that direct small area estimates are either nonviable or come with large variance.
Capturing the MSE via resampling techniques such as the bootstrap is common within regression-based SAE approaches (González-Manteiga et al. 2008b;Marchetti et al. 2018;Moretti, Shlomo, and Sakshaug 2018).Indeed, González-Manteiga et al. (2008b) point out that even when analytical approximations are available, bootstrap resampling might provide more accurate estimates due to its second-order accuracy, a property discussed further in Efron and Tibshirani (1993).However, no similar MSE measures have yet been considered in the spatial microsimulation SAE context where MSE expressions are not available in closed form and where empirical approaches are therefore necessary to explore (Chen and Shen 2015;D'Arrigo and Skinner 2010).This is particularly relevant since the estimation of bias in particular has proven elusive in previous attempts (Chen and Shen 2015;Nagle et al. 2014;Whitworth et al. 2017).
This article responds to this gap through its modification of bootstrap ideas for regression-based small area estimators set out in González-Manteiga et al. (2008b), so that they become suited to the differing technical processes and requirements of spatial microsimulation approaches.The use of models in the estimation of MSE of model-assisted estimators can be found in the regression estimator context where a model unbiased conditional MSE estimator is proposed (Kott 2009).This provides initial motivation for this article to develop and adapt such an approach in the context of where u d and e di are the area random effect and the residual error term, respectively.As with all SAE methods, these are assumed to be independent (Rao and Molina 2015).The model assumes that the population has a twolevel structure where units are nested in areas.This is reasonable in the SAE context given the aim to estimate target parameters of small domains in the population.By doing so, the approach recognizes explicitly that the MSE will be based on a multilevel (two-level) structure and that the intraclass correlation (ICC) will therefore have relevance.The ICC describes the extent to which units (e.g., individuals) within the same higher level unit (e.g.areas) are similar to one another (R.
3. Generate the bootstrap residual error term e d , for every unit i in the sample in area d, for the sample units, i 2 s d ; 4. Calculate the true population means for each small area of the bootstrap population as follows: where x d; pop denotes the means of the known population auxiliary variables for each area d.These may be taken, for instance, from the census or administrative data.

Simulation Study
This section presents the findings from a simulation study to examine the performance of our proposed MSE bootstrap estimator for spatial microsimulation SAE.For a classificatory work on simulation studies in SAE and further theoretical details, we refer to Mu ¨nnich ( 2014).In this model-based simulation, S ¼ 1; 000 populations are generated from model (3), given that estimators such as the MSE estimator depend on model assumptions and hence that it is important to evaluate the statistical properties of our proposed approach under the model.There are no significant computational barriers to the approach, and this is an important practical consideration.Using a standard modern machine, the simulation study took around 10 hours to perform and the application of municipality income in Tuscany in the fifth section took around 20 minutes to perform.The parameters for the simulation are selected from the LANDSAT data that are widely used in SAE simulation settings.These are survey and satellite data for corn and soybeans in 12 Iowa counties obtained from the 1978 June survey of the U.S. Department of Agriculture and from land observatory satellites (see Battese et al. 1988;Datta, Day, and Basawa 1999;Moretti et al. 2018).The small area problem arises in these data since small area sample sizes are small.The simulations are computationally intensive in large population dimensions and are therefore controlled for the purposes of this simulation.All analyses are conducted in R, and details on the code and functions used for the bootstrap are provided in the Online Appendix (which can be found at http://smr.sagepub.com/supplemental/).

Generating the Population
The population is generated using the following parameters: N ¼ 20; 000, D ¼ 80, and 130 N d 420.N d , d ¼ 1; . . .; D is generated from the discrete uniform distribution, N d *dUnif 130; 420 ð Þ , with P D d¼1 N d ¼ 20; 000.y di observations are generated according to the following model: where F 2 Normal; Gumbel; Logistic f g .The rationale for sensitivity testing the performance of our bootstrap estimator across these three distribution types is that the MSE is based on a normality assumption.In line with good practice in previous SAE literature (González-Manteiga et al. 2008b), distributions are chosen deliberately in order to sensitivity test how the estimators perform when the error term e di is not normal but is instead skewed (Gumbel) or symmetric with heavy tails (Logistic).All three distribution types are common in real data applications.
The auxiliary variables are defined as follows: and the regression coefficients are given by the following vector: As noted above, since the data are assumed to take a multilevel structure (units inside areas), the ICC will have relevance and requires sensitivity testing.The ICC varies across applications dependent upon the extent to which the variability in the data observes a hierarchal structure with, for example, values ranging across 0.005, 0.05, and 0.2 with respect to mortality (Ambugo and Hegn, 2015), fear of crime (Whitworth 2012), and well-being (Moretti et al. 2019), respectively.Given that the ICC plays a role in the performance of the model-based MSE estimator, it is important that sensitivity tests are performed around its value within the simulation (Molina, Nandram, and Rao 2014;Moretti et al. 2018).We use the following relationships to explore the role of the ICC in the case of the Normal distribution: s 2 e with s 2 e ¼ 297:71 and r 2 0:01; 0:03; 0:05; 0:08; 0:10; 0:15; 0:20; 0:50 f g : Due to space constraints, in the cases of the Gumbel and Logistic distributions, the ICC is set at a realistic value of 0.05 only (see, e.g., Moretti et al. 2019).
In order to produce the small area IPF estimates, we create the following classes identifying the benchmark constraints related to the covariates x 1 and x 2 : ).These quality measures are evaluated and compared across the areas using the median as a robust central tendency measure (Chambers, Chandra, and Tzavidis 2011;Giusti et al. 2013).

Results
Performance of the bootstrap MSE estimator under different distributional assumptions of e di (r ¼ 0:05).Figure 2 presents the empirical root MSE (ERMSE) of the IPF estimator and the direct estimator.Since the performance is very similar across all three distributions, Figure 2 presents the findings for the Normal case only, ordered by increasing small area sample size.
It can be seen that the IPF synthetic estimator provides estimates with lower MSE than the direct estimator due to the use of related auxiliary variables in reducing variance.This is particularly true for small areas with smaller survey sample sizes where the performance gains of IPF are large relative to direct estimator.This occurs because the ERMSE of the direct estimator naturally depends on the survey sample size: When the survey sample size in the small area d is smaller, the ERMSE tends to increase due to the larger variance around such estimates.As the small area survey sample size increases, the performance gains of the synthetic IPF estimator decline relative to the direct estimator until a point where its performance converges with that of the direct estimator.For reference, Figure 2 looks identical when ordered by sampling fraction rather than sample size.
For each distribution, Figure 3 shows the IPF point estimates across the small areas plotted against the true values observed in the population.under the different distributional scenarios.The true value Y d is shown in the first row, while the direct and IPF central estimates are shown in rows 2 and 3, respectively.The relative bias of those direct and IPF estimates across all the small areas are then shown in the penultimate two rows.It can be seen that the IPF small area estimator returns only negligible biases across the small area even in cases of Gumbel and Logistic distributions of the error term.
Figure 4 moves on to focus on the performance of the bootstrap MSE estimator to calculate the uncertainty around those central small area point estimates in the three Normal, Gumbel, and Logistic distributions, respectively.It can be seen that our proposed bootstrap approach provides nearly unbiased estimates of the true MSE with relative bias centered on and close to zero across the small areas.No association is found between estimate bias and sampling fraction across Figure 4.  Table 2 provides further details of the performance of the bootstrap MSE estimator across the three distributions.In particular, the true MSE (i.e., empirical MSE) in each distribution is compared to our bootstrap MSE estimate and coverage rates (of 95 percent confidence intervals) are also presented.It can be seen that the relative bias values are close to zero and that the MSE bootstrap estimator is nearly unbiased across the small areas in each of the three distributions.
On the role of the ICC in the case of the Normal distribution.On the basis of these analyses, the performance of our proposed MSE estimator is encouraging in terms both of the small relative bias of the MSE estimates and the quality of  the coverage.However, as noted above, it may be that the magnitude of the ICC affects the performance of the MSE estimator.
Table 3 shows the findings of ICC sensitivity analyses with a focus on the Normal distribution where the ICC is varied at several points from 0.01 to 0.50.The results for r ¼ 0:05 shown above are repeated to aid comparison.It can be seen that the relative bias of Y IPF d increases slightly when the ICC increases beyond around 0.15, ranging from À0.001 when r ¼ 0.01-0.006when r ¼ 0:2 and 0.021 when r ¼ 0:50.In terms of the bootstrap MSE estimator, the penultimate row shows that this estimator delivers consistently small relative bias in the MSE though with somewhat weaker performance in coverage at very low levels of ICC as displayed in the final row.When the ICC is small, the MSE is slightly underestimated as seen by looking the relative bias and by comparing the empirical MSE (line two) with our bootstrap MSE (line 3).However, these relative bias estimates remain acceptable.

Application to Small Area Income Estimation in Italian Municipalities
This section provides a real-world application of an IPF small area estimator and, more centrally for this article, of our proposed bootstrap estimator of its MSE.The application used is the estimation of mean equivalized annual household disposable income (in Euros) for the municipalities of Tuscany region (D ¼ 287).The survey data used are provided from the 2009 European Union SILC (EU-SILC).These EU-SILC data contain a sample of 1,448 households for Tuscany.EU-SILC is designed to deliver estimates at the national and also regional (NUTS-2) level (Giusti, Masserini, and Pratesi 2015).Therefore, this situation is typical of most survey situations in that EU-SILC cannot be used to derive usable income estimates at smaller subregional geographies such as municipalities due to low or zero survey sample sizes.The household income variable of interest is given in the EU-SILC data and equivalized using Eurostat's official modified Organization for Economic Cooperation and Development equivalence scale (Haagenars, de Vos, and Zaidi 1994;Marchetti et al. 2018).The auxiliary variables for the Tuscan municipalities come from the Population Census of Italy.

Model Fitting and Internal Validation
The explanatory variables used in this application are working status, years of education, gender, and age of the survey identified head of household.These have been informed by preliminary model investigations and findings from previous studies (Giusti et al. 2013).Model diagnostics identified some skewness and outliers in the distribution of the income outcome variable, as is common with such distributions, and this variable was therefore log transformed.No evidence of leverage was found.Table 4 presents the results from the log-linear linear model in EU-SILC based on (3).
Validation is an important step in any SAE study.SAE models can be validated internally in terms of the underlying model and externally against some known other external data of the target outcome variable.In terms of the internal validation, Figure 5 shows the fitted values versus the residuals as well as the Q-Q plots of the residuals from the log-linear model used to produce the MSE of the IPF estimates.These show good behavior with respect to the normality assumption.External validation is discussed below.

Estimating Municipality Income in Tuscany
This section discusses the results of the IPF SAE of the mean equivalized annual household disposable income across Tuscan municipalities along  with their uncertainty estimates.In terms of external validation of these IPF estimates, a frequent inherent challenge, as here, is the typical lack of any such existing small area data against which to validate (hence the motivation for the SAE).External validation of these IPF estimates is provided in two ways.Firstly, the spatial patterns in Figure 6 are in line with known geographical patterns of similar indicators across Tuscany seen in previously published research (Giusti et al. 2015;Moretti et al. 2019).Secondly, no identical income indicators exist at this municipality scale, and no direct survey estimates to municipality level are viable from the EU-SILC survey data.However, it is viable to produce direct survey estimates from the EU-SILC survey data to Tuscany's 10 larger provinces and to compare these with indirect IPF estimates also to province level.The Spearman's rank correlation between these two sets of estimates is.93, and this is statistically significant at below the 1 percent level, although acknowledging the limited sample size involved.
Table 5 presents summary statistics of the uncertainty of the direct survey estimates compared with the IPF estimates.In particular, it shows the root mean squared error (column 1) and, expressed as a percentage of the estimates, the relative root mean squared error (RRMSE%; column three) of the small area estimates.Given that the direct estimates are unbiased, the standard deviation (SD; column 2) and coefficient of variation (CV%; column 4) of the direct estimates enable a comparison of uncertainty of the direct estimates with the bootstrap MSE estimates of the IPF estimator.The coefficient of variation (CV) is a standardized measure of the dispersion of a distribution and is calculated as a ratio of its SD to its mean.In the present analyses, it is obtained as the ratio between the SD of the direct survey estimate and the direct survey estimate for every area.Since direct estimates are unbiased, their CVs represent measures of uncertainty (Rao and Molina  D'Agostino, and Salvati 2018;Statistics Canada 2009).
Table 5 shows that the IPF estimates are more reliable than the direct estimates across all points of the income distribution, as depicted by the lower values of the RRMSE IPF (column 1) and RRMSE IPF (column 3) compared to SD direct (column 2) and CV direct (column 4), respectively.The IPF small area estimates can also be considered reliable in absolute terms.Values of RRMSE below a threshold of 20 percent are often taken by statistical agencies as acceptable (Commonwealth Department of Social Services 2015), and almost all of this municipality distribution of small area estimates is below this level.The final column of Table 5 summarizes the gains in efficiency of the IPF estimates over the direct survey estimates by comparing the MSE for the IPF estimator with the variance of the unbiased direct estimator.Technically, these are calculated by .Equation ( 13) denotes a measure of gain in efficiency of using an estimator with higher precision compared to the direct estimator.We refer to Sa ¨rndal et al (1992) for measures of efficiency in survey statistics and to Moretti et al. (2018) and González-Manteiga et al. (2008a) for some examples of their use.Results for n d ¼ 0 (municipalities with zero sample size) and n d > 0 (municipalities with some households in the survey) have been separated because it is not possible to compute direct estimates (and as consequently the gains) for areas with zero sample size.Table 5 shows that the small area estimates from the indirect IPF estimator provide significant performance gains compared to the direct estimator at all points of the municipality income distribution.Figure 7 drills down to focus on the extent to which these performance gains vary according to the size of the municipality sample size in the EU-SILC survey, a key driver of the variance of the direct estimator and key limiter of the viability of using the direct estimator to produce reliable survey estimates for small areas.To aid comparison, Figure 7 is ordered from left to right by municipality survey sample size in the EU-SILC.RRMSE is shown for the IPF estimates, while CV is shown for the direct estimates.Figure 7 illustrates that the RRMSE of the indirect IPF estimates does not depend on the sample size in contrast to the direct estimator.As such, Figure 7 highlights that while performance gains from the IPF estimator are seen across the whole distribution, they increase as the municipality sample size in the survey decreases.Among those municipalities with the smallest survey sample sizes, there is a marked increase in the performance gains available from the IPF estimator compared to the direct survey estimator.Figure 7 is naturally only able to display comparative results for municipalities with nonzero sample sizes, given that direct estimates cannot be produced for small areas with zero sample size.Estimates for these municipalities do of course become viable with synthetic IPF estimator.For reference, Figure 7 looks identical when ordered by sampling fraction rather than sample size.

Discussion
The combination of high costs of survey data collection and increasing demands for ever more spatially detailed data from policy makers and scholars alike mean growing demands for SAE techniques.Spatial microsimulation approaches to SAE continue to be widely used across diverse domains including transport, health, physical activity, and income.However, their continuing inability to produce reliable estimates of uncertainty alongside their central point estimates remains a pressing limitation to their utility for  practitioners and scholars alike.This is understandable in part given that the estimation of MSE is difficult in a spatial microsimulation context since it cannot be estimated in a closed form and analytical approximations are highly challenging.
Widely discussed in the SAE literature is the importance of providing measures of uncertainty such as MSE or confidence intervals alongside the central point estimates in order to assess the reliability of the small area estimates (Pratesi 2016).This is particularly important where policy decisions are taken on the basis of the small area estimates since the consequences of real-world decision making without a clear sense of the uncertainty around the point estimates can be misleading and potentially harmful (Goedemé et al. 2013).Klevmarken (2002:264) argues that "[T]he credibility of [microsimulation models] with the research community as well as with users will in the long run depend on the application of sound principles of inference in the estimation, testing and validation of these models." This article provides a significant development in this context by presenting a novel parametric bootstrap approach for the estimation of uncertainty in spatial microsimulation SAE techniques.Importantly, the measure of uncertainty estimated is the MSE that contains both the variance and bias of the estimate.Simulation results demonstrate that under model assumptions, our proposed MSE estimator is relatively unbiased and displays good coverage properties against known true population values.The approach delivers substantial performance gains compared to the direct estimator across all portions of the distribution.In doing so, our approach enables researchers and policy makers alike to quantify both the performance gains potentially available through the use of spatial microsimulation approaches to SAE compared to direct survey estimates and to quantify the extent of uncertainty around those small area estimates.The simulation results show that those performance gains exist irrespective of the target small area sample size but are especially large at low sample sizes (below 10 in this simulation) and, naturally, when small areas have zero sample size such that direct estimates are nonviable but synthetic small area estimates are possible.Sensitivity tests confirm that these performance gains are maintained both across nonnormal Gumbel and Logistic distributions as well as across differing values of ICC, though coverage performance falls slightly at very low levels of the ICC in our simulation.A practical data application using EU-SILC data to municipality income across Tuscany is presented and validated in order to demonstrate the applicability and similar performance of the MSE bootstrap estimator in a real-world setting.
While the performance and sensitivity analyses confirm that our proposed approach evaluates well and marks a significant contribution to the field, it serves also to open up opportunities for further more advanced enquiry as a result.First, we focus here only on a linear model.Future work will need to explore the performance of other nonlinear models in this context.However, the bootstrap approach in these scenarios will follow the same steps as those proposed in our approach.Second, a clear next step is to extend the framework to different types of outcome variables beyond the scalar target variable assessed here.Third, in this study, the common case of nonnormal error terms is assessed, and our simulation study shows good performance of the MSE estimator in distributions with mild skew and heavy tails as well as in the normal case.However, future work could take into account more fully the implications of and potential responses to failures in model assumptions and of model failure.Our hope therefore is that this article's contributions will not only make a significant advance to research and policy practice in spatial microsimulation SAE, but that it will also stimulate further scholarly attention to these and other areas.

Figure 1 .
Figure 1.Precision and accuracy in estimates.

Figure 2 .
Figure 2. Empirical root mean squared error comparisons: Direct versus iterative proportional fitting estimator for the Normal case.

Figure 3 .
Figure 3. Comparisons of iterative proportional fitting estimates versus true means.
Figure 4. RB MSE boot Y IPF d , Normal, Gumbel, and Logistic cases.

Figure 5 .
Figure 5. Fitted values versus residuals (left) and Q-Q plot of the residuals from the model used to produce the mean squared error of the iterative proportional fitting estimates.
Figure 6 maps the mean IPF estimates across the 287 Tuscan municipalities.The map displays municipalities in four quartiles and shows a range in 2009 municipal income estimates from a low of just over 16,000 Euros per annum to a high of just under 20,000 Euros per annum.Municipalities located in the provinces of Massa Carrara (North West), Grosseto (South), and Prato and Pistoia (North) show the lowest estimated municipal income levels.On the contrary, municipalities around Florence, Arezzo, Pisa, and Livorno show the highest estimated municipal income levels.

Figure 6 .
Figure 6.Iterative proportional fitting income estimates for Tuscan municipalities.

Figure 7 .
Figure 7. Performance comparison sorted by small area sample size.
r, hence s d ¼ s \ O d is the subsample from the small area d of size n d , n ¼ P D d¼1 n d , and s ¼ [ d s d .r d denotes the nonsampled units in small area d with N d À n d dimension.Here, the target parameter is the population mean Y d ¼ N À1 d P N d i¼1 y di of the variable Y for area d, with y di denoting the value of variable Y for ith unit from dth area.
Koch 2008).To sensitivity test this issue, the simulation study below explicitly tests the effect of differing ICCs on the MSE estimator.

Table 1 .
Point Estimates Comparisons and Relative Biases Across the Small Area, Median Values Where Intraclass Correlation ¼ 0.05.

Table 2 .
Performance Measures of the Bootstrap MSE Estimates.

Table 3 .
Performance Measures of Our Bootstrap MSE Estimator at Varying Levels of Intraclass Correlation in the Normal Distribution.

Table 5 .
Summary Statistics of the Performance Gains from the Synthetic IPF Estimator Compared to the Direct Estimator for Small Area Income Estimates Across Tuscan Municipalities.