Top Incomes and Inequality Measurement: A Comparative Analysis of Correction Methods Using the EU SILC Data

It is sometimes observed and frequently assumed that top incomes in household surveys worldwide are poorly measured and that this problem biases the measurement of income inequality. This paper tests this assumption and compares the performance of reweighting and replacing methods designed to correct inequality measures for top income biases generated by data issues such as unit or item nonresponse. Results for the European Union’s Statistics on Income and Living Conditions survey indicate that survey response probabilities are negatively associated with income and bias the measurement of inequality downward. Correcting for this bias with reweighting, the Gini coefficient for Europe is revised upwards by 3.7 percentage points. Similar results are reached with replacing of top incomes using values from the Pareto distribution when the cut point for the analysis is below the 95th percentile. For higher cut points, results with replacing are inconsistent suggesting that popular parametric distributions do not mimic real data well at the very top of the income distribution.


Introduction
Thanks to the wide public attention that top incomes have received in the aftermath of the global financial crisis, it is now acknowledged that top incomes have grown disproportionally faster than other incomes in industrialized countries over the past several decades. The fact that these top incomes are difficult to capture in household surveys potentially leads to biases in the estimation of income inequality related to the representation and precision of reported top incomes, even though the direction of the bias is not a priori clear (Deaton 2005:11). These range from issues related to sampling, to issues related to data collection, data preparation or data analysis. The European Union Survey of Income and Living Conditions, for example, suffers from data issues such as under-representation of the highest incomes (Bartels and Metzing 2017; Törmälehto 2017). Most countries in Europe suffer from very high non-response rates reaching up to 50 percent of the sample. Income measurement issues including surveying, interview methods and postsurvey treatment also explain differences in inequality measurements across data sources (Frick and Krell 2010).
Two types of in-survey methods have been proposed to address the question of correcting inequality in the presence of top incomes biases while relying on survey microdata only. The first method, which we call reweighting, attempts to correct the sampling weights of existing observations using information on unit or item nonresponse rates across demographic cells such as geographical areas (Mistiaen and Ravallion 2003;Korinek et al. 2006 and. The approach exploits the relationship between response rates and shapes of income distributions across national regions to estimate the gradient of households' response probability by income level. It then uses the estimated response probabilities to reweight the observed incomes by the mass of nonresponding households in order to correct the measure of inequality. The second method, which we call replacing attempts to replace top income observations with observations generated from known theoretical distributions. This method can be used to correct for issues such as top coding, trimming or censoring but can also mitigate the problem of unit or item nonresponses if these nonresponses are concentrated among top incomes (Cowell and Victoria-Feser 2007;Jenkins et al. 2011). Several distributions have been suggested as candidates, including Pareto type I or type II, or generalized beta. 1 Hlasny and Verme (2018) have combined the reweighting and replacing methods, and studied the contribution of each method to the composite correction of an inequality index.
It is evident that both the reweighting and replacing methods have their advantages and disadvantages, as the information available within surveys has its limits even if used creatively to correct for top income problems. Proper reweighting and replacing depend on the appropriateness of parametric assumptions imposed on a particular national distribution of incomes at hand. Using alternative methods based on out of survey information such as tax records or national accounts data to inform the measurement of top incomes has its own measurement problems. Good tax or macro data are only available in a few countries and data may not be comparable across countries, whereas household survey data of reasonable quality are now available in most countries worldwide. This paper compares the reweighting and replacing methods using the European Union's Statistics on Income and Living Conditions (SILC) survey data, taking into account heterogeneity of income distributions, differences in sampling designs and definitions of nonresponse rates across EU member states. We find survey non-response probabilities to be negatively and significantly associated with income indicating that measures of inequality are downward biased. Correcting for this bias with reweighting, the Gini coefficient for Europe is revised upwards by 3.7 percentage points. Similar results are reached with replacing of top incomes using values from the Pareto distribution when the cut point for replacing is set below the 95th percentile. For higher cut points, results with replacing are inconsistent suggesting that popular parametric distributions do not mimic well real data at the very top of the income distribution.
The paper is organized as follows. The next section discusses measurement issues related to top incomes.
The following section outlines the main methods used to correct for top income biases related to unit nonresponse. Section four describes the data. Section five presents main results and section six concludes.

Materials and Methods
Problems related to top-income data may be due to sample design, data collection, data preparation or data analysis. We introduce these four typologies of errors in turn clarifying the type of error we address in this paper.
Sample design issues emerge when the sampling is designed in such a way that top incomes cannot be captured by design. This can occur, for example, when the sampling is done poorly or when the population census is old or the master sample has not been updated to capture newly constructed wealthy areas. If detected, some of these issues can be corrected post-survey by reweighting the sample, but either detecting or correcting these problems post-survey is not simple. It is important to note here that we should not expect exceptionally high incomes to be captured in household sample surveys. Billionaires are a very rare characteristic in any population. There are less than 3,000 people worldwide with this characteristic and most countries have only one or two billionaires at the most. If one wishes to study billionaires, sample surveys are not the right instrument. It would also be unwise to add billionaires in survey income statistics partly because they are billionaires in wealth, not income, and partly because most of their wealth is generated globally rather than in a particular country. Including billionaires in income statistics would simply bias survey population statistics. Therefore, when we consider the very top income earners in this paper we are considering millionaires in wealth whose income is counted in the hundreds of thousands euros annually. This is the class of people we want properly represented in household sample surveys at the top of the distribution.
Data collection issues mostly arise from respondents' or interviewers' non-compliance to survey instructions and may result in unit nonresponse, item nonresponse, item underreporting or generic measurement errors: Unit nonresponse. Unit nonresponse refers to households that were selected into the sample but did not participate in the survey. The reasons for non-participation can be many such as a change of address or noninterest on the part of the household. Interviewers generally have lists of addresses that can be used to replace the missing household but this practice is not always sufficient to complete the survey with the full expected sample. Most of the available household survey data suffer from unit nonresponse. 2 In some surveys, the reason for nonresponse is recorded but in others it is not. Unit nonresponse bias results if nonresponse is not random but systematically driven by specific factors. This paper will address unit nonresponse issues using reweighting.
Item nonresponse. Item nonresponse occurs when households participating in the survey do not reply to an item of interest (income or expenditure in our case). Item nonresponse biases results if it is non-random and related to specific factors. Nonresponse may be related to households' characteristics such as wealth or education, and this may bias statistics constructed with income or expenditure variables. As compared to unit nonresponse, it is possible to correct for item nonresponse using information on the reasons for nonresponse (when available) or by means of imputation using household and individual socio-economic characteristics to predict income. The reweighting method proposed in this paper also corrects for item non response.
Item underreporting. Consistent underreporting of variables on the part of respondents can lead to poor estimates of inequality. For example, if the degree of underreporting rises with income, the measurement of inequality could be affected. Even if underreporting applies equally across respondents, the measurement of inequality may change if the income inequality measure used is not scale invariant. Over-reporting is also possible although extremely rare with income and expenditure data, particularly at the top end of the distribution. The replacing method used in this paper helps to correct for item underreporting.
Generic measurement errors. Any variable including income or expenditure can be subject to measurement error. This error is typically expected to be random, distributed normally and with zero mean. For example, extreme observations in an income distribution can result from data input errors, but if they are very large they bias sample statistics significantly. Statistical agencies are usually quite thorough on this issue and clear data of errors before providing the data to researchers. This issue will not be treated in this paper explicitly but these errors are implicitly treated when replacing observations. Data preparation issues are mostly a consequence of statistical agencies' compliance with rules and regulations governing data confidentiality and data use, and may result in top coding, sample trimming, or the provision of limited subsamples to researchers.
Topcoding. Top coding is the practice adopted by some statistical agencies such as the US Census Bureau to modify intentionally the values of some variables to prevent identification of households or individuals.
It can take various forms, from replacing values above a certain threshold with means or medians of top cells to swapping incomes across top observations. In some cases and for research purposes, statistical agencies provide restricted access to the original values. But in most cases researchers are left with the problem of having to correct sample statistics for top coding. In this paper, we use EU-SILC data which are not subject to top coding on the part of Eurostat, although it is possible that some countries apply some form of topcoding to their data before transmitting these data to Eurostat. Replacing corrects for topcoding but only for the segment of data replaced whereas reweighting is unlikely to correct for topcoding.
Trimming. Trimming is the practice of cutting off some observations from the sample. This may be done for confidentiality reasons or for observations that appear unreliable. Researchers may not be informed whether statistical agencies have trimmed data, why trimming was performed, or both. A related issue is that of trimming through sampling weights. Statistical agencies sometimes trim sampling weights to bring them within a narrow range of values or to limit their influence if their variable values may have been mismeasured. The overarching objective is to control the influence of units that are rare in the sampling frame. Trimming observations or weights biases statistical measurement and should be corrected for.
Trimming is similar to unit or item non-response in that we are missing income observations. Reweighting can help to address this issue if trimmed income observations come from within the support in the observed sample.
Provision of subsamples. Some statistical agencies cannot provide the entire data sets to researchers for confidentiality or national-security reasons or simply to prevent others from replicating official statistics.
In many countries, statistical agencies provide 20% to 50% of their samples to researchers. These subsamples are usually extracted randomly so that statistics produced from these subsamples may be reasonably accurate. As we know from sampling theory, random extraction is the best option for extracting a subsample in the absence of any information on the underlying population. However, only one subsample is typically extracted from the full sample and given to researchers and this implies that a particularly "unlucky" random extraction can potentially provide skewed estimates of the statistics of interest. Hlasny and Verme (2018) have tested the margins of error in inequality measurement that can arise from the provision of subsamples instead of full samples and found significant margins of error. This issue is not treated in this paper because EU-SILC data are provided in full.
Data analysis issues may arise from an inadvertently wrong choice of statistical estimators on the part of researchers. Some estimators are more sensitive than others to the issues listed above so that one choice of estimator may lead to greater errors than others. For example, Cowell and Victoria-Feser (1996) have found that the Gini index is more robust to contamination of extreme values than two members of the generalized entropy family, a finding later confirmed by Cowell and Flachaire (2007). Based on these findings, we will focus on the Gini index and leave the discussion of alternative inequality estimators aside. Also important to note is that many researchers routinely trim outliers or problematic observations or apply top coding with little consideration of the implications for the measurement of inequality.

Reweighting
Unlike the case of item nonresponse, unit nonresponse cannot be dealt with by inferring households' unreported income from their other reported characteristics, because we don't observe any information for the non-responding households. In an effort to address this problem, Atkinson and Micklewright (1983) used information on nonresponse rates across regions to uniformly 'gross up' the mass of respondents in a region by the regional nonresponse rate. This is the approach taken by several national statistical agencies in adjusting sampling weights for regional unit nonresponse. This approach is inadequate, as it accounts only for inter-regional differences in nonresponse rates, and not for systematic differences in response probability across units within individual regions. Mistiaen and Ravallion (2003), and Korinek et al. (2006 and proposed a probabilistic model that uses information on nonresponse rates across geographic regions as well as information about the distribution within regions. They estimated the response probability of each household, and used the inverse of this estimate to adjust each household's weight. Each household's weight is thus 'grossed up' non-uniformly to match the mass of all respondents to the size of the underlying population.
The central tenet of the method is that the probability of a household i in a region j to respond to the survey, P ij , is a deterministic function of its arguments. Logistic functional form is used for its simplicity and its robustness properties: ( , ) = ( , ) 1 + ( , ) , Here g(x ij ,θ) is a stable function of x ij , the observable demographic characteristics of responding households that are used in estimations, and of θ, the corresponding vector of parameters. Variable-specific subscripts are omitted for conciseness. g(x ij ,θ) is assumed to be twice continuously differentiable. Equation 1 thus imposes several restrictions on the modeled behavioral relationship between households' characteristics x ij and their response probability: the relationship is deterministic and dictated by the logistic functional form and the functional form of g(x ij ,θ), differentiable at all levels of x ij , and identical across all households and regions. These restrictions are strong, but several facts help to justify them. One, the logistic function is well-accepted as a robust form to model probabilistic relations. Two, Korinek et al. (2006Korinek et al. ( , 2007, and Hlasny and Verme (2018) have evaluated alternative forms of g(x ij ,θ) including non-monotonic functions on US and Egyptian data, and have concluded that some of the most parsimonious functions provide very good fit, compared to both uncorrected income distributions and compared to external information on the true degree of inequality in those countries. Three, nonlinear forms of P(x ij ,θ) and g(x ij ,θ) allow for response differences between poorer and richer households in a realistic way. Four, a comparative study of US, EU and Egyptian data led to similar estimation results across countries, suggesting that the behavioral tendencies exhibit a high degree of consistency across regions (Hlasny and Verme 2015). Five, supplementing g(x ij ,θ) with indicators for subsets of regions helps to attenuate any systematic behavioral differences across parts of the country.
The number of households in each region ( ̂) is imputed as the sum of inverted estimated response probabilities of responding households in the region (̂) where the summation is over all N j responding households.
The parameters θ can be estimated by fitting the estimated and actual number of households in each region using the generalized method of moments estimator: where m j is the number of households in region j according to sample design, and w j is a region-specific analytical weight proportional to m j . 3 The asymptotic variance of ̂ can be estimated as the ratio of the model objective value (the weighted sum of squared region-level residuals), and the squared partial derivative of this objective value with respect to ̂ (equal to − ∑ −1 ∑ (⁄ ) under the assumed logistic functional form), both weighted by region-specific analytical weights w j (equations 11-14 in Korinek et al. 2007).
Under the assumptions of random sampling within and across regions, representativeness of the sample for the underlying population in each region, and stable functional form of g(x ij ,θ) for all households and all regions, the estimator ̂ is consistent for the true θ. Estimated values of ̂ that are significantly different from zero would serve as an indication of a systematic relationship between household demographics and household response probability, and of a nonresponse bias in the observed distribution of the demographic variable. In that case, we could reweight observations using the inverted estimated household response probabilities to correct for the bias.
Applying the model in equations 1-3 involves making several decisions regarding the delineation of regions, and choosing parametric forms for the functions P(x ij ,θ) and g(x ij ,θ). The choice of regional delineation involves a trade-off between the number of j data points for the model loss function (equation Properties of the data at hand thus call for different degrees of data aggregation, but there is presently little guidance for arbitrary national surveys. For the United States CPS, Korinek et al. (2006Korinek et al. ( , 2007 used statelevel aggregation, because geographic identifiers are consistently reported only at that level whereas county or metropolitan statistical area identifiers are missing for some responding as well as non-responding 3 An illustration is in order. Suppose there are two income groups residing in two national regions. Region 1 has a higher share of the richer income group, and correspondingly a higher unit nonresponse rate, as the richer households are less likely to participate. As a result, mean income and income inequality index may or may not differ across the two regions. To correct the mean incomes and inequality indexes in each region as well as nationally, we wish to give more weight to each richer household until the sum of weights equals the underlying regional population, because behind each responding rich household there are more non-responding rich households. Equation 2 'blows up' the weight of each responding household systematically, under the household-level behavioral rules specified in equation 1, to fit the joint weighted mass of the responding households to the underlying regional population (equation 3). In one region the weighted mass of the responding households may exceed the underlying population, while in the other region it may fall short (because of the restrictions imposed in equation 1), but the nationwide sum of the weighted masses equals the underlying national population.
households. Hlasny and Verme (2017)  households' response probabilities, logarithmic specification of g(x ij ,θ), and country indicators are used in g(x ij ,θ). On the margins, we will report how the addition of regional indicators affects the correction for the unit nonresponse bias. 4 For the covariates in x ij , Korinek et al. (2006Korinek et al. ( , 2007  Finally worth noting, SILC surveys already provide a limited correction for unit nonresponse through sampling weights. This method accounts for differences in response rates across regions but not for systematic differences across demographic groups within regions. Unfortunately, these sampling weights cannot be decomposed into weights for unit nonresponse and weights for other issues with unit representativeness. We could either double-correct for unit nonresponse by using the available sampling weights, or ignore other sample representativeness issues by not using the weights. In the United States CPS (Korinek et al. 2006(Korinek et al. , 2007Hlasny and Verme 2017) and the Egyptian HIECS (Hlasny and Verme 2018), the correction for nonresponse (through ̂− 1 ) affected inequality estimates substantially more than the corrections for other sample representativeness issues (through sampling weights), and so the nonresponse correction weights should be used with or without the survey sampling weights. These findings may not apply to surveys with less prevalent or less systematic nonresponses, and with graver sampling design issues. In the case of the SILC, the great heterogeneity in sample representativeness across EU member states, and the modest role of nonresponse correction in the available sampling weights are thought to favor the usage of the nonresponse correction weights (̂− 1 ) in tandem with the sampling weights. To accommodate all these options, alternative estimates of inequality are produced: on uncorrected data, data corrected with nonresponse-bias weights, data corrected with statistical agency weights, and data corrected with both sets of weights simultaneously. Estimates obtained without sampling weights are reported on the margins.

Replacing
An alternative approach to correct for poorly reported top incomes is to remove the top end of the  (2007) we correct the Gini coefficient by replacing highest-income observations with values drawn from a parametric distribution and combining the corresponding parametric inequality measure for these incomes with a non-parametric measure for lower incomes. The following sections discuss the mechanics of fitting the alternative parametric forms to the data at hand.

Pareto Distribution
For the past century, the Pareto distribution has been applied to various socio-economic phenomena and is thought to be suitable to model the distribution of upper incomes. The Pareto distribution can be described by the following cumulative density function: where is a fixed parameter called the Pareto coefficient and x is the variable of interest (income in our case) and L is the lowest value allowed for in the case of left censoring. The corresponding probability density function, allowing for right-censoring at H (separating potentially contaminated top income This density function is decreasing, tending to zero as x tends to infinity and has a mode equal to the minimum value, L. As income becomes larger, the number of observations declines following a law dictated by the constant parameter . Clearly, this distribution function does not suit perfectly all incomes under all income distributions, but it should be thought of as one alternative in modeling the right hand tail of a general income distribution. Parameter in equations 4-5 can be estimated using maximum likelihood from a right-truncated Pareto distribution, which also provides robust standard errors (Jenkins and Van Kerm 2007).
The Gini among the top k households can be derived from the expression of the corresponding Lorenz curve as follows with a standard error composed of a sampling error in the estimation of the Pareto distribution, and an error in the estimation of the Gini coefficient. The sampling standard error under the Pareto distribution is equal and Gastwirth 2006), where is the estimation sample size ( ≤ < ). The estimation error due to the potentially imprecise estimates of α is equal to , where is the standard error of ̂.

Generalized Beta Type 2 Distribution
Because the Pareto distribution is not representative of incomes in the middle or bottom of the income distribution, and because even among top incomes in some countries it may not follow the dispersion of incomes accurately, more flexible parametric distributions have been considered in recent literature. The four-parameter Generalized Beta distribution type 2 (GB2) has been suggested as providing better and more consistent fit for the distribution in various EU and US income surveys (Jenkins et al. 2011). It has the cumulative distribution function In this equation, I(p,q,y) is the regularized incomplete beta function, in which the last argument, y, is income normalized to be in the unit interval. Parameters a, b, p, and q are parameters estimable with their standard errors by maximum likelihood. Because the right tail may be contaminated by top income issues, righttruncation may be applied in the calculation of the GB2 density and model likelihood functions.
Moreover, like the Pareto distribution, the GB2 distribution itself may not approximate well the bottom-

Corrected Gini for EU States and EU-wide
Replacing of observed top incomes with fixed Pareto or GB2 fitted values has the problem that it does not account for parameter-estimation error and sampling error in the available sample. The resulting Gini carries an artificially low standard error. An and Little (2007) In the case of the EU SILC, we derive a corrected Gini coefficient across all EU member states as follows.
The cumulative parametric distributions in equations 4 and 7 are estimated at the level of each member state, and top incomes observed in each member state are replaced with random draws from the corresponding state-specific parametric distribution, as proposed by An and Little (2007), and Jenkins et al. (2011). Combining the observed lower-income values and the imputed top incomes across all EU member states allows us to derive a non-parametric estimate of the aggregate EU-wide Gini. Finally, repeating the exercise (bootstrapping) we obtain a quasi-nonparametric EU-wide Gini with its standard error (Reiter 2003).
As compared to the semi-parametric approach conventionally used in countries with homogeneous populations, this procedure allows the EU-wide distribution to include observations from both tails of statelevel distributions, and preserve the original number of observations for each country. It also allows modalities such as custom truncation of state samples used for parametric estimation and for inequality measurement. Estimating the parametric distributions at the level of EU member states and replacing top incomes according to the estimated country-specific distributions ensures that each state will have true lower incomes as well as replacement top incomes in the EU-wide data. 7 The random draws of incomes (x>H) from the parametric distributions (estimated on incomes between L and H) can be combined with true lower incomes (up to H) as well as with incomes across EU states. Such flexible estimation of the EUwide Gini and its standard error would generally not be possible with parametric estimates of the topincome Ginis.
Comparing the corrected state Ginis from the replacing analysis with the observed non-parametric Ginis would indicate whether the observed high incomes have been generated by Pareto-or GB2-like statistical processes, or whether the observed Gini is affected by top-income issues such as missing or nonrepresentative values. A quasi-nonparametric Gini that is lower than the nonparametric Gini can be interpreted as evidence that some top incomes are extreme compared to those predicted under the parametric distribution. A higher quasi-nonparametric Gini would indicate that the observed top incomes are distributed more narrowly than would be predicted parametrically, potentially implying underrepresentation, censoring, or measurement errors in relation to high-income units in the sample.
An important decision in applying the replacing method relates to the range of incomes that should be replaced as potentially nonrepresentative or contaminated. Cowell and Flachaire (2007)  In conclusion, the reweighting and replacing methods differ in several respects and address different types of problems related to top incomes. Reweighting considers the entire income support and reweights all observations throughout the support according to the probability of non-response estimated with real data.
Replacing keeps all observations up to the cut point unaltered while replacing all observations above the cut point with observations drawn from a theoretical distribution. Reweighting uses a probabilistic model drawing information from within and between regions' non-response rates to estimate the probability of non-response. Replacing does not make use of non-response rates or probabilistic models and uses instead estimated parameters from theoretical distributions to replace observations at the top. Reweighting is suited to address issues related to unit and item non-response and trimming whereas replacing is suited to address issues related to item underreporting, generic measurement errors, topcoding, and undue sensitivity of inequality measurement to the inclusion of rare extreme income observations.

Data
The methodologies outlined in the above section are evaluated using the set of national household surveys included in the 2011 round of the EU Statistics on Income and Living Conditions (SILC). This is a challenging set of surveys with different types of problems related to measurement issues that affect top incomes and inequality estimates. 8 The SILC surveys, coordinated by a Directorate-General of the European Commission, Eurostat, cover one of the most heterogeneous and largest common markets, including some of the world's most affluent nations as well as former socialist economies. All European Union member states as well as Iceland, Norway and Switzerland are included. The data include relatively large sample sizes for each state but suffer from very different nonresponse rates across member states, and from limited potential for regional disaggregation. Average national nonresponse rates range from 3.3 to 50.7 percent across member states in the 2011 wave, and from 3.5 to 48.1 percent in 2009 (Tables A1-A2 in supplementary materials). These features allow for a limited number of model specifications to be used to reevaluate inequality under various measurement issues. 9 SILC data are rarely used as one dataset for cross-country analysis in the same fashion as one would do cross-region analysis in a specific country. That is because SILC data are derived from country specific surveys which take different forms in different countries. However, in our case, they are an interesting set of data in that they are characterized by substantial diversity compared to other national surveys (Hlasny and Verme 2015). They are therefore a good benchmark to test how different top incomes correction methodologies perform under such diversity, provided that systematic cross-country differences are controlled for. 10 One challenge is that incomes exhibit substantial cross-nation inequality, but relatively less inequality within nations, as evidenced by the difference between state-specific and EU-wide Gini indexes (refer to tables A1-A2). In fact, decomposition of the EU-wide Gini reveals that 67 percent of inequality arises solely from income differences between EU member states, and only 4 percent arises solely from within-state inequality, while 29 percent is due to an overlap of the between and within state inequality (2009 SILC shows analogous results).
With little overlap between income distributions in the richest and the poorest member states, when the reweighting correction method is run at the level of states (rather than within-state regions), it would effectively adjust the mass of entire member states in the calculation of the Gini. The vast majority of households in rich states would be assigned higher weights, and the majority of households in poor states would be assigned lower weights. This suggests that the analysis performed at a more geographically disaggregated level is warranted. To that end, we have collected unit nonresponse rates for NUTS-1 regions, that is geographic divisions, provinces or states of EU member countries. 11 Refer to tables A2-A3 in supplementary materials. In what follows, we will primarily make use of the 2011 round of the SILC, and we will report on the 2009 round only on the margins. When not noted explicitly, the discussion refers to the 2011 round.
Household nonresponse rates (NRh) in SILC surveys are computed using Eurostat notation as: ℎ = 1 − ∑ 1( 120 = 11) ∑ 1( 120 ≠ ∅) − ∑ 1( 120 = 23) ⏟ Address contact rate where 1(•) is a binary indicator function, db120 is the record of contact at the address, db130 is the household questionnaire result and db135 is the household interview acceptance result. Addresses that could not be located or accessed (db120 ≤ 22) are accounted for in the address contact rate, while nonexisting, non-residential, non-occupied and non-principal residence addresses (db120 = 23) are omitted.
Rate of complete interviews accepted is the accepted interviews (i.e., at least one personal interview in household accepted) among all households completing, refusing to cooperate, temporarily absent, or unable to respond due to illness, incapacity, language or other problems. Finally, many of the EU statistical agencies combine survey and administrative information such as tax and social security records to estimate income (refer to individual chapters of Jäntti et al. 2013). This may result in a more accurate estimation of incomes as compared to countries that do not adopt this strategy. If this is the case, both the reweighting and replacing methods should show (correctly) a lower bias as for any survey with better quality data. However, these techniques vary across countries and can play a role when comparing estimated biases across countries. Considering the fact that the original survey instruments differ and that the income aggregates are not identical in their composition, estimations presented in this paper are not strictly comparable across countries. Moreover, the influence of each country in the overall estimation for the EU Gini is also affected by these factors. 12 There are two editions of the EU-SILC survey produced by Eurostat. The Production Data Base (PDB) includes all available variables for responding and nonresponding households, while a Users Data Base (UDB) excludes nonresponding units and variables that could potentially allow identification of households. Related to our analysis, the PDB includes variables DB120, DB130 and DB135, defining responding and non-responding households, DB060-DB062, identifying primary sampling units, and DB075, separating the traditional non-response rate (households interviewed for the first time) from the attrition rate (households from the 2 nd to the 4 th interview). Unfortunately, the PDB is not shared with users for confidentiality reasons, so in this study we rely on the UDB datasets.

Results
Reweighting Table 1 presents the benchmark results for the reweighting correction method described in equations 1-3.
Equivalized disposable income is used as the outcome variable whose inequality is being measured, as well as the main element of x ij (in logarithmic form). Binary indicators for European countries are also included as element of x ij in light of the high heterogeneity in incomes, inequalities and nonresponse rates across The main finding is that households' survey response probability is related negatively to disposable income.
The estimated coefficient on log income (̂2) is negative and significantly different from zero, an indication that unit nonresponse is related to incomes and is therefore expected to bias our measurement of inequality.
As a consequence, the corrected Ginis are consistently higher than the non-corrected Ginis. The unweighted corrected Gini coefficient is 48.34. This is higher than the uncorrected and unweighted Gini by 3.25 percentage points, statistically highly significant. Making use of the sampling weights provided by national statistical agencies does not affect these findings. The correction for unit nonresponse in this case amounts to 3.70 percentage points of the Gini. 14 To the extent that applying the statistical agency weights amounts to some double-correcting for nonresponse and these corrections interact with each other arbitrarily, we can estimate a quasi differencein-difference type of effect of weighting. The stand-alone correction for nonresponse is estimated at 3.60 percentage points of the Gini (48.34-45.10). The stand-alone correction for non-representative sampling is estimated at -6.19 percentage points of the Gini (38.91-45.10). Adding these effects to the uncorrected Gini, we conclude that the robust Gini is 42.15. This figure is slightly lower than the original estimate of 42.61, suggesting that the double-correction of nonresponse is responsible for a 0.46 percentage-point inflation of the Gini. In conclusion, reweighting is consistent in finding an upward correction of the Gini of between 3.25 and 3.70 percentage points.
[ Across the 29 EU member states (excluding the two outliers, and without accounting for states' population or sample sizes), the estimated Gini correction is strongly positively associated with states' mean income (correl. +0.541), mean nonresponse rate (correl. +0.219) and the count of regions used for sub-national disaggregation (correl. +0.488). 16 Finally, refer to the discussion on the survey instruments, income aggregates and combination with administrative data to understand other potential sources of cross-country differences in estimated biases.

Replacing
Next, we use a methodology first proposed by Cowell and Victoria-Feser (2007) to test the sensitivity of the Gini coefficients to extreme or non-representative observations on the right-hand side of the distribution.
We correct for the influence of potentially contaminated top incomes using an estimated Pareto or generalized beta distribution as discussed in the methodological section. The analysis is performed at the level of individual EU member states, so that the replaced income values would come from all states rather than just from a handful of the richest states. Table 3  Results are shown in Table 3. The table has  [ Table 3] Tables A4-A9 in

Discussion
This study has evaluated two methodsreweighting and replacing -for correcting top income biases generated by known data issues including unit and item nonresponse and more generally representativeness issues of top income observations. The joint use of two distinct statistical methods for correcting top incomes biases, sensitivity analysis of their technical specifications, and analysis of their performance on a challenging heterogeneous household survey were methodological contributions of this study.
Using the reweighting approach and the 2011 wave of the SILC, the paper finds a significant 3. This may be due to the limited flexibility offered by the one-parameter Pareto distribution.
Repeating the replacing exercise with the four-parameter GB2 distribution does not improve our findings.
Our estimates of inequality fall by 0.2-3.3 percentage points of the Europe-wide Gini, while the Ginis for individual member states are estimated very widely and often unreasonably low or high. We conclude that the popular 1-4 parameter distributions such as the Pareto and the GB2 distributions are not well suited to model the topmost incomes across a heterogeneous sample of distributions, and that alternative distributions should be sought to model the very top ends. The fact that these distributions were proposed and initially tested in the 20 th century combined with the sharp growth of incomes at the very top of the distribution in the 21 st century in Europe and elsewhere may contribute to explain this shortcoming.
Another problem with the replacing methods, similarly to the traditional treatments for item nonresponse, is that they rely on an assumption that other income observations are valid and accurate. Replacing methods assume away measurement issues below the cutoff point. At the same time, the parametric distributions proposed yield a wide range of empirical results (in tables 3 and 4), indicating that parameters calibrated with the lower parts of the income distributions do not offer insights of any accuracy about the very top.
In perspective of the findings from the reweighting and parametric replacing exercises, we also conclude that the systematic under-representation of top income households due to unit nonresponse is a more worrying problem than other potential contaminations of the top-income distribution for inequality measurement. Unit nonresponse leads to a systematic downward bias in the measurement of the Gini coefficient by 3-4 percentage points, while the balance of other top income biases remains unclear, and has been estimated in this study widely at between a -3 and a +4 percentage point adjustment to the Gini. The model is estimated on an unweighted sample, and the uncorrected or corrected weights are only applied in the calculation of the Ginis. Only incomes ≥1 are retained. Benchmark region is the Netherlands. Standard errors are in parentheses. Ginis and their bootstrap standard errors are multiplied by 100.  Notes: Pareto coefficients are estimated on non-contaminated income observations (sample size ; ≤ < ; H is income corresponding to the 100-h th percentile) using maximum likelihood, and are then used to impute values for the k top-income observations. Parametric replacement is done at the national level. Europe-wide Ginis and their standard errors are computed across all national quasi-nonparametric income distributions, and are bootstrapped. For clarity, Ginis and their standard errors are multiplied by 100. Sampling weights are adopted from Eurostat. i Right-truncation here is higher than in the models below. Any lower right-truncation point than this leads to overly large and erratic Gini estimates due to small national estimation samples (i.e., range of income quantiles on which Pareto distribution is fit) and comparatively large national prediction samples (i.e., quantiles for which Pareto estimates are drawn). Refer to table A5. Notes: GB2 coefficients are estimated on non-contaminated income observations (sample size ; ≤ < ; L is income corresponding to the 30 th percentile; H is income corresponding to the 100-h th percentile) using maximum likelihood. Quasi-nonparametric Ginis and their standard errors are bootstrap estimates, and are multiplied by 100. (Croatia was omitted from the SILC survey until the 2010 wave.) Incomes less than 1 are omitted. Mean incomes may not be representative of those for the entire states, as they omit non-responding households. For clarity of presentation, Ginis are multiplied by 100.        GB2 coefficients are estimated using maximum-likelihood methods on the SILC sample for each state, sample size N, left truncated at the 30 th percentile and right-truncated at the 95 th percentile (non-weighted data). Quasinonparametric Ginis are calculated by replacing top 5 percent of incomes with random draws from the GB2 distribution, and are bootstrapped (Reiter 2003). For clarity, Ginis and their standard errors are multiplied by 100. GB2 coefficients are estimated using maximum-likelihood methods on the SILC sample for each state, sample size N, left truncated at the 30 th percentile and right-truncated at the 95 th percentile (sampling-weighted data). Quasinonparametric Ginis are calculated by replacing top 5 percent of incomes with random draws from the GB2 distribution, and are bootstrapped (Reiter 2003). For clarity, Ginis and their standard errors are multiplied by 100. GB2 coefficients are estimated using maximum-likelihood methods on the SILC sample for each state, sample size N, left truncated at the 30 th percentile and right-truncated at the 90 th percentile (non-weighted data). Quasinonparametric Ginis are calculated by replacing top 10 percent of incomes with random draws from the GB2 distribution, and are bootstrapped (Reiter 2003). For clarity, Ginis and their standard errors are multiplied by 100. GB2 coefficients are estimated using maximum-likelihood methods on the SILC sample for each state, sample size N, left truncated at the 30 th percentile and right-truncated at the 90 th percentile (sampling-weighted data). Quasinonparametric Ginis are calculated by replacing top 10 percent of incomes with random draws from the GB2 distribution, and are bootstrapped (Reiter 2003). For clarity, Ginis and their standard errors are multiplied by 100.