Abstracts
Jean-François Beaumont*, Statistics
Canada
The Analysis of Survey Data Using the Bootstrap
The bootstrap is a convenient tool for estimating design variances
of finite population parameters or model parameters. It is typically
implemented by producing design bootstrap weights that are made available
to survey analysts. When analysts are interested in making inferences
about model parameters, two sources of variability are normally taken
into account: the model that generates data of the finite population
and the sampling design. When the overall sampling fraction is negligible,
the model variability can be ignored and standard bootstrap techniques
that account for the sampling design variability can be used (e.g.,
Rao and Wu, 1988; Rao, Wu and Yue, 1992). However, there are many
practical cases where the model variability cannot be ignored, as
evidenced by an empirical study. We show how to modify design bootstrap
weights in a simple way to account for the model variability. The
analyst may also be interested in testing hypotheses about model parameters.
This can be achieved by replicating a simple weighted model-based
test statistic using the bootstrap weights. Our approach can be viewed
as a bootstrap version of the Rao-Scott test (Rao-Scott, 1981). We
illustrate through a simulation study that both methods perform better
than the standard Wald or Bonferroni test.
David Bellhouse*, University of
Western Ontario
The Teaching of a First Course in Survey Sampling Motivated by Survey
Data Analysis
Survey sampling is often seen to be out of sync with many of the
statistics courses that students take over their undergraduate or
early graduate programs. Historically, most topics in sampling courses
were motivated by the production of estimates from surveys run by
government agencies. Compared to the time when I studied from William
Cochran's Sampling Techniques, the situation in survey sampling
has changed substantially, and for the better, with the current
use of Sharon Lohr's book Sampling: Design and Analysis. In one
sense, she has followed a traditional approach: a general discussion
of survey design, followed by estimation techniques for means, totals
and proportions and then some topics in survey data analysis late
in the book. This approach differs from many courses in statistics
taught today, for example regression and experimental design, where
the statistical theory is often motivated by problems in data analysis
constrained by the study design. Over the past three years I have
experimented with an approach to teaching a first course in survey
sampling by motivating survey estimation through survey data analysis.
Essentially, I asked the question: What if I begin the course with
some of the analysis topics that appear late in Lohr's book thus
bringing the course more in line with other statistics courses?
My talk will focus on the techniques that I used and the results
of this experiment.
Gauri S. Datta*, University of Georgia
Benchmarking Small Area Estimators
In this talk, we consider benchmarking issues in the context of
small area estimation. We find optimal estimators within the class
of benchmarked linear estimators under either external or internal
benchmark constraints. This extends existing results for both external
and internal benchmarking, and also provides some links between
the two. In addition, necessary and sufficient conditions for self-benchmarking
are found for an augmented model. Most of our results are found
using ideas of orthogonal projection. To illustrate the results
of this paper, we present an example using a model and data from
the Census Bureau's Small Area Income and Poverty Estimates (SAIPE)
program.
This is a joint work with W.R. Bell of U.S. Census Bureau, Washington,
D.C. 20233, U.S.A., and Malay Ghosh, Department of Statistics, University
of Florida, Gainesville, Florida 32611, U.S.A.
Patrick J. Farrell*, Carleton University,
Brenda MacGibbon*, Université du Québec à
Montréal, Gillian Bartlett, McGill University,
Thomas J. Tomberlin, Carleton University
The Estimating Function Jackknife Variance Estimator in a Marginal
Logistic Regression Model with Repeated Measures and a Complex Sample
Design: Its Small Sample Properties and an Application
One of the most important aspects of modeling binary data with repeated
measures under a complex sampling design is to obtain efficient
variance estimators in order to test for covariates, and to perform
overall goodness-of fit tests of the model to the data. Influenced
by the work of Rao (Rao 1998, Rao and Tausi 2004, Roberts et al.
2009), we use his adaptation of estimation functions and the estimating
function bootstrap to the marginal logistic model with repeated
measures and complex survey data in order to obtain estimating function
jackknife variance estimators. In particular, we conduct Monte Carlo
simulations in order to study the level and power of tests using
this estimator proposed by Rao. The method is illustrated on an
interesting data set based on questionnaires concerning the willingness
of patients to allow their e-health data to be used for research.
Robert E. Fay*, Westat
The Multiple Facets of Small Area Estimation: A Case Study
Prof. Rao's many contributions to the literature on small area estimation
are widely recognized and acknowledged. This talk selects just one
example from his work as an illustration. The Rao-Yu model and subsequent
variants provide methods for small area estimation that incorporate
time-series aspects. The models are suitable for a set of sample
observations for a characteristic observed at multiple time points.
Unlike some other proposals, the Rao-Yu model accommodates sample
observations correlated across time, as would be typical of panel
or longitudinal surveys. This talk describes an application and
some extensions of this model to the National Crime Victimization
Survey (NCVS) in the U.S. Until now, the almost exclusive focus
of the NCVS has been to produce national estimates of victimizations
by type of crime annually. The talk describes the attraction, but
also some of the challenges, of applying the Rao-Yu model to produce
annual state estimates of crime from the NCVS. The talk will note
how small area applications are typically multi-faceted, in the
sense that often elements of science, technology, engineering, and
mathematics (STEM) must be brought together for effective result.
Moshe Feder*, University of Southampton
State-Space Modelling of U.K. Labour Force Survey Rolling Quarterly
Wave Data
We propose a multivariate state space model for the U.K. Labour
Force Survey rolling quarterly wave data. The proposed model is
based upon a basic structural model with seasonality, an auto-regressive
survey error model and a mode effect (person-to-person vs. telephone
interviews). The proposed approach takes advantage of the temporal
structure of the data to improve estimates and to extract its unobserved
components. Two alternatives for modelling the seasonal component
will also be discussed. Finally, we'll present some simulation results.
Wayne A. Fuller*, Iowa State University
Small Area Prediction: Some Comments
Small area prediction using complex sample survey data is reviewed,
emphasizing those aspects of estimation impacted by the survey design.
Variance models, nonlinear models, benchmarking, and parameter estimation
are discussed.
Malay Ghosh*, University of Florida,
Rebecca Steorts University of Florida
Two-Stage Bayesian Benchmarking as Applied to Small Area Estimation
The paper considers two-stage benchmarking. We consider a single
weighted squared error loss that combines the loss at the domain-level
and the area-level. We benchmark the weighted means at each level
or both the weighted means and the weighted variability, the latter
only at the domain-level. We provide also multivariate versions
of these results. Finally, we illustrate our methods using a study
from the National Health Interview Survey (NHIS) in the year 2000.
The goal was to estimate the proportion of people without health
insurance for many domains of the Asian subpopulation.
David Haziza*, Université de
Montréal, Audrey Béliveau, Simon Fraser University,
Jean-François Beaumont, Statistics Canada
Simplified Variance Estimation in Two-Phase Sampling
Two-phase sampling is often used in practice when the sampling frame
contains little or no information. In two-phase sampling, the total
variance of an estimator can be expressed as the sum of two terms:
the first-phase variance and the second-phase variance. Estimating
the total variance can be achieved by estimating each term separately.
However, the resulting variance estimator may not be easy to compute
in practice because it requires the second-order inclusion probabilities
for the second-phase sampling design, which may not be tractable.
Also, it requires specialized software for variance estimation for
two-phase sampling designs. In this presentation, we consider a
simplified variance estimator that does not depend on the second-order
inclusion probabilities of the second-phase sampling design and
that can be computed using software designed for variance estimation
in single-phase sampling designs. The simplified variance estimator
is design-biased, in general. We present strategies under which
the bias is small, where a strategy consists of the choice of a
sampling design and a point estimator. Results of a limited simulation
study that investigates the performance of the proposed simplified
estimator in terms of relative bias will be shown.
M.A. Hidiroglou*, Statistics Canada,
V. Estevao, Statistics Canada, Y. You, Statistics
Canada
Unit Level Small Area Estimation for Business Survey Data
Direct estimators parameters for small areas of interest use regression
estimators relying on well-correlated auxiliary data. As the domains
get smaller, such estimators will become unreliable, and one option
is to use small-area procedures. The one that we will illustrate
in this presentation extends the unit level procedure originally
proposed by Battese, Harter, and Fuller (1988) to include the following
additional requirements. Firstly, the parameter that we estimate
is a weighted mean of observations; secondly, the errors associated
with the nested error regression model are heteroskedastic; and
lastly, the survey weights are included in the estimation.
Three point estimators and their associated estimated mean squared
errors are given in this presentation. The first one does not use
the survey weights (EBLUP) but the last two (Pseudo-EBLUP) do make
use of the survey weights. One of the Pseudo-EBLUP estimators was
developed originally by Rubin-Bleuer et al. (2007). These three
estimators are all implemented in the small-area prototype being
developed at Statistics Canada (Estevao, Hidiroglou and You 2012).
We illustrate the application of the proposed models and methods
to real business survey data.
Jiming Jiang*, University of California,
Davis
The EMAF and E-MS Algorithms: Model Selection with Incomplete
Data
In this talk, I will present two computer-intensive strategies of
model selection with incomplete data that we recently developed.
The E-M algorithm is well-known for parameter estimation in the
presence of missing data. On the other hand, model selection, as
another key component of model identification, may also be viewed
as parameter estimation, with the parameter being [the identification
(ID) number of] the model and the parameter space being the (ID
numbers of the) model space. From this point of view, it is intuitive
to consider extension of the E-M to model selection in the presence
of missing data. Our first strategy, called the EMAF algorithm,
is motivated by a recently developed procedure for model selection,
known as the adaptive fence (AF), that is incorporated with the
E-M algorithm to model in the missing data situation. Our second
strategy, called the E-MS algorithm, is a more direct extension
of the E-M algorithm to model selection problems with missing data.
This work is joint with Thuan Nguyen of the Oregon Health and Science
University and J. Sunil Rao of the University of Miami.
Graham Kalton*, Westat
Design and Analysis of Venue-Based Samples
Venue-based sampling-also known as location sampling, center sampling,
or intercept sampling-samples a population by collecting data from
individuals contacted at a sample of locations and time periods.
This method, which is often used for sampling hard-to-reach populations,
involves multiple frames. This paper describes the 2008 venue-based
survey of men who have sex with men (MSM) in the U.S. Centers for
Disease Control and Prevention's National HIV Behavioral Surveillance
(NHBS) system. The venues were mostly places such as bars, clubs,
and social organizations where MSM congregate and the time periods
were parts of days when the venues were open, sampled on a monthly
basis over a six month period. The paper discusses the calculation
of survey weights for such samples, using the 2008 NHBS survey as
an example. Implications for the design of venue-based samples are
also discussed.
Jae-Kwang Kim*, Iowa State University,
Sixia Chen, Iowa State University
Two-Phase Sampling Approach for Propensity Score Estimation in
Voluntary Samples
Voluntary sampling is a non-probability sampling design whose sample
inclusion probabilities are unknown. When the sample inclusion probability
depends on the study variables being observed, the popular approach
of the propensity score adjustment using the auxiliary information
available for the population may lead to biased estimation. In this
paper, we propose a novel application of the two-phase sampling
idea to estimate the parameters in the propensity model. To apply
the proposed method, we apply an experiment of making a second attempt
of the data collection to the original sample and obtain a subset
of the original sample, called second phase sample. Under this two-phase
sampling experiment, we can estimate the parameters in the propensity
score model using the calibration and the propensity score adjustment
can be used to estimate the population parameters from the original
voluntary sample. Once the propensity scores are estimated, we can
incorporate additional auxiliary variables from the reference distribution
by a calibration method. Results from some simulation studies are
also presented.
Phillip Kott*, Research Triangle Institute
One Step or Two? Calibration Weighting when Sampling from a Complete
List Frame
When a random sample drawn from a complete list frame suffers from
unit nonresponse, calibration weighting can be used to remove nonresponse
bias under either an assumed response or an assumed prediction model.
Not only can this provide double protection against nonresponse
bias, it can also decrease variance. By employing a simple trick
one can simultaneously estimate the variance under the assumed prediction
model and the mean squared error under the combination of an assumed
response model and the probability-sampling mechanism in a relatively
simple manner. Unfortunately, there is a practical limitation on
what response model can be assumed when calibrating in a single
step. In particular, the response function cannot always be logistic.
This limitation does not hinder calibration weighting when performed
in two steps: one to remove the response bias and one to decrease
variance. There are efficiency advantages from using the two-step
approach as well. Simultaneous linearized mean-squared-error estimation,
although still possible, is not as straightforward.
Snigdhansu Chatterjee, University
of Minnesota, Partha Lahiri*, University of Maryland, College
Park
Parametric Bootstrap Methods in Small Area Estimation Problems
In small area estimation, empirical best prediction (EBP) methods
are routinely used in combining information from various data sources.
One major challenge for this approach is the estimation of an accurate
mean squared error (MSE) of EBP that captures all sources of variations.
But, the basic requirements of second-order unbiasedness and non-negativity
of the MSE estimator of an EBP have led to different complex analytical
adjustments in different MSE estimation techniques. We suggest a
parametric bootstrap method to replace laborious analytical calculations
by computer-oriented simple techniques, without sacrificing the
basic requirements in an MSE estimator. The method works for a general
class of mixed models and different techniques of parameter estimation.
Isabel Molina*, Universidad Carlos III
de Madrid, Balgobin Nandram, Worcester Polytechnic Institute,
J.N.K. Rao, Carleton University
Hierarchical Bayes Estimation of Poverty Indicators in Small
Areas
A new methodology is proposed for estimation of general non linear
parameters in small areas, based on the Hierarchical Bayes approach.
Only non-informative priors are considered and, with these priors,
Markov chain Monte Carlo procedures and the convergence problems
therein are avoided. At the same time, less computational effort
is required. Results are compared in simulations with those obtained
using the empirical Bayes methodology under a frequentist point
of view. This methodology is illustrated through the problem of
estimation of poverty indicators as particular cases of non linear
parameters. Finally, an application to poverty mapping in Spanish
provinces by gender is carried out.
Esther López-Vizcaíno, María
José Lombardía, Domingo Morales*, Universidad
Miguel Hernandez de Elche
Small Area Estimation of Labour Force Indicators under Multinomial
Mixed Models with Time and Area Effects
The aim of this paper is the estimation of small area labour force
indicators like totals of employed and unemployed people and unemployment
rates. Small area estimators of these quantities are derived from
a multinomial logit mixed model with time and area random effects.
Mean squared errors are used to measure the accuracy of the proposed
estimators and they are estimated by explicit formulas and bootstrap
methods. The behavior of the introduced estimators is empirically
investigated in simulation experiments. The introduced methodology
is applied to real data from the Spanish Labour Force Survey of
Galicia.
Ralf T. Münnich*, Ekkehard
W. Sachs, Matthias Wagner University of Trier, Germany
Calibration Benchmarking for Small Area Estimates: An Application
to the German Census 2011
In the 2010/11 Census round in Europe, several countries introduced
new methodologies. Countries like Germany and Switzerland decided
to apply a register-assisted method. In addition to using the population
register data, a sample is drawn in order to allow for estimating
values which are not available in the register. Generally, these
estimates have to be considered on different (hierarchical) aggregation
levels. Additionally, several cross classifications in different
tables with overlapping marginal distributions are of interest.
On higher aggregation levels classical design-based methods may
be preferable whereas in lower aggregation levels small area techniques
are more appropriate. The variety of aggregation levels in connection
with different estimation methods may then lead to severe coherence
problems. The present paper focuses on a specialized calibration
problem which takes into account the different kinds of estimates
for areas and domains while using penalization procedures. The procedure
allows understanding possible problematic constraints in order to
enable the end-user to relax certain boundary conditions to achieve
better overall results. An application to the German Census 2011
will be given.
Balgobin Nandram*, Worcester Polytechnic
Institute
A Bayesian Analysis of a Two-Fold Small Area Model for Binary
Data
We construct a hierarchical Bayesian model for binary data, obtained
from a number of small areas, and we use it to make inference for
the finite population proportion of each area. Within each area
there is a two-stage cluster sampling design and a two-fold model
incorporates both an intracluster correlation (between two units
in the same cluster) and an intercluster correlation (between two
units in different clusters). The intracluster correlation is important
because it is used to accommodate the increased variability due
to the clustering effect and we study it in detail. Using two goodness
of fit Bayesian procedures, we compare our two-fold model with a
standard one-fold model which does not include the intracluster
correlation. Although the Gibbs sampler is the natural way to fit
the two-fold model, we show that random samples can be used, thereby
providing a faster and more efficient algorithm. We describe an
example on the Third International Mathematics and Science Study
and a simulation study to compare the two models.While the one-fold
model gives estimates of the proportions with smaller posterior
standard deviations, our goodness of fit procedures show that the
two-fold model is to be preferred and the simulation study shows
that the two-fold model has much better frequentist properties than
the one-fold model.
Danny Pfeffermann*Hebrew University
of Jerusalem and Southampton Statistical Sciences Research Institute
Model Selection and Diagnostics for Model-Based Small Area Estimation
(Joint paper with Dr. Natalie Shlomo and Mr. Yahia El-Horbaty)
Model selection and diagnostics is one of the difficult aspects
of model-based small area estimation (SAE) because the models usually
contain random effects at one or more levels of the model, which
are not observable. Careful model testing is required under both
the frequentist approach and the Bayesian approach. It is important
to emphasize also that misspecification of the distribution of the
random effects may affect the specification of the functional (fixed)
part of the model and conversely, misspecification of the functional
relationship may affect the specification of the distribution of
the random effects.
In the first part of my talk I shall review recent developments
in model checking for model-based SAE under the Bayesian and frequentist
approaches and propose a couple of new model testing procedures.
A common feature of articles studying model specification is that
they usually only report the performance of the newly developed
methods. In the second part of my talk I shall compare empirically
several frequentist procedures of model testing in terms of the
computations involved and the powers achieved in rejecting mis-specified
models.
Shu Jing Gu, University of Alberta, N.G.N.
Prasad*, University of Alberta
Estimation of Median for Successive Sampling
Survey practitioners widely use successive sampling methods to estimate
characteristic changes over time. The main concern in such methods
is the nonresponse, due to the fact that the same individuals are
sampled repeatedly to observe responses. To overcome this problem,
partial replacement sampling (rotation sampling) schemes are adopted.
However, most of the methods available in the literature are focused
towards estimation of linear parameters such as a mean or a total.
Recently, some attempts have been made to the estimation of population
quantiles under repeated sampling schemes. However, these methods
are restricted to simple random sampling and also require estimation
of the density function of the underlying characteristics. The present
work that we are presenting uses estimating approach to obtain estimates
for population quantiles under repeated sampling when units are
selected under unequal probability sampling scheme. Performance
of the proposed method to estimate a finite population median is
examined through a simulation study as well as using real data sets.
J. Sunil Rao*, University of Miami
Fence Methods for Mixed Model Selection
This talk reviews a body of work related to some ideas for mixed
model selection. It's meant to provide a bridge to some new work
that will also be presented at the symposium on model selection
with missing data.
Many model search strategies involve trading off model fit with
model complexity in a penalized goodness of fit measure. Asymptotic
properties for these types of procedures in settings like linear
regression and ARMA time series have been studied, but these do
not naturally extend to non-standard situations such as mixed effects
models. I will detail a class of strategies known as fence methods
which can be used quite generally including for linear and generalized
linear mixed model selection. The idea involves a procedure to isolate
a subgroup of what are known as correct models (of which the optimal
model is a member). This is accomplished by constructing a statistical
fence, or barrier, to carefully eliminate incorrect models. Once
the fence is constructed, the optimal model is selected from amongst
those within the fence according to a criterion which can be made
flexible. A variety of fence methods can be constructed, based on
the same principle but applied to different situations, including
clustered and non-clustered data, linear or generalized linear mixed
models, and Gaussian or non-Gaussian random effects. I will illustrate
some via simulations and real data analyses. In addition, I will
also show how we used an adaptive version of the fence method for
fitting non-parametric small area estimation models and quite differently,
how we developed an invisible fence for gene set analysis in genomic
problems.
This is joint work with Jiming Jiang of UC-Davis and Thuan Nguyen
of Oregon Health and Science University.
Louis-Paul Rivest*, Université
Laval
Copula-Based Small Area Predictions for Unit Level Models
Small area predictions are typically constructed with multivariate
normal models. We suggest a decomposition of the standard normal
unit level small area model in terms of a normal copula for the
dependency within a small area and marginal normal distributions
for the unit observations. It turns out that the copula is a key
ingredient for the construction of small area predictions. This
presentation introduces copula based small area predictions. They
are semi-parametric in the sense that they do not make any assumption
on the marginal distribution of the data. They are valid as long
as the assumed copula captures the residual dependency between the
unit observations within a small area. Besides the normal copula,
multivariate Archimedean copulas can also be used to construct small
area predictions. This provides a new perspective on small area
predictions when the normality assumption fails as studied by Sinha
& Rao (2009). This reports joint work with François Verret
of Statistics Canada.
A.K.Md. Ehsanes Saleh*, Carleton University
Model Assisted Improved Estimation of Population Totals
Consider a finite population P of size N partitioning into k strata
P1, P2,
, Pk, with sizes N1, N2,
, Nk, respectively.
Each stratum is then subdivided into p sub-strata. We estimate the
population total, T = T1 + T2 +
+ Tk by estimating the h-th
strata totals, Th using model assisted methodology based on the
regression models for each of the sub-strata PNhj, namely yhj =
?hj1Nhj + ßhjxhj + ehj, ehj ~ NNhj(0, s2INhj) when it is suspected
that ßh1 = ßh2 =
= ßhp = ß0 (unknown).
Let Shj be a random sample of size nhj from Phj so that nh1 + nh2
+
+ nhp = nh. Accordingly, we define five estimators, namely,
(i) the unrestricted estimator (UE), , (ii) the restricted estimator
(RE), , (iii) the preliminary test estimator (PTE), , (iv) James-Stein-type
estimators (SE), , and (v) the positive-rule Stein-type estimator
(PRSE), , and compare their dominance properties. It is shown that
= = uniformly for p = 3 while = , , and under the equality of the
slopes. The PTE dominance depends of the level of significance of
PT-test. For p = 2, we provide alternative estimators.
Fritz Scheuren*, Human Rights Data
Analysis Group
Indigenous Population: Small Domain Issues
Small domain issues exist in nearly all settings. Some, like in
the work of JNK Rao, can be addressed by statistical models and
sound inferences obtained. Some require that additional issues,
such as potential misclassifications and record linkage errors,
be addressed.
For many years there has been a small group of researchers focused
on aboriginal issues working internationally (in Canada, Australia,
New Zealand, and the United States). Now in all these countries
a modest fraction of the Indigenous peoples still live in close
proximity in relatively homogeneous (usually rural) communities.
However, many of the Indigenous, maybe half or more, depending on
the country, live widely disbursed among the general population.
Many Indigenous suffer from life style issues, like diabetes, that
were not native to them. Many, too, despite racial prejudice, have
intermarried and in some cases are indistinguishable from the general
population. Given these diverse circumstances, measuring differential
indigenous mortality and morbidity is extremely difficult. Still,
it is to these latter concerns that the beginnings of an international
plan of action are addressed in this paper.
Junheng Ma, National Institute of Statistical
Sciences, J. Sedransk*, Case Western Reserve University
Bayesian Predictive Inference for Finite Population Quantities
under Informative Sampling
We investigate Bayesian predictive inference for finite population
quantities when there are unequal probabilities of selection. Only
limited information about the sample design is available, i.e.,
only the first-order selection probabilities corresponding to the
sample units are known. Our probabilistic specification is similar
to that of Chambers, Dorfman and Wang (1998). We make inference
for finite population quantities such as the mean and quantiles
and provide credible intervals. Our methodology, using Markov chain
Monte Carlo methods, avoids the necessity of using asymptotic approximations.
A set of simulated examples shows that the informative model provides
improved precision over a standard ignorable model, and corrects
for the selection bias.
Jeroen Pannekoek, Statistics Netherlands,
Natalie Shlomo*, Southampton Statistical Sciences Research
Institute, Ton de Waal, Statistics Netherlands
Calibrated Imputation of Numerical Data under Linear Edit Restrictions
A common problem faced by statistical offices is that data may be
missing from collected datasets. The typical way to overcome this
problem is to impute the missing data. The problem of imputing missing
data is complicated by the fact that statistical data often have
to satisfy certain edit rules and that values of variables sometimes
have to sum up to known totals. The edit rules are most often formulated
as linear restrictions on the variables that have to be satisfied
by the imputed data. For example, for data on enterprises edit rules
could be that the profit and costs of an enterprise should sum up
to its turnover and that that the turnover should be at least zero.
The totals of some variables may already be known from administrative
data (turnover from a tax register) or estimated from other sources.
Standard imputation methods for numerical data as described in the
literature generally do not take such edit rules and totals into
account. We describe algorithms for imputation of missing numerical
data that take edit restrictions into account and ensure that sums
are calibrated to known totals. These algorithms are based on a
sequential regression approach that uses regression predictions
to impute the variables one by one. For each missing value to be
imputed we first derive a feasible interval in which the imputed
value must lie in order to make it possible to impute the remaining
missing values in the same unit in such a way that the imputed data
for that unit satisfy the edit rules and sum constraints. To assess
the performance of the imputation methods a simulation study is
carried out as well as an evaluation study based on a real dataset.
Chris Skinner*, London School of Economics
and Political Science
Extending Missing Data Methods to a Measurement Error Setting
We consider a measurement error setting, where y denotes the true
value of a variable of interest, y* denotes the value of y measured
in a survey and z denotes an observed ordinal indicator of the accuracy
with which y* measures y. This generalizes the classical missing
data setting where z is binary, z = 1 denotes fully accurate measurement
with y* = y and z = 0 denotes missing data with the value of y*
set equal to a missing value code. The more general setting is motivated
by an application where y is gross pay rate and z is an interviewer
assessment of respondent accuracy or an indicator of whether the
respondent consulted a pay slip when responding. We discuss possible
approaches to inference about the finite population distribution
function of y. We focus on a parametric modelling approach and the
use of pseudo maximum likelihood. Modelling assumptions are discussed
in the context of data from the British Household Panel Survey and
an associated validation study of earnings data. We indicate a possible
extension using parametric fractional imputation. This paper is
joint work with Damiao da Silva (Southampton) and Jae-Kwang Kim
(Iowa State).
Mary Thompson*, University of Waterloo
Bootstrap Methods in Complex Surveys
This talk presents a review of the bootstrap in the survey sampling
context, with emphasis on the Rao and Wu 1988 paper which is still
the main reference on this topic, and more recent work of Rao and
others on bootstrapping with estimating functions. Some interesting
problems, addressable in part by bootstrapping, arise in inference
for the parameters of multilevel models when the sampling design
is a multistage design and the inclusion probabilities are possibly
informative, and in inference for network parameters under network
sampling.
Mahmoud Torabi*, University of Manitoba
Spatio-temporal Modeling of Small Area Rare Events
In this talk, we use generalized Poisson mixed models for the analysis
of geographical and temporal variability of small area rare events.
In this class of models, spatially correlated random effects and
temporal components are adopted. Spatio-temporal models that use
conditionally autoregressive smoothing across the spatial dimension
and B-spline smoothing over the temporal dimension are considered.
Our main focus is to make inference for smooth estimates of spatio-temporal
small areas. We use data cloning, which yields maximum likelihood,
to conduct frequentist analysis of spatio-temporal modeling of small
area rare events. The performance of the proposed approach is evaluated
with applying to a real dataset and also through a simulation study.
Lola Ugarte*, University of Navarre
Deriving Small Area Estimates from the Information Technology Survey
Knowledge of the current state of the art in information technology
(IT) of businesses in small areas is very important for Central
and Local Governments, markets, and policy-makers because information
technology allows to collect information, to improve the access
to information, and to be more competitive. Information about IT
is obtained through the Information Technology Survey which is a
common survey in Western countries. In this talk, we focus on developing
small area estimators based on a double logistic model with categorical
explanatory variables to obtain information about the penetration
of IT in the Basque Country establishments in 2010. Auxiliary information
for population totals is taken from a Business Register. A model-based
bootstrap procedure is also given to provide the prediction MSE.
Changbao Wu*, University of Waterloo, Jiahua
Chen, University of British Columbia and Jae-Kwang Kim,
Iowa State University
Semiparametric Fractional Imputation for Complex Survey Data
Item nonresponses are commonly encountered in complex surveys. However,
it is also common that certain baseline auxiliary variables can
be observed for all units in the sample. We propose a semiparametric
fractional imputation method for handling item nonresponses. Our
proposed strategy combines the strengths of conventional single
imputation and multiple imputation methods, and is easy to implement
even with a large number of auxiliary variables available, which
is typically the case for large scale complex surveys. A general
theoretical framework will be presented and results from a comprehensive
simulation study will be reported. This is joint work with Jiahua
Chen of University of British Columbia and Jae-Kwang Kim of Iowa
State University.
Yong You*, Statistics Canada, Mike Hidiroglou,
Statistics Canada
Sampling Variance Smoothing Methods for Small Area Proportions
Sampling variance smoothing is an important topic in small area
estimation. In this paper, we study sampling variance smoothing
methods for small area estimators for proportions. We propose two
methods to smooth the direct sampling variance estimators for proportions,
namely, the design effects method and generalized variance function
(GVF) method. In particular, we will show the proposed smoothing
methods based on the design effects and GVF are equivalent. The
smoothed sampling variance estimates will then be treated as known
in the area level models for small area estimation. We evaluate
and compare the smoothed variance estimates based on the proposed
methods through the analysis of different survey data from Statistics
Canada including LFS, CCHS, and PALS. The proposed sampling variance
smoothing methods can also be applied and extended to more general
estimation problems including proportions and counts estimation.
Susana Rubin-Bleuer*, Statistics Canada,
Wesley Yung*, Statistics Canada, Sébastien Landry*,
Statistics Canada
Variance Component Estimation through the Adjusted Maximum Likelihood
Method
Estimation of variance components is a fundamental part of small
area estimation. Unfortunately standard estimation methods frequently
produce negative estimates of the strictly positive model variances.
As a result the resulting Empirical Best Linear Unbiased Predictor
(EBLUP) could be subject to significant bias. Adjusted maximum likelihood
estimators, which always yield positive variance estimators, have
been recently studied by Li and Lahiri (2010) for the classical
Fay-Herriot (1979) small area model. In this presentation, we propose
an extension of Li and Lahiri to the time series and cross-sectional
small area model proposed by Rao-Yu (1994). Theoretical properties
are discussed, along with some empirical results.
Rebecca Steorts*, University of Florida
and Malay Ghosh, University of Florida
On Estimation of Mean Squared Errors of Benchmarked Empirical
Bayes Estimators
We consider benchmarked empirical Bayes (EB) estimators under the
basic area-level model of Fay and Herriot while requiring the standard
benchmarking constraint. In this paper we determine how much mean
squared error (MSE) is lost by constraining the estimates through
benchmarking. We show that the increase due to benchmarking is O(m-1),
where m is the number of small areas. Furthermore, we find an asymptotically
unbiased estimator of this MSE and compare it to the second-order
approximation of the MSE of the EB estimator or equivalently of
the MSE of the empirical best linear unbiased predictor (EBLUP),
which was derived by Prasad and Rao (1990). Moreover, using methods
similar to those of Butar and Lahiri (2003), we compute a parametric
bootstrap estimate of the MSE of the benchmarked EB estimate under
the Fay-Herriot model and compare it to the MSE of the benchmarked
EB estimate found by a second-order approximation. Finally, we illustrate
our methods using SAIPE data from the U.S. Census Bureau.
Top