Rudy Beran, University of California-Davis
Estimating Many Means: A Mosaic of Recent Methodologies
A fundamental data structure is the k-way layout of observations,
complete or incomplete, balanced or unbalanced. The cells of the
layout are indexed by all k-fold combinations of the levels of
the k covariates (or factors). Replication of observations within
cells may be rare or nonexistent. Observations may be available
for only a subset of the cells. The problem is to estimate the
mean observation, or mean potential observable, for each cell
in the k-way layout. Equivalently, the problem is to estimate
an unknown regression function that depends on k covariates.
This talk unifies a mosaic of recent methodologies for estimating
means in the general k-way layout or k-covariate regression problem.
Included are penalized least squares with multiple quadratic penalties,
associated Bayes estimators, associated submodel fits, multiple
Stein shrinkage, and functional data analysis as a limit scenario.
The focus is on the choice of tuning parameters to minimize estimated
quadratic risk under a minimally restrictive data model; and on
the asymptotic risk of the chosen estimator as the number of observed
cells in the k-way layout tends to infinity.
Back to home page
========================
Jiahua Chen, University of British Columbia
Advances in EM-test for Finite Mixture Models
Making valid and effective inferences for finite mixture models
has known to be technically challenging. Due to the non-regularity,
the likelihood ratio test was found to diverge to infinite if
the parameter space is not artificially confined to a compact
space. Even under compact assumption, the limiting distribution
is often a function of the supermum of some Gaussian processes.
Such results are of theoretical interest but not useful in applications.
Recently, many new tests have been proposed to address this problem.
The EM-test has been found superior in many respects. For many
classes of finite mixture models, we have tailor designed EM-tests
that have easy to use limiting distributions. The simulation indicates
that the limiting distributions have good precision at approximating
the finite sample distributions in the examples investigated.
A general procedure for choosing the tuning parameter has also
been developed.
Back to home page
========================
Xihong Lin, Harvard University
Hypothesis testing and variable selection for Studying
Rare Variants in Sequencing Association Studies
Sequencing studies are increasingly being conducted to
identify rare variants associated with complex traits. The limited
power of classical single marker association analysis for rare
variants poses a central challenge in such studies. We propose
the sequence kernel association test (SKAT), a supervised, flexible,
computationally efficient regression method to test for association
between genetic variants (common and rare) in a region and a continuous
or dichotomous trait, while easily adjusting for covariates. As
a score-based variance component test, SKAT can quickly calculate
p-values analytically by fitting the null model containing only
the covariates, and so can easily be applied to genome-wide data.
Using SKAT to analyze a genome-wide sequencing study of 1000 individuals,
by segmenting the whole genome into 30kb regions, requires only
7 hours on a laptop. Through analysis of simulated data across
a wide range of practical scenarios and triglyceride data from
the Dallas Heart Study, we show that SKAT can substantially outperform
several alternative rare-variant association tests. We also provide
analytic power and sample size calculations to help design candidate
gene, whole exome, and whole genome sequence association studies.
We also discuss variable selection methods to select causal variants.
Ejaz Ahmed, University of Windsor
System/Machine Bias versus Human Bias: Generalized Linear
Models
Penalized and shrinkage regression have been widely used in high-dimensional
data analysis. Much of recent work has been done on the study
of penalized least square methods in linear models. In this talk,
I consider estimation in generalized linear models when there
are many potential predictor variables and some of them may not
have influence on the response of interest. In the context of
two competing models where one model includes all predictors and
the other restricts variable coefficients to a candidate linear
subspace based on prior knowledge, we investigate the relative
performances of absolute penalty estimator (APE) and shrinkage
estimators in the direction of the subspace. We develop large
asymptotic analysis for the shrinkage estimators. The asymptotics
and a Monte Carlo simulation study show that the shrinkage estimator
performs better than benchmark estimators. Further, it performs
better than the APE when the dimension of the restricted parameter
space is large. The estimation strategies considered in this talk
are also applied on a real life data set for illustrative purpose.
========================
Pierre Alquier, Université Paris 7 and CREST
Bayesian estimators in high dimension: PAC bounds and Monte
Carlo methods
Coauthors: Karim Lounici (Georgia Institute of Technology) Gérard
Biau (Université Paris 6)
The problem of sparse estimation in high dimension received a
lot of attention in the last ten years. However, to find an estimator
with both satisfying statistical and computationnal properties
is still an open problem. For example the LASSO can be efficiently
computed but statistical properties requires strong assumption
on the observations. On the other hand, BIC does not require such
hypothesis but can not be efficiently computed in very high dimension.
We propose here the so-called PAC-Bayesian method (McAllester
1998, Catoni 2004, Dalalyan and Tsybakov 2008) as an alternative
approach. We build a Bayesian estimator that satisfies a tight
PAC bound, and compute it using reversible jump Markov Chain Monte
Carlo methods. A first version, proposed in a joint work with
Karim Lounici, deals with the linear regression problem while
the work with Gérard Biau extends these results to the
single index model.
Back to home page
========================
Shojaeddin Chenouri, University of Waterloo
Coauthors: Sam Behseta (California State University, Fullerton)
Comparison of Two Populations of Curves with an Application
in Neuronal Data Analysis
Often in neurophysiological studies, scientists are interested
in testing hypotheses regarding the equality of the overall intensity
functions of a group of neurons when recorded under two different
experimental conditions. In this talk, we consider such a hypothesis
testing problem. We propose two test statistics: a parametric
test based on the Hotelling's $T^2$ statistic, as well as a nonparametric
one based on the spatial signed-rank test statistic of M\"{o}tt\"{o}nen
and Oja (1995). We implement these tests on smooth curves obtained
via fitting Bayesian Adaptive Regression Splines (BARS) to the
intensity functions of neuronal Peri-Stimulus Time Histograms
(PSTH).
Through simulation, we show that the powers of our proposed tests
are extremely high even when the number of sampled neurons, and
the number of trials per neuron are small. Finally, we apply our
methods on a group of motor cortex neurons recorded during a reaching
task.
========================
Kjell Doksum, University of Wisconsin-Madison
Coauthors: Fan Yang, Kam Tsui
Biomedical large scale inference
I will describe methods used by population and medical geneticists
to analyse associations between disease and genetic markers. These
methods are able to handle data with hundred of thousands of variables
by using dual principal component analysis. I will compare these
methods to frequentist and Bayesian methods from the field of
statistics.
This is joint work with Fan Yang and Kam Tsui
========================
Yang Feng, Columbia University
Coauthors: Tengfei Li, Wen Yu, Zhiliang Ying, Hong Zhang
Loss Adaptive Modified Penalty in Variable Selection
For variable selection, balancing sparsity and stability is a
very important task. In this work, we propose the Loss Adaptive
Modified Penalty (LAMP) where the penalty function is adaptively
changed with the type of the loss function. For generalized linear
models, we provide a unified form of the penalty corresponding
to the specific exponential family. We show that LAMP can have
asymptotic stability while achieving oracle properties. In addition,
LAMP could be seen as a special functional of a conjugate prior.
An efficient coordinate-descent algorithm is proposed and a balancing
method is introduced. Simulation results show LAMP has competitive
performance comparing with several well-known penalties.
========================
D. A. S. Fraser, University of Toronto
High-Dimensional: The Barrier and Bayes and Bias
We all aspire to breach the barrier and we do; and yet it
always reforms as more formidable. In the context of a statistical
model and data two familiar approaches involve; slicing which
uses only a data-slice of the model, namely the likelihood function
perhaps with a calibrating weight function or prior; and bridging
which uses derivatives at infinity to cantilever back over the
barrier to first, second, and third order. Both have had remarkable
successes and both involve risks that can be serious.
We all have had confrontations with the boundary and I'll start
with comment on my first impact. The slicing I refer to is the
use of the data slice, the likelihood function, as the sole or
primary model summary. This can be examined in units data-standardized
and free from model curvature, and the related gradient of the
log-prior then gives a primary calibration of the prior; the initiative
in this direction is due to Welch and Peers (1963) but its prescience
was largely overlooked. The bridging is the Taylor expansion about
infinity with analysis from asymptotics. From these we obtain
an order of magnitude calibration of the effect of a prior on
the basic slice information; this leads to the direction and the
magnitude of the bias that derives from the use of a prior to
do a statistical analysis.
========================
Xin Gao, York University
Coauthors: Peter Song, Yuehua Wu
Model selection for high-dimensional data with applications
in feature selection and network building
For high-dimensional data set with complicated dependency
structures, the full likelihood approach often leads to intractable
computational complexity. This imposes difficulty on model selection
as most of the traditionally used information criteria require
the evaluation of the full likelihood. We propose a composite
likelihood version of the Bayesian information criterion (BIC)
and establish its consistency property for the selection of
the true underlying marginal model. Under some mild regularity
conditions, the proposed BIC is shown to be selection consistent,
where the number of potential model parameters is allowed to
increase to infinity at a certain rate of the sample size. In
this talk, we will also discuss the result that using a modified
Bayesian information criterion (BIC) to select the tuning parameter
in penalized likelihood estimation of Gaussian graphical model
can lead to consistent network model selection even when $P$
increases with $N,$ as long as all the network edges are contained
in a bounded subset.
========================
Xiaoli Gao, Oakland University,
LAD Fused Lasso Signal Approximation
The fused lasso penalty is commonly used in signal processing
when the hidden true signals are sparse and blocky. The $\ell_1$
loss has some robust properties when the additional noises are
contaminated by outliers. In this manuscript, we study the asymptotic
properties of an LAD-fused-lasso model used as a signal approximation
(LAD-FLSA). We first investigate the estimation consistency
properties of an LAD-FLSA estimator. Then we provide some conditions
under which an LAD-FLSA estimator can be both block selection
consistent and sign consistent. We also provide an unbiased
estimate for the generalized degrees of freedom (GDF) of the
LAD-FLSA modeling procedure for any given tuning parameters.
The effect of the unbiased estimate is demonstrated using simulation
studies.
Back to home page
========================
Yulia Gel, University of Waterloo
Coauthors: Peter Bickel, University of California, Berkeley
Banded regularization of autocovariance matrices in application
to parameter estimation and forecasting of time series
This talk addresses a "large p-small n" problem
in a time series framework and considers properties of banded
regularization of an empirical autocovariance matrix of a time
series process. Utilizing the banded autocovariance matrix enables
us to fit a much longer model to the observed data than typically
suggested by AIC, while controlling how many parameters are to
be estimated precisely and the level of accuracy. We present results
on asymptotic consistency of banded autocovariance matrices under
the Frobenius norm and provide a theoretical justi cation on optimal
band selection using cross-validation. Remarkably, the cross-validation
loss function for banded prediction is related to the conditional
mean square prediction error (MSPE) and, thus, may be viewed as
an alternative model selection criterion. The proposed procedure
is illustrated by simulations and application to predicting sea
surface temperature (SST) index in the Nino 3.4 region.
========================
Jiashun Jin, Carnegie Mellon University
Coauthors: Pengsheng Ji
UPS delivers optimal phase diagram in high dimensional
variable selection
We consider a linear regression model where both $p$ and $n$ are
large but $p > n$. The vector of coefficients is unknown but
is sparse in the sense that only a small proportion of its coordinates
is nonzero, and we are interested in identifying these nonzero
ones. We propose a two-stage variable selection procedure which
we call the {\it UPS}. This is a Screen and Clean procedure, in
which we screen with the Univariate thresholding, and clean with
the Penalized MLE.
In many situations, the UPS possesses two important properties:
Sure Screening and Separable After Screening (SAS). These properties
enable us to reduce the original regression problem to many small-size
regression problems that can be fitted separately. As a result,
the UPS is effective both in theory and in computation. The lasso
and the subset selection are well-known approaches to variable
selection. However, somewhat surprisingly, there are regions where
neither the lasso nor the subset selection is rate optimal, even
for very simple design matrix. The lasso is non-optimal because
it is too loose in filtering out fake signals (i.e. noise that
is highly correlated with a signal), and the subset selection
is non optimal because it tends to kill one or more signals in
correlated pairs, triplets, etc..
========================
Timothy D. Johnson, University of Michigan, Department of Biostatistics
Computational Speedup in Spatial Bayesian Image Modeling
via GPU Computing
Spatial modeling is a computationally complex endeavor due to
the spatial correlation structure in the data that must be taken
into account in the modeling. This endeavor is even more computationally
complex for 3D data/images---curse of dimensionality---and within
the Bayesian framework due to posterior distributions that are
not analytically tractable and thus must be approximated via
MCMC simulation. For point reference data, dimension reduction
techniques, such as Gaussian predictive process models, have
alleviated some of the computational burden, however, for image
data and point pattern data, these dimension reduction techniques
may not be applicable. Two examples are a population level fMRI
hierarchical model where image correlation is accounted for
in the weights of a finite mixture model and a log-Gaussian
Cox process model of lesion location in patients with Multiple
Sclerosis. Both of these models are extremely computationally
intense due to the complex nature of the likelihoods and the
size of the 3D images. However, both likelihoods are amenable
to parallelization. Although the MCMC simulation cannot be parallelized,
by small, rather straightforward changes to the code and porting
the likelihood computation to a graphical processing unit (GPU),
I have achieved over 2 orders of magnitude increase in computational
efficiency in these two problems.
Back to home page
========================
Abbas Khalili, McGill University
Coauthors: Shili Lin; Dept. of Statistics, The Ohio State University
Regularization in finite mixture of regression models with
diverging number of parameters
Feature (variable) selection has become a fundamentally
important problem in recent statistical
literature. Often, in applications many variables are introduced
to reduce possible modeling biases.
The number of introduced variables thus depends on the sample
size, which reflects the estimability
of the parametric model. In this paper, we consider the problem
of feature selection in finite mixture of
regression models when the number of parameters in the model can
increase with the sample size.
We propose a penalized likelihood approach for feature selection
in these models. Under certain
regularity conditions, our approach leads to consistent variable
selection. We carry out a simulation
study to evaluate the performance of the proposed approach under
controlled settings. A real data on
Parkinsons disease is also analyzed. The data concerns whether
dysphonic features extracted from
the patients' speech signals recorded at home can be used as surrogates
to study PD severity and
progression. Our analysis of the PD data yields interpretable
results that can be of important clinical values.
The stratification of dysphonic features for patients with mild
and severe symptoms lead to novel insights
beyond the current literature.
Back to home page
========================
Peter Kim, University of Guelph
Testing Quantum States for Purity
The simplest states of finite quantum systems are the pure states.
This paper is motivated by the need to test whether or not a given
state is pure. Because the pure states lie in the boundary of
the set of all states, the usual regularity conditions that justify
the standard large-sample approximations to the null distributions
of the deviance and the score statistic are not satisfied. For
a large class of quantum experiments that produce Poisson count
data, this paper uses an enlargement of the parameter space of
all states to develop likelihood ratio and score tests of purity.
The asymptotic null distributions of the corresponding statistics
are chi-squared. The tests are illustrated by the analysis of
some quantum experiments involving unitarily correctable codes.
Back to home page
========================
Samuel Kou, Harvard University
Coauthors: Benjamin Olding
Multi-resolution inference of stochastic models from partially
observed data
Stochastic models, diffusion models in particular, are widely
used in science, engineering and economics. Inferring the parameter
values from data is often complicated by the fact that the underlying
stochastic processes are only partially observed. Examples include
inference of discretely observed diffusion processes, stochastic
volatility models, and double stochastic Poisson (Cox) processes.
Likelihood based inference faces the difficulty that the likelihood
is usually not available even numerically. Conventional approach
discretizes the stochastic model to approximate the likelihood.
In order to have desirable accuracy, one has to use highly dense
discretization. However, dense discretization usually imposes
unbearable computation burden. In this talk we will introduce
the framework of Bayesian multi-resolution inference to address
this difficulty. By working on different resolution (discretization)
levels simultaneously and by letting the resolutions talk to each
other, we substantially improve not only the computational efficiency,
but also the estimation accuracy. We will illustrate the strength
of the multi-resolution approach by examples.
Back to home page
========================
Hua Liang, University of Rochester Medical Center
Coauthors: Hansheng Wang and Chih-Ling Tsai
Profiled Forward Regression for Ultrahigh Dimensional Variable
Screening in Semiparametric Partially Linear Models
In partially linear model selection, we develop a profiled forward
regression (PFR) algorithm for ultrahigh dimensional variable
screening. The PFR algorithm effectively combines the ideas of
nonparametric profiling and forward regression. This allows us
to obtain a uniform bound for the absolute difference between
the profiled and original predictors. Based on this important
finding, we are able to show that the PFR algorithm discovers
all relevant variables within a few fairly short steps. Numerical
studies are presented to illustrate the performance of the proposed
method.
Back to home page
========================
Yufeng Liu, University of North Carolina at Chapel Hill
Coauthors: Helen Hao Zhang (NSCU) and Guang Cheng (Purdue)
Automatic Structure Selection for Partially Linear Models
Partially linear models provide good compromises between linear
and nonparametric models. However, given a large number of covariates,
it is often difficult to objectively decide which covariates are
linear and which are nonlinear. Common approaches include hypothesis
testing methods and screening procedures based on univariate scatter
plots. These methods are useful in practice; however, testing
the linearity of multiple functions for large dimensional data
is both theoretically and practically challenging, and visual
screening methods are often ad hoc. In this work, we tackle this
structure selection problem in partially linear models from the
perspective of model selection. A unified estimation and selection
framework is proposed and studied. The new estimator can automatically
determine the linearity or nonlinearity for all covariates and
at the same time consistently estimate the underlying regression
functions. Both theoretical and numerical properties of the resulting
estimators are presented.
Back to home page
========================
Jinchi Lv, University of Southern California
Non-Concave Penalized Likelihood with NP-Dimensionality
Coauthors: Jianqing Fan (Princeton University)
Penalized likelihood methods are fundamental to ultra-high dimensional
variable selection. How high dimensionality such methods can handle
remains largely unknown. In this paper, we show that in the context
of generalized linear models, such methods possess model selection
consistency with oracle properties even for dimensionality of
Non-Polynomial (NP) order of sample size, for a class of penalized
likelihood approaches using folded-concave penalty functions,
which were introduced to ameliorate the bias problems of convex
penalty functions. This fills a long-standing gap in the literature
where the dimensionality is allowed to grow slowly with the sample
size. Our results are also applicable to penalized likelihood
with the L1-penalty, which is a convex function at the boundary
of the class of folded-concave penalty functions under consideration.
The coordinate optimization is implemented for finding the solution
paths, whose performance is evaluated by a few simulation examples
and the real data analysis.
Back to home page
========================
Bin Nan, University of Michigan
Coauthors: Xuejing Wang, Ji Zhu, Robert Koeppe
Sparse 3D Functional Regression via Haar Wavelets
PET imaging has great potential to aid diagnosis of neurodegenerative
diseases, such as Alzheimers disease or mild cognitive impairment.
Commonly used region-of-interest analysis loses detailed voxel-level
information. Here we propose a three-dimensional functional linear
regression model, treating the PET images as three-dimensional
functional covarites. Both image and functional regression coefficient
are expanded using the same set of Haar wavelet bases. The functional
regression model is then reduced to a linear regression model.
We found the sparsity of original functional regression coefficient
can be achieved by the sparsity of the regression coefficients
in the reduced model after wavelet transformation. Lasso procedure
can be implemented with the level of Haar wavelet expansion as
an additional tuning parameter.
========================
Annie Qu , Department of Statistics, University of Illinois at Urbana-Champaign
Coauthors: Peng Wang, University of Illinois at Urbana-Champaign;
Guei-feng Tsai, Center for Drug Evaluation of Taiwan
Conditional Inference Functions for Mixed-Effects Models
with Unspecified Random-Effects Distribution
In longitudinal studies, mixed-effects models are important for
addressing subject-specific effects. However, most existing approaches
assume a normal distribution for the random effects, and this
could affect the bias and efficiency of the fixed-effects estimator.
Even in cases where the estimation of the fixed effects is robust
with a misspecified distribution of the random effects, the estimation
of the random effects could be invalid. We propose a new approach
to estimate fixed and random effects using conditional quadratic
inference functions. The new approach does not require the specification
of likelihood functions or a normality assumption for random effects.
It can also accommodate serial correlation between observations
within the same cluster, in addition to mixed-effects modeling.
Other advantages include not requiring the estimation of the unknown
variance components associated with the random effects, or the
nuisance parameters associated with the working correlations.
Real data examples and simulations are used to compare the new
approach with the penalized quasi-likelihood approach, and SAS
GLIMMIX and nonlinear mixed effects model (NLMIXED) procedures.
Back to home page
========================
Sunil Rao, University of Miami, Division of Biostatistics
Coauthors: Hemant Ishwaran, Cleveland Clinic
Mixing Generalized Ridge Regressions
Hoerl and Kennard proposed generalized ridge regression (GRR)
almost forty years ago as a means to overcome the deficiency of
least squares in multicollinear problems. Because high-dimensional
regression problems naturally involve correlated predictors, in
part due to the nature of the data and in part due to artifact
of the dimensionality, it is reasonable to consider GRR for addressing
these problems. We study GRR in problems in which the number of
predictors exceeds the sample size. We describe a novel geometric
intrepretation for GRR in terms of a uniquely defined least squares
estimator. However, the GRR is constrained to lie in a low-dimensional
subspace which limits its effectiveness. To overcome this, we
introduce a mixing GRR procedure using easily constructed exponential
weights and establish a finite sample minimax bound for this procedure.
A term that appears is a dimensionality effect which poses a problem
in ultra-high dimensions that we address by using a mixing GRR
for filtering variables. We study the performance of this procedure
as well as a hybrid method using a range of examples.
========================
Enayetur Raheem, University of Windsor/Windsor-Essex County
Health Unit
Coauthors: Kjell Doksum, S. E. Ahmed
Absolute Penalty and B-spline-based Shrinkage Estimation
in Partially Linear Models
In the context of a partially linear regression model (PLM),
we utilized shrinkage and absolute penalty estimation technique
for simultaneous model selection and parameter estimation. Ahmed
et al (2007) in a similar setup considered kernel-based estimate
of the nonparametric component while B-spline is considered in
our setup. We developed shrinkage semiparametric estimators that
improve upon the classical estimators when there are nuisance
covariates present in the model. In comparing two modelswith
and without the nuisance covariates, the shrinkage estimators
take an adaptive approach in a way that the information contained
in the nuisance variable is utilized if it is tested to be useful
for overall fit of the model. Bias expressions and risk properties
of the estimators are obtained. Application of the proposed methods
to a real data set is provided.
Since the B-spline can be incorporated in a regression model
easily, we attempted to numerically compare the performance of
our proposed method with the lasso. While both shrinkage and lasso
outperform classical estimators, shrinkage estimators perform
better than lasso in terms of prediction errors when there are
many nuisance variables and the sample size is moderately large.
========================
Xiaotong Shen, School of Statistics, University of Minnesota
Coauthors: Hsin-Cheng Huang
On simultaneous supervised clustering and feature selection
In network analysis, genes are known to work in groups by
their biological functionality, where distinctive groups reveals
different gene functionalities. In such a situation, identifying
grouping structures as well as informative genes becomes critical
in understanding progression of a disease. Motivated from gene
network analysis, we investigate, in a regression context, simultaneous
supervised clustering and feature selection over an arbitrary
undirected graph, where each predictor corresponds to one node
in the graph and existence of a connecting path between two nodes
indicates possible grouping between the two predictors. In this
talk, I will discuss methods for simultaneous supervised clustering
and feature selection over a graph, and argue that supervised
clustering and feature selection are complementary for identifying
a simpler model with higher predictive performance. Numerical
examples will be given in addition to theory.
Back to home page
========================
Christopher G. Small, University of Waterloo
Multivariate analysis of data in curved shape spaces
We consider some statistical methods for the analysis of images
and objects whose shapes are encoded as points in Kendall shape
spaces. Standard multivariate methods, applicable to data in Euclidean
spaces, do not directly apply to such contexts. The talk highlights
the necessity for methods which respect the essentially non-Euclidean
nature of shape spaces. An application to data from anthropology
will be given.
Back to home page
========================
Hao Helen Zhang, North Carolina State University
Coauthors: Wenbin Lu and Hansheng Wang
On Sparse Estimation for Semiparametric Linear Transformation
Models
Semiparametric linear transformation models have received
much attention due to its high flexibility in modeling survival
data. A useful estimating equation procedure was recently proposed
by Chen et al. (2002) for linear transformation models to jointly
estimate parametric and nonparametric terms. They showed that
this procedure can yield a consistent and robust estimator. However,
the problem of variable selection for linear transformation models
is less studied, partially because a convenient loss function
is not readily available under this context. We propose a simple
yet powerful approach to achieve both sparse and consistent estimation
for linear transformation models. The main idea is to derive a
profiled score from the estimating equation of Chen et al. (2002),
construct a loss function based on the profile scored and its
variance, and then minimize the loss subject to some shrinkage
penalty. We show that the resulting estimator is consistent for
both model estimation and variable selection. Furthermore, the
estimated parametric terms are asymptotically normal and can achieve
higher efficiency than that yielded from the estimation equations.
We suggest a one-step approximation algorithm which can take advantage
of the LARS path algorithm. Performance of the new procedure is
illustrated through numerous simulations and real examples including
one microarray data.
========================
Hongtu Zhu, Department of Biostatistics and Biomedical Research
Imaging Center, UNC-Chapel Hill
Smoothing Imaging Data in Population Studies.
Coauthors: Yimei Li, Yuan Ying, Runze Li, Steven Marron, Ja-an
Lin, Jianqing Fan, John H. Gilmore, Martin Styner, Dinggang Shen,
Weili Lin
Motivated by recent work studying massive imaging data in large
neuroimaging studies,we propose various multiscale adaptive smoothing
models (MARM) for spatially modeling the relation between high-dimensional
imaging measures on a three-dimensional (3D) volume or a 2D surface
with a set of covariates. Statistically, MARM can be regarded
as a novel generalization of functional principal component analysis
(fPCA) and varying coefficient models (VCM) in higher dimensional
space compared to the standard fPCA and VCM. We develop novel
estimation procedures for MARMs and systematically study their
theoretical properties. We conduct Monte Carlo simulation and
real data analyses to examine the finite-sample performance of
the proposed procedures.
Back to home page
S. Ejaz Ahmed and Saber Fallahpour, Department of Mathematics
and Statistics
University of Windsor
L1 Penalty and Shrinkage Estimation in Partially Linear
Models with Random Coefficient Autoregressive Errors
In partially linear models (PLM) we consider methodology
for simultaneous model selection and parameter estimation
with random coefficient autoregressive errors using lasso
and shrinkage strategies. The current work is an extension
to Ahmed et al. (2007) where they considered a PLM with random
errors. We provide natural adaptive estimators that significantly
improve upon the classical procedures in the situation where
some of the predictors are nuisance variables that may or
may not affect the association between the response and the
main predictors. In the context of two competing partially
linear regression models (full and sub-models), we consider
an adaptive shrinkage estimation strategy. We develop the
properties of these estimators using the notion of asymptotic
distributional risk. The shrinkage estimators (SE) are shown
to have a higher efficiency than the classical
estimators for a wide class of models. For the lasso-type
estimation strategy, we devise efficient algorithms to obtain
numerical results. We compare the relative performance of
lasso with the shrinkage and other estimators. Monte Carlo
simulation experiments are conducted for various combinations
of the nuisance
parameters and sample size, and the performance of each method
is evaluated in terms of simulated mean squared error. The
comparison reveals that the lasso and shrinkage strategies
outperform the classical procedure. The SE performs better
than the lasso strategy in the effective part of the parameter
space when, and only when, there are many nuisance variables
in the model. A data example is showcased to illustrate the
usefulness of suggested methods.
Reference:
Ahmed, S. E., Doksum, K. A., Hossain, S. and You,
J. (2007). Shrinkage, pretest and absolute penalty estimators
in partially linear models. Aust. New Zealand J. Stat, 49, 435-454.
========================
Billy Chang, Ph.D. Candidate (Biostatistics), Dalla Lana School
of Public Health, University of Toronto
Author: Billy Chang and Rafal Kustra
Regularization for Nonlinear Dimension
Reduction by Subspace Constraint
Sparked by the introduction of Isomap and Locally-Linear-Embedding
in year 2000, nonlinear approaches to dimension reduction have
received unprecedented attention during the past decade. Although
the flexibility of such methods has provided scientists powerful
ways for feature extraction and visualization, their applications
are focused mainly on large-sample and low-noise settings. In
small sample, high-noise settings, model regularization is necessary
to avoid over-fitting. Yet, over-fitting issues for nonlinear
dimension reduction have not been widely explored, even for
earlier methods such as kernel PCA and multi-dimensional scaling.
Regularization for nonlinear dimension reduction is a non-trivial
task; while an overly-complex model will over-fit, an overly-simple
model cannot detect highly nonlinear signals. To overcome this
problem, I propose performing nonlinear dimension reduction
within a lower-dimensional subspace. As such, one can increase
the model complexity for nonlinear pattern search, while over-fitting
is avoided as the model is not allowed to traverse through all
possible dimensions. The crux of the problem lies in finding
the subspace containing the nonlinear signal, and I will discuss
a Kernel PCA approach for the subspace search, and a principal
curve approach for nonlinear basis construction.
========================
Abdulkadir Hussein, Ejaz Ahmed and Marwan Al-Momani, U of Windsor
To homogenize or not to homogenize: The case of linear
mixed models
The problem of whether a given data supports heterogeneous
or homogeneous models has a long history and perhaps its major
manifestation is in the form of generalized linear mixed models.
By heterogeneous models we mean models where diversity among
possible subpopulations is accommodated by using variance
components. Among other areas, this problem arises in economics,
finance, and Biostatistics under various names such as panel,
longitudinal or cluster correlated data. Homogeneity is a
desired property while heterogeneity is often a fact of life.
in order to reconcile these two types of models and seek unity
in diversity, we propose and explore several shrinkage-type
estimators for regression coefficient parameters as well as
for the variance components . We examine the merits of the
different estimators by using asymptotic risk assessment measures
and by using Monte Carlo simulations. We apply the proposed
methods to income panel data.
========================
Azadeh Shohoudi, McGill
Variable Selection in Multipath Change-point
Problems
Follow-up studies are frequently carried out to
study evolution of one or several measurements taken on some subjects
through time. When a stimulus is administered on subjects, it
is of interest to study the reaction times, change-points. One
may want to select the covariates that accelerate reaction to
the stimulus. Selecting effective covariates in this setting pose
a challenge when the number of covariates is large. We develop
such methodology and study, the large sample behavior of the method.
Small sample behavior is studied by the means of simulation. The
method is applied to a Parkinson disease data set.
========================
Xin Tong, Princeton University
Coauthors: Philippe Rigollet (Princeton University)
Neyman-Pearson classification, convexity and stochastic
constraints
Motivated by problems of anomaly detection, this paper implements
the Neyman-Pearson paradigm to deal with asymmetric errors in
binary classification with a convex loss. Given a finite collection
of classifiers, we combine them and obtain a new classifier
that satisfies simultaneously the two following properties with
high probability: (i) its probability of type~I error is below
a pre-specified level and (ii), it has probability of type~II
error close to the minimum possible. The proposed classifier
is obtained by solving an optimization problem with an empirical
objective and an empirical constraint. New techniques to handle
such problems are developed and have consequences on chance
constrained programming.
========================
Chen Xu (Dept.of Stat, UBC), Song Cai (Dept.of Stat, UBC)
Soft Thresholding-based Screening for Ultra-high
dimensional Feature Spaces
Variable selection and feature extraction are fundamental
for knowledge discovery and statistical modeling with high-dimensionality.
To reduce the computational burden, variable screening techniques,
such as the Sure Independence Screening (SIS; Fan and Lv, 2008),
are often used before the formal analysis. In this work, we propose
another computational efficient procedure for variable screening
through a soft thresholding-based iteration (namely, the soft
thresholding screening, STS). The STS could efficiently screen
out most of the irrelevant features (covariates), while keep those
important ones in the model with high probability. With dimensionality
reduced from high to low, the refined model after STS then serves
as a good starting point for further selection. The excellent
performance of STS is supported by various numerical studies.