|
Back to main index
Invited
Speakers
Mohamed Amezziane, Central Michigan
University
Semiparametric Smoothing through Preliminary Test Estimation
Coauthors: S. Ejaz Ahmed
Pre-test estimation is implemented to develop a semiparametric
function estimator by shrinking a nonparametric function estimator
towards a fully known parametric function. We demonstrate that
this semiparametric estimator outperforms the nonparametric
estimator under certain conditions. We also derive the asymptotic
properties of the estimator and discuss the smoothing parameter
selection.
Oskar Maria Baksalary (Adam Mickiewicz)
On certain subspace distances determined by the Frobenius norm
Coauthors: Goetz Trenkler, Faculty of Statistics, Dortmund University
of Technology, Dortmund, Germany
From among functions introduced so far to characterize a "separation"
of two vector subspaces, a distinguished role is played by the
notions of an angle and minimal angle, which were originally
defined in a Hilbert space. Inspired by known characterizations
of the angles, we specify two new measures of a separation between
two subspaces of a finite dimensional complex vector space,
say M, N ? Cn, 1. The measures are based on the Frobenius norm
of certain functions of orthogonal projectors onto subspaces
determined by M and N. Several properties of the measures are
identified and discussed, mostly by exploiting partitioned representations
of matrices.
With this talk we are pleased to celebrate the 70th birthday
of Professor Goetz Trenkler on 14 July 2013.
Somnath Datta (Louisville)
Robust Regression Analysis of Longitudinal Data Under Censoring
We consider regression analysis of longitudinal data when the
temporal correlation is modeled by an autoregressive process.Robust
R estimators of regression and autoregressive parameters are
obtained. Our estimators are valid under censoring caused by
detection limits. Theoretical and simulation studies of the
estimators are presented. We analyze a real data set on air
pollution using our methodology.
Susmita Datta (University of Louisville)
Rank Aggregation in Bioinformatics Problems
High-throughput technologies in genomics and proteomics promoted
the need to develop novel statistical methods for handling and
analyzing enormous amounts of high dimensional data that are
being produced on a daily bases in laboratories around the world.
In this work, we propose novelmethodology to summarize the information
in the data in terms of clustering techniques. In particular,
we find the optimal clustering algorithm for a given data amongst
a collection of algorithms in terms of multiple performance
criteria. We use stochastic optimization technique of cross
entropy to rank aggregate a list of distances of multiple ordered
lists to achieve this. We illustrate the methodologies through
simulated and real life microarray and mass spectrometry data.
Kai-Tai Fang (United International
College, Zhuhai)
The Magic Square - Historical Review and Recent Development
Coauthors: Yanxun Zheng
If an n×n matrix of numbers in which the sum of entries
along each row, each column, the main diagonal and the cross
diagonal is the same constant, it is called a magic square.
If the elements of the magic square are consecutive integers
from 1 through n2, then it is called a classical magic square.
Magic squares were known to Chinese mathematicians, as early
as 650 B.C. There are so many mystical properties of the magic
matrix in the literature. In this talk a historical review is
given and some recent developments are mentioned. Some applications
of the magic square are discussed also.
Ali Ghodsi, University of Waterloo
Nonnegative Matrix Factorization via Rank-One Downdate
Nonnegative matrix factorization (NMF) was popularized as a
tool for data mining by Lee and Seung in 1999. NMF attempts
to approximate a matrix with nonnegative entries by a product
of two low-rank matrices, also with nonnegative entries. In
this talk, I introduce an algorithm called rank-one downdate
(R1D) for computing an NMF that is partly motivated by the singular
value decomposition. This algorithm computes the dominant singular
values and vectors of adaptively determined submatrices of a
matrix. On each iteration, R1D extracts a rank-one submatrix
from the original matrix according to an objective function.
I establish a theoretical result that maximizing this objective
function corresponds to correctly classifying articles in a
nearly separable corpus. I also provide computational experiments
showing the success of this method in identifying features in
realistic datasets. The method is also much faster than other
NMF routines.
This is a joint work with Michael Biggs and Stephen Vavasis.
Karl E. Gustafson (Colorado at
Boulder)
A New Financial Risk Ratio
Randomness in Financial Markets has been recognized for over
a century: Bachelier(1900), Cowles(1932), Kendall(1953), Samuelson(1959).
Risk thus enters into efficient Portfolio design: Fisher(1906),
Williams(1936), Working(1948), Markowitz(1952). Reward versus
Risk decisions then depend upon Utility to the Investor: Bernoulli(1738),
Kelly(1956), Sharpe(1964), Modigliani(1997). Returns of a Portfolio
adjusted to Risk are measured by a number of Ratios: Treynor,
Sharpe, Sortino, M2, among others. I will propose a refinement
of such ratios. This possibility was mentioned in my recent
book: Antieigenvalue Analysis , World-Scientific (2011).
Abdulkadir Hussein (Windsor)
Efficient estimation in high dimensional spatial regression
models
We consider Some spatial regression models and develop an array
of shrinkage and absolute penalty estimators for the regression
coefficients. We compare the estimators analytically and by
means of Monte Carlo simulations. We illustrate the usefulness
of the proposed estimation methods by using data sets on crime
distribution and housing prices.
Keywords: Spatial regression, penalty estimation, shrinkage
Tõnu Kollo (University of Tartu)
Matrix star-product and skew elliptical distributions
Multivariate skew elliptical distributions have usually three
parameters: vectors of location and shape and a positive definite
matrix as the scale parameter. Often distributions have some
additional matrix parameters. In Kollo, Selart, Visk (2013)
an estimation method for three parameter skew elliptical distributions
is suggested based on moments' expressions. The construction
of point estimates uses star product of matrices. In the talk
we examine possibility of deriving confidence regions for these
estimates using matrix derivative technique. Reference: Kollo,
T., Selart, A. Visk, H. (2013). From multivariate skewed distributions
to copulas. In: Combinatorial Matrix Theory and Generalized
Inverses of Matrices. Eds.: R. B. Bapat et al. Springer, 63-72.
Steven N. MacEachern (Ohio State
University)
Efficient Quantile Regression for Linear Heterogenous Models
Coauthors: Yoonsuh Jung (University of Waikato) and Yoonkyung
Lee (Ohio State University)
Quantile regression provides estimates of a range of conditional
quantiles. This stands in contrast to traditional regression
techniques which focus on a single conditional mean function.
Lee et al. (2012) modified quantile regression by combining
notions from least squares regression and quantile regression.
The combination of methods results in a modified loss function
where the sharp corner of the quantile-regression loss is rounded.
The main modification involves an asymmetric l2 adjustment of
the loss function around zero. We extend the idea of l2 adjusted
quantile regression to linear heterogeneous models. The l2 adjustment
is constructed to diminish as sample size grows. The modified
method is evaluated both empirically and theoretically.
Ingram Olkin (Stanford)
A Linear Algebra Biography
This talk is a review of my travels through linear algebra
-- how it started and how it continued from 1948 to the present.
Jianxin Pan (University of Manchester)
Covariance matrix modeling: recent advances
When analyzing longitudinal/clustered data, misspecification
of covariance matrix structures may lead to very inefficient
estimates of parameters in the mean. In some circumstances,
for example, when missing data are present, it may yield very
biased estimates of the mean parameters. Hence, correct modeling
of covariance matrix structures play a very important role in
statistical inferences. Like the mean, covariance matrix structures
can be modeled using linear or nonlinear regression models.
Various estimation methods were proposed recently to model the
mean and covariance structures, simultaneously. In this talk,
I will review these methods on joint modeling of the mean and
covariance structures for longitudinal or clustered data, including
linear, nonparametric regression models and semiparametric models.
Missing data and variable selection will be addressed too. Real
examples and simulation studies will be provided for illustration.
Serge B. Provost (Western University)
On Improving Density Estimates by Means of Polynomial Adjustments
A moment-based methodology is proposed for obtaining accurate
density estimates and approximants. This technique which involves
applying a polynomial adjustment to an initial functional representation
of a target density, relies on the inversion of a matrix that
is often ill-conditioned. This approach will be applied to certain
saddlepoint density approximations and extended to density estimates
by making use of empirical cumulant-generating functions. The
bivariate case which is tackled via a standardizing transformation,
relies on inverting a high-dimensional matrix. The resulting
representation of the joint density functions gives rise to
a very flexible copula family. Several illustrative examples
will be presented.
Shuangge Ma (Yale University)
Contrasted Penalized Integrative Analysis
Single-dataset analysis of high-throughput omics data often
leads to unsatisfactory results. The integrative analysis of
heterogeneous raw data from multiple independent studies provides
an effective way to increase sample size and improve marker
selection results. In integrative analysis, the regression coefficient
matrix has certain structures. In our study, we use group penalization
for one- or two-dimensional marker selection and introduce contrast
penalties to accommodate the subtle coefficient structures.
Simulations show that the proposed methods have significantly
improved marker selection properties. In the analysis of cancer
genomic data, important markers missed by the existing methods
are identified.
Fuzhen Zhang (Nova Southeastern University)
Integer Partition, Young Diagram, and Majorization
Coauthors: Geir Dahl (University of Oslo, Norway)
We relate the integer partitions to Young diagram and majorization,
that is, we describe integer partition problem via Young diagram
and in terms of integral vector majorization. In the setting
of majorization, we study the polytope of integral vectors.
We present several properties of the cardinality function of
the integral vector majorization polytopes. This is a joint
work with G. Dahl,(University of Oslo, Norway).
|
Invited
Special Sessions
Special Session to
celebrate Lynn Roy LaMotte's 70th Birthday
Speakers:
David A. Harville (IBM Thomas
J. Watson Research Center )
Prediction: a "Nondenominational" Model-Based
Approach
Prediction problems are ubiquitous. In a model-based
approach to predictive inference, the values of
random variables that are presently observable are
used to make inferences about the values of random
variables that will become observable in the future,
and the joint distribution of the random variables
or various of its characteristics are assumed to
be known up to the value of a vector of unknown
parameters. Such an approach has proved to be highly
effective in many important applications.
It is argued that the performance of a prediction
procedure in repeated application is important and
should play a significant role in its evaluation.
A ``nondenominational'' model-based approach to
predictive inference is described and discussed;
what in a Bayesian approach would be regarded as
a prior distribution is simply regarded as part
of a model that is hierarchical in nature. Some
specifics are given for mixed-effects linear models,
and an application to the prediction of the outcomes
of basketball or football games (and to the ranking
and rating of basketball or football teams) is included
for purposes of illustration.
Lynn Roy LaMotte, LSUHSC
School of Public Health
On Formulation of Models for Factor Effects
In linear models, effects of combinations of levels
of categorical factors are modeled in several ways:
dummy variables (also called GLM coding), reference-level
coding, effect coding, and others. The conventional
way to parse multi-factor effects is in terms of
sets of main-effect and interaction-effect contrasts
among the cell means, e.g., A effects, B effects,
AB interaction effects. Call these ANOVA effects.
Depending on the formulation, the full-model vs.
restricted-model computation of the numerator sum
of squares may not provide a test statistic that
tests the ANOVA effects in question. In some, the
hypothesis tested depends on the within-cell sample
sizes and empty cells. In others, it has been asserted
that the test statistic tests the target hypothesis
when all its degrees of freedom are estimable. There
does not seem to be a general description of the
relation between the test statistic and the target
hypothesis.
A framework is described in this paper by which
some order and clarification of such questions can
be had. Building on the cell-means approach to models
of effects, models are formulated in terms of sums
of orthogonal projection matrices for ANOVA effects.
Other formulations can be expressed equivalently
in these terms, providing a common bridge among
different model formulations.
Using this framework, several questions and widely-held
beliefs will be addressed.
J. N. K. Rao, Carleton University
Robust estimation of variance components in linear mixed
and semi-parametric mixed models
Maximum likelihood (ML) method is often used for
estimating variance components in a linear mixed
model. However, ML estimators are sensitive to outliers
in the random effects or the unit errors in the
model. We propoe a robust fixed point approach to
robust ML estimation and show that it overcomes
covnergence problems with the Newton-Raphson method.
We also consider semi-parametric mixed models using
penalized splines to approximate the mean function.
In this case, robust ML approach runs into problems
and we propose a method simialr to a method of Fellner
(1986) to estimate the random effects and model
parameters simultaneously.
Special Session
on Applied Probability
(Organized by Jeffrey Hunter)
Speakers:
Iddo Ben-Ari, University
of Connecticut
A Probabilistic Approach to Generalized Zeckendorf
Decompositions
Coauthors: Steven J. Miller
Zeckendorf's Theorem states that every positive
integer can be written uniquely as a sum of non-adjacent
Fibonacci numbers, if we start the sequence 1, 2,
3, 5,... This result has been generalized to decompositions
arising from other recurrence relations, and to
properties of the decompositions, most notabley,
Lekkerkerker's Theorem which gives the mean number
of summands. The theorem was originally proved using
continued fraction techniques, but recently a more
combinatorial approach has had great success in
attacking this and related problems, such as the
distribution between gaps of summands. We introduce
a unified probabilistic framework and show how this
machinery allows to reprove and generalize all existing
results and obtain new results. The main idea is
that the digits appearing in the decomposition are
obtained by a simple change of measure for some
Markov chain.
Minerva Catral, Xavier
University
The Kemeny constant for a Markov chain on a given
directed graph
Let T be the transition matrix of an n-state homogeneous
ergodic Markov chain. The Kemeny constant K(T) gives
a measure of the expected time to mixing or time
to stationarity of the chain, and has representations
in terms of the group generalized inverse of A=I-T
and the inverses of the (n-1) ×(n-1) principal
submatrices of A. We give an overview of these representations
and present several perturbation results. Finally,
we consider the effect of the directed graph structure
of T on the value of K(T).
Sophie Hautphenne,
University of Melbourne
An Expectation-Maximization algorithm for the model
fitting of Markovian binary trees
Coauthors: Mark Fackrell
In this paper, we consider the parameter estimation
of Markovian binary trees, a class of branching
processes where the lifetime and reproduction epochs
of individuals are controlled by an underlying Markov
process. We develop an Expectation-Maximization
(EM ) algorithm to estimate the parameters of the
Markov process from the continuous observation of
some populations, first with information about which
individuals reproduce or die, and second without
this information.
Jeffrey Hunter, Auckland
University of Technology
Generalized inverses of Markovian kernels in terms
of properties of the Markov chain
All one-condition generalized inverses of the
Markovian kernel I - P, where P is the transition
matrix of a finite irreducible Markov chain, can
be uniquely specified in terms of the stationary
probabilities and the mean first passage times of
the underlying Markov chain. Special sub-families
include the group inverse of I - P, Kemeny and Snells
fundamental matrix of the Markov chain and the Moore-
Penrose g-inverse. The elements of some sub-families
of the generalized inverses can also be re-expressed
involving the second moments of the recurrence time
variables. Some applications to Kemenys constant
and perturbations of Markov chains are also considered
*Jianhong Xu, Southern Illinois
University Carbondale
An Iterative Algorithm for Computing Mean First
Passage Matrices of Markov Chains
For an ergodic Markov chain, the mean first passage
matrix and the stationary distribution vector are
among its most important characteristics. There
are various iterative algorithms in the literature
for computing the stationary distribution vector
without resorting to any explicit matrix inversion
(in either the regular or the generalized form).
This, however, is not the case in general when it
comes to computing the mean first passage matrix.
In particular, various formulations of this matrix
involve one or multiple explicit matrix inversions
if implemented directly. In this talk, we present
an iterative algorithm that computes the mean first
passage matrix and, moreover, that can be readily
implemented free of explicit matrix inversions.
Special Session
of Statistical Inference on GLM
(Organized by Krishna Saha, Central CT State University)
Speakers:
Dianliang Deng, University
of Regina
Goodness of fit of product multinomial regression
models to sparse data
Tests of goodness of t of sparse multinomial models
with non-canonical links is proposed by using approximations
to the first three moments of the conditional distribution
of a modifed Pearson Chi-square statistic. The modifed
Pearson statistic is obtained using a supplementary
estimating equation approach. Approximations to
the first three conditional moments of the modifed
Pearson statistic are derived. A simulation study
is conducted to compare, in terms of empirical size
and power, the usual Pearson Chi-square statistic,
the standardized modifed Pearson Chi-square statistic
using the first two conditional moments, a method
using Edgeworth approximation of the p-values based
on the first three conditional moments and a score
test statistic. There does not seems to be any qualitative
difference in size of the four methods. However,
the standardized modifed Pearson Chi-square statistic
and the Edgeworth approximation method of obtaining
p-values using the first three conditional moments
show power advantages compared to the usual Pearson
Chi-square statistic, and the score test statistic.
In some situations, for example, for small nominal
level, the standardized modifed Pearson Chi-square
statistic shows some power advantage over the method
using Edgeworth approximation of the p-values using
the first three conditional moments. Also, the former
is easier to use and so is preferable. Two data
sets are analyzed and a discussion is given.
Severien Nkurunziza,
Windsor
Optimal
inference in linear model with multiple change-points
Krishna Saha, Central CT
State University
Inference concerning a common dispersion of several
treatment groups in the analysis of count response
data from clinical trials
Samiran Sinha, Texas A&M
University
Semiparametric analysis of linear transformation
model in the presence of errors-in-covariates
*Jihnhee Yu, University at
Buffalo
A maximum likelihood approach to analyzing incomplete
longitudinal data in mammary tumor development experiments
with mice
Longitudinal mammary tumor development studies
using mice as experimental units are affected by
i) missing data towards the end of the study by
natural death or euthanasia, and ii) the presence
of censored data caused by the detection limits
of instrumental sensitivity. To accommodate these
characteristics, we investigate a test to carry
out K-group comparisons based on maximum likelihood
methodology. We derive a relevant likelihood ratio
test based on general distributions, investigate
its properties of based on theoretical propositions,
and evaluate the performance of the test viaa simulation
study. We apply the results to data extracted from
a study designed to investigate the development
of breast cancer in mice.
Memorial Session
to honor Shayle R. Searle
(Organized by Jeffrey J. Hunter)
Speakers:
David A. Harville
Jeffrey J. Hunter
J. N. K. Rao
Robert Rodriguez
Susan Searle
Heather Selvaggio
Special
Session on Perspectives on High
Dimensional Data Analysis
(Organized by: Muni Srivastava)
Speakers:
Shota Katayama, Osaka University
Model
Selection in High-Dimensional Multivariate Linear
Regression Analysis with Sparse Inverse Covariance
Structure
Abbas Khalili, McGill University
Sparse
mixture of regression models in high dimensional spaces
*J. S. Marron, University of North Carolina
Object
Oriented Data Analysis: HDLSS Asymptotics
Takahiro Nishiyama, Senshu University
Multivariate
multiple comparison procedure among mean vectors in high-dimensional
settings
Martin Singull, Linköping University
Test
for the mean in a Growth Curve model in high dimension
Muni Srivastava, University of Toronto
Test
for Covariance Matrices in High Dimension with Less
Sample Size
Anand Vidyashankar, George Mason University
Inference
for high-dimensional data accounting for model selection variability
Takayuki Yamada, Nihon University
Test
for assessing multivariate normality available for
high-dimensional data
Hirokazu Yanagihara, Hiroshima University
Conditions
for Consistency of a Log-likelihood- Based Information
Criterion in High-Dimensional Multivariate Linear
Regression Models under the Violation of Normality
Assumption
Invited Special
Session on Open-Source Statistical Computing
(Organized by Reijo Sund)
Speakers:
Antti Liski, National
Institute for Health and Welfare, Finland
The effect of data constraints on the normalized
maximum likelihood criterion with numerical examples
using Survo R
The stochastic complexity for the data, relative
to a suggested model, serves as a criterion for
model selection. The normalized maximum likelihood
(NML) formulation of the stochastic complexity contains
two components: the maximized log likelihood and
a component that may be interpreted as the parametric
complexity of the model. In Gaussian linear regression
the use of the normalized maximum likelihood criterion
is problematic because the parametric complexity
is not finite. Rissanen has proposed an elegant
solution to constrain the data space. Liski and
Liski (2009) proposed alternative constraints for
the data space and Giurcaneanu, Razavi and Liski
(2011) investigated the use of these constraints
further. In this paper we study and illustrate the
effects of data constraints on the proposed model
selection criterion using Survo R software. We focus
especially on the case when there is multicollinearity
present in the data.
Reijo Sund, National Institute
for Health and Welfare (THL), Finland
Survo R for open-source statistical computing
For centuries, a core principle of scientific research
has been intersubjective verifiability. A structured
version of this principle demands that a thorough
description of methodology required for the replication
of findings is made publicly available. For a statistician
this means publication of data, theoretic-mathematical
justification of the statistical methods and code
required for the replication of the actual analyses.
Statistical software packages have revolutionalized
the use of statistical methods in empirical research:
nowadays extremely complicated methods can be easily
applied by any skilled researcher on the cost that
many important computational details may remain
hidden inside a black box of the potentially expensive
statistical software. To overcome this problem,
open-source statistical software is becoming a cornerstone
of scientific inference, and is an important element
of the modern scientific method. The open-source
software development process also makes the method
development global in the sense that software and
source codes are freely available to anyone and
the development is open to collaborative efforts
of scientists worldwide.
R is the most common open-source software environment
for statistical computing and graphics. It runs
on a wide variety of platforms and is highly extensible,
with thousands of user-contributed packages available.
One extensive package is Survo R. Survo is an integrated
environment for statistical computing and related
areas developed since the early 1960s by professor
Seppo Mustonen. First version was for the Elliott
803 and later generations include versions for Wang
2200, PC and Windows. Survo R is a sophisticated
mixture of Mustonens original C sources and
rewritten I/O functions that utilize R and Tcl/Tk
extensively to provide multiplatform support.
Features of Survo include file-based data operations,
flexible data preprocessing and manipulation tools,
various plotting and printing facilities, a teaching
friendly matrix interpreter, so-called editorial
arithmetics for instant calculations, a powerful
macro language, plenty of statistical modules and
an innovative text editor based GUI (originally
invented in 1979) that allows to freely mix commands,
data and natural text encouraging towards reproducible
research with ideas similar to literate programming.
In addition, several properties have been developed
to make the interplay with R from the GUI seamless.
By giving direct access to these additional useful
features of Survo for every R user, the options
available for data processing and analysis as well
as teaching within R are significantly expanded.
Text editor based user interface suits well for
interactive use and offers flexible tools to deal
with issues that may be challenging to approach
using standard R programming. This new version of
Survo also concretely shows the power and possibilities
of open-source software: the functionality of full-featured
statistical software Survo has been fully incorporated
into the open-source statistical software R.
Kimmo Vehkalahti,
University of Helsinki
Teaching matrix computations using SURVO R
Coauthors: Reijo Sund (National Institute for Health
and Welfare, Helsinki)
We demonstrate the possibilities of SURVO R in
teaching matrix computations. SURVO R (Sund et al
2012) is an open-source implementation representing
the newest generation of the Survo computing environment,
the lifework of prof. Seppo Mustonen since the early
1960s (Mustonen 1992).
Survo binds together a selection of useful tools
through its unique editorial approach, invented
by Mustonen in 1979 (Mustonen 1981). This approach,
inherited to all subsequent generations of Survo,
lets the user freely create mixtures of work schemes
and documentation (Mustonen 1992, Mustonen 1999,
Vehkalahti 2005). Our focus is on Survo puzzles
(Mustonen 2006) and an exciting method for solving
them with matrices, involving restricted integer
partitions and Khatri-Rao products, for example.
We employ the matrix interpreter and other tools
of SURVO R to demonstrate the power of the editorial
approach in teaching matrix computations.
References
Mustonen, S. (1981). On Interactive Statistical
Data Processing. Scandinavian Journal of Statistics,
8, 129-136.
Mustonen, S. (1992). Survo - An Integrated Environment
for Statistical Computing and Related Areas. Survo
Systems, Helsinki. http://www.survo.fi/books/1992/Survo_Book_1992_with_comments.pdf
Mustonen, S. (1999). Matrix computations in Survo.
Proceedings of the 8th IWMS, Department of Mathematical
Sciences, University of Tampere, Finland. http://www.helsinki.fi/survo/matrix99.html
Mustonen, S. (2006). On certain cross sum puzzles.
http://www.survo.fi/papers/puzzles.pdf
Sund, R., Vehkalahti, K. and Mustonen, S. (2012).
Muste - editorial computing environment within R.
Proceedings of COMPSTAT 2012, 20th International
Conference on Computational Statistics, 27-31 August
2012, Limassol, Cyprus. pp. 777-788. http://www.survo.fi/muste/publications/sundetal2012.pdf
Vehkalahti, K. (2005). Leaving useful traces when
working with matrices. Proceedings of the 14th IWMS,
ed. by Paul S. P. Cowpertwait, Massey University,
Auckland, New Zealand, pp. 143-154.
*awaiting confirmation
|
back to top
|