ABSTRACTS
Vincent J. Carey, Harvard Medical School
Genomic EDA and Modeling with R/Bioconductor
Bioconductor (www.bioconductor.org)
pursues the creation of flexible and portable tools for statistical
analysis of genomic data. I will describe Bioconductor facilities for
exploratory data analysis and flexible statistical inference with microarray
data. Particular examples will include classification of gene expression
densities, visualization and inference on genomic network structures,
and flexible methods for testing hypotheses about the roles of pathways
and pathway components in gene expression studies.
Christopher Field, Dalhousie University
Robustness Issues in Phylogeny
To estimate the tree structure for a set of taxa, we typically use a
statistical model for evolution and compute the maximum likelihood estimate.
Molecular Biologists recognize that the model is a rough approximation
to reality and there is considerable literature on the effects of model
deviations. In this talk, I will examine some of these deviations paying
particular attention to the type of robustness methodology needed to
successfully estimate the tree and make reliable inference.
Greg Gloor, University of Western Ontario
Co-evolution and mutual information of amino acid positions in protein
families
Proteins are extremely complicated molecular machines that have evolved
to perform a particular cellular function. While knowing the structure
of a given protein often gives valuable insights into its function,
there are also many unanswered questions. This is because each structure
is a snapshot of one particular conformation of a protein isolated from
one individual species. In many instances functionally important amino
acid positions are conserved, but mutation
studies show that many non-conserved positions equally important. We
are using mutual information to find these important, yet variable,
amino acid positions in protein families. I will describe our progress
on this project, and present some strengths and limitations of the current
generation of tools used to show the correspondence between structure
and sequence.
David Sankoff, University of Ottawa
Far-reaching effects of missing map data and local shuffling on the
inference of genome rearrangement history
Joint work with Phil Trinh. Until recently algorithms for studying
the evolution of gene order could only be applied to small genomes (mitochondria,
chloroplasts, prokaryotes), the difficulty with mammalian and other
larger eukaryotic nuclear genomes lying not so much in their much greater
length but rather in the absence of comprehensive lists of genes and
their orthologs. Pavel Pevzner and Glen Tesler (PNAS 2003) have suggested
a way to bypass gene finding and ortholog identification by using the
order of syntenic blocks constructed solely from sequence data as input
to a genome rearrangement algorithm. The method focuses on major evolutionary
events by glossing over small block-internal rearrangements, and neglecting
intervening blocks smaller than a threshold length. This use of large
"sanitized" blocks, and the neglect of short blocks may, however,
blur important parts of the historical derivation of the genomes. We
model the effects of eliminating and amalgamating short blocks, concentrating
on the summary statistic of`"breakpoint re-use" introduced
by Pevzner and Tesler. They did not conceive of this as an evolutionary
distance, but in the context of their protocol it effectively measures
to what extent genomes have diverged in becoming random permutations
of blocks with respect to each other. We use analytic and simulation
methods to investigate breakpoint re-use as a function of threshold
size and of rearrangement parameters. We discuss the implication of
our findings for the comparison of mammalian genomes and suggest a number
of mathematical, algorithmic and statistical lines for further developing
the Pevzner-Tesler approach.
David Tritchler, University of Toronto
A Spectral Clustering Method for Microarray Data
Joint work with Shafagh Fallah and Joseph Beyene. Cluster analysis is
a commonly used dimension reduction technique. This talk introduces
a clustering method motivated by a multivariate analysis of variance
model and computationally based on eigenanalysis (thus the term ``spectral"
in the title). Our focus is on large problems, and we present the method
in the context of clustering genes and arrays using microarray expression
data. The computational algorithm for the method has complexity linear
in the number of genes.
Of the numerous methods for constructing clusters
from microarray data, many require that the number of clusters believed
present in the data be specified a priori, and in general judgements
about the appropriate number of clusters is problematic. We also introduce
a method for assessing the number of clusters exhibited in microarray
data based on the eigenvalues of a particular matrix.
Jean Yee Hwa Yang, University of California,
San Francisco
Statistical Issues in the Design of Microarray Experiments
Microarray experiments performed in many areas of biological sciences
generate large and complex multivariate datasets. This talk addresses
statistical design and analysis issues, which are essential to improve
the efficiency and reliability of cDNA microarray experiments. We discuss
various considerations unique to the design of cDNA microarrays, and
examine how different types of replication affect design decisions.
We calculate variances of two classes of estimates of differential gene
expression based on log ratios of fluorescence intensities from cDNA
microarray experiments: direct estimates, using measurements from the
same slide, and indirect estimates, using measurements from different
slides. These variances are compared and numerical estimates are obtained
from a small experiment. Some qualitative and quantitative conclusions
are drawn which have potential relevance to the design of cDNA microarray
experiments.
Kenny Q Ye and Anil Dhundale, SUNY at
Stony Brook
Pooling or not pooling in microarray experiments - an experimental
design point of view
Microarray experiments are often used to detect differences in gene
expression between two populations of cells; a test population versus
a control population. However in many cases, such as individuals in
a population, the biological variability can present changes that are
irrelevant to the question of interest and it then becomes important
to assay many individual samples to collect statistically meaningfully
results. Unfortunately the cost of performing some types of microarray
experiments can be prohibitive. A potentially effective but not well
publicized alternative is to pool individual RNA samples together for
hybridization on a single microarray. This method can dramatically reduce
the experimental costs while maintaining high power in detecting the
changes in expression levels that relate to the specific question of
interest. In this talk, we will discuss why this technique works and
the optimal design strategy for pooling. This idea will also be illustrated
by a synthetic experiment and a real experiment that studies Afib (cardiac
atrial fibrillation), a condition that is a serious health condition
that affects a large percent of the population but mechanistically remains
not well understood.
Back to Top
Back to Workshop Home Page