Run by New Zealand Statistical Association
NZSA 2012 Conference
29 – 30 November 2012         Dunedin, New Zealand
Hosted by:

Thomas Lumley

University of Auckland

Two million t-tests: issues in genome-wide association

Genome-wide association studies measure hundreds of thousands of genetic markers and use them to find small regions of the genome where genetic variation is associated with disease or with other interesting biological variables. The typical analysis uses statistical methods from Stage 1 and Stage 2 introductory stats courses, but still provides interesting statistical challenges in asymptotics, model choice, sample spaces, and other issues.

Roger Payne

VSN International, UK

Hierarchical generalized linear models - theory and practice

Hierarchical generalized linear models (HGLMs) extend the familiar generalized linear models (GLMs) by allowing you to include additional random terms in the linear predictor. However, they do not constrain these terms to follow a Normal distribution nor to have an identity link, as e.g. in generalized linear mixed models. So they provide a richer of class of models that may be more intuitively appealing. The methodology provides improved estimation methods that reduce bias, by the use of the exact likelihood or extended Laplace approximations. In particular, the Laplace approximations seem to avoid the biases that are often found when binary data are analysed by generalized linear mixed models.

The algorithm involves fitting two (or more) interlinked GLMs, firstly to estimate the fixed and random effects in the model that describes the mean, and secondly to model the dispersion of the random terms. So all the familiar model checking techniques are available. We can also exploit other GLM extensions such as prediction and the inclusion of nonlinear parameters in the linear predictor.

The theory will be explained, with examples using GenStat to illustrate its usefulness in practical data analysis.

Alastair Scott

University of Auckland

Fitting models with response-dependent samples

We are interested in fitting regression models to data from samples when we do not have complete information on all members of the sample. In particular, we look at situations where the probability of missing data for a unit depends, at least in part, on the value of the response of that unit. Case-control studies, where the selection probabilities depend directly on the outcome, are simple examples. We look at examples of related studies, as well as studies where the dependence on the response is more subtle.

When the chance of missing data depends on the response, the likelihood involves the distribution of the explanatory variables as well as the regression parameters. We certainly do not want to have to model this covariate distribution in general, so we look for semi-parametric methods that avoid the need for such modelling. We develop fully efficient semi-parametric methods for some situations and good, practical procedures for situations where full efficiency is not feasible.