### Plenary abstracts

## Thomas Lumley

University of Auckland

#### Two million t-tests: issues in genome-wide association

Genome-wide association studies measure hundreds of thousands of genetic markers and use them to find small regions of the genome where genetic variation is associated with disease or with other interesting biological variables. The typical analysis uses statistical methods from Stage 1 and Stage 2 introductory stats courses, but still provides interesting statistical challenges in asymptotics, model choice, sample spaces, and other issues.

## Roger Payne

VSN International, UK

#### Hierarchical generalized linear models - theory and practice

Hierarchical generalized linear models (HGLMs) extend the familiar generalized linear models (GLMs) by allowing you to include additional random terms in the linear predictor. However, they do not constrain these terms to follow a Normal distribution nor to have an identity link, as e.g. in generalized linear mixed models. So they provide a richer of class of models that may be more intuitively appealing. The methodology provides improved estimation methods that reduce bias, by the use of the exact likelihood or extended Laplace approximations. In particular, the Laplace approximations seem to avoid the biases that are often found when binary data are analysed by generalized linear mixed models.

The algorithm involves fitting two (or more) interlinked GLMs, firstly to estimate the fixed and random effects in the model that describes the mean, and secondly to model the dispersion of the random terms. So all the familiar model checking techniques are available. We can also exploit other GLM extensions such as prediction and the inclusion of nonlinear parameters in the linear predictor.

The theory will be explained, with examples using GenStat to illustrate its usefulness in practical data analysis.

## Alastair Scott

University of Auckland

#### Fitting models with response-dependent samples

We are interested in fitting regression models to data from samples when we do not have complete information on all members of the sample. In particular, we look at situations where the probability of missing data for a unit depends, at least in part, on the value of the response of that unit. Case-control studies, where the selection probabilities depend directly on the outcome, are simple examples. We look at examples of related studies, as well as studies where the dependence on the response is more subtle.

When the chance of missing data depends on the response, the likelihood involves the distribution of the explanatory variables as well as the regression parameters. We certainly do not want to have to model this covariate distribution in general, so we look for semi-parametric methods that avoid the need for such modelling. We develop fully efficient semi-parametric methods for some situations and good, practical procedures for situations where full efficiency is not feasible.

## Patty Solomon

University of Adelaide

#### Statistical analysis of hospital performance: understanding the uncertainty

Critical care is expensive with costs increasing inexorably as our populations age. Understandably, governments at all levels want accurate measures of hospital performance to provide a basis for planning, for accountability and to inform public debate. However, provider comparisons via league tables have proven to be methodologically challenging as well as politically controversial, as witness the recent inquiry into Australia's Bundaberg Base Hospital. Furthermore, analyses purporting to measure hospital performance often suffer from a number of serious deficiencies, ranging from inadequate adjustment for patient case-mix to no allowance for multiple comparisons. In this talk, I will discuss these issues and present our recent work on comparing hospital performance using the Australian and New Zealand Intensive Care Adult Patient Database, one of the largest databases of its kind in the world.