From the Executive Editor

Statistical tests for non-normally distributed data

In the last issue, I discussed the analysis of normally distributed data and control for clustering. In this editorial, we will look at outcome or dependent variables that have a dichotomous outcome – “yes or no” variables.

Statistical tests are applied to data to determine whether the differences we see between two groups are more different than we would expect by chance alone. The P-value tells us the likelihood that we would see that difference or a difference bigger than that by chance. A P-value of .03 indicates that there is only a 3% chance that we would see a difference as big as we measured by chance alone. Traditionally, a P-value of < .05 enables us to conclude that the difference is statistically significant. If the P-value is > .05, we cannot conclude that the two groups are the same. Statistical tests do not test for equality. They test only for differences. Although it may seem logical that if the test does not find a difference, it has found similarity, that conclusion is incorrect.

One game that I have played to understand this assumption of expected distributions involves two people, each flipping a coin at the same time. As you flip the coin, you complete a two-by-two table (Figure 1). If both people get heads, you put a mark in the “A” cell. If Person One gets heads and Person Two gets tails, you put a mark in the “C” cell. Together, flip the coins 20 times and see what the cell values become (Figure 2). I think you will agree that the flip of a coin is random and that the outcome of the coin flipped by Person One will not be related to (associated with) the outcome of the coin flipped by Person Two. However, you will not end up with a “5” in each box. The statistical question is this: How different do the numbers in the boxes have to be to say that the differences observed would only be seen by chance alone 5% of the time? That is what the P-value is telling us. In this example, Fisher’s exact test gives a P-value of .37. There is a 37% chance that this distribution occurred by chance alone. We use a Fisher’s exact test because at least one of the cells has an expected value of < 5.

Figure 1: Two-by-two table constructed to determine the distribution of results of two people flipping coins.

Person One
Heads Tails
Person Two Heads A B
Tails C D

Figure 2: Example of a two-by-two table used to describe the distribution of results of two people flipping coins 20 times.

Person One
Heads Tails
Person Two Heads 4 6
Tails 7 3

What we have described above is a dichotomous outcome. Mortality, morbidity, and farrowing are all examples of this type of data from the pig barn. Did the pig live or die? Did the sow farrow or not farrow from that first service? We can set up a two-by-two table to describe these data (Figure 3). Please note, the table includes fictitious numbers.

Figure 3: Example of a two-by-two table used to describe the data for number of females that farrowed from the first service and the numbers of these that were sows and gilts.

Female farrowed to first service
Yes No
Female is a sow Yes 80 20
No (gilt) 65 35

The statistical test selected for a given analysis depends on the distribution of the dependent or outcome variable. In this case, the outcome is dichotomous (whether or not the female farrowed from the first service). A chi-square test can be applied to these data to determine how likely it is that the numbers observed in each cell would have occurred by chance alone. Using these numbers, the Pearson’s chi-square is 5.64 and the P-value is .017. Hence, this distribution, or one more extreme than this, would occur by chance alone 1.7% of the time. We can conclude that there is a significant association between farrowing from the first service and sows versus gilts. As above, if any of the expected cell values were < 5, then a Fisher’s exact test would have been the more appropriate test to use.

We know that production parameters are influenced by a large number of independent variables. It is unlikely that you would be satisfied to determine this simple association between first-parity litter and farrowing success. Logistic regression is the statistical modeling analysis that is used to model a dichotomous outcome with multiple independent variables. Let us assume that you are working with a herd where the farrowing rate is low. Perhaps you are interested in determining the association between farrowing rate and artificial insemination using semen collected on the first day of mating versus semen collected 1 to 3 days before the first day of mating. Perhaps you also wish to control for parity (measured as gilt versus sow), the number of days that the female was bred during that service, and the wean-to-breed interval. Because the purpose of your analysis is to determine the association between farrowing rate and the age of the semen (0 versus > 0 days old), we refer to the age of the semen as the independent variable of interest. The other variables (parity, days bred, and wean-to-breed interval) are often called covariates. We apply a Wald’s test to determine whether or not there is a significant association between age of semen and farrowing rate, after controlling for the covariates in the model.

The strength of the association is provided through the odds ratio. The odds ratio may be used only if the association is significant at P < .05. If there is a significant association, the odds ratio provides the likelihood of the outcome given that the factor of interest is present. In this example, what are the odds of the sow farrowing from the first service, if the semen used on the first day of mating was collected on that day, compared to the odds when the semen was collected before the day of mating, after controlling for the covariates in the model?

Like all statistical tests, logistic regression has assumptions that must be tested before the researcher can be convinced that it was the correct analysis to conduct. Logistic regression has the assumption of independence of observations. If the data come from multiple farms, then farm must be included in the model as a random variable. Similarly, sows will cluster by the week they are bred, the person doing the breeding, and the boar. These can all be controlled for by adding random variables to the model.

Goodness-of-fit tests are used to determine how closely the predicted or fitted responses fit the observed data. For example, given the age of the semen, parity, wean-to-breed interval, and number of days the sow was bred, did the model predict that a given sow would farrow or not farrow? For each sow, the predicted outcome is compared to the actual outcome. A Pearson’s chi-square test is used to compare the predicted and the observed values. If the chi-square is a large number (significant), we conclude that the model does not fit the data well.

The second goodness-of-fit test is a likelihood ratio criterion calculated as a deviance. In this test, the model of interest is compared to the full or saturated model. The differences in the deviance are used to evaluate the model with additional variables.1

If you have been following this series, you will have the information to answer this question: assume that you have a group of pigs that are randomly assigned to receive either a vaccine or a placebo. The two outcome variables of interest are average daily gain (ADG) and the presence or absence of pneumonia lesions at slaughter. How do you anticipate that these variables will be distributed? Given the anticipated distributions, what statistical test will you use to determine the association between these outcome variables and the vaccination status of the pigs? I assume that ADG is normally distributed and therefore I would use a Student’s t-test to determine the association between ADG and vaccination status. I also assume that the presence or absence of pneumonia lesions at slaughter is a binary variable (either there are lesions or there are not) which follows a binomial distribution. Therefore, the appropriate test to determine the association between vaccination status and lesions is a chi-square test, or a Fisher’s exact test if the sample size is very small. However, you will likely want to control for the weight of each pig at the start of the study. What model will you select to allow you to control for initial pig weight in the analysis? For ADG, you will select a linear regression model, and for the presence of lesions, you will use a logistic regression model. Now there is a concern about independence. In this study, we have both vaccinated and nonvaccinated pigs in the same pens and we included more than one barn. Pigs will be clustered in pens within barn. How will you control for the non-independence? Both the barn and the pen can be considered random variables. Pens are nested within barn. This means that Pen 1 in Barn One is not the same as Pen 1 in Barn Two, and each barn has a group of pens within it. This makes logical sense, but from a statistical analysis point of view, this is called nesting. Another term used for this phenomenon is hierarchical data.

If the vaccine is randomly assigned to individual pigs, and pigs are randomly assigned to pen so that there are vaccinated and nonvaccinated pigs in each pen, there are actually three levels of data. The first level is the pig, the second level is the pen, and the third level is the barn. This is what is meant by the term “hierarchy of data.” The models will include starting weight of the pig as a fixed effect, and pen nested within barn as random effects. The term used for such a model is a multilevel model because of the different hierarchies or levels of data. It is also a mixed model because it has both fixed and random effects or variables. Therefore, the model for ADG is described as a multilevel linear mixed model.2 The coefficient for the linear regression model will describe the association between ADG and vaccination after controlling for the starting weight of the pig and the pen and barn effects.

The odds ratio of the logistic regression model will describe the association between the presence of lesions and vaccination status after controlling for the starting weight of the pig and the pen and barn effects. The P-value of the Wald’s test is used to determine whether or not there is a “significant” association between the lesions and vaccination status after controlling for starting weight and the random effects of the clustering of pigs within pen and pen within barn. The odds ratio provides a measure of the strength of the association. How would you describe the model for the presence or absence of lesions due to pneumonia at slaughter? It is a multilevel logistic mixed model.

References

1. Shoukri MM, Pause CM. Logistic Regression In Statistical Methods for Health Sciences. 2nd ed. Washington, DC: CRC Press; 1999:141–200.

2. Jirawattanapong P, Stockhofe-Zurwieden N, van Leengoed L, Binnendijk G, Wisselink H, Raymakers R, Cruijsen T, van der Peet-Schwering C, van Nes A, Nielen M. Efficacy of a subunit vaccine against Actinobacillus pleuropneumoniae in an endemically infected swine herd. J Swine Health Prod. 2008;16:193–199.

-- Cate Dewey, DVM, MSc, PhD