Chi square

Research manifesto

Statistics Index

CERG Home Page
CERG Resources
Bibliography

Chi square

Type Non-parametric

Data categorical
count

Chi-square has many tasks to do in statistics but its basic role is in cross-tabulation or contingency tables.
The following hypothetical contingency table supposedly shows the cross-tabulation of mode of study against end of subject performance. These are counts of the crosstabluations.

HD D C P NN

FT - Internal 10 15 18 33 8

PT Internal 3 4 8 15 10

External 4 3 12 15 6

The educational question we have is "Is mode of study related to performance in the subject?"
The research hypothesis would be Mode of study is related to performance
The Null Hypothesis would then be The is no relationship between mode and performance
How does Chi-square allow us to test these hypotheses? The first thing you do when calculating chi-square is to create column and row margin totals:

HD D C P NN

FT - Internal 10 15 18 33 8 84

PT Internal 3 4 8 15 10 40

External 4 3 12 15 5 39

17 22 38 63 23 163

In the bottom right hand corner we have the Grand Total.
If we use the marginal totals, we can calculate theoretical frequencies for each cell. That is, the row and column totals can be used to calculate a set of frequencies you would expect to get given only these marginal totals - forgetting about the actual data used to produce them.
Bear with me.
We calculate these Expected Frequencies by :

(column total/grand total)*row total

So the top left hand cell is (17/163)*84 = 8.8
The middle C x PT Internal cell is (38/163)*40 = 9.3
All these values can now be added to the table -

HD D C P NN

FT - Internal 10
8.8 15
11.3 18
19.6 33
32.5 8
11.9 84

PT Internal 3
4.2 4
5.4 8
9.3 15
15.5 10
5.6 40

External 4
4.1 3
5.3 12
9.1 15
15.1 5
5.5 39

17 22 38 63 23 163

What we now have is both the Observed values(O) and the Expected values(E) where the expected values assume that there is no structure to the cells outside what would be expected from a systematic distribution based on the margin totals.
Chi-square asks the question Do the observed values deviate significantly from these expected values? We find this out be calculating the chi-square component for each cell -

((E-O)**2)/E
and then summing them all.
In this case chi-square = 9.26
The Degrees of Freedom (df) for Chi-square are based on -

(No.Rows-1)*(No.columns-1)
in this case df = (3-1)*(5-1)=8
Now we have to look up the Chi-square table with 9.26 for 8 df at 0.05
The tabled value is 15.51.
As our calculated value of 9.26 is less than this value, we cannot reject the null hypothesis of no difference.
Think about it this way -

Because we are looking at the difference between the pattern of expected values (representing an unstructured set of data) and the pattern in our observed values, the greater the difference between expected and observed, the more patterning there is in our observed data. If that patterning is large enough not to have been a result of chance factors, then our difference measure (chi-square) will be greater than the tabled value.
The bigger the value of chi-square the more likely it is that you can reject your null hypothesis.

Special Case - Fitting to Expected Values
Sometimes you will have data which can be seen to reflect the normal pattern in an array of responses.
Following the above example, you may have data from a number of years of student performance. You could create a contingency table based on this.
This year you look at the student performance and it appears to be different from the trend from past years. You are asking yourself if this group does show a significantly different pattern. To test this question -

you take the values from the tabling of past performance as your expected vales(E)
you use the current year as the observed values(O)

Then you calculated ((E-O)**2)/E for each cell and add them up.
The result is the chi-square showing the fit between past and current performance.
Your research hypothesis is that "There is a difference between past and current year performance"
Your null hypothesis is that "There is no difference between past and current years"
As before, if your calculated chi-square is greater than or equal to the tabled chi-square value, then you can reject the null hypothesis that there is no difference. Otherwise, you have to accept your null hypothesis.