Research manifesto


Statistics Index


CERG Home Page  

CERG Resources

Bibliography


Chi square

TypeNon-parametric
Datacategorical
count

Chi-square has many tasks to do in statistics but its basic role is in cross-tabulation or contingency tables.

The following hypothetical contingency table supposedly shows the cross-tabulation of mode of study against end of subject performance. These are counts of the crosstabluations.

HDDCPNN
FT - Internal101518338
PT Internal3481510
External4312156

The educational question we have is "Is mode of study related to performance in the subject?"

The research hypothesis would be Mode of study is related to performance

The Null Hypothesis would then be The is no relationship between mode and performance

How does Chi-square allow us to test these hypotheses? The first thing you do when calculating chi-square is to create column and row margin totals:

HDDCPNN
FT - Internal10151833884
PT Internal348151040
External431215539
1722386323163

In the bottom right hand corner we have the Grand Total.

If we use the marginal totals, we can calculate theoretical frequencies for each cell. That is, the row and column totals can be used to calculate a set of frequencies you would expect to get given only these marginal totals - forgetting about the actual data used to produce them.

Bear with me.

We calculate these Expected Frequencies by :

    (column total/grand total)*row total

So the top left hand cell is (17/163)*84 = 8.8

The middle C x PT Internal cell is (38/163)*40 = 9.3

All these values can now be added to the table -

HDDCPNN
FT - Internal10
8.8
15
11.3
18
19.6
33
32.5
8
11.9
84
PT Internal3
4.2
4
5.4
8
9.3
15
15.5
10
5.6
40
External4
4.1
3
5.3
12
9.1
15
15.1
5
5.5
39
1722386323163

What we now have is both the Observed values(O) and the Expected values(E) where the expected values assume that there is no structure to the cells outside what would be expected from a systematic distribution based on the margin totals.

Chi-square asks the question Do the observed values deviate significantly from these expected values? We find this out be calculating the chi-square component for each cell -

    ((E-O)**2)/E

and then summing them all.

In this case chi-square = 9.26

The Degrees of Freedom (df) for Chi-square are based on -

    (No.Rows-1)*(No.columns-1)

in this case df = (3-1)*(5-1)=8

Now we have to look up the Chi-square table with 9.26 for 8 df at 0.05

The tabled value is 15.51.

As our calculated value of 9.26 is less than this value, we cannot reject the null hypothesis of no difference.

Think about it this way -

    Because we are looking at the difference between the pattern of expected values (representing an unstructured set of data) and the pattern in our observed values, the greater the difference between expected and observed, the more patterning there is in our observed data. If that patterning is large enough not to have been a result of chance factors, then our difference measure (chi-square) will be greater than the tabled value.

    The bigger the value of chi-square the more likely it is that you can reject your null hypothesis.

Special Case - Fitting to Expected Values

Sometimes you will have data which can be seen to reflect the normal pattern in an array of responses.

Following the above example, you may have data from a number of years of student performance. You could create a contingency table based on this.

This year you look at the student performance and it appears to be different from the trend from past years. You are asking yourself if this group does show a significantly different pattern. To test this question -

    you take the values from the tabling of past performance as your expected vales(E)

    you use the current year as the observed values(O)

Then you calculated ((E-O)**2)/E for each cell and add them up.

The result is the chi-square showing the fit between past and current performance.

Your research hypothesis is that "There is a difference between past and current year performance"

Your null hypothesis is that "There is no difference between past and current years"

As before, if your calculated chi-square is greater than or equal to the tabled chi-square value, then you can reject the null hypothesis that there is no difference. Otherwise, you have to accept your null hypothesis.