h Searchforsinglesmeetsingles searchu Szh n Forum i Forum ysearch
is a measure of the degree of deviation between the Observed and Expected frequencies. If there is no relationship between the row variable and the column variable this measure will be very close to zero. Under the hypothesis that there is a relationship between the rows and the columns, this quantity has a Chi-square distribution with parameter equal to number of rows minus 1, multiplied by number of columns minus 1.
For this numerical example we have:
with d.f. = (2-1)(3-1) = 2, that has the p-value of 0.14, suggesting little or no real evidences against the null hypothesis.
The main question is how large is this measure. The maximum value of this measure is:
where A is the number of rows or columns, whichever is smaller. For our numerical example it is, 40(2-1) = 40.
The coefficient of determination which has a range of [0, 1], provides relative strength of relationship, computed as
Therefore we conclude that the degree of association is only 11% which is fairly weak.
Alternatively, you could also look at the contingency coefficient f statistic, which is:
This statistic ranges between 0 and 1 and can be interpreted like the correlation coefficient. This measure also indicates that the curriculum chosen by students is related to the occupation of their parents.
You might like to use Chi-square Test for Crosstable Relationship in performing this test, and he P-values for the Popular Distributions JavaScript to findout the p-values of Chi-square statistic.
Further Readings:
Agresti A., Categorical Data Analysis,
Wiley, 2002.
Fleiss J., Statistical Methods for Rates and Proportions, Wiley, 1981.
Using Chi-square in a 2x2 table requires the Yates's correction. One first subtracts 0.5 from the absolute differences between observed and expected frequencies for each of the three genotypes before squaring, dividing by the expected frequency, and summing. The formula for the Chi-square value in a 2x2 table can be derived from the Normal Theory comparison of the two proportions in the table using the total incidence to produce the standard errors. The rationale of the correction is a better equivalence of the area under the normal curve and the probabilities obtained from the discrete frequencies. In other words, the simplest correction is to move the cut-off point for the continuous distribution from the observed value of the discrete distribution to midway between that and the next value in the direction of the null hypothesis expectation. Therefore, the correction essentially only applied to one d.f. tests where the"square root" of the Chi-square looks like a"normal/t-test" and where a direction can be attached to the 0.5 addition.
Chi-square distribution is used as an approximation of the binomial distribution. By applying a continuity correction, we get a better approximation of the binomial distribution for the purposes of calculating tail probabilities.
Given the following 2x2 table, one may compute some relative risk measures:
|
a
|
b
|
|
c
|
d
|
The most usual measures are:
Rate-difference: a/(a+c) - b/(b+d)
Rate-ratio: (a/(a+c))/(b/(b+d))
Odds-ratio: ad/bc
The rate difference and rate ratio are appropriate when you are contrasting two groups whose sizes (a+c and b+d) are given. The odds ratio is for when the issue is association rather than difference.
The risk-ratio (RR) is the ratio of the proportion (a/(a+b)) to the proportion (c/(c+d)):
RR is thus a measure of how much larger the proportion in the first row is compared to the second. RR value of < 1.00 indicating a 'negative' association [a/(a+b) < c/(c+d)], 1.00 indicating no association [a/(a+b) = c/(c+d)], and >1.00 indicating a 'positive' association [a/(a+b) > c/(c+d)]. The further from 1.00 the RR is, the stronger the association.
Notice that the odds ratio (OR) is equal to the simple crossproduct ratio of a 2×2 table.
The OR can be written as: (a/b)/(c/d) which is the ratio of these two odds -- hence its name, the odds ratio. Both the numerator and denominator are odds. For example, the numerator, a/b, gives the odds of a positive versus negative rating by Rater 2 given that Rater 1's rating is positive. The denominator c/d gives the odds of a positive versus negative rating by Rater 2 given that Rater 1's rating is negative.
Since the odds ratio is skewed, so we cannot easily compute a standard error for the odds ratio itself. We can, however, find a standard error for the natural logarithm of the odds ratio. It is simply:
Notice that, you need to compute the confidence interval on the log scale and then transform the results back to the original scale of measurement.
We see that as any or all of the counts in the two by two table increase, the confidence interval for the log odds ratio shrinks. Also, it turns out that the smallest count in the 2 by 2 table plays the largest role in determining the size of the standard error.
The two tests differ, however, in the following respect. The Test for Crosstable Relationship is made on data drawn from a single population (with fixed total) where one is concerned with whether one set of attributes is independent of another set. The test for homogeneity, on the other hand, is designed to test the null hypothesis that two or more random samples are drawn from the same population or from different populations, according to some criterion of classification applied to the samples.
The homogeneity test is concerned with the question: Are the samples drawn form populations that are homogeneous (i.e., the same) with respect to some criterion of classification?
In the crosstable for this test, either the row or the column categories may represent the populations from which the samples are drawn.
An Application: Suppose a board of directors of a labor union wishes to survey the opinion of its members regarding a change in its constitution. The following table shows the result of the survey sent to three union locals:
| Reactions of A Sample of Three Locals Group Members | ||||||||||||||||||
| Union Local | ||||||||||||||||||
| ||||||||||||||||||
The problem is not to determine whether or not the union members are in favor of the change. The question is to test if there is a significant difference in the proportions of opinion of the three populations' members concerning the proposed change.
The Chi-square statistic is 9.58 with d.f. = (3-1)(3-1) = 4. The p-value is equal to 0.048, indicating that there is moderate evidence against the null hypothesis that the three union locals are the same.
You might like to use Populations Homogeneity Test