The Yule’s Q contingency coefficient (Yule (1900)), is a measure of correlation,
which can be calculated for 2 × 2 contingency tables.
where O11, O12, O21, O22 - observed frequencies in a contingency table.
The Q coefficient value is included in a range of <-1, 1>. The closer to 0 the value of the Q is, the weaker dependence joins the analysed features, and the closer to –1 or +1, the stronger dependence joins the analysed features. There is one disadvantage of this coefficient. It is not much resistant to small observed frequencies (if one of them is 0, the coefficient might wrongly indicate the total dependence of features).
The statistic significance of the Yule’s Q coefficient is defined by the Z test.
Hypotheses:
:
Q = 0,
:
Q ≠ 0.
The φ contingency coefficient
The φ contingency coefficient is a measure of correlation, which can be calculated for 2 × 2 contingency tables.
The coefficient value is included in a range of < 0; 1 >. The closer to 0 the value of φ is, the weaker dependence joins the analysed features, and the closer to 1, the stronger dependence joins the analysed features.
The φ contingency coefficient is considered as statistically significant, if the p value calculated on the basis of the χ2 test (designated for this table) is equal to or less than the significance level α.
The Cramer’s V contingency coefficient
The Cramer’s V contingency coefficient (Cramer (1946)), is an extension of the φ coefficient on r × c contingency tables.
where χ2 - value of the χ2 test statistic, n - total frequency in a contingency table, w – jthe smaller the value out of r and c.
The V coefficient value is included in a range of < 0; 1 >.The closer to 0 the value of V is, the weaker dependence joins the analysed features, and the closer to 1, the stronger dependence joins the analysed features. The V coefficient value depends also on the table size, so you should not use this coefficient to compare different sizes of contingency tables.
The V contingency coefficient is considered as statistically significant, if the p value contingency coefficient is considered as statistically significant, if the χ2 test (designated for this table) is equal to or less than the significance leveli α.
The Pearson’s C contingency coefficient
The Pearson’s C contingency coefficient is a measure of correlation, which can be calculated for r × c contingency tables
gdzie χ2 - value of the χ2 test statistic, n - total frequency in a contingency table.
The C coefficient value is included in a range of < 0; 1). The closer to 0 the value of C is, the weaker dependence joins the analysed features, and the farther from 0, the stronger dependence joins the analysed features. The C coefficient value depends also on the table size (the bigger table, the closer to 1 C value can be), that is why it should be calculated the top limit, which the C coefficient may gain for the particular table size.
The C contingency coefficient is considered as statistically significant, if the p value calculated on the basis of the χ2 test (designated for this table) is equal to or less than significance level α.
There is a sample of 170 persons (n = 170), who have 2 features analysed (X=sex, Y =passing the exam). Each of these features occurs in 2 categories (X1=f, X2=m, Y1=yes, Y2=no). Basing on the sample, we would like to get to know, if there is any dependence between sex and passing the exam in an analysed population. The data distribution is presented in a contingency table:
The chi-square test statistic value is 16.33 and the p value calculated for it: p = 0.00005. The result indicates that there is a statistically significant dependence between sex and passing the exam in the analysed population. Coefficient values, which are based on the chi-square test, so the strength of the correlation between analysed features are:
Cadj-Pearson = 0.42,
V-Cramer = φ = 0.31,
Q-Yule = 0.58, and the p value of the Z test (similarly to the chi-square test) indicates the statistically significant dependence between the analysed features.