Copy link
Chi-Square Test
Last updated: 05/07/2025
Key Points
- Chi-square tests are used to determine if there is a significant difference between categorical variables.
- The choice of the chi-square test depends on the categorical nature of the data and the underlying goals of a particular research project.
- Assumptions must be met to ensure the validity of chi-square results, including expected frequency counts.
Introduction to Chi-Square Test
- The chi-square test is used for analyzing categorical data and has important applications in clinical research. Unlike parametric tests, which compare group means (for example, Student’s t test), the chi-square test evaluates the statistical significance of observed differences in frequency or proportions across categorical variables such as males and females or geriatric patients and young patients. The test is designed to determine whether there is an association between the categorical variables or whether the observed distribution of a dataset matches an expected distribution.
- Before applying a chi-square test to a dataset, three assumptions must first be satisfied:
1. The variables being analyzed must be categorical (nominal or ordinal).1-3
2. The expected frequency in the contingency table used in the final calculation must be at least 5 for a valid approximation of the distribution.1-3
3. The data points must be independent of one another (no patient included in multiple categories).1-3 - If the above assumptions are met or controlled for, then the chi-square test can be used to draw conclusions about the relationships between multiple categorical variables in a dataset. It is also important to note that the chi-square test cannot determine the magnitude of the association or the cause of an association between categorical variables. Thus, further confirmatory tests are required after a significant chi-square result, especially if a study lacks robust control measures.1-4
- The chi-square test calculates a chi-square statistic (χ²) by adding the square differences between observed and expected frequencies, divided by the expected frequencies. A larger χ² value suggests a greater likelihood of rejecting the null hypothesis based on the study results and corresponds to a smaller p-value. If the resultant p-value lies below a predetermined threshold of significance (commonly set at p ≤ 0.05), the null hypothesis is rejected, and a statistically significant relationship between the analyzed categorical variables is present.1-3
- There are two main types of chi-square tests, each used for a different form of categorical dataset analysis. These two tests are the chi-square goodness of fit test and the chi-square test of independence, and they are described in detail in the following sections.
Chi-Square Goodness of Fit Test
- The purpose of a goodness of fit test is to determine whether the observed frequency distribution of a single categorical variable differs significantly from an expected or previously calculated distribution. This test compares observed frequency values across two or more categories of a single variable with theoretical frequency values derived from a known or hypothesized distribution (for example, frequency distributions that are assumed to be equal prior to the study or previously calculated population distributions). The null hypothesis of this test states that the observed frequencies equal the expected frequency distribution. A statistically significant result indicates that the observed frequency distribution deviates from the hypothesized or previously calculated distribution.1,3
Chi-Square Test of Independence
- The purpose of a test of independence is to evaluate whether there is a statistically significant association between two categorical variables in a dataset. This test is commonly applied to data organized in a contingency table (most frequently a 2×2 but can be larger), where each cell in the table represents the frequency of co-occurrence of two categorical variables. The null hypothesis of this test states that the variables are independent, and therefore the distribution of one variable does not affect the distribution of the other. A statistically significant result indicates that an occurrence or frequency relationship exists between the two categorical variables being analyzed. This test is commonly used in clinical trials and healthcare research.2,3
Interpretation of Chi-Square Test
The following section summarizes the general interpretation of the results of two popular types of chi-square tests:
Chi-Square Goodness of Fit Test
- Determines if the distribution of observed frequency counts across independent categories for a single variable aligns with or deviates from an expected distribution.
This test is useful when verifying population distributions, validating trial models, or examining genetic inheritance patterns. A statistically significant χ² value suggests the measured categorical distribution does not align with the expected or predicted distribution, thus rejecting the null hypothesis.1,3
Chi-Square Test of Independence
- Determines if two categorical variables in a dataset possess a statistically significant relationship or correlation that reciprocally influences each variable’s respective distribution frequency. A statistically significant χ² value suggests that the categorical variables are not independent and influence each other in some manner. This test evaluates correlations between disease risk factors and treatment outcomes, behavioral tendencies across demographics, or observed treatment group differences in clinical trials.2,3
Application of Chi-Square Test
The following section illustrates example study designs that could benefit from the use of chi-square tests.
Study 1
- A research team surveys 300 patients to determine the group’s preferred type of pain management following minor surgery. The patients are presented with three options, which include acetaminophen, ibuprofen, and oxycodone. The research team posits that pain medication preferences will be equally distributed among the three survey options. After collecting the survey responses, they observe that patient preferences after minor surgery are skewed towards acetaminophen and ibuprofen more than oxycodone.
Which chi-square test is most appropriate?
- In study 1, a goodness of fit test would be most appropriate because the study analyzes a single categorical variable (preferred pain medication) across three categories. This test will determine if the observed distribution of preferences significantly deviates from the hypothesized equal distribution.1,3
Study 2
- A clinical trial is investigating whether the incidence of adverse effects differs between male and female patients who are receiving a new pain medication. All patients in the trial received the new pain medication, and data reported to the trial by patient survey. The data is collected and arranged to display the frequency distribution of male patients who experienced side effects and male patients who did not experience side effects versus female patients who experienced side effects and female patients who did not experience side effects.
Which chi-square test is most appropriate?
- In study 2, a test of independence would be most appropriate because the researchers want to determine if there is an association between gender and side effect incidence rates with the new medication. This test will determine if the incidence of adverse events has a statistically significant relationship with patient gender in the trial population, or if these two categorical variables are independent, and thus have no significant effect on one another.2,3
References
- Egbuchulem KI. The Karl Pearson’s Chi-square: A medical research libero, and a versatile test statistic: an editorial. Ann Ib Postgrad Med. 2024 Aug 30;22(2):5-8. Link
- McHugh ML. The chi-square test of independence. Biochem Med (Zagreb). 2013;23(2):143-9. Link
- Kim HY. Statistical notes for clinical researchers: Chi-squared test and Fisher’s exact test. Restor Dent Endod. 2017;42(2):152–155. Link
- Schober P, Vetter TR. Chi-square tests in medical research. Anesthesia & Analgesia. 2019; 129(5):1193. Link
Copyright Information
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.