Chi-square Test for 2 Categories of Factor in Case-Control Status

1. Functionalities

  • To determine if th there is an association between the case-control status (rows) and factor categories (columns)
  • To determine if the proportions are the same in the 2 independent samples
  • To determine if the proportions are homogeneity
  • To get the percentage table and plot and the expected value of each cell

2. About your count data, 4-cell 2 by 2 contingency table

  • You have 2 categories for case-control status (shown as row names)
  • You have 2 categories for factor status (shown as column names)
  • Every cell is independent with moderately large counts

Case Example

Suppose we wanted to know the relation between OC user and MI. In one study, we investigated data of 5000 OC-users and 10000 non-OC-user, and categorized them into myocardial infarction (MI) and non-MI patients groups. Among 5000 OC-users, 13 developed MI; among 10000 non-OC-users, 7 developed MI. We wanted to determine if OC use was significantly associated with higher MI incidence.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

1. Give 2 names to each category of factor shown as column names

2. Give 2 names to case-control shown as row names


3. Input 4 values in row-order

Data points can be separated by , ; /Enter /Tab

Note: No Missing Value

The case-control was OC user and non-OC users. Factor categories were developed MI or not.

Among 5000 OC-users, 13 developed MI; among 10000 non-OC-users, 7 developed MI.


Hypothesis

Null hypothesis

Case-Control (Row) has no significantly associated with Grouped Factors (Column)

Alternative hypothesis

Case-Control (Row) is a significant association with Grouped Factors (Column)

In this example, we wanted to determine if OC use was significantly associated with higher MI incidence.


Step 2. Decide the P Value method

Output 1. Contingency Table



2 x 2 Contingency Table with Total Number

Expected Value


Cell/Total %

Cell/Row-Total %

Cell/Column-Total %


Percentages in the rows

Percentages in the columns


Output 2. Test Results


Explanations
  • P Value < 0.05, then Case-Control (Row) is significantly associated with Grouped Factors (Column) (Accept the alternative hypothesis)
  • P Value >= 0.05, then Case-Control (Row) is not associated with Grouped Factors (Column). (Accept the null hypothesis)

In this default setting, we concluded that using OC and MI development had a significant association. (P = 0.01) Because the minimum expected value was 6.67, Yates-correction on P Value was done.


Fisher Exact Test for 2 Categories of Factor with Small Expected Counts in Case-Control Status

1. Functionalities

  • To determine if th there is an association between the case-control status (rows) and factor status (columns)
  • To determine if the proportions are the same in the 2 dependent samples
  • To determine if the proportions are homogeneity
  • To get the percentage table and plot and the expected value of each cell

2. About your count data, 2 by 2 contingency table

  • You have 2 categories for case-control status (shown as row names)
  • You have 2 categories for factor status (shown as column names)
  • Every cell is independent
  • Expected value from your data is small

Case Example

Suppose we wanted to know the relation between CVD and a high salt diet. In one study, we investigated data of 35 CVD patients and 25 non-CVD patients and categorized them into high salt diet and low salt diet. Among 35 CVD patients, 5 had a high-salt diet; among 25 non-CVD patients, 2 had a high-salt diet. We wanted to determine if CVD was significantly associated with a high salt diet.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

1. Give 2 names to each category of factor shown as column names

2. Give 2 names to case-control shown as row names


3. Input 4 values in row-order

Data points can be separated by , ; /Enter /Tab

Note: No Missing Value

The case-control was CVD patients or not. Factor categories were a high salt diet or not.

Of 35 people who died from CVD, 5 were on a high-salt diet before they die; of 25 people who died from other causes, 2 were on a high-salt diet.


Step 2. Choose Hypothesis

Null hypothesis

Case-Control (Row) do not significantly associate with Grouped Factors (Column)

In this example, we wanted to determine if there was an association between the cause of death and a high-salt diet.

Output 1. Contingency Table



2 x 2 Contingency Table with Total Number

Expected Value


Cell/Total %

Cell/Row-Total %

Cell/Column-Total %


Percentages in the rows

Percentages in the columns


Output 2. Test Results


Explanations
  • P Value < 0.05, then Case-Control (Row) is significantly associated with Grouped Factors (Column) (Accept the alternative hypothesis)
  • P Value >= 0.05, then Case-Control (Row) is not associated with Grouped Factors (Column). (Accept the null hypothesis)

In this default setting, two expected values < 5, so we used the Fisher exact test. From the test result, we concluded that no significant association was found between the cause of death and high salt diet


McNemar Test for 2 Categories of of Factor with Matched Counts in Case-Control Status

1. Functionalities

  • To determine if the two factors on the matched samples were significantly different.
  • To get the percentage table and plot and the expected value of each cell
  • To get the percentage table and plot and the expected value of each cell

2. About your count data, 2 by 2 contingency table with paired counts

  • You have 2 categories for the case-control outcomes (shown in row and column names)
  • You have 2 categories for factor status (shown in row and column names)
  • Samples from your data are matched/paired data
  • You know the concordant pair, a matched pair in which the outcome is the same for each member of the pair
  • You know the dis-concordant pair, a matched pair in which the outcome differs for each member of the pair

3. Paired counts in 2 by 2 contingency table

  • Two pairs of patients were paired with similar age and clinical conditions. One group underwent treatment A and the other group underwent treatment B, and we recorded how many people became better and how many people became worse.
  • For concordant pair, a matched pair in which two members all became better or worse
  • For dis-concordant pair, a matched pair in which only one member became better or worse

Case Example

Suppose we wanted to compare the effects of two treatments. We investigated two groups of patients, one group accepted treatment A, and the other did treatment B. Tow groups of patients were made into pairs, and we made 621 pairs. In each pair, one was under treatment A, and the other was under treatment B. Among 621 pairs, 510 pairs were better in both treatment A and B; 90 pairs did not change either in treatment A or treatment B. In 16 pairs, only group after treatment A were better; in 5 pairs, only group after treatment B were better.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

1. Give 2 names to each shared categories of outcome shown in column name and row name

2. Give 2 factor/treatment names shown in row name and column name


3. Input 4 values in row-order

Data points can be separated by , ; /Enter /Tab

Note: No Missing Value

The example shown here was 621 pairs of patients, one group underwent treatment A, and the other underwent treatment B. Patients were paired with similar age and clinical conditions.

Among 621 patients, 510 pairs were better in both treatment A and B; 90 pairs did not change either in treatment A or treatment B. (Concordant Pair)

In 16 pairs, only group after treatment A was better; in 5 pairs, only group after treatment B was better. (Dis-concordant Pair)


Hypothesis

Null hypothesis

The factors have no significant differences

Alternative hypothesis

The factors have significant differences in the paired samples

In this example, we wanted to determine if whether the treatments had significant differences for the matched pair.


Step 2. Decide the P Value method

Output 1. Contingency Table



2 x 2 Contingency Table with Total Number

Expected Value


Cell/Total %

Cell/Row-Total %

Cell/Column-Total %



Output 2. Test Results


Explanations
  • P Value < 0.05, then the factors have significant differences in the paired samples. (Accept the alternative hypothesis)
  • P Value >= 0.05, then the factors have no significant differences. (Accept the null hypothesis)

In this default setting, we concluded that two treatments had a significantly different effect on paired patients. (P = 0.03)


Chi-square Test for >2 Categories of Factor in Case-Control Status

1. Functionalities

  • To determine if th there is an association between the case-control status (rows) and factor status (columns)
  • To determine if the population rate/proportion behind your multiple Groups data are significantly different
  • To get the percentage table and plot and the expected value of each cell

2. About your count data, 2 by C contingency table

  • You have 2 categories for the case-control outcomes (shown in row and column names)
  • You have >2 categories for factor status (shown in row and column names)
  • Your group data come from binomial distribution (the proportion of success)
  • You know the whole sample and the number of specified events (the proportion of sub-group) from each group
  • The multiple groups are independent observations

Case Example

Suppose we wanted to study the relationship between age at first birth and the development of breast cancer. Thus, we investigated 3220 breast cancer cases and 10254 no breast cancer cases. Then, we categorize women into different age groups. We wanted to know if the probability of having cancer was different among different age groups; or if their ages related to breast cancer.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

2. Give names to each category of factor shown as column names

2. Give 2 names to case-control shown as row names


3. How many Cases in every Group

Data points can be separated by , ; /Enter /Tab

4. How many Controls in every Group

Data point can be separated by , ; /Enter /Tab

Note: No Missing Value

In this example, we had 5 age groups of people, as shown in different ages, and we record the number of people who had cancer and who did not have cancer.


Hypothesis

Null hypothesis

Case-Control (Row) do not significantly associate with Grouped Factors (Column)

Alternative hypothesis

Case-Control (Row) has a significant association with Grouped Factors (Column)

In this setting, we wanted to know if there was any relation between cancer and ages.

Output 1. Contingency Table



2 x C Contingency Table with Total Number

Expected Value


Cell/Total %

Cell/Row-Total %

Cell/Column-Total %



Output 2. Test Results


Explanations
  • P Value < 0.05, then Case-Control (Row) is significantly associated with Grouped Factors (Column) (Accept the alternative hypothesis)
  • P Value >= 0.05, then Case-Control (Row) is not associated with Grouped Factors (Column). (Accept the null hypothesis)

In this default setting, we conclude that there was a significant relation between cancer and ages. (P < 0.001)


Chi-square Test for >2 Factor Categories of Factor in >2 Status

1. Functionalities

  • To determine if th there is an association between the case-control status (rows) and factor status (columns)
  • To determine if the population rate/proportion behind your multiple Groups data are significantly different
  • To get the percentage table and plot and the expected value of each cell

2. About your count data, R by C contingency table

  • Your group data come from binomial distribution (the proportion of success)
  • You know the whole sample and the number of specified events (the proportion of sub-group) from each group
  • The multiple groups are independent observations

Case Example

Suppose we wanted to know the relation of 3 types of treatments (penicillin, Spectinomycin-low, and Spectinomycin-high) and patients' response. In one study, we enrolled 400 patients, 200 used Penicillin, 100 used Spectinomycin in a low dose, and 100 patients used Spectinomycin in a high dose. Among 200 Penicillin users, 40 got Smear+, 30 got Smear-Culture+ and 130 were Smear-Culture-. Among 100 Spectinomycin-low users, 10 got Smear+, 20 got Smear-Culture+ and 70 were Smear-Culture-. Among 100 Spectinomycin-high users, 15 got Smear+, 40 got Smear-Culture+ and 45 were Smear-Culture-. We wanted to know if the treatments had a significant association with the response.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

1. Give names to each category of factor1 shown as column names

2. Give names to each category of factor2 shown as row names


3. Input R*C values in row-order

Data point can be separated by , ; /Enter /Tab

Note: No Missing Value

Row were different drug treatments, and columns were different responses

Among 200 Penicillin users, 40 got Smear+, 30 got Smear-Culture+, and others were Smear-Culture-.

Among 100 Spectinomycin-low users, 10 got Smear+, 20 got Smear-Culture+ and others were Smear-Culture-.

Among 100 Spectinomycin-high users, 15 got Smear+, 40 got Smear-Culture+ and others were Smear-Culture-.


Hypothesis

Null hypothesis

Case-Control (Row) is significantly associated with Grouped Factors (Column)

Alternative hypothesis

Case-Control (Row) has no significant association with Grouped Factors (Column)

In this setting, we wanted to know if there was a relationship between drug treatment and response.

Output 1. Contingency Table



R x C Contingency Table with Total Number

Expected Value


Cell/Total %

Cell/Row-Total %

Cell/Column-Total %



Output 2. Test Results


Explanations
  • P Value < 0.05, then Case-Control (Row) is significantly associated with Grouped Factors (Column) (Accept the alternative hypothesis)
  • P Value >= 0.05, then Case-Control (Row) is not associated with Grouped Factors (Column). (Accept the null hypothesis)

In this default setting, we conclude that there was a significant relationship between drug treatment and response. (P < 0.001)


Kappa Statistic for Reproducibility/Agreement of Two Raters

1. Functionalities

  • To quantify the agreement from two raters or two rankings
  • To get the percentage table and the expected value of each cell

2. About your count data, 2 by K contingency table

  • the outcomes (e.g., Y/N answers, rankings, categories) from two raters or two measurements

Case Example

Suppose we wanted to check the agreement of answers from two surveys. In one survey, the ranking scores were given from 1 to 9, while in the other, the ranking scores were not. We wanted to check if the two answers were reproducible or whether the two surveys had agreements.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

1. Give 2 related raters/ranking names shown in the column names


2. Input K values in the 1st rater

Data point can be separated by , ; /Enter /Tab


3. Input K values in the 2nd rater

Data point can be separated by , ; /Enter /Tab

Note: No Missing Value, two groups have equal length

Example here showed the Survey1 and Survey2. In this setting, we wanted to know the agreement in two rankings.

Output 1. Contingency Table



2 x K Contingency Table with Total Number




Output 2. Test Results


Explanations and Guidelines for Evaluating Kappa
  • Cohen's Kappa Statistic > 0.75: excellent reproducibility
  • 0.4 <= Cohen's Kappa Statistic <= 0.75: good reproducibility
  • 0 <= Cohen's Kappa Statistic < 0.4: marginal reproducibility
  • Cohen’s kappa takes into account disagreement between the two raters, but not the degree of disagreement.
  • The weighted kappa is calculated using a predefined table of weights that measure the degree of disagreement between the two raters. The higher the disagreement, the higher the weight.

In this default setting, we concluded that the response from Survey1 and Survey2 did not have such good reproducibility


Kappa Statistic for Reproducibility of Repeated/Related Measurements

This method uses a different type of data. It uses counts of concordant and dis-concordant shown in a K by K table.

1. Functionalities

  • To quantify the reproducibility of the same variables measured more than once
  • To quantify the association between 2 measurements with the same outcomes
  • To get the percentage table and the expected value of each cell

2. About your count data, K by K contingency table

  • You know the concordant response, repeated-measured responses in which the outcome are the same for every measurement
  • You know the dis-concordant response, repeated-measured responses in which the outcome differ for every measurement

Case Example

Suppose in one study, we did two surveys reflecting the same problems for a group of patients. We wanted to know the percentage of concordant responses in two surveys. We knew that the final results were 136 replied YES to both surveys, and 240 patients replied NO in both surveys. 69 people replied NO in survey1 and YES in survey2, and 92 people replied YES in survey1 and NO in survey2. We wanted to know whether the surveys were good in concordant response.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

1. Give K rater/measurement names shown in the column names and row names

2. Give 2 related experiment/repeated measurement names shown in the column names and row names


3. Input K*K values in row-order

Data point can be separated by , ; /Enter /Tab

Note: No Missing Value

The example shown here was the response from Survey 1 and Survey 2.


Hypothesis

Null hypothesis

Case-Control (Row) do not significantly associate with Grouped Factors (Column)

Alternative hypothesis

Case-Control (Row) has a significant association with Grouped Factors (Column)

In this setting, we wanted to know the reproducibility of the surveys.

Output 1. Contingency Table



K x K Contingency Table with Total Number




Output 2. Test Results


Explanations and Guidelines for Evaluating Kappa
  • Cohen's Kappa Statistic > 0.75: excellent reproducibility
  • 0.4 <= Cohen's Kappa Statistic <= 0.75: good reproducibility
  • 0 <= Cohen's Kappa Statistic < 0.4: marginal reproducibility
  • Cohen’s kappa takes into account disagreement between the two raters, but not the degree of disagreement.
  • The weighted kappa is calculated using a predefined table of weights that measure the degree of disagreement between the two raters. The higher the disagreement, the higher the weight.

In this default setting, we concluded that the response from Survey1 and Survey2 did not have good reproducibility, just marginally reproducible.


Mantel-Haenszel Test for 2 Categories of Factor in Case-Control Status under K Confounding Strata

1. Functionalities

  • To determine by controlling the stratum/confounding if there is an association between the case-control status (rows) and factor status (columns)
  • Two nominal variables are conditionally independent in K strata
  • To get the percentage table and plot and the expected value of each cell

2. About your count data, 2 x 2 contingency table under K strata

  • You have counts for several 2 x 2 contingency table
  • Each 2 x 2 contingency table was under one-factor stratum

Case Example

Suppose we wanted to see the effect of passive smoking on cancer risk. One potential confounding was smoking by the participants themselves. Because personal smoking is also related to both cancer risk and spouse smoking. Thus, we controlled for personal active smoking before looking at the relationship between passive smoking and cancer risk. We got two 2 x 2 tables, one was from the active smoking group, including 466 people, and the other was from a non-active smoking group with 532 people. As shown in the input data. We wanted to know if passive smoking significantly related to cancer risk after controlling for active smoking; or, whether the odds ratios were significantly different.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

1. Give 2 names to each category of factor shown as column names

2. Give 2 names to case-control shown as row names

3. Give names to each category confounding shown as row names


3. Input 2*2*K values in row-order

Data point can be separated by , ; /Enter /Tab

Note: No Missing Value

Example here was 2 sets of 2 by 2 table. One is the case-control table for active smokers; the other is the case-control table for non-active smokers.


Step 2. Choose Hypothesis

Null hypothesis

Case-Control (Row) has no significant association with Grouped Factors (Column) in each stratum / confounding group


Step 3. Decide the P Value method

In this setting, we wanted to know if the odds ratio for lung cancer (case) in passive smoker are different from non-passive-smoker, controlling for personal active smoking.

Output 1. Contingency Table


K layers 2 x 2 Contingency Table

The first 2 rows indicated 2 x 2 contingency table in the first stratum and followed by a 2 x 2 table from the second stratum.


Output 2. Test Results


Explanations
  • P Value < 0.05, to control for personal smoking, passive smoking and cancer risk has a significant relation, the odds ratios are significantly different. (Accept alternative hypothesis)
  • P Value >= 0.05, to control for personal smoking, passive smoking and cancer risk has no significant relation. (Accept null hypothesis)

In this default setting, we conclude that there was significant relationship between cancer risk and passive smoking, by controlling the personal actively smoking. (P < 0.001)


Cochran-Mantel-Haenszel for >2 Categories of Factor in >2 Status under K Strata

1. Functionalities

  • To determine by controlling the stratum/confounding if there is an association between the case-control status (rows) and factor status (columns)
  • Two nominal variables are conditionally independent in K strata
  • To get the percentage table and plot and the expected value of each cell

2. About your count data, R x C contingency table under K strata

  • You have counts for several R by C table
  • Each R x C contingency table was under one-factor stratum

Case Example

Suppose we wanted to know the relation between snoring and ages. A survey was done on 3513 individuals 30-60 years old, with 1843 women and 1670 men. Considering gender might be the confounding variable in this study, we created a 3 x 2 table in women strata and men strata. We wanted to know if ages significantly related to snoring after controlling gender.

Please follow the Steps, and Outputs will give real-time analytical results.


Step 1. Data Preparation

1. Give 2 names to each category of factor shown as column names

2. Give 2 names to case-control shown as row names

3. Give names to each category of factor shown as row names


3. Input R*C*K values in row order

Data point can be separated by , ; /Enter /Tab

Note: No Missing Value

The example shown here was the prevalence of habitual snoring by age and sex group.


Hypothesis

Null hypothesis

Case-Control (Row) has no significant association with Grouped Factors (Column) in each stratum/confounding group

Alternative hypothesis

Case-Control (Row) has a significant association with Grouped Factors (Column); the odds ratio is significantly different in each stratum

In this setting, we wanted to know if the prevalence of habitual snoring has a relation with age, controlling for gender.

Output 1. Contingency Table


K layers R x C Contingency Table

The first R rows indicated an R x C contingency table in the first stratum and followed by an R x C table from the second stratum.


Output 2. Test Results


Explanations
  • P Value < 0.05, by controlling the gender, the prevalence of habitual snoring and ages have a significant relation, the odds ratios are significantly different. (Accept the alternative hypothesis)
  • P Value >= 0.05, by controlling the gender, the prevalence of habitual snoring and ages have no significant relation. (Accept the null hypothesis)

In this default setting, we conclude that there was a significant relationship between the prevalence of habitual snoring and ages, by controlling the gender. (P < 0.001)