# Chi-square Test for 2 Categories of Factor in Case-Control Status

#### 1. Functionalities

• To determine if th there is an association between the case-control status (rows) and factor categories (columns)
• To determine if the proportions are the same in the 2 independent samples
• To determine if the proportions are homogeneity
• To get the percentage table and plot and the expected value of each cell

#### 2. About your count data, 4-cell 2 by 2 contingency table

• You have 2 categories for case-control status (shown as row names)
• You have 2 categories for factor status (shown as column names)
• Every cell is independent with moderately large counts

#### Case Example

Suppose we wanted to know the relation between OC user and MI. In one study, we investigated data of 5000 OC-users and 10000 non-OC-user, and categorized them into myocardial infarction (MI) and non-MI patients groups. Among 5000 OC-users, 13 developed MI; among 10000 non-OC-users, 7 developed MI. We wanted to determine if OC use was significantly associated with higher MI incidence.

#### Output 1. Contingency Table

2 x 2 Contingency Table with Total Number

Expected Value

Cell/Total %

Cell/Row-Total %

Cell/Column-Total %

Percentages in the rows

Percentages in the columns

#### Output 2. Test Results

Explanations
• P Value < 0.05, then Case-Control (Row) is significantly associated with Grouped Factors (Column) (Accept the alternative hypothesis)
• P Value >= 0.05, then Case-Control (Row) is not associated with Grouped Factors (Column). (Accept the null hypothesis)

In this default setting, we concluded that using OC and MI development had a significant association. (P = 0.01) Because the minimum expected value was 6.67, Yates-correction on P Value was done.

# Fisher Exact Test for 2 Categories of Factor with Small Expected Counts in Case-Control Status

#### 1. Functionalities

• To determine if th there is an association between the case-control status (rows) and factor status (columns)
• To determine if the proportions are the same in the 2 dependent samples
• To determine if the proportions are homogeneity
• To get the percentage table and plot and the expected value of each cell

• You have 2 categories for case-control status (shown as row names)
• You have 2 categories for factor status (shown as column names)
• Every cell is independent
• Expected value from your data is small

#### Case Example

Suppose we wanted to know the relation between CVD and a high salt diet. In one study, we investigated data of 35 CVD patients and 25 non-CVD patients and categorized them into high salt diet and low salt diet. Among 35 CVD patients, 5 had a high-salt diet; among 25 non-CVD patients, 2 had a high-salt diet. We wanted to determine if CVD was significantly associated with a high salt diet.

#### Output 1. Contingency Table

2 x 2 Contingency Table with Total Number

Expected Value

Cell/Total %

Cell/Row-Total %

Cell/Column-Total %

Percentages in the rows

Percentages in the columns

#### Output 2. Test Results

Explanations
• P Value < 0.05, then Case-Control (Row) is significantly associated with Grouped Factors (Column) (Accept the alternative hypothesis)
• P Value >= 0.05, then Case-Control (Row) is not associated with Grouped Factors (Column). (Accept the null hypothesis)

In this default setting, two expected values < 5, so we used the Fisher exact test. From the test result, we concluded that no significant association was found between the cause of death and high salt diet

# McNemar Test for 2 Categories of of Factor with Matched Counts in Case-Control Status

#### 1. Functionalities

• To determine if the two factors on the matched samples were significantly different.
• To get the percentage table and plot and the expected value of each cell
• To get the percentage table and plot and the expected value of each cell

#### 2. About your count data, 2 by 2 contingency table with paired counts

• You have 2 categories for the case-control outcomes (shown in row and column names)
• You have 2 categories for factor status (shown in row and column names)
• Samples from your data are matched/paired data
• You know the concordant pair, a matched pair in which the outcome is the same for each member of the pair
• You know the dis-concordant pair, a matched pair in which the outcome differs for each member of the pair

#### 3. Paired counts in 2 by 2 contingency table

• Two pairs of patients were paired with similar age and clinical conditions. One group underwent treatment A and the other group underwent treatment B, and we recorded how many people became better and how many people became worse.
• For concordant pair, a matched pair in which two members all became better or worse
• For dis-concordant pair, a matched pair in which only one member became better or worse

#### Case Example

Suppose we wanted to compare the effects of two treatments. We investigated two groups of patients, one group accepted treatment A, and the other did treatment B. Tow groups of patients were made into pairs, and we made 621 pairs. In each pair, one was under treatment A, and the other was under treatment B. Among 621 pairs, 510 pairs were better in both treatment A and B; 90 pairs did not change either in treatment A or treatment B. In 16 pairs, only group after treatment A were better; in 5 pairs, only group after treatment B were better.

#### Output 1. Contingency Table

2 x 2 Contingency Table with Total Number

Expected Value

Cell/Total %

Cell/Row-Total %

Cell/Column-Total %

#### Output 2. Test Results

Explanations
• P Value < 0.05, then the factors have significant differences in the paired samples. (Accept the alternative hypothesis)
• P Value >= 0.05, then the factors have no significant differences. (Accept the null hypothesis)

In this default setting, we concluded that two treatments had a significantly different effect on paired patients. (P = 0.03)

# Chi-square Test for >2 Categories of Factor in Case-Control Status

#### 1. Functionalities

• To determine if th there is an association between the case-control status (rows) and factor status (columns)
• To determine if the population rate/proportion behind your multiple Groups data are significantly different
• To get the percentage table and plot and the expected value of each cell

• You have 2 categories for the case-control outcomes (shown in row and column names)
• You have >2 categories for factor status (shown in row and column names)
• Your group data come from binomial distribution (the proportion of success)
• You know the whole sample and the number of specified events (the proportion of sub-group) from each group
• The multiple groups are independent observations

#### Case Example

Suppose we wanted to study the relationship between age at first birth and the development of breast cancer. Thus, we investigated 3220 breast cancer cases and 10254 no breast cancer cases. Then, we categorize women into different age groups. We wanted to know if the probability of having cancer was different among different age groups; or if their ages related to breast cancer.

#### Output 1. Contingency Table

2 x C Contingency Table with Total Number

Expected Value

Cell/Total %

Cell/Row-Total %

Cell/Column-Total %

#### Output 2. Test Results

Explanations
• P Value < 0.05, then Case-Control (Row) is significantly associated with Grouped Factors (Column) (Accept the alternative hypothesis)
• P Value >= 0.05, then Case-Control (Row) is not associated with Grouped Factors (Column). (Accept the null hypothesis)

In this default setting, we conclude that there was a significant relation between cancer and ages. (P < 0.001)

# Chi-square Test for >2 Factor Categories of Factor in >2 Status

#### 1. Functionalities

• To determine if th there is an association between the case-control status (rows) and factor status (columns)
• To determine if the population rate/proportion behind your multiple Groups data are significantly different
• To get the percentage table and plot and the expected value of each cell

• Your group data come from binomial distribution (the proportion of success)
• You know the whole sample and the number of specified events (the proportion of sub-group) from each group
• The multiple groups are independent observations

#### Case Example

Suppose we wanted to know the relation of 3 types of treatments (penicillin, Spectinomycin-low, and Spectinomycin-high) and patients' response. In one study, we enrolled 400 patients, 200 used Penicillin, 100 used Spectinomycin in a low dose, and 100 patients used Spectinomycin in a high dose. Among 200 Penicillin users, 40 got Smear+, 30 got Smear-Culture+ and 130 were Smear-Culture-. Among 100 Spectinomycin-low users, 10 got Smear+, 20 got Smear-Culture+ and 70 were Smear-Culture-. Among 100 Spectinomycin-high users, 15 got Smear+, 40 got Smear-Culture+ and 45 were Smear-Culture-. We wanted to know if the treatments had a significant association with the response.

#### Output 1. Contingency Table

R x C Contingency Table with Total Number

Expected Value

Cell/Total %

Cell/Row-Total %

Cell/Column-Total %

#### Output 2. Test Results

Explanations
• P Value < 0.05, then Case-Control (Row) is significantly associated with Grouped Factors (Column) (Accept the alternative hypothesis)
• P Value >= 0.05, then Case-Control (Row) is not associated with Grouped Factors (Column). (Accept the null hypothesis)

In this default setting, we conclude that there was a significant relationship between drug treatment and response. (P < 0.001)

# Kappa Statistic for Reproducibility/Agreement of Two Raters

#### 1. Functionalities

• To quantify the agreement from two raters or two rankings
• To get the percentage table and the expected value of each cell

• the outcomes (e.g., Y/N answers, rankings, categories) from two raters or two measurements

#### Case Example

Suppose we wanted to check the agreement of answers from two surveys. In one survey, the ranking scores were given from 1 to 9, while in the other, the ranking scores were not. We wanted to check if the two answers were reproducible or whether the two surveys had agreements.

#### Output 1. Contingency Table

2 x K Contingency Table with Total Number

#### Output 2. Test Results

Explanations and Guidelines for Evaluating Kappa
• Cohen's Kappa Statistic > 0.75: excellent reproducibility
• 0.4 <= Cohen's Kappa Statistic <= 0.75: good reproducibility
• 0 <= Cohen's Kappa Statistic < 0.4: marginal reproducibility
• Cohen’s kappa takes into account disagreement between the two raters, but not the degree of disagreement.
• The weighted kappa is calculated using a predefined table of weights that measure the degree of disagreement between the two raters. The higher the disagreement, the higher the weight.

In this default setting, we concluded that the response from Survey1 and Survey2 did not have such good reproducibility

# Kappa Statistic for Reproducibility of Repeated/Related Measurements

This method uses a different type of data. It uses counts of concordant and dis-concordant shown in a K by K table.

#### 1. Functionalities

• To quantify the reproducibility of the same variables measured more than once
• To quantify the association between 2 measurements with the same outcomes
• To get the percentage table and the expected value of each cell

• You know the concordant response, repeated-measured responses in which the outcome are the same for every measurement
• You know the dis-concordant response, repeated-measured responses in which the outcome differ for every measurement

#### Case Example

Suppose in one study, we did two surveys reflecting the same problems for a group of patients. We wanted to know the percentage of concordant responses in two surveys. We knew that the final results were 136 replied YES to both surveys, and 240 patients replied NO in both surveys. 69 people replied NO in survey1 and YES in survey2, and 92 people replied YES in survey1 and NO in survey2. We wanted to know whether the surveys were good in concordant response.

#### Output 1. Contingency Table

K x K Contingency Table with Total Number

#### Output 2. Test Results

Explanations and Guidelines for Evaluating Kappa
• Cohen's Kappa Statistic > 0.75: excellent reproducibility
• 0.4 <= Cohen's Kappa Statistic <= 0.75: good reproducibility
• 0 <= Cohen's Kappa Statistic < 0.4: marginal reproducibility
• Cohen’s kappa takes into account disagreement between the two raters, but not the degree of disagreement.
• The weighted kappa is calculated using a predefined table of weights that measure the degree of disagreement between the two raters. The higher the disagreement, the higher the weight.

In this default setting, we concluded that the response from Survey1 and Survey2 did not have good reproducibility, just marginally reproducible.

# Mantel-Haenszel Test for 2 Categories of Factor in Case-Control Status under K Confounding Strata

#### 1. Functionalities

• To determine by controlling the stratum/confounding if there is an association between the case-control status (rows) and factor status (columns)
• Two nominal variables are conditionally independent in K strata
• To get the percentage table and plot and the expected value of each cell

#### 2. About your count data, 2 x 2 contingency table under K strata

• You have counts for several 2 x 2 contingency table
• Each 2 x 2 contingency table was under one-factor stratum

#### Case Example

Suppose we wanted to see the effect of passive smoking on cancer risk. One potential confounding was smoking by the participants themselves. Because personal smoking is also related to both cancer risk and spouse smoking. Thus, we controlled for personal active smoking before looking at the relationship between passive smoking and cancer risk. We got two 2 x 2 tables, one was from the active smoking group, including 466 people, and the other was from a non-active smoking group with 532 people. As shown in the input data. We wanted to know if passive smoking significantly related to cancer risk after controlling for active smoking; or, whether the odds ratios were significantly different.

#### Output 1. Contingency Table

K layers 2 x 2 Contingency Table

The first 2 rows indicated 2 x 2 contingency table in the first stratum and followed by a 2 x 2 table from the second stratum.

#### Output 2. Test Results

Explanations
• P Value < 0.05, to control for personal smoking, passive smoking and cancer risk has a significant relation, the odds ratios are significantly different. (Accept alternative hypothesis)
• P Value >= 0.05, to control for personal smoking, passive smoking and cancer risk has no significant relation. (Accept null hypothesis)

In this default setting, we conclude that there was significant relationship between cancer risk and passive smoking, by controlling the personal actively smoking. (P < 0.001)

# Cochran-Mantel-Haenszel for >2 Categories of Factor in >2 Status under K Strata

#### 1. Functionalities

• To determine by controlling the stratum/confounding if there is an association between the case-control status (rows) and factor status (columns)
• Two nominal variables are conditionally independent in K strata
• To get the percentage table and plot and the expected value of each cell

#### 2. About your count data, R x C contingency table under K strata

• You have counts for several R by C table
• Each R x C contingency table was under one-factor stratum

#### Case Example

Suppose we wanted to know the relation between snoring and ages. A survey was done on 3513 individuals 30-60 years old, with 1843 women and 1670 men. Considering gender might be the confounding variable in this study, we created a 3 x 2 table in women strata and men strata. We wanted to know if ages significantly related to snoring after controlling gender.

#### Output 1. Contingency Table

K layers R x C Contingency Table

The first R rows indicated an R x C contingency table in the first stratum and followed by an R x C table from the second stratum.

#### Output 2. Test Results

Explanations
• P Value < 0.05, by controlling the gender, the prevalence of habitual snoring and ages have a significant relation, the odds ratios are significantly different. (Accept the alternative hypothesis)
• P Value >= 0.05, by controlling the gender, the prevalence of habitual snoring and ages have no significant relation. (Accept the null hypothesis)

In this default setting, we conclude that there was a significant relationship between the prevalence of habitual snoring and ages, by controlling the gender. (P < 0.001)