# Chi-square Test and Exact Binomial Method for One Proportion

#### 1. Functionalities

• To determine if the population rate/proportion behind your data is significantly different from the specified rate/proportion
• To determine how compatible the sample rate/proportion with a population rate/proportion
• To determine the probability of success in a Bernoulli experiment

• Your data come from binomial distribution (the proportion of success)
• You know the whole sample and the number of specified events (the proportion of sub-group)
• You have a specified proportion (p0)

#### Case Example

Suppose that in the general population, 20% of women who had infertility. Suppose a treatment may affect infertility. 200 women who were trying to get pregnant accepted the treatment. Among 40 women who got the treatment, 10 were still infertile. We wanted to know if there was a significant difference in the rate of infertility among treated women compared to 20% of the general infertile rate.

#### Step 1. Data Preparation

In the example, the number of events was 10 and the total sample size was 40.

#### Step 2. Specify Parameter

The infertility rate in general (20%) was what we wanted to compare.

#### Step 3. Choose Hypothesis

Null hypothesis

p = p0: the probability/proportion is p0

In this example, we wanted to test if there was a significant difference in the rate of infertility among treated women compared to 20% the general infertile rate, so we used the first alternative hypothesis

#### Output 2. Test Results

1. Normal Theory Method with Yates' Continuity Correction, when np0(1-p0) >= 5

2. Exact Binomial Method, when np0(1-p0) < 5

Explanations
• P Value < 0.05, then the population proportion/rate IS significantly different from the specified proportion/rate. (Accept the alternative hypothesis)
• P Value >= 0.05, then the population proportion/rate IS NOT significantly different from the specified proportion/rate. (Accept the null hypothesis)
From the default settings, we concluded that there was no significant difference in the rate of infertility among homozygous women compared to the general infertility rate (P = 0.55). In this case, np0(1-p0)=40*0.2*0.8 > 5, so the Normal Theory Method was preferable.

# Chi-square Test for Two Independent Proportions

#### 1. Functionalities

• To determine if the population rate/proportion behind your 2 groups data are significantly different

• Your 2 groups data come from binomial distribution (the proportion of success)
• You know the whole sample and the number of specified events (the proportion of sub-group) from 2 groups
• The 2 groups are independent observations

#### Case Example

Suppose all women in the study had at least on birth. We investigated 3220 breast cancer women as the case. Among them, 683 had at least one birth after 30 years old. Also, we investigated 10245 no breast cancer women as control. Among them, 1498 had at least one birth after 30 years old. We wanted to know if the underlying probability of having first birth over 30 years old was different in breast cancer and non-breast cancer groups.

#### Step 1. Data Preparation

Give names to the sample Groups

Give names to the success/events

Group 1 (Case)

Samples in Group 1 were 3220 breast cancer women. Among them, 683 had at least one birth after 30 years old.

Group 2 (Control)

Samples in Group 2 were 10245 no breast cancer women. Among them, 1498 had at least one birth after 30 years old.

#### Step 2. Choose Hypothesis

Null hypothesis

p1 = p2: the probability/proportion of cases are equal in Group 1 and Group 2.

In this example, we wanted to know if the underlying probability of having first birth over 30 years old was different in 2 groups.

#### Output 1. Data Preview

Data Table

Percentage Plot of

1. Case

2. Control

#### Output 2. Test Results

Explanations
• P Value < 0.05, then the population proportion/rate are significantly different in two groups. (Accept alternative hypothesis)
• P Value >= 0.05, then the population proportion/rate are NOT significantly different in two groups. (Accept null hypothesis)
From the default settings, we conclude that women with breast cancer are significantly more likely to have their first child after 30 years old compared to women without breast cancer. (P<0.001)

# Chi-square Test for More than Two Independent Proportions

#### 1. Functionalities

• To determine if the population rate/proportion behind your multiple group data are significantly different

• Your group data come from binomial distribution (the proportion of success)
• You know the whole sample and the number of specified events (the proportion of sub-group) from each group
• The multiple groups are independent observations

#### Case Example

Suppose we wanted to study the relationship between age at first birth and the development of breast cancer. Thus, we investigated 3220 breast cancer cases and 10254 no breast cancer cases. Then, we categorize women into different age groups. We wanted to know if the probability of having cancer were different among different age groups; or if their ages related to breast cancer.

#### Step 1. Data Preparation

You can change groups names

You can change success/events names

How many success/events in every group, x

How many trials/samples in every group, n > x

Note: No Missing Value

In this example, we had 5 age groups of people, as shown in n, and we record the number of people who had cancer in x.

#### Hypothesis

Null hypothesis

The probability/proportion is equal over the groups

Alternative hypothesis

The probability/proportions are not equal

In this example, we wanted to know if the probability of having cancer was different among different age groups.

Data Table

#### Output 2. Test Results

Explanations
• P Value < 0.05, then the population proportion/rate are significantly different. (Accept the alternative hypothesis)
• P Value >= 0.05, then the population proportion/rate are NOT significantly different. (Accept the null hypothesis)

In this default setting, we concluded that the probability of have cancer was significantly different in different age groups. (P < 0.001)

# Chi-square Test for Trend in Multiple Independent Samples

#### 1. Functionalities

• To determine if the population rate/proportion behind your multiple group data vary

• Your group data come from binomial distribution (the proportion of success)
• You know the whole sample and the number of specified events (the proportion of sub-group) from each group
• The multiple groups are independent observations

#### Case Example

Suppose we wanted to study the relationship between age at first birth and the development of breast cancer. Thus, we investigated 3220 breast cancer cases and 10254 no breast cancer cases. Then, we categorize women into different age groups. In this example, we wanted to know if the rate of having cancer tended from small to large ages.

#### Step 1. Data Preparation

1. Give names to group samples

2. Give names to success/event

Data pointscan be separated by , ; /Enter /Tab

3. How many success/event in every group (x)

4. How many trials/samples totally in every group (n > x)

Note: No Missing Value

In this example, we had 5 age groups of people, as shown in n, and we recorded the number of people who had cancer in x.

#### Step 2. What is the order that you want to test for your samples

Order of the columns (same length with your sample)

In this case, age groups were in increasing order

#### Hypothesis

Null hypothesis

There is no variation in for the sample proportion

Alternative hypothesis

The proportion/rate/probabilities vary with score

Data Table

Cell-Column %

#### Output 2. Test Results

Explanations
• P Value < 0.05, then Case-Control (Row) is significantly associated with grouped Factors (Column) (Accept the alternative hypothesis)
• P Value >= 0.05, then Case-Control (Row) is not associated with grouped Factors (Column). (Accept the null hypothesis)

In this default setting, we concluded that the proportion of cancer varied among different ages. (P = 0.01)