# One-Sample T-Test

#### 1. Functionalities

• To determine if your data is statistically significantly different from the specified mean from T-test results
• To understand the descriptive statistics plot such as box-plot, mean-sd plot, QQ-plot, distribution histogram, and density distribution plot about your data to determine if your data is close to a normal distribution

• Your data contain only 1 group of values (or a numeric vector)
• The values are independent observations and approximately normally distributed

#### Case Example

Suppose we collected the age of 50 independent lymph node-positive patients and wanted to know whether the general age of lymph node-positive patients was 50 years old

#### Step 1. Data Preparation

1. Give a name to your data (Required)

2. Input data

Here was the AGE of 144 independent lymph node-positive patients

Data point can be separated by , ; /Enter /Tab /Space

Data be copied from CSV (one column) and pasted in the box

Missing values are input as NAs

Upload data will cover the example data

2. Show 1st row as column names?

3. Use 1st column as row names? (No duplicates)

Correct separator and quote ensure the successful data input

Find some example data here

#### Step 2. Specify Parameter

The specified parameter is the general age 50

#### Step 3. Choose Hypothesis

Null hypothesis

μ = μ₀: the population mean (μ) of your data is μ₀

We wanted to know whether the age was 50 or not, so we chose the first alternative hypothesis

#### Output 1. Descriptive Results

Explanations
• The band inside the box is the median
• The box measures the difference between 75th and 25th percentiles
• Outliers will be in red, if existing

Explanations
• Normal Q–Q Plot: to compare randomly generated, independent standard normal data on the vertical axis to a standard normal population on the horizontal axis. The linearity of the points suggests that the data are normally distributed.
• Histogram: to roughly assess the probability distribution of a given variable by depicting the frequencies of observations occurring in certain ranges of values
• Density Plot: to estimate the probability density function of the data

Normal Q–Q plot

Histogram

When the number of bins is 0, plot will use the default number of bins

Density plot

#### Output 2. Test Results

Explanations
• P Value < 0.05, then the population of the data IS significantly different from the specified mean. (Accept the alternative hypothesis)
• P Value >= 0.05, then the population of the data IS NOT significantly different from the specified mean. (Accept the null hypothesis)

Because P < 0.05, we concluded that the age of the lymph node-positive population was significantly different from 50 years old. Thus the general age was not 50. If we reset the specified mean to 44, we could get P > 0.05

# Independent Two-Sample T-Test

#### 1. Functionalities

• To determine if the means of two sets of your data are significantly different from each other from T test results
• To know the descriptive statistics plot such as box-plot, mean-sd plot, QQ-plot, distribution histogram, and density distribution plot about your data to determine if your data is close to normal distribution

• Your data contain 2 separate groups/sets (or 2 numeric vectors)
• The 2 separate groups/sets are independent and identically approximately normally distributed

#### Case Example

Suppose we collected the age of 50 independent lymph node-positive patients. Among them, 25 had Estrogen receptor (ER) positive, 25 had ER negative. We wanted to know if the ages of patients with ER positive was significantly different from patients with ER negative in general. Or, whether ER is related to age.

#### Step 1. Data Preparation

1. Give names to your groups (Required)

2. Input data

Example here was the AGE of 27 lymph node positive patients with Estrogen receptor (ER) positive (Group.1-Age.positive); and 117 patients with ER negative (Group.2-Age.negative)

Data point can be separated by , ; /Enter /Tab /Space

Data be copied from CSV (one column) and pasted in the box

Group 1

Group 2

Missing values are input as NAs to ensure 2 sets have equal length; otherwise, there will be error

Upload data will cover the example data

2. Show 1st row as column names?

3. Use 1st column as row names? (No duplicates)

Correct separator and quote ensure the successful data input

Find some example data here

#### Step 2. Equivalence of Variance

Before doing the T test, we need to check the equivalence of variance and then decide which T test to use

Null hypothesis

v1 = v2: Group 1 and Group 2 have equal population variances

#### Step 3. T Test

Null hypothesis

μ₁ = μ₂: Group 1 and Group 2 have equal population means

In this default settings, we wanted to know if the ages of patients with ER positive was significantly different from patients with ER negative

#### Output 1. Descriptive Results

Explanations
• The band inside the box is the median
• The box measures the difference between 75th and 25th percentiles
• Outliers will be in red, if existing

Explanations
• Normal Q–Q Plot: to compare randomly generated, independent standard normal data on the vertical axis to a standard normal population on the horizontal axis. The linearity of the points suggests that the data are normally distributed.
• Histogram: to roughly assess the probability distribution of a given variable by depicting the frequencies of observations occurring in certain ranges of values
• Density Plot: to estimate the probability density function of the data

Normal Q-Q plot

Histogram

When the number of bins is 0, plot will use the default number of bins

Density plot

#### Output 2. Test Result 1

Check the equivalence of 2 variances
Explanations
• P value < 0.05, then refer to the Welch Two-Sample t-test
• P Value >= 0.05, then refer to Two-Sample t-test

In this example, P value of F test was about 0.15 (>0.05), indicating the equal variance in the data. Thus, we should refer to the results from 'Two-Sample t-test'

#### Output 3. Test Result 2

Decide the T Test

Explanations
• P Value < 0.05, then the population means of the Group 1 IS significantly different from Group 2. (Accept the alternative hypothesis)
• P Value >= 0.05, then there are NO significant differences between Group 1 and Group 2. (Accept the null hypothesis)

In this example, we concluded that the age of lymph node positive population with ER positive was not significantly different from ER negative (P=0.55, from 'Two-Sample t-test')

# Dependent T-Test for Paired Samples

In paired case, we compare the differences of 2 groups to zero. Thus, it becomes a one-sample test problem.

#### 1. Functionalities

• To determine if the difference of the paired 2 samples are equal to 0
• To know the descriptive statistics plot such as box-plot, mean-sd plot, QQ-plot, distribution histogram, and density distribution plot about your data to determine if your data is close to normal distribution

• Your data contain 2 separate groups/sets (or 2 numeric vectors)
• Two samples that have been matched or paired
• The differences of paired samples are approximately normally distributed

#### 3. Examples for Matched or Paired Data

• One person's pre-test and post-test scores
• When there are two samples that have been matched or paired

#### Case Example

Suppose we collected the wanted to know whether a certain drug had effect on people's sleeping hour. We got 10 people and collected the sleeping hour data before and after taking the drug. This was a paired case. We wanted to know whether the sleeping hours before and after the drug would be significantly different; or, whether the difference before and after were significantly different from 0

#### Step 1. Data Preparation

1. Give names to your groups (Required)

2. Input data

Example here was the HOUR of sleep effected by a certain drug. Sleeping hours before and after taking the drug were recorded

Data point can be separated by , ; /Enter /Tab /Space

Data be copied from CSV (one column) and pasted in the box

Before

After

Missing values are input as NAs to ensure 2 sets have equal length; otherwise, there will be error

Upload data will cover the example data

2. Show 1st row as column names?

3. Use 1st column as row names? (No duplicates)

Correct separator and quote ensure the successful data input

Find some example data here

#### Step 2. Choose Hypothesis

Null hypothesis

Δ = 0: Group 1 (Before) and Group 2 (After) have equal effect

In this default settings, we wanted to know if the drug has effect. Or, if sleep HOUR changed after they take the drug.

#### Output 1. Descriptive Results

Basic Descriptives of the Difference

Explanations
• The band inside the box is the median
• The box measures the difference between 75th and 25th percentiles
• Outliers will be in red, if existing

Explanations
• Normal Q–Q Plot: to compare randomly generated, independent standard normal data on the vertical axis to a standard normal population on the horizontal axis. The linearity of the points suggests that the data are normally distributed.
• Histogram: to roughly assess the probability distribution of a given variable by depicting the frequencies of observations occurring in certain ranges of values
• Density Plot: to estimate the probability density function of the difference

Normal Q-Q plot

Histogram

When the number of bins is 0, plot will use the default number of bins

Density plot

#### Output 2. Test Results

Explanations
• P Value < 0.05, then Group 1 (Before) and Group 2 (After) have a significantly unequal effect. (Accept the alternative hypothesis)
• P Value >= 0.05, then there is NO significant difference between 2 groups. (Accept the null hypothesis)

From the default settings, we concluded that the drug has no significant effect on the sleep hour. (P=0.2)