To upload data files, preview data set, and check the correctness of data input

To pre-process some variables (when necessary) for building the model

To achieve the basic descriptive statistics and plots of the variables

2. About your data

Your data need to have more rows than columns

Your data need to be all numeric

Case Example 1: Mouse gene expression data

This data measured the gene expression of 20 mouses in a diet experiment. Some mouses showed the same genotype, and some gene variables were correlated.
We wanted to compute the principal components that were linearly uncorrelated from the gene expression data.

Case Example 2: Chemical data

Suppose in one study, people measured the 9 chemical attributes of 7 types of drugs. Some chemicals had a latent association.
We wanted to explore the latent relational structure among the set of chemical variables and narrow down to a smaller number of variables.

Please follow the Steps, and Outputs will give real-time analytical results. After getting data ready, please find the model in the next tabs.

Linear fitting plot: to roughly show the linear relation between any two numeric variable.
Grey area is 95% confidence interval.

3. Change the labels of X and Y axes

Histogram: to roughly show the probability distribution of a variable by depicting the frequencies of observations occurring in certain ranges of values.

Density plot: to show the distribution of a variable

Histogram and Density Plot

When the number of bins is 0, plot will use the default number of bins

Density plot

Principal Component Analysis

Principal components analysis (PCA) is a data reduction technique that transforms a larger number of correlated variables into a much smaller set of uncorrelated variables called principal components.

1. Functionalities

From to estimate the number of components

To achieve a correlation matrix and draw plots

To achieve the principal components and loadings result tables

To gachieve the principal components and loadings distribution plots in 2D and 3D

2. About your data

All the data for analysis are numeric

More samples size than the number of independent variables, that is, the number of rows is greater than the number of columns

Please follow the Steps to build the model, and click Outputs to get analytical results.

This plot graphs the components relations from two components, you can use the score plot to assess the data structure and detect clusters, outliers, and trends

Groupings of data on the plot may indicate two or more separate distributions in the data

If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero

2. When A >=2, choose 2 components to show component and loading 2D plot

In the plot of PC1 and PC2 (without group circle), we could find some outliers, for example, 11 and 23.
If we chose diet and add group circle in Euclid distance, we could find diet type sun was separated from others.

Explanations

This plot show the contributions from the variables to the PCs (choose PC in the left panel)

Red indicates negative and blue indicates positive effects

Use the cumulative proportion of variance (in the variance table) to determine the amount of variance that the factors explain.

For descriptive purposes, you may need only 80% (0.8) of the variance explained.

If you want to perform other analyses on the data, you may want to have at least 90% of the variance explained by the factors.

Loadings

Variance table

Explanations

This plot (biplots) overlays the components and the loadings (choose PC in the left panel)

If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero

Loadings identify which variables have the largest effect on each component.

Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.

When A >=2, choose 2 components to show component and loading 2D plot

In the plot of PC1 and PC2, we could find ACAT2 have comparatively strong negative effect to PC1, and PKD4 has strong positive effect on PC1.
For PC2, THIOL has strong positive effect and VDR has strong negative effect.
The results are corresponding to the loading plot

Explanations

This is the extension for 2D plot. This plot overlays the components and the loadings for 3 PCs (choose PCs and the length of lines in the left panel)

We can find the outliers in the plot.

If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero

Loadings identify which variables have the largest effect on each component

Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.

This plot needs some time to load for the first time

When A >=3, choose 3 components to show component and loading 3D plot

The default is to show the first 3 PC in the 3D plot

Trace legend

Exploratory Factor Analysis

Exploratory Factor analysis (EFA) is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

1. Functionalities

From parallel analysis to estimate the number of components

To achieve a correlation matrix and plots

To achieve the factors and loadings result tables and

To achieve the factors and loadings distribution plots in 2D and 3D

2. About your data

All the data for analysis are numeric

More samples size than the number of independent variables, that is, the number of rows is greater than the number of columns

Please follow the Steps to build the model, and click Outputs to get analytical results.

This plot graphs the factor relations to the variables

Results in the window show the statistical test for the sufficiency of factors.

Explanations

This plot graphs the relations from two factors, you can use the score plot to assess the data structure and detect clusters, outliers, and trends

Groupings of data on the plot may indicate two or more separate distributions in the data

If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero

2. When A >=2, choose 2 factors to show component and loading 2D plot

In the plot of ML1 and ML2, we could find some outliers, for example, 169 and 208. We can remove these points in Data tab.
If we chose type and add group circle in Euclid distance, we could find B group was somewhat different. Not all the groups had circles due to the number of points were too less.

Explanations

This plot show the contributions from the variables to the PCs (choose PC in the left panel)

Red indicates negative and blue indicates positive effects

Use the proportion of variance (in the variance table) to determine the amount of variance that the factors explain.

For descriptive purposes, you may need only 80% (0.8) of the variance explained.

If you want to perform other analyses on the data, you may want to have at least 90% of the variance explained by the factors.

Loadings

Variance table

Explanations

This plot (biplots) overlays the factors and the loadings (choose PC in the left panel)

If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero

Loadings identify which variables have the largest effect on each component

Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.

When A >=2, choose 2 factors to show factors and loading 2D plot

After removing the points 169 and 208, we could find chem2 have comparatively strong relation to ML2.

Explanations

This is the extension for 2D plot. This plot overlays the factors and the loadings for 3 PCs (choose PCs and the length of lines in the left panel)

We can find the outliers in the plot.

If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero

Loadings identify which variables have the largest effect on each component

Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.

This plot needs some time to load for the first time

When A >=3, choose 3 factors to show factors and loading 3D plot

The default is to show the first 3 factors in the 3D plot