To upload data files, preview data set, and check the correctness of data input
To pre-process some variables (when necessary) for building the model
To achieve the basic descriptive statistics and plots of the variables
2. About your data
Your data need to have more rows than columns
Your data need to be all numeric
Case Example 1: Mouse gene expression data
This data measured the gene expression of 20 mouses in a diet experiment. Some mouses showed the same genotype, and some gene variables were correlated.
We wanted to compute the principal components that were linearly uncorrelated from the gene expression data.
Case Example 2: Chemical data
Suppose in one study, people measured the 9 chemical attributes of 7 types of drugs. Some chemicals had a latent association.
We wanted to explore the latent relational structure among the set of chemical variables and narrow down to a smaller number of variables.
Please follow the Steps, and Outputs will give real-time analytical results. After getting data ready, please find the model in the next tabs.
Linear fitting plot: to roughly show the linear relation between any two numeric variable.
Grey area is 95% confidence interval.
3. Change the labels of X and Y axes
Histogram: to roughly show the probability distribution of a variable by depicting the frequencies of observations occurring in certain ranges of values.
Density plot: to show the distribution of a variable
Histogram and Density Plot
When the number of bins is 0, plot will use the default number of bins
Density plot
Principal Component Analysis
Principal components analysis (PCA) is a data reduction technique that transforms a larger number of correlated variables into a much smaller set of uncorrelated variables called principal components.
1. Functionalities
From to estimate the number of components
To achieve a correlation matrix and draw plots
To achieve the principal components and loadings result tables
To gachieve the principal components and loadings distribution plots in 2D and 3D
2. About your data
All the data for analysis are numeric
More samples size than the number of independent variables, that is, the number of rows is greater than the number of columns
Please follow the Steps to build the model, and click Outputs to get analytical results.
This plot graphs the components relations from two components, you can use the score plot to assess the data structure and detect clusters, outliers, and trends
Groupings of data on the plot may indicate two or more separate distributions in the data
If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
2. When A >=2, choose 2 components to show component and loading 2D plot
In the plot of PC1 and PC2 (without group circle), we could find some outliers, for example, 11 and 23.
If we chose diet and add group circle in Euclid distance, we could find diet type sun was separated from others.
Explanations
This plot show the contributions from the variables to the PCs (choose PC in the left panel)
Red indicates negative and blue indicates positive effects
Use the cumulative proportion of variance (in the variance table) to determine the amount of variance that the factors explain.
For descriptive purposes, you may need only 80% (0.8) of the variance explained.
If you want to perform other analyses on the data, you may want to have at least 90% of the variance explained by the factors.
Loadings
Variance table
Explanations
This plot (biplots) overlays the components and the loadings (choose PC in the left panel)
If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
Loadings identify which variables have the largest effect on each component.
Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.
When A >=2, choose 2 components to show component and loading 2D plot
In the plot of PC1 and PC2, we could find ACAT2 have comparatively strong negative effect to PC1, and PKD4 has strong positive effect on PC1.
For PC2, THIOL has strong positive effect and VDR has strong negative effect.
The results are corresponding to the loading plot
Explanations
This is the extension for 2D plot. This plot overlays the components and the loadings for 3 PCs (choose PCs and the length of lines in the left panel)
We can find the outliers in the plot.
If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
Loadings identify which variables have the largest effect on each component
Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.
This plot needs some time to load for the first time
When A >=3, choose 3 components to show component and loading 3D plot
The default is to show the first 3 PC in the 3D plot
Trace legend
Exploratory Factor Analysis
Exploratory Factor analysis (EFA) is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.
1. Functionalities
From parallel analysis to estimate the number of components
To achieve a correlation matrix and plots
To achieve the factors and loadings result tables and
To achieve the factors and loadings distribution plots in 2D and 3D
2. About your data
All the data for analysis are numeric
More samples size than the number of independent variables, that is, the number of rows is greater than the number of columns
Please follow the Steps to build the model, and click Outputs to get analytical results.
This plot graphs the factor relations to the variables
Results in the window show the statistical test for the sufficiency of factors.
Explanations
This plot graphs the relations from two factors, you can use the score plot to assess the data structure and detect clusters, outliers, and trends
Groupings of data on the plot may indicate two or more separate distributions in the data
If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
2. When A >=2, choose 2 factors to show component and loading 2D plot
In the plot of ML1 and ML2, we could find some outliers, for example, 169 and 208. We can remove these points in Data tab.
If we chose type and add group circle in Euclid distance, we could find B group was somewhat different. Not all the groups had circles due to the number of points were too less.
Explanations
This plot show the contributions from the variables to the PCs (choose PC in the left panel)
Red indicates negative and blue indicates positive effects
Use the proportion of variance (in the variance table) to determine the amount of variance that the factors explain.
For descriptive purposes, you may need only 80% (0.8) of the variance explained.
If you want to perform other analyses on the data, you may want to have at least 90% of the variance explained by the factors.
Loadings
Variance table
Explanations
This plot (biplots) overlays the factors and the loadings (choose PC in the left panel)
If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
Loadings identify which variables have the largest effect on each component
Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.
When A >=2, choose 2 factors to show factors and loading 2D plot
After removing the points 169 and 208, we could find chem2 have comparatively strong relation to ML2.
Explanations
This is the extension for 2D plot. This plot overlays the factors and the loadings for 3 PCs (choose PCs and the length of lines in the left panel)
We can find the outliers in the plot.
If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
Loadings identify which variables have the largest effect on each component
Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.
This plot needs some time to load for the first time
When A >=3, choose 3 factors to show factors and loading 3D plot
The default is to show the first 3 factors in the 3D plot