Data Preparation

1. Functionalities

  • To upload data files, preview data set, and check the correctness of data input
  • To pre-process some variables (when necessary) for building the model
  • To achieve the basic descriptive statistics and plots of the variables

2. About your data

  • Your data need to have more rows than columns
  • Your data need to be all numeric

Case Example 1: Mouse gene expression data

This data measured the gene expression of 20 mouses in a diet experiment. Some mouses showed the same genotype, and some gene variables were correlated. We wanted to compute the principal components that were linearly uncorrelated from the gene expression data.

Case Example 2: Chemical data

Suppose in one study, people measured the 9 chemical attributes of 7 types of drugs. Some chemicals had a latent association. We wanted to explore the latent relational structure among the set of chemical variables and narrow down to a smaller number of variables.

Please follow the Steps, and Outputs will give real-time analytical results. After getting data ready, please find the model in the next tabs.


Data Preparation




Upload data will cover the example data

Please refer to the example data format to upload new data

2. Show 1st row as column names?

3. Use 1st column as row names? (No duplicates)

Correct separator and quote ensure the successful data input

Find some example data here


Change the types of some variable?







Output 1. Data Information

Data Preview

1. Numeric variable information list


            

2. Categorical variable information list


            

Output 2. Descriptive Results


1. For numeric variable

2. For categorical variable


                  
                    
                    Download Results (Categorical variable)
                  
                


Linear fitting plot: to roughly show the linear relation between any two numeric variable. Grey area is 95% confidence interval.


3. Change the labels of X and Y axes


Histogram: to roughly show the probability distribution of a variable by depicting the frequencies of observations occurring in certain ranges of values.

Density plot: to show the distribution of a variable


Histogram and Density Plot

When the number of bins is 0, plot will use the default number of bins

Density plot






                

Principal Component Analysis

Principal components analysis (PCA) is a data reduction technique that transforms a larger number of correlated variables into a much smaller set of uncorrelated variables called principal components.

1. Functionalities

  • From to estimate the number of components
  • To achieve a correlation matrix and draw plots
  • To achieve the principal components and loadings result tables
  • To gachieve the principal components and loadings distribution plots in 2D and 3D

2. About your data

  • All the data for analysis are numeric
  • More samples size than the number of independent variables, that is, the number of rows is greater than the number of columns

Please follow the Steps to build the model, and click Outputs to get analytical results.


Build the Model

Prepare the data in the Data tab

The number of variables (columns) should be < the number of samples (rows)

Example data here is Chemical


Step 1. Choose parameters to build the model


Step 2. If data and model are ready, click the blue button to generate model results.





Output 1. Data Explores

Part of Data

Please edit data in Data tab


Output 2. Model Results


Explanations
  • This plot graphs the components relations from two components, you can use the score plot to assess the data structure and detect clusters, outliers, and trends
  • Groupings of data on the plot may indicate two or more separate distributions in the data
  • If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero

2. When A >=2, choose 2 components to show component and loading 2D plot

In the plot of PC1 and PC2 (without group circle), we could find some outliers, for example, 11 and 23. If we chose diet and add group circle in Euclid distance, we could find diet type sun was separated from others.


Explanations
  • This plot show the contributions from the variables to the PCs (choose PC in the left panel)
  • Red indicates negative and blue indicates positive effects
  • Use the cumulative proportion of variance (in the variance table) to determine the amount of variance that the factors explain.
  • For descriptive purposes, you may need only 80% (0.8) of the variance explained.
  • If you want to perform other analyses on the data, you may want to have at least 90% of the variance explained by the factors.

Loadings

Variance table


Explanations
  • This plot (biplots) overlays the components and the loadings (choose PC in the left panel)
  • If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
  • Loadings identify which variables have the largest effect on each component.
  • Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.

When A >=2, choose 2 components to show component and loading 2D plot

In the plot of PC1 and PC2, we could find ACAT2 have comparatively strong negative effect to PC1, and PKD4 has strong positive effect on PC1. For PC2, THIOL has strong positive effect and VDR has strong negative effect. The results are corresponding to the loading plot


Explanations
  • This is the extension for 2D plot. This plot overlays the components and the loadings for 3 PCs (choose PCs and the length of lines in the left panel)
  • We can find the outliers in the plot.
  • If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
  • Loadings identify which variables have the largest effect on each component
  • Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.

This plot needs some time to load for the first time

When A >=3, choose 3 components to show component and loading 3D plot

The default is to show the first 3 PC in the 3D plot

Trace legend


                

Exploratory Factor Analysis

Exploratory Factor analysis (EFA) is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

1. Functionalities

  • From parallel analysis to estimate the number of components
  • To achieve a correlation matrix and plots
  • To achieve the factors and loadings result tables and
  • To achieve the factors and loadings distribution plots in 2D and 3D

2. About your data

  • All the data for analysis are numeric
  • More samples size than the number of independent variables, that is, the number of rows is greater than the number of columns

Please follow the Steps to build the model, and click Outputs to get analytical results.


Build the Model

Prepare the data in the Data tab

The number of variables (columns) should be < the number of samples (rows)

Example data here is Mouse


Step 1. Choose parameters to build the model

According to the suggested results from the parallel analysis, we chose to generate 3 factors from the data


Step 2. If data and model are ready, click the blue button to generate model results.





Output 1. Data Explores

Part of Data

Please edit data in Data tab


Output 2. Model Results


Explanations
  • This plot graphs the factor relations to the variables
  • Results in the window show the statistical test for the sufficiency of factors.


                


Explanations
  • This plot graphs the relations from two factors, you can use the score plot to assess the data structure and detect clusters, outliers, and trends
  • Groupings of data on the plot may indicate two or more separate distributions in the data
  • If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero

2. When A >=2, choose 2 factors to show component and loading 2D plot

In the plot of ML1 and ML2, we could find some outliers, for example, 169 and 208. We can remove these points in Data tab. If we chose type and add group circle in Euclid distance, we could find B group was somewhat different. Not all the groups had circles due to the number of points were too less.


Explanations
  • This plot show the contributions from the variables to the PCs (choose PC in the left panel)
  • Red indicates negative and blue indicates positive effects
  • Use the proportion of variance (in the variance table) to determine the amount of variance that the factors explain.
  • For descriptive purposes, you may need only 80% (0.8) of the variance explained.
  • If you want to perform other analyses on the data, you may want to have at least 90% of the variance explained by the factors.

Loadings

Variance table


Explanations
  • This plot (biplots) overlays the factors and the loadings (choose PC in the left panel)
  • If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
  • Loadings identify which variables have the largest effect on each component
  • Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.

When A >=2, choose 2 factors to show factors and loading 2D plot

After removing the points 169 and 208, we could find chem2 have comparatively strong relation to ML2.


Explanations
  • This is the extension for 2D plot. This plot overlays the factors and the loadings for 3 PCs (choose PCs and the length of lines in the left panel)
  • We can find the outliers in the plot.
  • If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero
  • Loadings identify which variables have the largest effect on each component
  • Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component.

This plot needs some time to load for the first time

When A >=3, choose 3 factors to show factors and loading 3D plot

The default is to show the first 3 factors in the 3D plot

Trace legend