- To upload data files, preview data set, and check the correctness of data input
- To pre-process some variables (when necessary) for building the model
- To achieve the basic descriptive statistics and draw plots of the variables

- Your data need to include
**one binary dependent variable (denoted as Y)**and**at least one independent variables (denoted as X)** - Your data need to have more rows than columns
- Do not mix character and numbers in the same column
- The data used to build a model is called a
**training set**

**Data Preview**

**Variable types**

**1. For numeric variable**

**2. For categorical variable**

**Logit plot**: to roughly show the relation between any two numeric variable.

**3. Change the labels of X and Y axes**

**Histogram**: to roughly show the probability distribution of a variable by depicting the frequencies of observations occurring in certain ranges of values.

**Density plot**: to show the distribution of a variable

**Histogram**

When the number of bins is 0, plot will use the default number of bins

**Density plot**

- To build simple or multiple logistic regression model
- To achieve the estimates of regressions, including (1) estimate of coefficients with t test, p value, and 95% CI, (2) R
^{2}and adjusted R^{2}, and (3) F-Test for overall significance in Regression - To achieve additional information: (1) predicted dependent variable and residuals, (2) AIC-based variable selection, (3) ROC plot, and (4) sensitivity and specificity table for ROC plot
- To upload new data and achieve the prediction
- To achieve the evaluation of new data containing new dependent variable

- The dependent variable is binary
- Please prepare the training set data in the previous
**Data**tab - New data (test set) should cover all the independent variables used in the model.

Check full data in Data tab

- Output in the left shows estimated coefficients (95% confidence interval), T statistic (t = ) for the significance of single variable, and P value (p = ) are given
- Output in the right shows odds ratio = exp(b) and standard error of the original coefficients
- T test of each variable and P < 0.05 indicates this variable is statistically significant to the model
- Observations mean the number of samples
- Akaike Inf. Crit. = AIC = -2 (log likelihood) + 2k; k is the number of variables + constant

- The Akaike Information Criterion (AIC) is used to performs stepwise model selection.
- Model fits are ranked according to their AIC values, and the model with the lowest AIC value is sometime considered the 'best'.

**Model selection suggested by AIC**

- ROC curve: receiver operating characteristic curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied
- ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings
- Sensitivity (also called the true positive rate) measures the proportion of actual positives that are correctly identified as such
- Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such

Predicted dependent variable is shown in the 1st column

This plot is shown when new dependent variable is provided in the test data.

This plot shows the ROC plot between predicted values and true values, based on the new data not used in the model.

**Sensitivity and specificity table**