# 《machine-learning-mindmap》 4 Concepts （Daniel Martinez）

## Motivation

### Prediction

#### When we are interested mainly in the predicted variable as a result of the inputs, but not on the each way of the inputs affect the prediction. In a real estate example, Prediction would answer the question of: Is my house over or under valued? Non-linear models are very good at these sort of predictions, but not great for inference because the models are much less interpretable.

### Inference

#### When we are interested in the way each one of the inputs affect the prediction. In a real estate example, Prediction would answer the question of: How much would my house cost if it had a view of the sea? Linear models are more suited for inference because the models themselves are easier to understand than their non-linear counterparts.

## Performance Analysis

### Confusion Matrix

### Accuracy

#### Fraction of correct predictions, not reliable as skewed when the data set is unbalanced (that is, when the number of samples in different classes vary greatly)

### f1 score

#### Precision

#### Out of all the examples the classifier labeled as positive, what fraction were correct?

#### Recall

#### Out of all the positive examples there were, what fraction did the classifier pick up?

#### Harmonic Mean of Precision and Recall: (2 * p * r /(p + r))

### ROC Curve - Receiver Operating Characteristics

#### True Positive Rate (Recall / Sensitivity) vs False Positive Rate (1-Specificity)

### Bias-Variance Tradeoff

#### Bias refers to the amount of error that is introduced by approximating a real-life problem, which may be extremely complicated, by a simple model. If Bias is high, and/or if the algorithm performs poorly even on your training data, try adding more features, or a more flexible model.

#### Variance is the amount our model’s prediction would change when using a different training data set. High: Remove features, or obtain more data.

### Goodness of Fit = R^2

#### 1.0 - sum_of_squared_errors / total_sum_of_squares(y)

### Mean Squared Error (MSE)

#### The mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors or deviations—that is, the difference between the estimator and what is estimated

### Error Rate

#### The proportion of mistakes made if we apply out estimate model function the the training observations in a classification setting.

## Tuning

### Cross-validation

#### One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds.

### Methods

#### Leave-p-out cross-validation

#### Leave-one-out cross-validation

#### k-fold cross-validation

#### Holdout method

#### Repeated random sub-sampling validation