Learning from Data

types of Problems - Models

Dinary Classification (T/F)
Multi-class Classification (종류 구분)
Regression / Prediction

Supervised Learning

Label을 부여하여 정답을 아는 상태로 학습을 진행
ML을 통해 함수를 학습하는 과정 Hypothesis

Model Generalization

Generalization : 일반화를 통해 경험하지 못한 Data에서도 작동해야 한다.
- Generalization Error : E_gen 를 최소화 하도록 학습해야 함
  - Squared Error : e(h(x),y) = (h(x) - y)^2
  - Binary Error e(h(x), y) = 1?[h(x) != y] : h(x)와 y의 값이 다르면 1
Error Function / Loss Function
\[E [(h(x)-y)^2]\]
- E_train: 학습의 과정에서 발견되는 Error, 이후 학습에 활용
Objective 1

E_test = E_train
- Reasons of Failure
  - overfitting → high variance train Dataset에 너무 맞게 만듦
Objective 2

E_train = 0
- Reasons of Failure
  - underfitting → high bias train dataset이 너무 편향적인 경우 다른 training dataset을 구하거나, 최적화 실행

How to avoid over-fitting

Regularization to penalize complex models → make model to unsensitive to noise
Using Ensemble
- Ensemble : adding average to model
Cross-validation (CV)
- Use one of traing set as a validation data : more complexity

Linear Regression

Linear Models

Use H as a set of lines

\[h_w(x) = theta_0 + theta_1x_1 + ... + theata_dx_d = theta^Tx\]

advantages :
- Simplicity
- Generalization
- Solving regression and classification

Feature Organization

Extract features from x to use as model parameter

How to optimize a parameter?

minimize the Loss Function
Normal Equation(Least Square)
- problems.
  - if Vector is too high, requiring large amount of calculation
  - matrix is not invertible : cannot use
- solutions.
  - Using iterative algorithm ( gradient descent )

Gradient Descent

개념: 값의 변화율이 점점 작아질수록 특정 지점에 수렴하는 개념
Induced by numerically
미분을 통해 연산을 하여 특정한 값에 수렴할 때 까지 반복한다.
\[\theta_{new} = \theta_{old} - \alpha \frac\partial{\partial\theta} J(\theta)\]

alpha: velocity of gradient descent

알파값이 너무 작다면 수렴이 너무 느림
알파값이 너무 크다면 발산하거나 수렴하지 않을 수 있음
Stochasitc gradient descent (SGD)
기존의 GD에선 모든 점에서의 gradient 들을 계산하여 사용하지만 SGD에서는 임의의 한 Data에서의 gradient descent 를 수행하여 비교적 빠르게 계산
단점:
- local minima에 가까이 있는 Data를 추출하였다면 global minima에 수렴하지 못하는 문제가 발생한다.
  Momentum
개념: 이전에 사용한 Gradient를 응용
과거의 속도를 반영하여 gradient 계산
AdaGrad

순간의 gradient 값에 따라 learning rate를 바꾸어 주며 학습한다

RMSProp

Adagrad에서 반영 비율을 조정할 수 있도록 값을 추가함

Adam

RMSProp + momentum

Learning rate scheduling

학습 속도를 조절하여 효율적인 학습을 만듦: 처음에는 빠른 속도로 접근하되 시간이 지날수록 천천히 접근해야 정확도를 높일 수 있을 것이다.

Linear Classification

: 2차원 평면에서의 YES/No 판단을 할 때 필요하다 \(h(x) = w_{0}+w_1x_1+w_2x_2 = 0\) \(h(x) = sign(w^Tx)\)

Zero-one Loss

margin 부분이 0보다 작거나 같으면 1

Hingle Loss

\(Loss_{hinge}(x,y,w) = max(1 - (w\phi(x))y, 0)\)

Cross-entrophy Loss

: 확률값을 통해 표현하는 법 → Sigmoid Function

Multiclass classification

: Data에서 2진으로 나누는 것이 아닌 각각의 종류를 찾아주는 것 여러개의 liner classification을 통하여 각각을 분류한다.

Pros

Simple
Interpretability

Advanced Classification Model

Linear Classification의 문제점: H를 어디에 잡느냐가 중요하다
SVM (Support Vector Machine)
경계가 될 수 있는 두 Data point에게 margin을 부여하여 잡음
Margin
경계점으로 지정한 Data point들의 사이의 거리
Lemma = \(\frac{|h_{wb}(x)|}{||w||}\)
Optimization
Hard margin SVM: linear 하게 구분할 수 있는 문제
Soft margin SVM
Hard margin SVM

margin을 작게 하는 것이 문제임 \(||w||^2\) 을 작게 하는 값을 구해야 한다.

Soft Margin SVM

Problem of SVM

linearly separable 하지 않을 경우 문제가 발생한다.

Kernel Trick

차원을 늘려서 둘을 분리할 수 있는 평면을 만든다.

Polynomial
Gaussian radial basis function (RBF)
Hyperbolic tangent
Artificial neural network (ANN)

non linear classification model : 뇌신경을 모사하여 제작함 input을 처리하는 부와 그 결과를 가지고 전달하는 activation function으로 나뉨

Sort of Activation Functions

ReLU: max(0, x)
Leaky ReLU: max(0.1x, x)

Multilayer Perceptron (MLP)

ANN구조를 여러 계층으로 쌓아 연산 ### Problems 계층을 많이 쌓았을 때 오류가 발생하는 이유는? → gradient가 점점 작아지는 문제가 발생.

### Breakthrough Back Propagation

Pre-training + fine tunning

## Convolutional Neural Network

# Ensemble : 닮다 New Data들을 prediction을 통해 예측하는 것 (by voting)

다양한 모델들의 결과를 종합하여 표현하는 방식

## Bagging

Bootstrapping + Aggregating
Training Data를 random하게 선택하여 학습하도록 하는 것.
병렬적으로 진행됨
분산이 작음

### Bootstraping

Dataset 안에 작은 sub Dataset을 만들어 학습하는 것
Aggregating
학습한 모델들의 결과를 종합하는 것
Boosting
Classifier들에게 다른 weight을 부여하여 분류되지 못한 Data를 다음 분류에서는 더 강하게 만든다
Adaboost

Dataset에게 weight를 부여하여 다음번엔 무조건 분류될 수 있게 한다.

Learning from Data

types of Problems - Models

Supervised Learning

Model Generalization

How to avoid over-fitting

Linear Regression

Linear Models

Use H as a set of lines

Feature Organization

How to optimize a parameter?

Gradient Descent

Stochasitc gradient descent (SGD)

Momentum

AdaGrad

RMSProp

Adam

Learning rate scheduling

Linear Classification

Zero-one Loss

Hingle Loss

Cross-entrophy Loss

Multiclass classification

Pros

Advanced Classification Model

SVM (Support Vector Machine)

Margin

Optimization

Hard margin SVM

Soft Margin SVM

Problem of SVM

Kernel Trick

Artificial neural network (ANN)

Sort of Activation Functions

Multilayer Perceptron (MLP)

Aggregating

Boosting

Adaboost