Machine Learning

[머신러닝] 이상치 탐지(anomaly detection) #02 Gaussian Density Estimation

728x90

이 포스트는 고려대학교 강필성 교수님의 강의 내용과, 따로 학습한 내용을 정리를 한 것입니다.

Density-Based Novelty Detection

Purpose

Estimate the date-driven density function(핵심)
If a new instance has a low probability according the trained density function, it will be identified as novel
novelty의 정의에서 2번째에 조금 더 focus
1차원 데이터에서 실제 분포가 정말 가우시안인지는 모르나, 가우시안 분포라고 가정하고 정규분포를 추정한다. 이후 새로운 테스트 객체가 어디 들어가느냐에 따라 normal/abnormal로 판별

Gaussian Density Estimation(GDE)

전체의 데이터를 단 하나 의 가우시안 분포로부터 생성되었다 가정. 해당하는 가우시안 분포의 평균 벡터와 공분산 행렬을 추정 → 학습 과정

Assume that the observed data are drawn from a Gaussian distribution

p(x): 가우시안 분표의 확률 밀도함수
x: 주어지는 변수
μ: 찾아야하는 값(미지수)

x+: normal 데이터만 사용한다는 의미

Advantage

Insensitive to scaling of the data(변수의 범위에 민감하지 않다.)
Possible to compute analytically the optimal threshold
공분산 행렬의 역행렬을 구하기 때문에 변수의 단위가 영향을 미치지 않는다.
- 즉 Normalization이 필요하지 않음
- 5%정도 rejection에 대한 1종오류를 설정 가능하다. ⇒ 바운더리 계산이 가능

Parameter estimation: μ, σ^2

likelihood
어떤 값이 관측되었을 때, 이것이 어떤 확률 분포에서 왔을 지에 대한 확률

유도함수(ƒ(x)를 미분하여 얻은 함수)를 만들기 위해 다음 과정을 거친다.

1. log를 취함으로서 exp에 해당하는 부분을 전부 지운다.

2. 변수가 2가지니 2개의 1차 도함수에 대해 0이어야한다.(최적화 조건을 만족하기 위해서)
3. Maximum likelihood estimation,

In general

The shape of Gaussian distribution according to the Covariance Matrix type

각각의 변수들은 low triangle/upper triangle 가 0이다
동일한 분산을 가지고 있다고 가정 ⇒ 데이터를 한 번 normalize 하게 됨, 각각의 변수들이 독립이라고 가정

변수들의 분산이 다르다

각각의 변수들 모두 어느정도 상관관계가 있다
장축/단축이 축에 수직이 아니다

full이 가장 좋지만 데이터 변동성에 의해 잘 들어 맞지 않을 가능성 多
현실적으로는 diagonal 사용

Mixture of Gaussian Density Estimation

Gaussian Density Estimation
- assume a very strong model of the data: unimodal and convex
MoG
- an extension of Gaussian that allows multi-model distribution
- a linear corrbination of noramal distributions
  - 여러개의 가우시안 모델의 선형결합
- Has a smaller bias that the single Gaussian distribution, but requires for more data for training
  - 좀 더 정확한 추정. but 많은 개수의 데이터 필요
이상치 스코어 ⇒ 추정된 환경 밀도 함수로부터 새로운 데이터가 들어왔을때 산출되는 확률 분포의 값
이상치 스코어가 낮을수록 이상치 가능성 높다.

single gaussian: μ, σ ⇒ 2개의 미지수 추정
MoG: K개에 대해 → μ, σ, ω 계산 ⇒ 3*k개

components of MoG

probability of an instance belonging to the normal class ⇒ 어떤 객체가 normal class에 속할 확률

x: 우리가 추정해야하는 미지수의 집합
M: 가우시안의 개수

각각 개별적인 가우시안 분포에 대한 liklihood를 계산
가중치를 곱한 값을 전부 더해준다.

Distribution of each Gaussian model

개별적인 가우시안 분포는 우리가 아는 single gaussian 이용

Expectation-Maximization

gradient desent 알고리즘과 더불어 기본적으로 가장 많이 사용되는 방법론

E-step: Given the current estimate of the parameters, compute the conditional probabilities.
M-step: Update the parameters to maximize the expected likelihood found in the E-step
- 객체 별로 고정. 가우시안 parameter update

추정해야하는 미지수 family가 A, B 존재 → 동시에 최적화 불가능
A 고정 → B 최적화, B 고정 → A 최적화 … (반복) ⇒A와 B가 불변(수렴). 이 지점이 최적이다.

EM algorithm for MoG

Expectation

P값을 바탕으로 아래 값을 계산
w_m, mu_m, Sigma_m: 임의의 값으로 고정. 최적화 x.
분자: m번째 가우시안 분포에서 생성될 확률
분모: 1-m 모든 분포에서 생성될 확률의 값

Maximization

The shape of MoG according to the covariance matrix

참고

03-2 Anomaly Detection - Gauss & MoG