XGBoost for classification

XGBoost for classification (분류)

XGBoost for regression 과 작동 방식이 거의 동일하다.

XGBoost for classification의 작동 방식은,

XGBoost for regression 과 Gradient Boosting for classificaition 의 작동 방식에 대해 알고있다면

훨씬 더 쉽게 이해할 수 있다.

Gradient & Hessian (그래디언트 & 헤시안)

Similarity Score

Output Value

regression 에서는,

현재 node로 분류된 data 개수인 n 으로 나누지만

classification 에서는,

Gradient Boosting for classification 의 gamma 를 구하는 공식과 마찬가지로

p * (1-p) 값들의 합으로 나눈다.

Cover (classification)

특정 노드 (node) 의 헤시안 (Hessian) 값들을 모두 더한 값을

"Cover" 라고 한다.

XGBoost 에서는 minimum value for cover ( min_child_weight ) 를 설정할 수 있는데,

regression 이나 classification 모두 default = 1 이다.

특정 node에서의 cover 값이 min_child_weight 값보다 작으면,

XGBoost 에서는 그 node의 생성을 허용하지 않는다.

regression에서 cover = n (node에 속한 data 개수) 이고,

특정 node에 반드시 하나 이상의 residuals가 들어가기 때문에

min_child_weight = 1 인 경우 (default),

모든 노드에서 cover 값이 1 이상이 된다.

따라서 Tree 확장에 영향을 끼치지 않는다.

하지만 classification 에서는

cover 값이 1보다 작을 수 있기 때문에,

Tree 확장에 영향을 미칠 수 있다.

◎ References

SVM (Support Vector Machine) (0)	2023.05.17
CatBoost (0)	2023.05.13
XGBoost for regression (0)	2023.05.05
Gradient Boosting (그래디언트 부스팅) for classification (0)	2023.05.03
Probability and Likelihood (0)	2023.05.01

How have I been doing so far?