Random Forest 및 앙상블

보팅 (Voting)

하드 보팅 : 다수 모델이 예측한 값 선택
소프트 보팅 : 가장 확률이 높은 것 선택

배깅 (Bagging)

중복을 허용하는 샘플링(Bootstrap Sample) > 다시 뺏다가 넣어서 다시 랜덤으로 뽑기 (복원 랜덤 샘플링 방식)

같은 유형의 알고리즘 기반 모델을 사용 - Random Forest

n_estimators : Decision Tree 개수

max_depth : 깊이의 의미

부스팅 (Boosting)

예측하지 못한 데이터에 가중치를 부여하는 방식

속도가 느리고 과적합이 발생할 가능성이 있다. (의리주)

XGBoost, LightGBM

Gradient Boost : 오차를 줄이고 합치며 최종 예측 값확인

XGBoost : GBM에 비해 빠르지만 느리다.

#!pip install xgboost
#!pip install lightgbm

스태킹 (Stacking)

여러 모델의 예측 값을 최종 모델의 학습 데이터로 사용하여 예측하는 방법

Random Forest

.feature_importances_는 100개 트리의 변수 중요도의 평균값을 보여준다.

1번째 트리는 원하는 .estimators_[0]를 앞에 적는다.

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(max_depth=5)

model.fit(x_train, y_train)
y_pred = model.predict(x_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

# Feature 중요도 확인
plt.barh(list(x),model.estimators_[0].feature_importances_)
plt.show()

XGBoost

from xgboost import XGBClassifier
model = XGBClassifier(max_depth=5)

model.fit(x_train, y_train)
y_pred = model.predict(x_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

- 트리 시각화

#!pip install graphviz
from xgboost import plot_tree
plot_tree(model, num_trees=0)
plt.show()

LightGBM

from lightgbm import LGBMClassifier
model = LGBMClassifier(max_depth=5, verbose=-1, importance_type='gain') #importance_type='split'

model.fit(x_train, y_train)
y_pred = model.predict(x_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

split : 질문에 얼마나 많이 등장했는가 (분할에 얼마나 사용되었나)

gain : 얼마나 많은 역할을 했는가 (손실을 얼마나 줄였나)

- 퍼센트로 표현

tmp = model.feature_importances_
tmp2 = tmp / np.sum(tmp)

plt.barh(list(x), tmp2)
plt.show()

외) 더 공부해야 할 것, ebm, catBoost

'KT AIVLE School > 머신러닝' 카테고리의 다른 글

머신러닝 커맨드 정리 (1)	2024.10.04
클래스 불균형 (0)	2024.10.04
Hyperparameter (1)	2024.10.04
K-Fold Cross Validation (0)	2024.10.02
기본 알고리즘 4가지 (0)	2024.09.30

보팅 (Voting)

배깅 (Bagging)

부스팅 (Boosting)

스태킹 (Stacking)

Random Forest

XGBoost

LightGBM

- 퍼센트로 표현

'KT AIVLE School > 머신러닝' 카테고리의 다른 글

티스토리툴바