배깅 앙상블 (Bagging Ensemble): Random Forest

앙상블(Ensemble) 방법 중 배깅(Bagging)의 대표적인 예시인 Random Forest 알고리즘에 대해 정리한다.

Random Forest

여러 개의 Decision tree들이 모여서 숲을 이룬다는 의미에서 Forest라고 부른다. 각 트리는 비교적 예측을 잘 할 수 있지만, 데이터의 일부에 대해 과대적합 하는 경향을 가진다는데 기초한다. 따라서 각 트리를 많이 만들어 그 결과를 평균내어 과대적합을 방지할 수 있다.

학습 방법

Bootstrap
- 주어진 데이터셋으로부터 random sampling을 통해 각 decision tree를 만들기 위한 subset 생성 (중복 허용)
Decision Tree
- Bootstrap을 통해 생성된 각각의 데이터셋에 대한 Decision tree들을 구성
Ensemble
- Decision tree의 예측 결과를 voting하여 최종 예측값 얻음

장점

classification 및 regression 문제에 모두 사용 가능
missing value(결측치)를 다루기 용이
대용량 데이터 처리에 효과적
모델의 노이즈를 심화시키는 과대적합 문제 최소화, 모델 정확도 향상
classification 모델에서 상대적으로 중요한 변수를 선정 및 순위 매기기 가능

python code

python scikit-learn 라이브러리의 sklearn.ensemble.RandomForestClassifier 또는 sklearn.ensemble.RandomForestRegressor를 이용해 Random Forest를 사용할 수 있다.

# library load
from sklearn.ensemble import RandomForestRegressor

# build model
mdl = RandomForestRegressor()

# fit (training)
mdl.fit(X_trn, y_trn)

# predict (testing)
mdl.predict(X_tst, y_tst)

Reference

https://tyami.github.io/machine%20learning/ensemble-2-bagging-random-forest/
http://www.incodom.kr/Random_Forest
Müller, Andreas C., and Sarah Guido. Introduction to machine learning with Python: a guide for data scientists. " O'Reilly Media, Inc.", 2016.

'Machine Learning' 카테고리의 다른 글

클러스터링 평가 지표(Clustering Evaluation Metrics) (0)	2022.09.19
부스팅 앙상블 (Boosting Ensemble): AdaBoost (0)	2022.04.05
앙상블 (Ensemble)의 개념 (0)	2022.04.04
의사결정 나무 (Decision Tree) ID3 알고리즘 (0)	2022.04.04
의사결정나무, Decision Tree (0)	2022.04.04

열쩡강쥐

배깅 앙상블 (Bagging Ensemble): Random Forest

Random Forest

학습 방법

장점

python code

Reference

'Machine Learning' 카테고리의 다른 글

댓글

티스토리툴바

배깅 앙상블 (Bagging Ensemble): Random Forest

Random Forest

학습 방법

장점

python code

Reference

'Machine Learning' 카테고리의 다른 글

관련글

댓글

티스토리툴바