Hi Xin,

Mar 13, 2022

Hi Xin,

In sklearn’s Random Forest classifier, there is a hyperparameter called class_weight that can assist with the unbalanced data set problem. You can use this hyperparameter to play with the class weighting in the training or bootstrap sample. See link below. Regarding the statement “larger class will get a low error rate…” that is meant to state the problem with having unbalanced data sets and putting it in a model. We want to avoid that scenario, so we use various methods to try to overcome problems of unbalanced data sets.

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Julia Kho

Responses (1)