Julia Kho
Mar 13, 2022

Hi Xin,

In sklearn’s Random Forest classifier, there is a hyperparameter called class_weight that can assist with the unbalanced data set problem. You can use this hyperparameter to play with the class weighting in the training or bootstrap sample. See link below. Regarding the statement “larger class will get a low error rate…” that is meant to state the problem with having unbalanced data sets and putting it in a model. We want to avoid that scenario, so we use various methods to try to overcome problems of unbalanced data sets.

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Julia Kho
Julia Kho

Written by Julia Kho

Julia is an analytics professional who loves to write easy to understand Python and data science articles for beginners

Responses (1)

Write a response