Data Science: Random Forests

Random Forest is a trademark term for an ensemble of decision trees.

Unlike single decision trees which are likely to suffer from high Variance or high [Bias] (depending on how they are tuned) Random Forests use averaging to find a natural balance between the two extremes.

[Error due to Bias - Difference between expect(or average) prediction of our model and the correct value which we are trying to predict.

Error due to variance - The variability of a model prediction at a given data point.]

Bagging / Bootstrap aggregation is a technique for reducing the variance of an estimated prediction function.

Bagging seems to work for high variance law bias procedure, such as tree.

Random Forest is substantial modification of bagging that builds a large collection of de-correlated trees and then average them.

Pros:

Accuracy

Cons:

Speed
Interpretability
Overfitting

Random forests are one of the two top performing algorithms along with Boosting in prediction contests.

Random forests are difficult to interpret but often very accurate.

Care should be taken to avoid overfitting.

Data Science

Saturday, 13 September 2014

Random Forests

No comments:

Post a Comment