Pierre Geurts, Cristina Olaru, Louis Wehenkel
One of the main difficulties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting tree is strongly dependent on the random nature of the particular sample used for training. Con-sequently, these algorithms tend to be suboptimal in terms of accuracy and interpretability. This paper analyses this problem in depth and proposes a new method, relying on threshold softening, able to significantly improve the bias/variance tradeoff of decision trees. The algorithm is validated on a number of benchmark problems and its relationship with fuzzy decision tree induction is discussed. This sheds some light on the success of fuzzy deci-sion tree induction and improves our understanding of machine learning, in general. Keywords: decision trees, variance, threshold softening
© 2001-2024 Fundación Dialnet · Todos los derechos reservados