Data Mining

This project started out as a thesis project I did to complete my masters degree in statistics at KUL in Leuven, Belgium. The thesis was on comparing free data mining tools to the commercial alternatives with the focus on decision tree(classification tree) methods. The end goal was to identify free data mining tools that were suitable for instructional as well as practical/research purposes.

Although other software is reviewed in this project, the focus is on the Weka data mining system. Weka is build by people who donate thier time and effort freely, and it is freely offered to the publicas well. It is in this spirit that I would like to contribute and carry on this documentation project to the public.

The information currently online is focused on classification tree methods as this was the focus of my thesis. However, since Weka has a very simple interface, for most classification methods, a user does not have to do radically different tasks to use them. This makes the information on this project applicable for most classification tasks.

Here you will find documentation about the different interfaces of Weka, and how they apply to classification tasks. Moreover, as they say, a picture says a thousand words, I also plan to add a list video sequences that show Weka in action.

I should also mention that a very good starting point in relation to Weka and data mining in general is the book, Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten and Eibe Frank. This book offers a wealth of information. More importantly, it presents the many complex concepts in data mining in a very understandable and accessible way. I would greatly recommend this book for anyone wanting more information on Weka and data mining in general