Conclusion
There are many free data mining software available that are being updated with new features, algorithms and functionalities. In many areas, free software has proven itself to be a viable (if not superior) alternative to commercial software. Data mining software is no different and the free alternatives we have covered in this project are indeed very good alternatives, if not superior, commercial alternatives SAS and SPLUS.
There two software packages, Shih and Weka, have been chosen because they are easy enough to use for instructional purposes and are well documented. Weka is also suitable for practical data mining as it employs many of the most current data mining methods.
Shih Data Miner
Shih is a very simple to use data mining tool. It specializes in decision tree methods, and its interactive tree building method is both easy to use and very informative. In general, it has good visualization tools and the user will have little difficulty in understanding its outputs. The documentation is quite extensive, clear and offers a good source of help.
For these reasons, Shih is well suited for instructional purposes such as in a classroom setting. The trees can be built automatically or interactively to highlight the whole tree building process and thus Shih is easily more suited for instructional use than SAS or SPLUS.
Weka Learning Environment
Compared with Shih, Weka with its 3 interfaces and endless options, methods and functionalities is a lot more complex. A lot of this complexity, however, is hidden and revealed only when needed. The KnowledgeFlow interface is quite similar to SAS in how experiments are defined and it is quite simple to use. Having 3 interfaces also shields the user from having to face unnecessary complexity. For example, if one only wants to work on one model at time, one will just use the Explorer interface and never have to look at anything else.
Instructional Use:
The Explorer interface is most suited for this purpose. It is very easy to use, but still quite powerful as all the learning methods and filters of Weka are accessible in this interface. As long as the goal is to work with one learning model at a time, there is no need to consider the other interfaces of Weka.
When the goal is to compare different models, the Experimenter is good for this task. Moreover, for simple experiments, it is not difficult to use this interface. If a network flow model, as in SAS, is desired to define experiments the KnowledgeFlow is an excellent solution.
Practical/Research Use:
Weka with its very current learning methods offers powerful data mining solutions. It is well suited for real world data mining problems. Weka is an open source project and thus its source code is available to the public. This makes it excellent for research purposes. The source can be obtained and one is free to modify it to not only implement new methods but also modify or extend current methods.
Documentation:
The documentation is quite extensive and even available within the software. Another source of help is a book, Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten and Eibe Frank. It is based on Weka and it is an excellent guide to the software and data mining in general. Lastly, there is a very active community of users and software developers that are a good source of help. There is even a mailing list archive that contains all the questions users have ever had about Weka and the answers they received.