Getting Started
Getting Started:
Getting Weka & Licensing:
Weka is available for download from the University of Waikato website at the following URL:
http://www.cs.waikato.ac.nz/~ml/index.html
Weka is free (open source) software released under the GNU general public license (GPL). This means that not only is Weka free to download, and use…but that its source code is also available to be freely modified and used. This makes Weka perfectly suited to be used in the university environment for research and teaching purposes.
The only restrictions lie in how the software may be distributed to the public. In short, one does not have the right to commercially redistribute GPL software. If one is distributing the software and they have made modifications to the software, these modifications along with the software must be made also be distributed freely. A full description of the license can be obtained at this website:
http://www.gnu.org/copyleft/gpl.html
Hardware and Software Requirements:
Since Weka is written in pure java, it is able to run on most hardware and operating systems. Naturally, the hardware requirements will vary depending on the nature and size of problem, though the basic requirements to run Weka are quite small and should be able to run on even older hardware. For the purposes of this project, we will be using the Weka System in a Windows environment.
For large problems, the data can be analyzed on a remote machine or by several machines in parallel. An alternate version of Weka called Grid Weka makes it easy to set up Weka to use the resources of several computers at once when performing functions. This is done by setting up Weka servers. The work is distributed among the Weka servers transparently. Multiple servers can be run on separate machines and, in the case the machines are have multiple processors, many servers can be set-up on the same machine.
This version can be obtained along with documentation on the website below:
http://smi.ucd.ie/~rinat/weka/
Installation and Launching the Program:
Installation is easy in Windows as Weka comes with its own self-installer. The installation process is straightforward. The program can be launched easily by double clicking “runweka.bat” in the program directory or more easily by using the start menu shortcut.
Program Launching:
When the program is first launched, the user is shown two windows, the Log Window and the GUI (Graphical User Interface) Chooser. These two windows are shown next.
The Log Window:
The purpose of the log window (right) is to provide a detailed report of any errors or operations taking place. For our purposes this window will not be useful and can be minimized out of view.
The GUI Chooser:
The GUI chooser is used to start
different interfaces of the Weka Environment. These interfaces can be
considered different programs and they vary in form, function and purpose.
Depending on the specific need whether it be simple data exploration, detailed experimentation or tackling very large problems, these different interfaces of Weka will be more appropriate.
This project will mainly focus on the Explorer, Experimenter and KnowledgeFlow interfaces of the Weka Environment. They can be launched by clicking on the Explorer and Experimenter buttons at the bottom of the window.
The other interface, Simple CLI is also available and will be examined briefly.
Simple CLI (Command Line
Interface):
The CLI is a text based interface to the Weka Environment.
It is the most memory efficient interface available in Weka.
The interface is simple only in the sense that it is quite barren. There are no buttons or menus to navigate.
The simplicity of interface comes at the cost of ease of use however. Commands are entered in the white box at the bottom of the window and program responses are shown in the grey box.
This interface, while quite powerful, is also the most unintuitive one to use. However, the commands themselves are not difficult to use as Weka is organized in a very natural hierarchical way. For example to use the J48 classifier on the dataset iris.arff one would enter the command:
java
weka.classifiers.trees.J48 -t ./Data/iris.arff
All Weka commands follow a similar structure, i.e., “java”, followed by a Weka environment object name and lastly the options for that object. An object represents a certain function of Weka. In the above example, we wanted to use the J48 classifier so the object name is “weka.classifiers.trees.J48”. All tree based classifiers have object names that begin with “weka.classifiers.trees”. Similarly all rule based classifiers have names that begin with “weka.classifiers.rules”.
Using the CLI is the best way to learn the intricate details of how Weka actually is designed and how it works. This is important if one intends to use Weka internally in an application being developed.