Setting Up the Experiment

To demonstrate using the KnowledgeFlow interface, we will redo the experiment done on the Experimenter interface using this interface. Unlike the experiment though, we will start with the raw CSV file and preprocess it as we did when we were using the Explorer interface.

Basic Experiment Setup

The first step of the basic experiment setup involves defining how the data will be brought in. The second step is to defined what (if any) pre-processing needs to take place. The next is to define how the model will be trained and tested and what learning scheme will be used to build the model. The final step is to define how the results of the experiment will be displayed and/or saved.

The Birth-weight Example

For our experiment, we will first require a Datasource object to bring in the data. Since we are getting the data from a CSV file, we will add a CSV Loader object on to the layout. We will configure it to load the ‘birth.csv’ file, which is the birth weight datafile.

 

When we examined this datafile using the Explorer we had to drop some unnecessary attributes and also convert some nominal attributes Weka did not recognize as such. We will first deal with the second problem by adding the unsupervised Discretize filter and configure it to discretize attributes 4 to 8 and 10 (smoke, ptl, ht, ui, ftv and weight). To drop the unnecessary attributes we shall use the Remove filter and configure it to remove attributes 1 and 9(bwt and id).

 

Both filters require a dataset as input and produce a dataset as output. We will first connect the CSV Loader object to the Discretize object and then connect this to the Remove object. We also need to ensure that weight is specified as the class variable.

 

The way to do this is to the Class Assigner object and set it to use the last attribute (weight) as the class variable. This object is located in the Evaluator tab in the object selection area. This object requires a dataset as input and produces a dataset as output so we connect it accordingly. The experiment layout at this point should look something like the picture above.

 


The next step is to add our classifiers and connect the training and testing datasets to them. For this experiment, we will be using the J48 (with binary splits) and the REPTree algorithm. They are found in the Classifiers tab in the object selection area. After adding them to the layout, we will connect them to the CrossValidation FoldMaker. This is shown in the picture on the right.

We also need a way to measure the performance of these classifiers and to do this we will add a Classifier PerformanceEvaluator object to each classifier. Finally we need a way to visualize the performance of both models and to do this we will add a graphical and a text visualizer.

 

The Model PerformanceChart object is a graphical visualizer that takes in threshold data as input and produces ROC style curves. It can accept more than one classifier so it is possible to view both curves on the same graph.

 

The graph displayed is identical to the threshold curve we looked at in the explorer interface, except it can display multiple curves. A similar functionality is planned for the cost and margin curve functions but as of yet not implemented. The TextViewer object is a general-purpose text viewer and takes any text as input. This object will display a summary of the performance of the two models.

 

The experiment setup is now complete and we can start it by right clicking on the first object, the CSV loader, and from the menu that pops up select ‘Start Loading’. The running state of the experiment should be displayed in the status area as well as in the log window but both are not working in this version of Weka. To look at the results we can right click on the visualization objects and select ‘Show Plot’ or ‘Show Results’. Both displays will keep the results of past experiment runs and can be recalled to be viewed again.