Comp578                                                                                                                     Susan Portugal Fall 2008

Assignment 5                                                                           October 2, 2008

Consider a binary classification problem with the following set of attribute and attribute values:

                     • Air Conditioner = {Working, Broken}
                     • Engine = { Good, Bad}
                     • Mileage = {High, Medium, Low}
                     • Rust = {Yes, No}
Suppose a rule-based classifier produces the following rule set:
                                        Mileage = High Value = Low
                                        Milage = Low  Value = High
                                        Air Conditioner = Working, Engine = Good  Value = High
                                        Air Conditioner = Working, Engine = Bad Value = Low
                                        Air Conditioner = Broken  Value = Low


(a) Are the rules mutually exclustive?

The rules are not mutually exclusive.  A rule set is mutually exclusive if no two rules in the set are triggered by the same record.  The condition of a car with a broken air condition and has low miles is not present within the rule set; a broken air condition leads to a low value car and car with low miles leads to a high value car.

(b) Is the rule set exhaustive?

The rule set is not exhausive due to the rule set has more than one attribute for one category.  There are three conditions for airconditioner and two for milleage.  The rule set is also missing the rust attribute and how it affects the value of the car.

(c) Is ordering needed for this set of rules?

If order was implemented into the rule set then the user would have an easier time classifying the value of a car.  Ordering the rule set would also help make sure that the set would be mutually exclusive.  Another attribute that the rule set can take into account would be rust.  High mileage is one attribute that can not be fixed where the the other attibutes can be fix though they do cost money on the end user.  The other attributes can classify if the car is truly high or low.  For instance, if a car has a working air condition, good engine, no rust, but high miles then this car can be classified as an average car value.

(d) Do you need a default class for the rule set?

“A default rule has an empty anecedent and is triggered when all other rules have failed.”  A default rule would help in the area of the rule set not being mutually exclusive and order would help in cover all situation giving the default high or low value based on mileage.  The other attributes would classify if the value is truly low or average.

The RIPPER algorithm is an extension of an earlier algorithm called IREP.   Both algorithms apply the reduced-error pruning method to determine whether a rule needs to be pruned,  The reduced error pruning method uses a validation set to estimate the generalization error of a classifier.  Consider the following pair of rules:

                                      R1: A C
                                      R2: A Λ B C

R2 is obtained by adding a new conjunct, B, to the left-hand side of R1.  For this question, you will be asked to determine whether R2 is preferred over R1 from the perspectives of rule-growing and rule-pruning.  To determine whether a rule should be pruned, IREP computer the following measure:
                                     VIREP [p + (N – n)] / [P + N]

where P is the total number of positive examples in the validation set, N is the total number of negative examples in the validation set, p is the number of positive examples in the validation set covered by the rule, and n is the number of negative examples in the validation set covered by the rule.  VIREP is actually similar to the classification accuracy for the validation set.  IREP favors rules that have higher values of vIREP.  On the other hand, RIPPER applies the following measure to determine whether a rule should be pruned:

                                     VRIPPER =  [p-n]/[ p+n]

(a)             Suppose R1 is covered by 350 positive examples and 150 negative examples, while R2 is covered by 300 positive examples and 50 negative examples.  Compute the FOIL’s information gain for the rule R2 with respect to R1.

(b)              Consider a validation set that contains 500 positive examples and 500 negative examples.  For R1, suppose the number of positive examples covered by the rule is 200, and the number of negative examples covered by the rule is 50.  For R2, suppose the number of positive examples covered by the rule is 100 and the number of negative examples is 5.  Compute the vIREP for both rules.   While rule does IREP prefer?

(c)               Compute vRIPPER for the previous problem.  Which rule does RIPPER prefer?

Consider the data set shown in Table 5.10

(a)             Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|-), P(B|-) and     P(C|-).

(b)            Use the estimate of conditional probabilities given in the previous question to predict the class label for a test sample (A = 0, B = 1, C = 0) using the naïve Bayes approach.

(c)             Estimate the conditional probabilities using the m-estimate approach, with p = 1.2 and   m = 4.

(d)            Repeat part (b) using the conditional probabilities given in part (c).

(e)            Compare the 2 methods for estimating probabilities.  Which method is better and why?

When comparing the two methods for estimating probabilities, the m-estimate approach is a better method. Due to the small sample size this m-estimated approach is preferred even though additional information (p and m) is needed to find one's results.