spam7 | R Documentation |
The data consist of 4601 email items, of which 1813 items were identified as spam. This is a subset of the full dataset, with six only of the 57 explanatory variables in the complete dataset.
spam7
Columns included are:
total length of uninterrupted sequences of capitals
Occurrences of the dollar sign, as percent of total number of characters
Occurrences of ‘!’, as percent of total number of characters
Occurrences of ‘money’, as percent of total number of words
Occurrences of the string ‘000’, as percent of total number of words
Occurrences of ‘make’, as a percent of total number of words
outcome variable, a factor with levels
n
not spam,
y
spam
George Forman, Hewlett-Packard Laboratories
The complete dataset, and documentation, are available from Spam database
require(rpart)
spam.rpart <- rpart(formula = yesno ~ crl.tot + dollar + bang +
money + n000 + make, data=spam7)
plot(spam.rpart)
text(spam.rpart)