spark.fpGrowth {SparkR} | R Documentation |
A parallel FP-growth algorithm to mine frequent itemsets.
spark.fpGrowth
fits a FP-growth model on a SparkDataFrame. Users can
spark.freqItemsets
to get frequent itemsets, spark.associationRules
to get
association rules, predict
to make predictions on new data based on generated association
rules, and write.ml
/read.ml
to save/load fitted models.
For more details, see
FP-growth.
spark.fpGrowth(data, ...) spark.freqItemsets(object) spark.associationRules(object) ## S4 method for signature 'SparkDataFrame' spark.fpGrowth(data, minSupport = 0.3, minConfidence = 0.8, itemsCol = "items", numPartitions = NULL) ## S4 method for signature 'FPGrowthModel' spark.freqItemsets(object) ## S4 method for signature 'FPGrowthModel' spark.associationRules(object) ## S4 method for signature 'FPGrowthModel' predict(object, newData) ## S4 method for signature 'FPGrowthModel,character' write.ml(object, path, overwrite = FALSE)
data |
A SparkDataFrame for training. |
... |
additional argument(s) passed to the method. |
object |
a fitted FPGrowth model. |
minSupport |
Minimal support level. |
minConfidence |
Minimal confidence level. |
itemsCol |
Features column name. |
numPartitions |
Number of partitions used for fitting. |
newData |
a SparkDataFrame for testing. |
path |
the directory where the model is saved. |
overwrite |
logical value indicating whether to overwrite if the output path already exists. Default is FALSE which means throw exception if the output path exists. |
spark.fpGrowth
returns a fitted FPGrowth model.
A SparkDataFrame
with frequent itemsets.
The SparkDataFrame
contains two columns:
items
(an array of the same type as the input column)
and freq
(frequency of the itemset).
A SparkDataFrame
with association rules.
The SparkDataFrame
contains three columns:
antecedent
(an array of the same type as the input column),
consequent
(an array of the same type as the input column),
and condfidence
(confidence).
predict
returns a SparkDataFrame containing predicted values.
spark.fpGrowth since 2.2.0
spark.freqItemsets(FPGrowthModel) since 2.2.0
spark.associationRules(FPGrowthModel) since 2.2.0
predict(FPGrowthModel) since 2.2.0
write.ml(FPGrowthModel, character) since 2.2.0
## Not run:
##D raw_data <- read.df(
##D "data/mllib/sample_fpgrowth.txt",
##D source = "csv",
##D schema = structType(structField("raw_items", "string")))
##D
##D data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
##D model <- spark.fpGrowth(data)
##D
##D # Show frequent itemsets
##D frequent_itemsets <- spark.freqItemsets(model)
##D showDF(frequent_itemsets)
##D
##D # Show association rules
##D association_rules <- spark.associationRules(model)
##D showDF(association_rules)
##D
##D # Predict on new data
##D new_itemsets <- data.frame(items = c("t", "t,s"))
##D new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
##D predict(model, new_data)
##D
##D # Save and load model
##D path <- "/path/to/model"
##D write.ml(model, path)
##D read.ml(path)
##D
##D # Optional arguments
##D baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
##D another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5,
##D itemsCol = "baskets", numPartitions = 10)
## End(Not run)