group_by {SparkR}R Documentation

GroupBy

Description

Groups the SparkDataFrame using the specified columns, so we can run aggregation on them.

Usage

group_by(x, ...)

groupBy(x, ...)

## S4 method for signature 'SparkDataFrame'
groupBy(x, ...)

## S4 method for signature 'SparkDataFrame'
group_by(x, ...)

Arguments

x

a SparkDataFrame.

...

variable(s) (character names(s) or Column(s)) to group on.

Value

A GroupedData.

Note

groupBy since 1.4.0

group_by since 1.4.0

See Also

Other SparkDataFrame functions: SparkDataFrame-class, agg, arrange, as.data.frame, attach,SparkDataFrame-method, cache, checkpoint, coalesce, collect, colnames, coltypes, createOrReplaceTempView, crossJoin, dapplyCollect, dapply, describe, dim, distinct, dropDuplicates, dropna, drop, dtypes, except, explain, filter, first, gapplyCollect, gapply, getNumPartitions, head, hint, histogram, insertInto, intersect, isLocal, isStreaming, join, limit, merge, mutate, ncol, nrow, persist, printSchema, randomSplit, rbind, registerTempTable, rename, repartition, sample, saveAsTable, schema, selectExpr, select, showDF, show, storageLevel, str, subset, take, toJSON, union, unpersist, withColumn, with, write.df, write.jdbc, write.json, write.orc, write.parquet, write.stream, write.text

Examples

## Not run: 
##D   # Compute the average for all numeric columns grouped by department.
##D   avg(groupBy(df, "department"))
##D 
##D   # Compute the max age and average salary, grouped by department and gender.
##D   agg(groupBy(df, "department", "gender"), salary="avg", "age" -> "max")
## End(Not run)

[Package SparkR version 2.2.1 Index]