subset {SparkR} | R Documentation |
Return subsets of SparkDataFrame according to given conditions
subset(x, ...) ## S4 method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] ## S4 replacement method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] <- value ## S4 method for signature 'SparkDataFrame' x[i, j, ..., drop = F] ## S4 method for signature 'SparkDataFrame' subset(x, subset, select, drop = F, ...)
x |
a SparkDataFrame. |
... |
currently not used. |
i, subset |
(Optional) a logical expression to filter on rows. For extract operator [[ and replacement operator [[<-, the indexing parameter for a single Column. |
value |
a Column or an atomic vector in the length of 1 as literal value, or |
j, select |
expression for the single Column or a list of columns to select from the SparkDataFrame. |
drop |
if TRUE, a Column will be returned if the resulting dataset has only one column. Otherwise, a SparkDataFrame will always be returned. |
A new SparkDataFrame containing only the rows that meet the condition with selected columns.
[[ since 1.4.0
[[<- since 2.1.1
[ since 1.4.0
subset since 1.5.0
Other SparkDataFrame functions: SparkDataFrame-class
,
agg
, alias
,
arrange
, as.data.frame
,
attach,SparkDataFrame-method
,
broadcast
, cache
,
checkpoint
, coalesce
,
collect
, colnames
,
coltypes
,
createOrReplaceTempView
,
crossJoin
, cube
,
dapplyCollect
, dapply
,
describe
, dim
,
distinct
, dropDuplicates
,
dropna
, drop
,
dtypes
, exceptAll
,
except
, explain
,
filter
, first
,
gapplyCollect
, gapply
,
getNumPartitions
, group_by
,
head
, hint
,
histogram
, insertInto
,
intersectAll
, intersect
,
isLocal
, isStreaming
,
join
, limit
,
localCheckpoint
, merge
,
mutate
, ncol
,
nrow
, persist
,
printSchema
, randomSplit
,
rbind
, rename
,
repartitionByRange
,
repartition
, rollup
,
sample
, saveAsTable
,
schema
, selectExpr
,
select
, showDF
,
show
, storageLevel
,
str
, summary
,
take
, toJSON
,
unionByName
, union
,
unpersist
, withColumn
,
withWatermark
, with
,
write.df
, write.jdbc
,
write.json
, write.orc
,
write.parquet
, write.stream
,
write.text
Other subsetting functions: filter
,
select
## Not run:
##D # Columns can be selected using [[ and [
##D df[[2]] == df[["age"]]
##D df[,2] == df[,"age"]
##D df[,c("name", "age")]
##D # Or to filter rows
##D df[df$age > 20,]
##D # SparkDataFrame can be subset on both rows and Columns
##D df[df$name == "Smith", c(1,2)]
##D df[df$age %in% c(19, 30), 1:2]
##D subset(df, df$age %in% c(19, 30), 1:2)
##D subset(df, df$age %in% c(19), select = c(1,2))
##D subset(df, select = c(1,2))
##D # Columns can be selected and set
##D df[["age"]] <- 23
##D df[[1]] <- df$age
##D df[[2]] <- NULL # drop column
## End(Not run)