R: Generalized Linear Models (R-compliant)

glm {SparkR}

R Documentation

Generalized Linear Models (R-compliant)

Description

Fits a generalized linear model, similarly to R's glm().

Usage

glm(formula, family = gaussian, data, weights, subset, na.action,
  start = NULL, etastart, mustart, offset, control = list(...),
  model = TRUE, method = "glm.fit", x = FALSE, y = TRUE,
  contrasts = NULL, ...)

## S4 method for signature 'formula,ANY,SparkDataFrame'
glm(formula, family = gaussian, data,
  epsilon = 1e-06, maxit = 25, weightCol = NULL)

Arguments

`formula`	a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Refer R family at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html.
`data`	a SparkDataFrame or R's glm data for training.
`weights`	an optional vector of ‘prior weights’ to be used in the fitting process. Should be `NULL` or a numeric vector.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The ‘factory-fresh’ default is `na.omit`. Another possible value is `NULL`, no action. Value `na.exclude` can be useful.
`start`	starting values for the parameters in the linear predictor.
`etastart`	starting values for the linear predictor.
`mustart`	starting values for the vector of means.
`offset`	this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be `NULL` or a numeric vector of length equal to the number of cases. One or more `offset` terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See `model.offset`.
`control`	a list of parameters for controlling the fitting process. For `glm.fit` this is passed to `glm.control`.
`model`	a logical value indicating whether model frame should be included as a component of the returned value.
`method`	the method to be used in fitting the model. The default method `"glm.fit"` uses iteratively reweighted least squares (IWLS): the alternative `"model.frame"` returns the model frame and does no fitting. User-supplied fitting functions can be supplied either as a function or a character string naming a function, with a function which takes the same arguments as `glm.fit`. If specified as a character string it is looked up from within the stats namespace.
`x,y`	For `glm`: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.
`contrasts`	an optional list. See the `contrasts.arg` of `model.matrix.default`.
`...`	For `glm`: arguments to be used to form the default `control` argument if it is not supplied directly. For `weights`: further arguments passed to or from other methods.
`epsilon`	positive convergence tolerance of iterations.
`maxit`	integer giving the maximal number of IRLS iterations.
`weightCol`	the weight column name. If this is not set or `NULL`, we treat all instance weights as 1.0.

Value

glm returns a fitted generalized linear model.

Note

glm since 1.5.0

Examples

## Not run: 
##D sparkR.session()
##D data(iris)
##D df <- createDataFrame(iris)
##D model <- glm(Sepal_Length ~ Sepal_Width, df, family = "gaussian")
##D summary(model)
## End(Not run)

[Package SparkR version 2.1.0 Index]

Generalized Linear Models (R-compliant)

Description

Usage

Arguments

Value

Note

See Also

Examples