Generation creates a simulated distribution from specify().
In the context of confidence intervals, this is a bootstrap distribution
based on the result of specify(). In the context of hypothesis testing,
this is a null distribution based on the result of specify() and
hypothesize().
Learn more in vignette("infer").
generate(x, reps = 1, type = NULL, variables = !!response_expr(x), ...)| x | A data frame that can be coerced into a tibble. |
|---|---|
| reps | The number of resamples to generate. |
| type | The method used to generate resamples of the observed
data reflecting the null hypothesis. Currently one of
|
| variables | If |
| ... | Currently ignored. |
A tibble containing reps generated datasets, indicated by the
replicate column.
The type argument determines the method used to create the null
distribution.
bootstrap: A bootstrap sample will be drawn for each replicate,
where a sample of size equal to the input sample size is drawn (with
replacement) from the input sample data.
permute: For each replicate, each input value will be randomly
reassigned (without replacement) to a new output value in the sample.
draw: A value will be sampled from a theoretical distribution
with parameters specified in hypothesize() for each replicate. This
option is currently only applicable for testing point estimates. This
generation type was previously called "simulate", which has been
superseded.
Other core functions:
calculate(),
hypothesize(),
specify()
# generate a null distribution by taking 200 bootstrap samples
gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
generate(reps = 200, type = "bootstrap")
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 100,000 × 2
#> # Groups: replicate [200]
#> replicate hours
#> <int> <dbl>
#> 1 1 48.6
#> 2 1 38.6
#> 3 1 38.6
#> 4 1 8.62
#> 5 1 38.6
#> 6 1 38.6
#> 7 1 18.6
#> 8 1 38.6
#> 9 1 38.6
#> 10 1 58.6
#> # … with 99,990 more rows
# generate a null distribution for the independence of
# two variables by permuting their values 1000 times
gss %>%
specify(partyid ~ age) %>%
hypothesize(null = "independence") %>%
generate(reps = 200, type = "permute")
#> Dropping unused factor levels DK from the supplied response variable 'partyid'.
#> Response: partyid (factor)
#> Explanatory: age (numeric)
#> Null Hypothesis: independence
#> # A tibble: 100,000 × 3
#> # Groups: replicate [200]
#> partyid age replicate
#> <fct> <dbl> <int>
#> 1 rep 36 1
#> 2 ind 34 1
#> 3 dem 24 1
#> 4 dem 42 1
#> 5 ind 31 1
#> 6 dem 32 1
#> 7 ind 48 1
#> 8 rep 36 1
#> 9 ind 30 1
#> 10 ind 33 1
#> # … with 99,990 more rows
# more in-depth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}