Sets up control object for linear or nonlinear modeling of a response variable onto a large panel of
textual sentiment measures (and potentially other variables). See sento_model for details on the
estimation and calibration procedure.
ctr_model(
model = c("gaussian", "binomial", "multinomial"),
type = c("BIC", "AIC", "Cp", "cv"),
do.intercept = TRUE,
do.iter = FALSE,
h = 0,
oos = 0,
do.difference = FALSE,
alphas = seq(0, 1, by = 0.2),
lambdas = NULL,
nSample = NULL,
trainWindow = NULL,
testWindow = NULL,
start = 1,
do.shrinkage.x = FALSE,
do.progress = TRUE,
nCore = 1
)
Arguments
| model |
a character vector with one of the following: "gaussian" (linear regression), "binomial"
(binomial logistic regression), or "multinomial" (multinomial logistic regression). |
| type |
a character vector indicating which model calibration approach to use. Supports "BIC",
"AIC" and "Cp" (Mallows's Cp) as sparse regression adapted information criteria (Tibshirani and Taylor,
2012; Zou, Hastie and Tibshirani, 2007), and "cv" (cross-validation based on the train
function from the caret package). The adapted information criteria are only available for a linear regression. |
| do.intercept |
a logical, TRUE by default fits an intercept. |
| do.iter |
a logical, TRUE induces an iterative estimation of models at the given nSample size and
performs the associated out-of-sample prediction exercise through time. |
| h |
an integer value that shifts the time series to have the desired prediction setup; h = 0 means
no change to the input data (nowcasting assuming data is aligned properly), h > 0 shifts the dependent variable by
h periods (i.e., rows) further in time (forecasting), h < 0 shifts the independent variables by h
periods. |
| oos |
a non-negative integer to indicate the number of periods to skip from the end of the training sample
up to the out-of-sample prediction(s). This is either used in the cross-validation based calibration approach
(if type = "cv"), or for the iterative out-of-sample prediction analysis (if do.iter = TRUE). For
instance, given \(t\), the (first) out-of-sample prediction is computed at \(t +\) oos \(+ 1\). |
| do.difference |
a logical, TRUE will difference the target variable y supplied in the
sento_model function with as lag the absolute value of the h argument, but
abs(h) > 0 is required. For example, if h = 2, and assuming the y variable is properly aligned
date-wise with the explanatory variables denoted by \(X\) (the sentiment measures and other in x), the regression
will be of \(y_{t + 2} - y_t\) on \(X_t\). If h = -2, the regression fitted is \(y_{t + 2} - y_t\) on
\(X_{t+2}\). The argument is always kept at FALSE if the model argument is one of
c("binomial", "multinomial"). |
| alphas |
a numeric vector of the alphas to test for during calibration, between 0 and 1. A value of
0 pertains to Ridge regression, a value of 1 to LASSO regression; values in between are pure elastic net. |
| lambdas |
a numeric vector of the lambdas to test for during calibration, \(>= 0\).
A value of zero means no regularization, thus requires care when the data is fat. By default set to
NULL, such that the lambdas sequence is generated by the glmnet function
or set to 10^seq(2, -2, length.out = 100) in case of cross-validation. |
| nSample |
a positive integer as the size of the sample for model estimation at every iteration (ignored if
do.iter = FALSE). |
| trainWindow |
a positive integer as the size of the training sample for cross-validation (ignored if
type != "cv"). |
| testWindow |
a positive integer as the size of the test sample for cross-validation (ignored if type !=
"cv"). |
| start |
a positive integer to indicate at which point the iteration has to start (ignored if
do.iter = FALSE). For example, given 100 possible iterations, start = 70 leads to model estimations
only for the last 31 samples. |
| do.shrinkage.x |
a logical vector to indicate which of the other regressors provided through the x
argument of the sento_model function should be subject to shrinkage (TRUE). If argument is of
length one, it applies to all external regressors. |
| do.progress |
a logical, if TRUE progress statements are displayed during model calibration. |
| nCore |
a positive integer to indicate the number of cores to use for a parallel iterative model
estimation (do.iter = TRUE). We use the %dopar% construct from the foreach package. By default,
nCore = 1, which implies no parallelization. No progress statements are displayed whatsoever when nCore > 1.
For cross-validation models, parallelization can also be carried out for a single-shot model (do.iter = FALSE),
whenever a parallel backend is set up. See the examples in sento_model. |
Value
A list encapsulating the control parameters.
References
Tibshirani and Taylor (2012). Degrees of freedom in LASSO problems.
The Annals of Statistics 40, 1198-1232, doi: 10.1214/12-AOS1003
.
Zou, Hastie and Tibshirani (2007). On the degrees of freedom of the LASSO.
The Annals of Statistics 35, 2173-2192, doi: 10.1214/009053607000000127
.
See also
Author
Samuel Borms, Keven Bluteau
Examples