train an xgboost cross validation classification model with k-fold cross-validation

Usage

.mldpEHR.cv_train_outcome(
  target,
  features,
  folds,
  required_conditions = "id==id",
  xgboost_params = list(booster = "gbtree", objective = "binary:logistic", subsample =
    0.7, max_depth = 3, colsample_bytree = 1, eta = 0.05, min_child_weight = 1, gamma =
    0, eval_metric = "auc"),
  nrounds = 1000
)

Arguments

target

data.frame containing the patient id, sex, target_class (0/1) and fold (number used to assigne to cross validation folds)

features

data.frame containing patient id along with all other features to be used in classification model

folds

number of cross-validation folds

required_conditions

any filter to apply to the features to filter out training/testing samples (e.g. missing data)

xgboost_params

parameters used for xgboost model training

nrounds

number of training rounds

Value

a predictor, a list with the following elements

model - list of xgboost models, for each fold
train - data.frame containing the patients id, fold, target class and predicted value in training (each id was used in nfolds-1 for training)
test - data.frame containing the patients id, fold, target class and predicted value in testing (each id was tested once in the fold it was not used for training)
xgboost_params - the set of parameters used in xgboost
nrounds - number of training iterations conducted