Skip to contents

train an xgboost cross validation classification model with k-fold cross-validation

Usage

.mldpEHR.cv_train_outcome(
  target,
  features,
  folds,
  required_conditions = "id==id",
  xgboost_params = list(booster = "gbtree", objective = "binary:logistic", subsample =
    0.7, max_depth = 3, colsample_bytree = 1, eta = 0.05, min_child_weight = 1, gamma =
    0, eval_metric = "auc"),
  nrounds = 1000
)

Arguments

target
  • data.frame containing the patient id, sex, target_class (0/1) and fold (number used to assigne to cross validation folds)

features
  • data.frame containing patient id along with all other features to be used in classification model

folds
  • number of cross-validation folds

required_conditions
  • any filter to apply to the features to filter out training/testing samples (e.g. missing data)

xgboost_params
  • parameters used for xgboost model training

nrounds
  • number of training rounds

Value

a predictor, a list with the following elements

  • model - list of xgboost models, for each fold

  • train - data.frame containing the patients id, fold, target class and predicted value in training (each id was used in nfolds-1 for training)

  • test - data.frame containing the patients id, fold, target class and predicted value in testing (each id was tested once in the fold it was not used for training)

  • xgboost_params - the set of parameters used in xgboost

  • nrounds - number of training iterations conducted