This page is for parameter specification in ThunderGBM. The parameters used in ThunderGBM are identical to XGBoost (except some newly introduced parameters), so existing XGBoost users can easily get used to ThunderGBM.
-
verbose
[default=1]- Printing information: 0 for silence, 1 for key information and 2 for more information.
-
depth
[default=6]- The maximum depth of the decision trees. Shallow trees tend to have better generality, and deep trees are more likely to overfit the training data.
-
n_trees
[default=40]- The number of training iterations.
n_trees
equals to the number of trees in GBDTs.
- The number of training iterations.
-
n_gpus
[default=1]- The number of GPUs to be used in the training.
-
max_num_bin
[default=255]- The maximum number of bins in a histogram.
-
column_sampling_rate
[default=1]- The sampling ratio of subsampling columns (i.e., features)
-
bagging
[default=0]- This option is for training random forests. Setting it to 1 to perform bagging.
-
n_parallel_trees
[default=1]- This option is used for random forests to specify how many trees per iteration.
-
learning_rate
[default=1, alias(only for c++):eta
]- valid domain: [0,1]. This option is to set the weight of newly trained tree. Use
eta < 1
to mitigate overfitting.
- valid domain: [0,1]. This option is to set the weight of newly trained tree. Use
-
objective
[default="reg:linear"]- valid options include
reg:linear
,reg:logistic
,multi:softprob
,multi:softmax
,rank:pairwise
andrank:ndcg
. reg:linear
is for regression,reg:logistic
andbinary:logistic
are for binary classification.multi:softprob
andmulti:softmax
are for multi-class classification.multi:softprob
outputs probability for each class, andmulti:softmax
outputs the label only.rank:pairwise
andrank:ndcg
are for ranking problems.
- valid options include
-
num_class
[default=1]- set the number of classes in the multi-class classification. This option is not compulsory.
-
min_child_weight
[default=1]- The minimum sum of instance weight (measured by the second order derivative) needed in a child node.
-
lambda_tgbm
[default=1, alias(only for c++):lambda
orreg_lambda
]- L2 regularization term on weights.
-
gamma
[default=1, alias(only for c++):min_split_loss
]- The minimum loss reduction required to make a further split on a leaf node of the tree.
gamma
is used in the pruning stage.
- The minimum loss reduction required to make a further split on a leaf node of the tree.
-
tree_method
[default="auto"]-
"auto": select the approach of finding best splits using the builtin heuristics.
-
"exact": find the best split using enumeration on all the possible feature values.
-
"hist": find the best split using histogram based approach.
-
-
data
[default="../dataset/test_dataset.txt"]- The path to the training data set
-
model_out
[default="tgbm.model"]- The file name of the output model. This option is used in training.
-
model_in
[default="tgbm.model"]- The file name of the input model. This option is used in prediction.