You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To train the model using big data, I have split all the data to some parts, I use fit function to train each part in turn. "bypassing sigterm" will occur in the second part.
To Reproduce
importpytorch_lightningasplimporttransformersastfsfromtorch.utils.dataimportDataLoaderfromdatasetsimportload_datasetfromsrc.data.collatorsimportDataCollatorForMacBertfromsrc.data.tokenize_funcsimportSearchDataTokenizeFuncfromsrc.modeling.modeling_bertsimportBertsForPretrainingfromsrc.tools.basesimportargs_parsefromsrc.utilsimportget_abs_pathdefget_loader_for_text(tokenize_function, data_collator, data_files):
extension='text'raw_datasets=load_dataset(extension, data_files=data_files)
column_names=raw_datasets["train"].column_namestext_column_name="text"if"text"incolumn_nameselsecolumn_names[0]
tokenized_datasets=raw_datasets.map(
tokenize_function,
batched=True,
# num_proc=cfg.DATALOADER.NUM_WORKERS,num_proc=4,
remove_columns=[text_column_name],
# keep_in_memory=True,load_from_cache_file=True
)
train_dataset=tokenized_datasets["train"]
# Log a few random samples from the training set:# for index in random.sample(range(len(train_dataset)), 3):# logger.info(f"Sample {index} of the training set: {train_dataset[index]}.")# Data collator# This one will take care of randomly masking the tokens.# DataLoaders creation:train_dataloader=DataLoader(
train_dataset,
shuffle=True,
collate_fn=data_collator,
batch_size=4,
num_workers=2,
# persistent_workers=True,
)
returntrain_dataloader, None, Nonedefrun():
tokenizer=tfs.AutoTokenizer.from_pretrained('bert-base-chinese')
collator=DataCollatorForWholeWordMask(tokenizer=tokenizer, mlm_probability=0.15,
pad_to_multiple_of=8)
tokenize_func=SearchDataTokenizeFunc(tokenizer)
data_files= [get_abs_path('datasets/search/part--100')]
loader, _, _=get_loader_for_text(tokenize_func, collator, data_files)
model=BertsForPretraining('bert-base-chinese')
trainer=pl.Trainer(precision=16, max_epochs=1, gpus=4, strategy='ddp')
trainer.fit(model, loader)
data_files= [get_abs_path('datasets/search/part-00000')]
loader, _, _=get_loader_for_text(tokenize_func, collator, data_files)
trainer.fit(model, loader)
if__name__=='__main__':
run()
Expected behavior
Environment
PyTorch Lightning Version (e.g., 1.3.0): 1.5.0rc1
PyTorch Version (e.g., 1.8): 1.10.0+cu113
Python version: 3.8.10
OS (e.g., Linux): ubuntu20.04
CUDA/cuDNN version: 11.4
GPU models and configuration:
How you installed PyTorch (conda, pip, source): pip
If compiling from source, the output of torch.__config__.show():
Any other relevant information: from NGC pytorch:21.09
@gitabtion thanks for checking out 1.5.0rc1!
The message "bypassing sigterm" should only appear if a sigterm is sent to the process. Do you know anything about that? Do you run your script just like a regular python command, python train.py? or some different way?
@gitabtion thanks for checking out 1.5.0rc1! The message "bypassing sigterm" should only appear if a sigterm is sent to the process. Do you know anything about that? Do you run your script just like a regular python command, python train.py? or some different way?
I don‘t know about the sigterm signal, and I just run the lightning script by ''python train.py''
Uh oh!
There was an error while loading. Please reload this page.
🐛 Bug
To train the model using big data, I have split all the data to some parts, I use fit function to train each part in turn. "bypassing sigterm" will occur in the second part.
To Reproduce
Expected behavior
Environment
conda
,pip
, source): piptorch.__config__.show()
:Additional context
Here is Dockerfile
The text was updated successfully, but these errors were encountered: