Skip to content

Tenorboard, logs either don't appear or have prepended 'epoch_' names #3758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrismaliszewski opened this issue Sep 30, 2020 · 6 comments
Closed
Labels
question Further information is requested

Comments

@chrismaliszewski
Copy link

I have two kinds of problems with Tensorboard.

  1. Either logs don't appear when I create them inside training_step.
    Code:
def training_step(self, batch, batch_idx):
        type = "train"
        loss, acc, y_true, y_pred, name = self.step(batch)

        result = pl.TrainResult(minimize=loss)
        result.log(type + "_loss", loss, prog_bar=True, on_step=True, on_epoch=False)
        result.log(type + "_acc", acc, prog_bar=True, on_step=True, on_epoch=False)

        return result

Screen from Tensorflow:
image
2. Or... 'epoch_' text is prepended to the logs name for an unknown reason although the same code is done for validation.
Code:

    def training_step(self, batch, batch_idx):
        type = "train"
        loss, acc, y_true, y_pred, name = self.step(batch)

        result = pl.TrainResult(minimize=loss)
        result.log(type + "_loss", loss, logger=False, prog_bar=True, on_step=True, on_epoch=False)
        result.log(type + "_acc", acc, logger=False, prog_bar=True, on_step=True, on_epoch=False)

        return result

    def training_epoch_end(self, outputs):
        type = "train"

        avg_loss = torch.stack([x for x in outputs[type + "_loss"]]).mean()
        avg_acc = torch.stack([x for x in outputs[type + "_acc"]]).mean()

        result = pl.TrainResult()
        result.log(type + "_loss", avg_loss, prog_bar=False, on_epoch=True)
        result.log(type + "_acc", avg_acc, prog_bar=False, on_epoch=True)

        return result

    def validation_step(self, batch, batch_idx):
        type = "val"
        loss, acc, y_true, y_pred, name = self.step(batch)

        result = pl.EvalResult()
        result.log(type + "_loss", loss, logger=False, prog_bar=True, on_step=True, on_epoch=False)
        result.log(type + "_acc", acc, logger=False, prog_bar=True, on_step=True, on_epoch=False)

        return result

    def validation_epoch_end(self, outputs):
        type = "val"

        avg_loss = torch.stack([x for x in outputs[type + "_loss"]]).mean()
        avg_acc = torch.stack([x for x in outputs[type + "_acc"]]).mean()

        result = pl.EvalResult(checkpoint_on=avg_loss, early_stop_on=avg_loss)
        result.log(type + "_loss", avg_loss, prog_bar=False, on_epoch=True)
        result.log(type + "_acc", avg_acc, prog_bar=False, on_epoch=True)

        return result

Screen from Tensorflow:
image

Why is that happening? Why validation logs do not have epoch_ beginning? The only difference is in using TrainReport vs EvalReport.

@ydcjeff
Copy link
Contributor

ydcjeff commented Oct 1, 2020

if you enable on_epoch=True in TrainResult while there is on_step=True, epoch_ is appended in the front by default
since you can't distinguish which is logged from batch level or epoch level.

While in EvalResult, default is on_step=False and on_epoch=True, so you know those are logged from epoch level.

It's mentioned in the docs.
https://pytorch-lightning.readthedocs.io/en/stable/results.html#logging

@chrismaliszewski
Copy link
Author

chrismaliszewski commented Oct 1, 2020

@ydcjeff, great, thank you.

Doc doesn't say that clearly enough, at least to me. The line:

# the logger will show 2 charts: step_train_loss, epoch_train_loss

is the only indicator but it's missable.

It might be a good idea to indicate it more clearly with NOTE container.

@ydcjeff
Copy link
Contributor

ydcjeff commented Oct 2, 2020

Good point! Mind send a PR?

@chrismaliszewski
Copy link
Author

chrismaliszewski commented Oct 2, 2020

Sorry, @ydcjeff, I'm not familiar with pull requests nor I know what I should do now.
If you tell me, I'll try to do my best to help.
If by PR you just mean proposition, confirm and I'll send my proposal.

@ydcjeff
Copy link
Contributor

ydcjeff commented Oct 2, 2020

@chrismaliszewski Yes, I agree some parts of the docs are just normal comments and they cannot be found quickly sometimes.

Maybe we can use sphinx admonition (like warning, tip, note) for that. For the docs fixes, you can try to express your understanding of the API usages or try to fix the docs the way you might clear enough. We will help you validate and finish the PR.

@williamFalcon
Copy link
Contributor

metrics logged on both step and epochs now have postfixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants