Tenorboard, logs either don't appear or have prepended 'epoch_' names #3758

chrismaliszewski · 2020-09-30T23:30:47Z

I have two kinds of problems with Tensorboard.

Either logs don't appear when I create them inside training_step.
Code:

def training_step(self, batch, batch_idx):
        type = "train"
        loss, acc, y_true, y_pred, name = self.step(batch)

        result = pl.TrainResult(minimize=loss)
        result.log(type + "_loss", loss, prog_bar=True, on_step=True, on_epoch=False)
        result.log(type + "_acc", acc, prog_bar=True, on_step=True, on_epoch=False)

        return result

Screen from Tensorflow:

2. Or... 'epoch_' text is prepended to the logs name for an unknown reason although the same code is done for validation.
Code:

    def training_step(self, batch, batch_idx):
        type = "train"
        loss, acc, y_true, y_pred, name = self.step(batch)

        result = pl.TrainResult(minimize=loss)
        result.log(type + "_loss", loss, logger=False, prog_bar=True, on_step=True, on_epoch=False)
        result.log(type + "_acc", acc, logger=False, prog_bar=True, on_step=True, on_epoch=False)

        return result

    def training_epoch_end(self, outputs):
        type = "train"

        avg_loss = torch.stack([x for x in outputs[type + "_loss"]]).mean()
        avg_acc = torch.stack([x for x in outputs[type + "_acc"]]).mean()

        result = pl.TrainResult()
        result.log(type + "_loss", avg_loss, prog_bar=False, on_epoch=True)
        result.log(type + "_acc", avg_acc, prog_bar=False, on_epoch=True)

        return result

    def validation_step(self, batch, batch_idx):
        type = "val"
        loss, acc, y_true, y_pred, name = self.step(batch)

        result = pl.EvalResult()
        result.log(type + "_loss", loss, logger=False, prog_bar=True, on_step=True, on_epoch=False)
        result.log(type + "_acc", acc, logger=False, prog_bar=True, on_step=True, on_epoch=False)

        return result

    def validation_epoch_end(self, outputs):
        type = "val"

        avg_loss = torch.stack([x for x in outputs[type + "_loss"]]).mean()
        avg_acc = torch.stack([x for x in outputs[type + "_acc"]]).mean()

        result = pl.EvalResult(checkpoint_on=avg_loss, early_stop_on=avg_loss)
        result.log(type + "_loss", avg_loss, prog_bar=False, on_epoch=True)
        result.log(type + "_acc", avg_acc, prog_bar=False, on_epoch=True)

        return result

Screen from Tensorflow:

Why is that happening? Why validation logs do not have epoch_ beginning? The only difference is in using TrainReport vs EvalReport.

The text was updated successfully, but these errors were encountered:

ydcjeff · 2020-10-01T09:42:12Z

if you enable on_epoch=True in TrainResult while there is on_step=True, epoch_ is appended in the front by default
since you can't distinguish which is logged from batch level or epoch level.

While in EvalResult, default is on_step=False and on_epoch=True, so you know those are logged from epoch level.

It's mentioned in the docs.
https://pytorch-lightning.readthedocs.io/en/stable/results.html#logging

chrismaliszewski · 2020-10-01T20:22:49Z

@ydcjeff, great, thank you.

Doc doesn't say that clearly enough, at least to me. The line:

# the logger will show 2 charts: step_train_loss, epoch_train_loss

is the only indicator but it's missable.

It might be a good idea to indicate it more clearly with NOTE container.

ydcjeff · 2020-10-02T03:48:08Z

Good point! Mind send a PR?

chrismaliszewski · 2020-10-02T06:41:14Z

Sorry, @ydcjeff, I'm not familiar with pull requests nor I know what I should do now.
If you tell me, I'll try to do my best to help.
If by PR you just mean proposition, confirm and I'll send my proposal.

ydcjeff · 2020-10-02T10:47:24Z

@chrismaliszewski Yes, I agree some parts of the docs are just normal comments and they cannot be found quickly sometimes.

Maybe we can use sphinx admonition (like warning, tip, note) for that. For the docs fixes, you can try to express your understanding of the API usages or try to fix the docs the way you might clear enough. We will help you validate and finish the PR.

williamFalcon · 2020-10-08T03:00:18Z

metrics logged on both step and epochs now have postfixes.

chrismaliszewski added the question Further information is requested label Sep 30, 2020

chrismaliszewski mentioned this issue Sep 30, 2020

Tensorboard problems (how to work with Tensorboard) #3753

Closed

williamFalcon closed this as completed Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tenorboard, logs either don't appear or have prepended 'epoch_' names #3758

Tenorboard, logs either don't appear or have prepended 'epoch_' names #3758

chrismaliszewski commented Sep 30, 2020

ydcjeff commented Oct 1, 2020 •

edited

Loading

Uh oh!

chrismaliszewski commented Oct 1, 2020 •

edited

Loading

Uh oh!

ydcjeff commented Oct 2, 2020

Uh oh!

chrismaliszewski commented Oct 2, 2020 •

edited

Loading

Uh oh!

ydcjeff commented Oct 2, 2020

Uh oh!

williamFalcon commented Oct 8, 2020

Uh oh!

Tenorboard, logs either don't appear or have prepended 'epoch_' names #3758

Tenorboard, logs either don't appear or have prepended 'epoch_' names #3758

Comments

chrismaliszewski commented Sep 30, 2020

ydcjeff commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrismaliszewski commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydcjeff commented Oct 2, 2020

Uh oh!

chrismaliszewski commented Oct 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydcjeff commented Oct 2, 2020

Uh oh!

williamFalcon commented Oct 8, 2020

Uh oh!

ydcjeff commented Oct 1, 2020 •

edited

Loading

chrismaliszewski commented Oct 1, 2020 •

edited

Loading

chrismaliszewski commented Oct 2, 2020 •

edited

Loading