Skip to content

Add pb method for PR curves #633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 13, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions tensorboard/plugins/pr_curve/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ py_library(
visibility = ["//visibility:public"],
deps = [
":metadata",
"//tensorboard:expect_numpy_installed",
"//tensorboard:expect_tensorflow_installed",
],
)
Expand All @@ -69,12 +70,9 @@ py_test(
srcs = ["summary_test.py"],
srcs_version = "PY2AND3",
deps = [
":pr_curve_demo",
":summary",
"//tensorboard:expect_numpy_installed",
"//tensorboard:expect_tensorflow_installed",
"//tensorboard/backend:application",
"//tensorboard/backend/event_processing:event_multiplexer",
"//tensorboard/plugins:base_plugin",
"@org_pocoo_werkzeug",
"@org_pythonhosted_six",
Expand Down
78 changes: 75 additions & 3 deletions tensorboard/plugins/pr_curve/summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,17 @@
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf

from tensorboard.plugins.pr_curve import metadata

# A value that we use as the minimum value during division of counts to prevent
# division by 0. 1 suffices because counts of course must be whole numbers.
_MINIMUM_COUNT = 1.0
# division by 0.
_MINIMUM_COUNT = 1e-7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reason for this change. Why does 1 no longer suffice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I added a comment.


# The default number of thresholds.
_DEFAULT_NUM_THRESHOLDS = 200

def op(
tag,
Expand Down Expand Up @@ -78,7 +82,7 @@ def op(

"""
if num_thresholds is None:
num_thresholds = 200
num_thresholds = _DEFAULT_NUM_THRESHOLDS

if weights is None:
weights = 1.0
Expand Down Expand Up @@ -164,6 +168,74 @@ def op(
description,
collections)

def pb(tag,
labels,
predictions,
num_thresholds=None,
weights=None,
display_name=None,
description=None):
"""Creates a PR curves summary protobuf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with other pb functions: "Create a PR curves summary protobuf." (imperative mood, full stop at end)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Arguments:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentions of "python ints" and "constant strs" in the pb function seem potentially confusing; this is not in a TensorFlow context, so the distinction that you intend doesn't exist, and instead folks might wonder why they can't pass a string from a variable or something (they of course can). If you look at another summary's pb function, you'll see that the documentation is changed appropriately.

Suggested changes:

  • num_thresholds: Optional […] metrics for. When provided, should be an int of value at least 2. Defaults to 200.
  • weights: Optional float or float32 numpy array. […] This value must be […].
  • display_name: […] as a str. […]
  • description: […] as a str. […]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Indeed, that seems much clearer, and using python could be confusing here because pb methods inherently don't rely on TensorFlow.

tag: A name for the generated node. Will also serve as a series name in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like other summaries call this name instead of tag. Can we stay consistent with them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the tag parameter to name for summaries.

TensorBoard.
labels: The ground truth values. A bool numpy array.
predictions: A float32 numpy array whose values are in the range `[0, 1]`.
Dimensions must match those of `labels`.
num_thresholds: Optional number of thresholds, evenly distributed in
`[0, 1]`, to compute PR metrics for. Should be `>= 2`. This value should
be a python int. Defaults to 200.
weights: Optional python float or float32 numpy array. Individual counts are
multiplied by this value. This tensor must be either the same shape as
or broadcastable to the `labels` numpy array.
display_name: Optional name for this summary in TensorBoard, as a
constant `str`. Defaults to `name`.
description: Optional long-form description for this summary, as a
constant `str`. Markdown is supported. Defaults to empty.
"""
if num_thresholds is None:
num_thresholds = _DEFAULT_NUM_THRESHOLDS

if weights is None:
weights = 1.0

# Compute bins of true positives and false positives.
bucket_indices = np.int32(np.floor(predictions * (num_thresholds - 1)))
float_labels = labels.astype(np.float)
histogram_range = (0, num_thresholds - 1)
tp_buckets, _ = np.histogram(
bucket_indices,
bins=num_thresholds,
range=histogram_range,
weights=float_labels * weights)
fp_buckets, _ = np.histogram(
bucket_indices,
bins=num_thresholds,
range=histogram_range,
weights=(1.0 - float_labels) * weights)

# Obtain the reverse cumulative sum.
tp = np.cumsum(tp_buckets[::-1])[::-1]
fp = np.cumsum(fp_buckets[::-1])[::-1]
tn = fp[0] - fp
fn = tp[0] - tp
precision = tp / np.maximum(_MINIMUM_COUNT, tp + fp)
recall = tp / np.maximum(_MINIMUM_COUNT, tp + fn)

if display_name is None:
display_name = tag
summary_metadata = metadata.create_summary_metadata(
display_name=display_name if display_name is not None else tag,
description=description or '',
num_thresholds=num_thresholds)
summary = tf.Summary()
data = np.stack((tp, fp, tn, fn, precision, recall))
tensor = tf.make_tensor_proto(data, dtype=tf.float32)
summary.value.add(tag='%s/pr_curves' % tag,
metadata=summary_metadata,
tensor=tensor)
return summary

def streaming_op(tag,
labels,
Expand Down
Loading