HParam table view sort not working #3041

patzm · 2019-12-17T11:14:52Z

Environment information

Diagnostics

Diagnostics output

--- check: autoidentify
INFO: diagnose_tensorboard.py version 66d35fe98ca66dc3a5ae600631a8aa6bce785bc5

--- check: general                                                                                     
INFO: sys.version_info: sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0)    
INFO: os.name: posix                                                                                   
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='tensorboard', release='5.0.0-1026-gcp', version='#27~18.04.1-Ubuntu SMP Fri Nov 15 07:40:39 UTC 2019', machine='x86_64')
INFO: sys.getwindowsversion(): N/A                                                                     
                                                                                                       
--- check: package_management                                                                          
INFO: has conda-meta: False                                                                            
INFO: $VIRTUAL_ENV: None                                                                               
                                                                                                       
--- check: installed_packages
INFO: installed: tensorboard==2.0.0
INFO: installed: tensorflow==1.13.1
INFO: installed: tensorflow-estimator==1.13.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.0.0'

--- check: tensorflow_python_version
/home/patzm/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/patzm/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/patzm/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/patzm/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/patzm/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/patzm/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
INFO: tensorflow.__version__: '1.13.1'
INFO: tensorflow.__git_version__: "b'v1.13.1-0-g6612da8951'"

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/home/patzm/.local/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>                                                            
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'tensorboard.c.tensorflow-training-176320.internal'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info                                                                
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=145168, st_dev=2049, st_nlink=2, st_uid=1005, st_gid=1006, st_size=4096, st_atime=1576537836, st_mtime=1576576519, st_ctime=1576576519)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/home/patzm/.local/lib/python3.6/site-packages']; bad_roots (0): []

--- check: full_pip_freeze

netifaces==0.10.4
numpy==1.17.2
oauthlib==2.0.6
PAM==0.4.2
parso==0.5.1
pbr==5.1.1
pip==9.0.1
pluggy==0.12.0
protobuf==3.10.0
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycrypto==2.6.1
pygobject==3.26.1
PyJWT==1.5.3
pynvim==0.3.2
pyOpenSSL==17.5.0
pyserial==3.4
python-apt==1.6.4
python-debian==0.1.32
python-jsonrpc-server==0.2.0
python-language-server==0.28.1
pyxdg==0.25
PyYAML==3.12
requests==2.18.4
requests-unixsocket==0.1.5
SecretStorage==2.3.1
service-identity==16.0.0
setuptools==41.4.0
six==1.12.0
ssh-import-id==5.7
stevedore==1.30.0
systemd-python==234
tensorboard==2.0.0
tensorflow==1.13.1
tensorflow-estimator==1.13.0
termcolor==1.1.0
Twisted==17.9.0
ufw==0.36
unattended-upgrades==0.1
urllib3==1.22
virtualenv==16.1.0
virtualenv-clone==0.4.0
virtualenvwrapper==4.8.2
Werkzeug==0.16.0
wheel==0.33.6
zipp==0.5.2
zope.interface==4.3.2

For browser-related issues, please additionally specify:

Firefox 71.0 on Manjaro Linux 64 bit
Google Chrome Version 79.0.3945.79 (Official Build) (64-bit)

Issue description

In the Table View

sorting only works sporadically. Most of the time, it doesn't work at all, other times, sorting is applied to the previously selected column. This seems to be independent of whether you first select the sorting direction and then the sort by drop down, or otherwise round. This seems to be independent of the number of trials. Currently, I observe that for around ~50 runs.
On the server side that hosts the TensorBoard, I don't observe CPU core utilization or RAM saturation.

The text was updated successfully, but these errors were encountered:

rmothukuru · 2019-12-18T06:57:28Z

@patzm,
Can you please let us know if you have set the Direction (Ascending or Descending) as well. If the issue still persists, can you please share the Screenshot, which demonstrates the issue. Thanks!

patzm · 2019-12-18T11:56:46Z

@rmothukuru, yes I am using the Direction drop down menu to select one of the two sorting direction choices.

Selection of sorting column and direction:

The very same column in the table view:

As you can see, these values are clearly not sorted. Sadly, I can't share the whole screenshot. If this is sufficient, I would like to avoid the effort of writing a reproducible script. Hopefully, this already helps you to narrow down the issue.

I refreshed the site as well with the browser refresh button, and the TensorBoard refresh trigger (upper right). Neither of which helped. Also, I tried various versions of TensorBoard: 1.15.0, 2.0.0, 2.0.2, 2.1.0. It is also noteworthy, that the model_dirs (of each trial) are quite huge. TensorBoard uses ~20GB of RAM for those ~50 trials.

wchargin · 2019-12-18T20:56:11Z

Hi @patzm! Thanks for the report and detailed background information.
I’m trying to reproduce this on a dataset with 50 runs, and having
trouble; the sorting seems to be working fine for me in both Chrome and
Firefox, on 64-bit Debian-based Linux.

I’m assuming from your screenshot that for each trial you have runs
named eval and train (or something), each with accuracy/top_1
summaries. Here’s the script that I’m using to generate test data:

# Context: <https://github.com/tensorflow/tensorboard/issues/3041>

import os
import random

import tensorflow as tf
from tensorboard.plugins.hparams import api as hp

HP_LR = hp.HParam("learning_rate", hp.RealInterval(0.01, 0.1))
HP_OPTIMIZER = hp.HParam("optimizer", hp.Discrete(["adam", "sgd"]))

SESSIONS = ("train", "eval")

METRIC_LOSS = "loss"
METRIC_ACC1 = "accuracy/top_1"
METRIC_ACC5 = "accuracy/top_5"

NUM_RUNS = 50
NUM_STEPS = 10
BASE_LOGDIR = "logs"


def main():
    rng = random.Random(0)

    for i in range(NUM_RUNS):
        session_dir = os.path.join(BASE_LOGDIR, "%03d" % i)
        with tf.summary.create_file_writer(session_dir).as_default():
            hp.hparams(
                {h: h.domain.sample_uniform(rng) for h in (HP_LR, HP_OPTIMIZER)}
            )
        for session in SESSIONS:
            logdir = os.path.join(session_dir, session)
            with tf.summary.create_file_writer(logdir).as_default():
                for step in range(NUM_STEPS):
                    tf.summary.scalar(METRIC_LOSS, rng.random(), step=step)
                    tf.summary.scalar(METRIC_ACC1, rng.random(), step=step)
                    tf.summary.scalar(METRIC_ACC5, rng.random(), step=step)


if __name__ == "__main__":
    main()

Changing the sort key and direction always seems to work fine for me.

Can you check and see if the data generated by that script works in your
environment? If not, we can try to track down the environmental
difference. If so, we’ll probably want to get your example down to a
minimal repro that you can share.

FWIW, I notice from your diagnostics that you’re running TensorBoard 2.0
with TensorFlow 1.13. This isn’t an officially supported configuration
(TensorBoard and TensorFlow need to have the same minor version), so you
could try upgrading TensorFlow (1.15 should be compatible with 1.13, and
then you can test in TensorBoard 1.15), but I would be a bit surprised
if that were the root cause here.

wchargin · 2019-12-18T20:57:55Z

(NB: I just edited the script in the above comment.)

patzm · 2019-12-20T11:15:46Z

you’re running TensorBoard 2.0 with TensorFlow 1.13

I deleted the whole TensorBoard setup after submitting the issue and started from scratch. I created 3 virtual environments:

TensorFlow 1.15.0 and TensorBoard 1.15.0
TensorFlow 2.0.0 and TensorBoard 2.0.2
TensorFlow 2.0.0 with TensorBoard 2.1.0

This didn't solve the sorting problem. So your intuition was right 😉 .

patzm · 2019-12-20T11:36:38Z

I ran your script with a minor change. I inserted the following and replaced all tf.* usages with tf_v2.*:

import tensorflow as tf

tf.compat.v1.enable_eager_execution()
tf_v2 = tf.compat.v2

This allowed me to run your script both in TensorFlow 1.15 and TensorFlow 2.0. In both cases, TensorBoard (Hparams) with the respectively matching version worked as expected. I.e. I can reproduce your results.

Then I was running the same script with the BASE_LOGDIR set to a google bucket, e.g. gs://bucket-name/logs. Here, I tested with

TensorFlow 1.15.0 and TensorBoard 1.15.0
TensorFlow 2.0.0 with TensorBoard 2.0.2

and it also worked in both cases.

patzm · 2019-12-20T11:44:42Z

I just created a debug repo in which I just pushed the slightly modified demo script in patzm/tensorboard-3041@14e5a74. I will write a minimal example for my use case there. I am still working with TensorFlow 1.15 and relying on the non-eager, graph-based stuff. I wrote a custom HParamWriter, which uses TensorFlow 1.x summary writers. I will post again here once it is done.

patzm · 2019-12-20T12:13:30Z

for each trial you have runs named eval and train (or something)

Yes, almost: the eval runs are in the sub-folder of the train runs. The train runs are stored in the model_dir. Could that be a problem?

patzm · 2019-12-20T13:52:13Z

Ok @wchargin, can you try to run the two scripts in my repo? I added small # TODO(wchargin) for you:

tb_debug.py is essentially your example from above
custom_debug.py contains my implementation of the TensorFlow v1 compatible HParamWriter. Note that I am not using the _write_v2 implementation that it provides.

I would suggest that you do 4 runs. Run your script twice, once for each of the two MODES alternatives. Do the same for my script. I tested this with TensorFlow 1.15.

I think I have tracked down the issue to two things:

I am using TensorFlow v1 summary writers / API
My train model dir is the parent of the eval model dir.

If those two things are in place, I observe the following:

sometimes, none, or only a few runs appear in the HParams tab.

TensorBoard is complaining that it found multiple graphs and meta graphs:

W1220 14:42:34.325282 140550538139392 plugin_event_accumulator.py:294] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
W1220 14:42:34.325443 140550538139392 plugin_event_accumulator.py:302] Found more than one metagraph event per run. Overwriting the metagraph with the newest event.

I am curious to see if you can reproduce this. Sadly, this isn't exactly what this issue reported initially, but I have a strong feeling that this is related.

patzm · 2019-12-20T14:29:35Z

I just ran the same thing with the BASE_LOGDIR as a Google bucket. Now, all combinations work and TensorBoard did not complain about multiple (meta) graphs. 🤔 How would you continue debugging this?

phemmer · 2020-01-13T05:32:45Z

So I'm experiencing this behavior as well. However for me the sorting is applied to the field that is sequentially following the one I selected.
For example, lets say the sorting dropdown shows 3 options, "A", "B", and "C". If I set tensorboard to sort by "A", it'll sort by "B". If I tell it to sort by "B", it'll sort by "C". If I tell it to sort by "C", it actually doesn't do anything. It stays sorted by whatever was previously selected.
Note that I have a lot more than 3 though (55 if I counted correctly). But the behavior is consistent with the pattern described.

This is using Tensorboard 2.1.0 & Tensorflow 2.1.0

patzm · 2020-01-13T09:15:21Z

I experienced that as well sometimes. My observation was that it seemed to select the previously selected column. Maybe it is a different issue though.

tqfjo · 2020-01-15T20:25:47Z

I'm experiencing similar behavior. So far, sorting never changes no matter what I do. It always remains sorted by Trial ID, ascending. The issue remains even when I have only eg 5 trials, regardless of the sorting direction I choose, the sorting column I choose, and the browser (Firefox v71.0 or Chrome 68.0.3440).

I am using torch.utils.tensorboard.

Diagnostics

Diagnostics output

--- check: autoidentify
INFO: diagnose_tensorboard.py version d515ab103e2b1cfcea2b096187741a0eeb8822ef

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename=<private>, release='4.15.0-72-generic', version='#81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tensorboard==2.1.0
INFO: installed: tensorflow==2.1.0
INFO: installed: tensorflow-estimator==2.1.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.1.0'

--- check: tensorflow_python_version
2020-01-15 13:57:37.249308: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64:/usr/local/cuda/extras/CUPTI/lib64
2020-01-15 13:57:37.249453: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64:/usr/local/cuda/extras/CUPTI/lib64
2020-01-15 13:57:37.249463: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO: tensorflow.__version__: '2.1.0'
INFO: tensorflow.__git_version__: 'v2.1.0-rc2-17-ge5bf8de'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/home/<private>/miniconda3/envs/<private>/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'localhost.localdomain'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=15212576, st_dev=66305, st_nlink=2, st_uid=1000, st_gid=1000, st_size=4096, st_atime=1576874874, st_mtime=1579116229, st_ctime=1579116229)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/home/<private>/miniconda3/envs/<private>/lib/python3.7/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.8.1
ansiwrap==0.8.4
appdirs==1.4.3
astor==0.8.0
attrs==19.3.0
ax-platform==0.1.8
backcall==0.1.0
black==19.10b0
bleach==3.1.0
botorch==0.2.0
Bottleneck==1.2.1
cachetools==3.1.1
-e git+https://github.com/pytorch/captum.git@b46536b2f92dc609972e5a5cf125aec5c7f81e79#egg=captum
certifi==2019.11.28
cffi==1.13.2
chardet==3.0.4
Click==7.0
cycler==0.10.0
decorator==4.4.1
defusedxml==0.6.0
docutils==0.15.2
entrypoints==0.3
Flask==1.1.1
future==0.18.2
gast==0.2.2
google-auth==1.7.0
google-auth-oauthlib==0.4.1
google-pasta==0.1.8
gpytorch==1.0.0
grpcio==1.25.0
h5py==2.10.0
humanize==0.5.1
idna==2.8
imageio==2.6.1
importlib-metadata==1.3.0
ipdb==0.12.3
ipykernel==5.1.3
ipython==7.10.1
ipython-genutils==0.2.0
ipywidgets==7.5.1
itsdangerous==1.1.0
jedi==0.15.2
Jinja2==2.10.3
joblib==0.14.1
json5==0.8.5
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.3
jupyter-console==6.0.0
jupyter-core==4.6.1
jupyterlab==1.2.1
jupyterlab-black==0.2.1
jupyterlab-code-formatter==0.7.0
jupyterlab-server==1.0.6
jupytext==1.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
line-profiler==2.1.2
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.2
mistune==0.8.4
more-itertools==8.0.2
mypy-extensions==0.4.3
nbconvert==5.6.1
nbformat==4.4.0
notebook==6.0.1
numpy==1.17.3
oauthlib==3.1.0
olefile==0.46
opt-einsum==3.1.0
packaging==19.2
pandas==0.25.3
pandocfilters==1.4.2
parso==0.5.2
pathspec==0.7.0
patsy==0.5.1
-e git+https://github.com/SauceCat/PDPbox/@73c69665f1663b53984e187c7bc8996e25fea18e#egg=PDPbox
pexpect==4.7.0
pickleshare==0.7.5
Pillow==6.2.1
pip==19.3.1
pkginfo==1.5.0.1
plotly==4.4.1
prometheus-client==0.7.1
prompt-toolkit==3.0.2
protobuf==3.10.0
psutil==5.6.7
ptyprocess==0.6.0
pyasn1==0.4.7
pyasn1-modules==0.2.7
pycparser==2.19
Pygments==2.5.2
pynvml==8.0.3
pyparsing==2.4.6
PyQt5==5.12.3
PyQt5-sip==4.19.18
PyQtWebEngine==5.12.1
pyrsistent==0.15.6
python-dateutil==2.8.1
-e git+https://github.com/pytorch/ignite.git@f85c6483c00e4e3b125c7f21d953c0bf4f34de7e#egg=pytorch_ignite
pytz==2019.3
PyYAML==5.2
pyzmq==18.1.1
qtconsole==4.6.0
readme-renderer==24.0
regex==2019.12.20
requests==2.22.0
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
retrying==1.3.3
rsa==4.0
ruamel.yaml==0.16.5
ruamel.yaml.clib==0.2.0
scikit-learn==0.21.0
scipy==1.4.1
seaborn==0.9.0
Send2Trash==1.5.0
setuptools==44.0.0.post20200102
six==1.13.0
sklearn==0.0
skorch==0.7.0
statsmodels==0.10.2
tabulate==0.8.6
tenacity==6.0.0
tensorboard==2.1.0
tensorflow==2.1.0
tensorflow-estimator==2.1.0
termcolor==1.1.0
terminado==0.8.3
test-tube==0.7.3
testpath==0.4.4
textwrap3==0.9.2
toml==0.10.0
torch==1.3.1
torchvision==0.4.2
tornado==6.0.3
tqdm==4.35.0
traitlets==4.3.3
twine==1.13.0
typed-ast==1.4.0
typing-extensions==3.7.4.1
urllib3==1.25.6
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.16.0
wheel==0.33.6
widgetsnbextension==3.5.1
wrapt==1.11.2
xarray==0.14.0
xlrd==1.2.0
zipp==0.6.0

patzm · 2020-02-10T17:18:04Z

@wchargin, how to you want to proceed on this? The issue still persists in all deployment cases I am dealing with. Would love to solve this / see it solved.

wchargin · 2020-03-28T02:29:39Z

Hi @patzm—if you’re still experiencing this issue, could you check
whether there are any pending or failed network requests when you change
the sort order?

Changing the sort order actually triggers a network request (because the
frontend only shows the top k runs, so changing the ordering criterion
can cause the set of runs displayed to change). I didn’t realize this
before because I generally run TensorBoard as a web server on my local
machine, but if you’re connecting to your TensorBoard instance over a
remote network then it’s possible that that factors into the problem.
This could also be consistent with your observation that sometimes it
seems to select the previously selected column: perhaps network
requests are resolving out of order.

Could you share a few details about your network topology? Where your
TensorBoard instance is located with respect to your browser, how many
other people are accessing it, etc. If you don’t generally connect to
localhost, could you please try seeing if you can reproduce the issue
when connected to localhost?

JulianFerry · 2020-05-06T09:11:36Z

I'm having the same issue, sorting mostly doesn't work at all, or else it is very random. I tested this on runs with 2 trials and 1 epoch (stored locally) as well as 8 trials and 5 epochs (on google cloud).

I'm using Chrome on MacOS 10.15.4 with package versions:

tensorboard==2.1.1
tensorflow==2.1.0

Running TensorBoard on localhost.

wchargin · 2020-05-06T15:50:54Z

Running TensorBoard on localhost.

Hmm, interesting. If you’re up for a bit of debugging, would you mind
performing the following steps?

Launch TensorBoard pointing at the “2 trials, 1 epoch, stored
locally” logdir.
Navigate to the new TensorBoard instance in Firefox or Chrome.
Open your browser’s dev tools and select the “Network” panel.
Use the hparams controls pane to change the sort column and sort
direction.
Observe in the network panel whether a new request is fired to the
session_groups endpoint, as in this screenshot.
Observe whether that request finishes quickly, finishes slowly, or
fails.
Confirm whether the UI properly updates the sort state as you expect
or not.

My suspicion is that it’s this session_groups request that’s slow or
flaky and causing the problem. Sorry to have to ask you to poke around,
but we just haven’t been able to get a consistent repro for this. If
it’s easy for you to take a video/screencast, that’d be excellent; if
not, any screenshots or descriptions would be helpful. Thank you!

georgeadam · 2020-05-11T16:45:11Z

I am having a very similar problem, and it seems to only be a problem with some columns, namely the metrics columns. However, this depends on the total number of columns

The requests to the server are fine in terms of completion.

I will note that this problem also occurs when trying to filter values based on min-max range for a column where the sorting doesn't work. Interestingly enough, if you filter values before trying to sort, it works. If you try after, it no longer works, likely due to the client-side bugs that are messing with the table.

tshadley · 2021-08-14T18:21:03Z

I observe this issue but only when I have at least one hparam that is a string. That string creates an off-by-one miscalculation in the session_groups Request Payload judging by the position of the "order:" key.

PavelBezzub · 2022-01-14T08:31:19Z

I also observe a similar problem when the number of Param string parameters becomes greater than a certain number. When adding additional parameters, the sorting is shifted two columns to the right, when sorting the third column, the fifth is sorted.

TangJiakai · 2022-12-07T04:43:28Z

still has this problem...

bmd3k · 2022-12-19T14:58:55Z

Hi @TangJiakai ,
Thanks for bringing this back to our attention. I hadn't personally noticed this report before.

TensorBoard 2.11.0 should contain #5971 which may address some of the issues here.

Anybody who is still following along or comes here in the future, could you:

Try TensorBoard 2.11.0 or later.
If your problem persists, please post some more details about your problem or consider opening an entirely new issue.

patzm mentioned this issue Dec 17, 2019

HParam table view metric range limiting not working #3042

Open

rmothukuru self-assigned this Dec 18, 2019

rmothukuru added type:support core:frontend type:browser plugin:hparams labels Dec 18, 2019

rmothukuru added the stat:awaiting response label Dec 18, 2019

rmothukuru assigned wchargin and unassigned rmothukuru Dec 18, 2019

rmothukuru added stat:awaiting tensorflower and removed stat:awaiting response labels Dec 18, 2019

wchargin added stat:awaiting response type:bug and removed core:frontend stat:awaiting tensorflower type:support labels Dec 19, 2019

TheaperDeng mentioned this issue Oct 11, 2020

Automl visualization logger intel/BigDL#2939

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HParam table view sort not working #3041

HParam table view sort not working #3041

patzm commented Dec 17, 2019 •

edited

Loading

rmothukuru commented Dec 18, 2019

patzm commented Dec 18, 2019

wchargin commented Dec 18, 2019 •

edited

Loading

wchargin commented Dec 18, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

phemmer commented Jan 13, 2020 •

edited

Loading

patzm commented Jan 13, 2020

tqfjo commented Jan 15, 2020

patzm commented Feb 10, 2020

wchargin commented Mar 28, 2020

JulianFerry commented May 6, 2020 •

edited

Loading

wchargin commented May 6, 2020

georgeadam commented May 11, 2020 •

edited

Loading

tshadley commented Aug 14, 2021 •

edited

Loading

PavelBezzub commented Jan 14, 2022

TangJiakai commented Dec 7, 2022

bmd3k commented Dec 19, 2022

HParam table view sort not working #3041

HParam table view sort not working #3041

Comments

patzm commented Dec 17, 2019 • edited Loading

Environment information

Diagnostics

Issue description

rmothukuru commented Dec 18, 2019

patzm commented Dec 18, 2019

wchargin commented Dec 18, 2019 • edited Loading

wchargin commented Dec 18, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

patzm commented Dec 20, 2019

phemmer commented Jan 13, 2020 • edited Loading

patzm commented Jan 13, 2020

tqfjo commented Jan 15, 2020

Diagnostics

patzm commented Feb 10, 2020

wchargin commented Mar 28, 2020

JulianFerry commented May 6, 2020 • edited Loading

wchargin commented May 6, 2020

georgeadam commented May 11, 2020 • edited Loading

tshadley commented Aug 14, 2021 • edited Loading

PavelBezzub commented Jan 14, 2022

TangJiakai commented Dec 7, 2022

bmd3k commented Dec 19, 2022

patzm commented Dec 17, 2019 •

edited

Loading

wchargin commented Dec 18, 2019 •

edited

Loading

phemmer commented Jan 13, 2020 •

edited

Loading

JulianFerry commented May 6, 2020 •

edited

Loading

georgeadam commented May 11, 2020 •

edited

Loading

tshadley commented Aug 14, 2021 •

edited

Loading