Skip to content

Why does tensorboard‘s memory usage keep increasing #3747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
b675987273 opened this issue Jun 19, 2020 · 5 comments
Closed

Why does tensorboard‘s memory usage keep increasing #3747

b675987273 opened this issue Jun 19, 2020 · 5 comments
Assignees
Labels
core:backend stat:awaiting tensorflower theme:performance Performance, scalability, large data sizes, slowness, etc.

Comments

@b675987273
Copy link

b675987273 commented Jun 19, 2020

To report a problem with TensorBoard itself, please fill out the
remainder of this template.

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same
environment from which you normally run TensorFlow/TensorBoard, and
paste the output here:

### Diagnostics

<details>
<summary>Diagnostics output</summary>

--- check: autoidentify
INFO: diagnose_tensorboard.py version 724b56cee52e7d8eb89bbeec1f0d5ce3e38c9682

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='web-tb-2-v1-5589568997-lmbdd', release='3.10.0-1062.1.2.el7.x86_64', version='#1 SMP Mon Sep 30 14:19:46 UTC 2019', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tensorboard==2.1.1
INFO: installed: tensorflow==2.1.0
INFO: installed: tensorflow-estimator==2.1.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.1.1'

--- check: tensorflow_python_version
2020-06-19 03:56:52.987020: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-06-19 03:56:52.987128: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-06-19 03:56:52.987147: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO: tensorflow.__version__: '2.1.0'
INFO: tensorflow.__git_version__: 'v2.1.0-rc2-17-ge5bf8de'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/usr/local/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'web-tb-2-v1-5589568997-lmbdd'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=4879944232, st_dev=3145810, st_nlink=1, st_uid=0, st_gid=0, st_size=25, st_atime=1592535469, st_mtime=1592535447, st_ctime=1592535447)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/usr/local/lib/python3.6/dist-packages']; bad_roots (0): []

For browser-related issues, please additionally specify:

  • Browser type and version (e.g., Chrome 64.0.3282.140):
  • Screenshot, if it’s a visual issue:

Issue description

Please describe the bug as clearly as possible. How can we reproduce the
problem without additional resources (including external data files and
proprietary Python modules)?

image

My tensorboard runs as docker container. Then it is weird that the memory usage keep increasing until oom. It is the command tensorboard --logdir /tmp/data/ --bind_all.

image

There are lots of jpg in my tfevent file. Does it matter?

@b675987273
Copy link
Author

b675987273 commented Jun 19, 2020

Oh,I think it just because I save my event file on dataset dir. Maybe tensorboard pays too much memeory on dataset classification? After remove the huge dataset , tensorboard s mem usage became stable.
image

@rmothukuru rmothukuru self-assigned this Jun 19, 2020
@rmothukuru
Copy link

@b675987273,
It might be because the Tensorboard might have considered the entire Dataset Folder and hence might have resulted in OOM. Can we close this issue as it is resolved? Thanks!

@rmothukuru rmothukuru added stat:awaiting response theme:performance Performance, scalability, large data sizes, slowness, etc. core:backend labels Jun 19, 2020
@b675987273
Copy link
Author

b675987273 commented Jun 19, 2020

But the tensorboard have the function to figure out which file is a event file. I still not sure why the image cost lots of memory. Could you tell me why.

def IsTensorFlowEventsFile(path):
    """Check the path name to see if it is probably a TF Events file.

    Args:
      path: A file path to check if it is an event file.

    Raises:
      ValueError: If the path is an empty string.

    Returns:
      If path is formatted like a TensorFlowEventsFile. Dummy files such as
        those created with the '.profile-empty' suffixes and meant to hold
        no `Summary` protos are treated as true TensorFlowEventsFiles. For
        background, see: https://github.com/tensorflow/tensorboard/issues/2084.
    """
    if not path:
        raise ValueError("Path must be a nonempty string")
    return "tfevents" in tf.compat.as_str_any(os.path.basename(path))

@b675987273
Copy link
Author

#766 I find a relative issue. Emm very interesting.

@b675987273
Copy link
Author

b675987273 commented Jun 19, 2020

Seem like because the dataset contains to much files then make memory cost. Once I delete the images , the memory releases.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core:backend stat:awaiting tensorflower theme:performance Performance, scalability, large data sizes, slowness, etc.
Projects
None yet
Development

No branches or pull requests

3 participants