-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix azure logging #6860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix azure logging #6860
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,7 +19,7 @@ | |
TensorBoard. This allows running TensorBoard without depending on | ||
TensorFlow for file operations. | ||
""" | ||
|
||
from adlfs import AzureBlobFileSystem | ||
import dataclasses | ||
import glob as py_glob | ||
import io | ||
|
@@ -639,6 +639,11 @@ def stat(self, filename): | |
|
||
_FSSPEC_FILESYSTEM = FSSpecFileSystem() | ||
|
||
def _get_fs_class(filename): | ||
filename = compat.as_str_any(filename) | ||
segment = filename.partition(FSSpecFileSystem.CHAIN_SEPARATOR)[0] | ||
protocol = segment.partition(FSSpecFileSystem.SEPARATOR)[0] | ||
return fsspec.get_filesystem_class(protocol) | ||
|
||
def _get_fsspec_filesystem(filename): | ||
""" | ||
|
@@ -648,12 +653,7 @@ def _get_fsspec_filesystem(filename): | |
if not FSSPEC_ENABLED: | ||
return None | ||
|
||
segment = filename.partition(FSSpecFileSystem.CHAIN_SEPARATOR)[0] | ||
protocol = segment.partition(FSSpecFileSystem.SEPARATOR)[0] | ||
if fsspec.get_filesystem_class(protocol): | ||
return _FSSPEC_FILESYSTEM | ||
else: | ||
return None | ||
return _FSSPEC_FILESYSTEM if _get_fs_class(filename) else None | ||
|
||
|
||
register_filesystem("", LocalFileSystem()) | ||
|
@@ -671,7 +671,10 @@ def __init__(self, filename, mode): | |
) | ||
self.filename = compat.as_bytes(filename) | ||
self.fs = get_filesystem(self.filename) | ||
self.fs_supports_append = hasattr(self.fs, "append") | ||
if _get_fs_class(self.filename) == AzureBlobFileSystem: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a bit odd that we would call get_filesystem, which calls fsspec.get_filesystem_class transitively, and then we call fsspec.get_filesystem_class again. Can we remove the duplication? Perhaps get_filesystem could return a tuple, where first entry is the filesystem (like _FSSPEC_FILESYSTEM) and the second entry is the filesystem class (if any, like AzureBlobFileSystem). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted to avoid that approach because it required a bit more refactoring, but I implemented it now, please check it out. |
||
self.fs_supports_append = False | ||
else: | ||
self.fs_supports_append = hasattr(self.fs, "append") | ||
self.buff = None | ||
# The buffer offset and the buffer chunk size are measured in the | ||
# natural units of the underlying stream, i.e. bytes for binary mode, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have adlfs libraries installed on development or ci machines so this is not going to build. You can see this in the failures for the build checks:
https://github.com/tensorflow/tensorboard/actions/runs/9352527415/job/25755552731?pr=6860
What would be a good way to check for this without importing AzureBlobFileSystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, if a file is initially created by opening it in append mode, this file will allow append operations later on. However, as shown in the code snippet above, the file is always created by opening it in write mode:
One other option, besides checking what filesystem we are dealing with, is to always open the file in append mode initially, when the fs supports that. However, this means the contents of the file, if it exists, will not be cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option, which I am not very fond of, is checking the fs_class's string representation for the substring "Azure". You can see this approach in the latest commit.
However, I believe the approach I suggested above would be better. I don't have all the context behind why you would to want clear the file though, and whether always creating the file in append mode would be appropriate. In my context, where I use lightning's TensorboardLogger, a unique experiment directory gets created at the beginning of the run, so the events file will always be in a unique location, unless we explicitly want to continue an experiment, in which case appending to an existing events file should not be a problem.