-
Notifications
You must be signed in to change notification settings - Fork 45
ExtractMetadata becomes non-responsive after individual error #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @jnathanh ! Since providing a folder makes the library bug, I think that the first thing to do is to reject folder paths. I'm not convinced that handling folders is usefull. As a CLI it is (for exiftool), but as a library, if it does not increase performance, I think that keeping the library as simple as possible is better. What to you think ? |
@barasher I agree it's best to stick with the simplest fix, and I don't personally need to pass the directory so I think your first instinct to leave out directory support for now is probably the right call. I did a little more digging and found the root issue, the buffer is getting exhausted. In ExtractMetadata, when the I started on a fix, but it gets messy without breaking changes to the api. The obvious change is to report actual scan errors first before the "no content" error. What is not as straight-forward is how to address the bad scanner state. Here are the different options I have explored:
I suggest option 1 for the more reliable design, or option 3 for the simplest solution (and least api disruption). |
Thanks for the analysis ! Backward compatibility is very important to me. I hate when libraries I use have breaking changes in their API :p Here is the PR : #55 What do you think ? |
@barasher the directory check seems like a sensible feature given this library's lack of support for directories. However, it only addresses a symptom of the issue I was trying to raise. You could also trigger the same issue with a single file, if it is big enough to fill the scanner's buffer (see test below). This is likely an edge case since metadata files are usually small (at least the photo metadata I'm used to working with). So, you may want to leave it unresolved for now, however, I want to make sure you're aware that is the decision you are making by only addressing the issue with the directory check. func TestLargeMetadata(t *testing.T) {
t.Parallel()
bigFile, cleanup := createFileLargerThan(t, bufio.MaxScanTokenSize)
defer cleanup()
e, err := NewExiftool()
require.NoError(t, err)
defer e.Close()
// big file should not error, but if it does, it should output the actual source of the error
bigResults := e.ExtractMetadata(bigFile)
require.Len(t, bigResults, 1)
assert.NoError(t, bigResults[0].Err)
// subsequent small file should also not be affected by previous file reads
smallResults := e.ExtractMetadata("testdata/20190404_131804.jpg")
require.Len(t, smallResults, 1)
assert.NoError(t, smallResults[0].Err)
}
func createFileLargerThan(t *testing.T, byteCount int) (path string, cleanup func()) {
t.Helper()
dirPath := t.TempDir()
cleanup = func() { os.RemoveAll(dirPath) }
path = filepath.Join(dirPath, "20190404_131804.jpg")
require.Nil(t, copyFile("testdata/20190404_131804.jpg", path))
e, err := NewExiftool()
require.NoError(t, err)
defer e.Close()
mds := e.ExtractMetadata(path)
require.Len(t, mds, 1)
require.NoError(t, mds[0].Err)
valueLargerThanBuffer := ""
for i := 0; i < byteCount+1; i++ {
valueLargerThanBuffer += "x"
}
mds[0].SetString("Comment", valueLargerThanBuffer)
mds[0].Err = nil
e.WriteMetadata(mds)
require.NoError(t, mds[0].Err)
return
} |
I totally agree. Nevertheless, currently, the The only advantage I see using a |
Ya, I think using a larger buffer is an acceptable solution. The problem is that a new user, using the program defaults, would never find out that is the issue without extensive debugging. Ideally the program should report that the buffer filled up so a user know they need to use a larger buffer. Even worse, the output actually reports that the metadata was empty, so without investigating further, a user would think that there was just not metadata in the file. Based on your feedback so far, I suggest adding a panic. That would make it clear what the issue is and not report back incorrect metadata results. |
* Error ErrNotFile returned when extracting metadata from folder (#52 fix) * Test simplification
I created #56 to migrate from Scanner to Reader (and to return a cleaner response). |
Hi, thanks for sharing this package. I ran into an error case trying to do a filepath.Walk over my photos directory. Manually skipping directories does avoid this issue, but it seems like the wrong behavior for ExtractMetadata to silently start giving incorrect results as a result of a previous call. I'd like to incorporate this package into some of my workflows, so I'm raising the issue to help improve the package's usability/reliability. Here is a test that reproduces the behavior:
I suspect the issue is either a race (scanning output before it's finished), or improper buffer usage (overflow maybe?). I ran the same test with exiftool directly and got output right away without issue.
One other note, it looks like exiftool supports paths to a directory as an input, or even a list of multiple files. It might be worth reworking the design to let exiftool handle the work of managing multiple paths (directories or files), and add support for parsing the multi-file output.
I'll try to work on a PR for this, but posting the issue now as a heads up, and to see if you have any suggestions on the right design to fix the issue.
The text was updated successfully, but these errors were encountered: