-
Notifications
You must be signed in to change notification settings - Fork 149
Conversation
import Foundation | ||
import TensorFlow | ||
|
||
public struct COCODataset<Entropy: RandomNumberGenerator> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the dataset exposed access to the underlying array of [ObjectDetectionExample]
and it was useful to do custom preprocessing before it's converted to the batcher/epochs. I wonder if this PR can preserve this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For custom preprocessing before delivery to anything downstream, makeBatch()
is the intended customization point. In fact, I think we'll eventually want to shift from using the LazyImage type to just providing a URL and having the makeBatch()
perform all necessary processing at time of use. This is done in the Imagenette dataset, for example, with lazy loading of images coming from input URLs at the point of makeBatch().
What if I added a settable mapping function of ObjectDetectionExample
-> ObjectDetectionExample
that was called within makeBatch()
, where any custom preprocessing could be specified for a given instance of the dataset? Or makeBatch()
itself could be a user-provided function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new transform
parameter to the dataset creation that allows for the insertion of a custom ObjectDetectionExample -> ObjectDetectionExample mapping to be performed on each example. This will occur within makeBatch(), and by default we'll have an identity mapping. Again, This provides a starting point for working with the dataset, but I think we'll want to do a more thorough reorganization later to take full advantage of the Epochs design. That will be motivated by the examples you're working on.
For now, I'd like to make sure we've preserved functionality while cleaning up the last deprecation warnings before another stable branch cut.
…the COCO pipeline.
… a source example.
In response to part of issue #592, this migrates the COCO dataset to the Epochs API. Further cleanup might be required to better align the lazy object detection pipeline with Epochs and its new capabilities, but this provides the same functionality as before and removes deprecation warnings.
As this was the last use of TensorPair's _Collatable functionality, that has been removed. Batcher dependencies have also been removed from the Datasets module.
Additionally, copyright headers were missing from several files and have been added.