Skip to content

Add BootstrapSamplingTransform to DataOperationsCatalog #2384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rogancarr opened this issue Feb 2, 2019 · 2 comments
Closed

Add BootstrapSamplingTransform to DataOperationsCatalog #2384

rogancarr opened this issue Feb 2, 2019 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@rogancarr
Copy link
Contributor

As noted in #933, the BootstrapSamplingTransform isn't something that we want recorded in pipelines (e.g. having it executed on a test set). However, it is super useful to have in our library, so it would be nice to have a way to use it in user-facing code.

@rogancarr rogancarr self-assigned this Feb 2, 2019
@rogancarr
Copy link
Contributor Author

One solution that @TomFinley and I spoke of is to add it to the DataOperationsCatalog as an IDataView to IDataView operation.

@rogancarr rogancarr added the enhancement New feature or request label Feb 2, 2019
@TomFinley
Copy link
Contributor

Right, thanks Rogan. As mentioned in the issue it strikes me as being kind of "filter like" (it is sort of a very special sort of probabilistic filter), so probably belongs alongside the others in DataOperationsCatalog.

Two arguments? int? seed=null (if null, seed taken from, I guess, the catalog's environment?), and a bool to control whether we want the compliment of the sample or not ... so you could get a training set, and a test set. I think that's pretty much all we'd need for v1, would be pretty easy to add.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants