Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Discretization serialization when num_bins is used. #20971

Merged
merged 1 commit into from
Mar 4, 2025

Conversation

hertschuh
Copy link
Collaborator

Previously, serialization / deserialization would fail if:

  • the layer was saved / restored before adapt was called
  • the layer was saved / restored after adapt was called, but the dataset was such that the number of bins learned was fewer than num_bins

The fix consists in not serializing bin_boundaries if init was called with num_bins. This is consistent with how tf_keras works.

Tightened the error checking:

  • never allow num_bins and bin_boundaries to be specified at the same time, even if they match (same as tf_keras)
  • don't allow num_bins and bin_boundaries to be None at the same time
  • verify that adapt has been called in call

Also removed init_bin_boundaries as the value was never used and its presence can be inferred.

Copy link

Important

The terms of service for this installation has not been accepted. Please ask the Organization owners to visit the Gemini Code Assist Admin Console to sign it.

@codecov-commenter
Copy link

codecov-commenter commented Feb 26, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.

Project coverage is 82.45%. Comparing base (19b1418) to head (1e06886).

Files with missing lines Patch % Lines
keras/src/layers/preprocessing/discretization.py 83.33% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #20971   +/-   ##
=======================================
  Coverage   82.44%   82.45%           
=======================================
  Files         562      562           
  Lines       53292    53303   +11     
  Branches     8255     8257    +2     
=======================================
+ Hits        43936    43949   +13     
+ Misses       7339     7338    -1     
+ Partials     2017     2016    -1     
Flag Coverage Δ
keras 82.26% <83.33%> (+<0.01%) ⬆️
keras-jax 64.03% <83.33%> (+0.01%) ⬆️
keras-numpy 58.85% <83.33%> (+0.01%) ⬆️
keras-openvino 32.69% <11.11%> (-0.01%) ⬇️
keras-tensorflow 64.48% <83.33%> (+0.01%) ⬆️
keras-torch 64.09% <83.33%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hertschuh hertschuh marked this pull request as draft February 26, 2025 22:31
@hertschuh hertschuh force-pushed the discretization_config branch from 1aee0a0 to ca2e1f1 Compare February 27, 2025 18:44
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Mar 4, 2025
Previously, serialization / deserialization would fail if:
- the layer was saved / restored before `adapt` was called
- the layer was saved / restored after `adapt` was called, but the dataset was such that the number of bins learned was fewer than `num_bins`

The fix consists in adding a `from_config` to handle `bin_boundaries` separately. This is because at initial creation, `bin_boundaries` and `num_bins` cannot be both set, but when restoring the layer after `adapt`, they are both set.

Tightened the error checking:
- never allow `num_bins` and `bin_boundaries` to be specified at the same time, even if they match (same as `tf_keras`)
- don't allow `num_bins` and `bin_boundaries` to be `None` at the same time
- verify that `adapt` has been called in `call`

Also removed `init_bin_boundaries` as the value was never used and its presence can be inferred.
@hertschuh hertschuh force-pushed the discretization_config branch from ca2e1f1 to 1e06886 Compare March 4, 2025 20:08
@google-ml-butler google-ml-butler bot removed the ready to pull Ready to be merged into the codebase label Mar 4, 2025
@hertschuh hertschuh marked this pull request as ready for review March 4, 2025 21:03
@hertschuh hertschuh merged commit eb1f844 into keras-team:master Mar 4, 2025
7 checks passed
@hertschuh hertschuh deleted the discretization_config branch March 4, 2025 21:03
SaifMohammed22 pushed a commit to SaifMohammed22/keras that referenced this pull request Mar 6, 2025
…am#20971)

Previously, serialization / deserialization would fail if:
- the layer was saved / restored before `adapt` was called
- the layer was saved / restored after `adapt` was called, but the dataset was such that the number of bins learned was fewer than `num_bins`

The fix consists in adding a `from_config` to handle `bin_boundaries` separately. This is because at initial creation, `bin_boundaries` and `num_bins` cannot be both set, but when restoring the layer after `adapt`, they are both set.

Tightened the error checking:
- never allow `num_bins` and `bin_boundaries` to be specified at the same time, even if they match (same as `tf_keras`)
- don't allow `num_bins` and `bin_boundaries` to be `None` at the same time
- verify that `adapt` has been called in `call`

Also removed `init_bin_boundaries` as the value was never used and its presence can be inferred.
11happy pushed a commit to 11happy/keras that referenced this pull request Mar 9, 2025
…am#20971)

Previously, serialization / deserialization would fail if:
- the layer was saved / restored before `adapt` was called
- the layer was saved / restored after `adapt` was called, but the dataset was such that the number of bins learned was fewer than `num_bins`

The fix consists in adding a `from_config` to handle `bin_boundaries` separately. This is because at initial creation, `bin_boundaries` and `num_bins` cannot be both set, but when restoring the layer after `adapt`, they are both set.

Tightened the error checking:
- never allow `num_bins` and `bin_boundaries` to be specified at the same time, even if they match (same as `tf_keras`)
- don't allow `num_bins` and `bin_boundaries` to be `None` at the same time
- verify that `adapt` has been called in `call`

Also removed `init_bin_boundaries` as the value was never used and its presence can be inferred.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants