Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Sign up

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

tensorflow / data-validation Public

Notifications You must be signed in to change notification settings
Fork 177
Star 769

Code
Issues 25
Pull requests 13
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: tensorflow/data-validation

Releases · tensorflow/data-validation

# Version 0.23.0

14 Aug 21:34

dhruvesh09

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

# Version 0.23.0

Major Features and Improvements

Data validation is now able to handle arbitrarily nested arrow
List/LargeList types. Schema entries for features with multiple nest levels
describe the value count at each level in the value_counts field.
Add combiner stats generator to estimate top-K and uniques using Misra-Gries
and K-Minimum Values sketches.

Bug Fixes and Other Changes

Validate that enough supported images are present (if
image_domain.minimum_supported_image_fraction is provided).
Stopped requiring avro-python3.
Depends on apache-beam[gcp]>=2.23,<3.
Depends on pyarrow>=0.17,<0.18.
Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<3.
Depends on tensorflow-metadata>=0.23,<0.24.
Depends on tensorflow-transform>=0.23,<0.24.
Depends on tfx-bsl>=0.23,<0.24.

Known Issues

N/A

Breaking Changes

N/A

Deprecations

N/A

Assets 2

Loading

All reactions

TFDV 0.22.2 Release

29 Jun 22:30

dhruvesh09

Compare

Choose a tag to compare

Loading

TFDV 0.22.2 Release

Major Features and Improvements

Bug Fixes and Other Changes

Fixed a bug that affected tfx 0.22.0 to work with TFDV 0.22.1.
Depends on 'avro-python3>=1.8.1,<1.9.2' on Python 3.5 + MacOS

Known Issues

Breaking Changes

Deprecations

Assets 2

Loading

All reactions

TFDV 0.22.1 Release

24 Jun 23:31

dhruvesh09

Compare

Choose a tag to compare

Loading

TFDV 0.22.1 Release

Major Features and Improvements

Statistics generation is now able to handle arbitrarily nested arrow
List/LargeList types. Stats about the list elements' presence and valency
are computed at each nest level, and stored in a newly added field,
valency_and_presence_stats in CommonStatistics.

Bug Fixes and Other Changes

Trigger DATASET_HIGH_NUM_EXAMPLES when a dataset has more than the specified
limit on number of examples.
Fix bug in display_anomalies that prevented dataset-level anomalies from
being displayed.
Trigger anomalies when a feature has a number of unique values that does not
conform to the specified minimum/maximum.
Depends on pandas>=0.24,<2.
Depends on tensorflow-metadata>=0.22.2,<0.23.0.
Depends on tfx-bsl>=0.22.1,<0.23.0.

Known Issues

Breaking Changes

Deprecations

Assets 2

Loading

All reactions

Version 0.22.0

15 May 23:36

dhruvesh09

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Version 0.22.0

Major Features and Improvements

Bug Fixes and Other Changes

Crop values in natural language stats generator.
Switch to using PyBind11 instead of SWIG for wrapping C++ libraries.
CSV decoder support for multivalent columns by using tfx_bsl's decoder.
When inferring a schema entry for a feature, do not add a shape with dim = 0
when min_num_values = 0.
Add utility methods tfdv.get_slice_stats to get statistics for a slice and
tfdv.compare_slices to compare statistics of two slices using Facets.
Make tfdv.load_stats_text and tfdv.write_stats_text public.
Add PTransforms tfdv.WriteStatisticsToText and
tfdv.WriteStatisticsToTFRecord to write statistics proto to text and
tfrecord files respectively.
Modify tfdv.load_statistics to handle reading statistics from TFRecord and
text files.
Added an extra requirement group mutual-information. As a result, barebone
TFDV does not require scikit-learn any more.
Added an extra requirement group visualization. As a result, barebone TFDV
does not require ipython any more.
Added an extra requirement group all that specifies all the extra
dependencies TFDV needs. Use pip install tensorflow-data-validation[all]
to pull in those dependencies.
Depends on pyarrow>=0.16,<0.17.
Depends on apache-beam[gcp]>=2.20,<3.
Depends on `ipython>=7,<8;python_version>="3"'.
Depends on `scikit-learn>=0.18,<0.24'.
Depends on tensorflow>=1.15,!=2.0.*,<3.
Depends on tensorflow-metadata>=0.22.0,<0.23.
Depends on tensorflow-transform>=0.22,<0.23.
Depends on tfx-bsl>=0.22,<0.23.

Known Issues

(Known issue resolution) It is no longer necessary to use Apache Beam 2.17
when running TFDV on Windows. The current release of Apache Beam will work.

Breaking Changes

tfdv.GenerateStatistics now accepts a PCollection of pa.RecordBatch
instead of pa.Table.
All the TFDV coders now output a PCollection of pa.RecordBatch instead of
a PCollection of pa.Table.
tfdv.validate_instances and
tfdv.api.validation_api.IdentifyAnomalousExamples now takes
pa.RecordBatch as input instead of pa.Table.
The StatsGenerator interface (and all its sub-classes) now takes
pa.RecordBatch as the input data instead of pa.Table.
Custom slicing functions now accepts a pa.RecordBatch instead of
pa.Table as input and should output a tuple (slice_key, record_batch).

Deprecations

Deprecating Py2 support.

Assets 2

Loading

All reactions

Release 0.21.5

06 Mar 23:36

dhruvesh09

Compare

Choose a tag to compare

Loading

Release 0.21.5

Release 0.21.5

Major Features and Improvements

Add label_feature to StatsOptions and enable LiftStatsGenerator when
label_feature and schema are provided.
Add JSON serialization support for StatsOptions.

Bug Fixes and Other Changes

Only requires avro-python3>=1.8.1,!=1.9.2.*,<2.0.0 on Python 3.5 + MacOS

Breaking Changes

Deprecations

Assets 2

Loading

All reactions

Release 0.21.4

05 Mar 03:00

dhruvesh09

Compare

Choose a tag to compare

Loading

Release 0.21.4

Release 0.21.4

Major Features and Improvements

Support visualizing feature value lift in facets visualization.

Bug Fixes and Other Changes

Fix issue writing out string feature values in LiftStatsGenerator.
Requires 'apache-beam[gcp]>=2.17,<3'.
Requires 'tensorflow-transform>=0.21.1,<0.22'.
Requires 'tfx-bsl>=0.21.3,<0.22'.

Breaking Changes

Deprecations

Assets 2

Loading

All reactions

Release 0.21.2

20 Feb 17:28

dhruvesh09

Compare

Choose a tag to compare

Loading

Release 0.21.2

Release 0.21.2

Major Features and Improvements

Bug Fixes and Other Changes

Fix facets visualization.

Breaking Changes

Deprecations

tfdv.TFExampleDecoder has been removed. This legacy decoder converts
serialized tf.Example to a dict of numpy arrays, which is the legacy
input format (prior to Apache Arrow). TFDV has stopped accepting that format
since 0.14. Use tfdv.DecodeTFExample instead.

Assets 2

Loading

All reactions

Release 0.21.1

11 Feb 21:31

dhruvesh09

Compare

Choose a tag to compare

Loading

Release 0.21.1

Release 0.21.1

Major Features and Improvements

Bug Fixes and Other Changes

Do validation on weighted feature stats.
During schema inference, skip features which are missing common stats. This
makes schema inference work when the input stats are generated from some
pre-existing, unknown schema.
Fix facets visualization in Chrome >=M80.

Known Issues

Running TFDV with Apache Beam 2.18 or 2.19 does not work on Windows. If you
are using TFDV on Windows, use Apache Beam 2.17.

Breaking Changes

Deprecations

Assets 2

Loading

All reactions

Release 0.21.0

21 Jan 19:44

dhruvesh09

Compare

Choose a tag to compare

Loading

Release 0.21.0

Release 0.21.0

Major Features and Improvements

Started depending on the CSV parsing / type inferring utilities provided
by tfx-bsl (since tfx-bsl 0.15.2). This also brings performance improvements
to the CSV decoder (~2x faster in decoding. Type inferring performance is not
affected).
Compute bytes statistics for features of BYTES type. Avoid computing topk and
uniques for such features.
Added LiftStatsGenerator which computes lift between one feature (typically a
label) and all other categorical features.

Bug Fixes and Other Changes

Exclude examples in which the entire sparse feature is missing when
calculating sparse feature statistics.
Validate min_examples_count dataset constraint.
Document the schema fields, statistics fields, and detection condition for
each anomaly type that TFDV detects.
Handle null array in cross feature stats generator, top-k & uniques combiner
stats generator, and sklearn mutual information generator.
Handle infinity in basic stats generator.
Set num_missing and num_examples correctly in the presence of sparse
features.
Compute weighted feature stats for all weighted features declared in schema.
Depends on tensorflow-metadata>=0.21.0,<0.22.
Depends on pyarrow>=0.15 (removed the upper bound as it is determined by
tfx-bsl).
Depends on tfx-bsl>=0.21.0,<0.22
Depends on apache-beam>=2.17,<3

Breaking Changes

Changed the behavior regarding to statistics over CSV data:
- Previously, if a CSV column was mixed with integers and empty strings, FLOAT
  statistics will be collected for that column. A change was made so INT
  statistics would be collected instead.
Removed csv_decoder.DecodeCSVToDict as Dict[str, np.ndarray] had no longer
been the internal data representation any more since 0.14.

Deprecations

Assets 2

Loading

All reactions

Release 0.15.0

23 Oct 02:18

paulgc

Compare

Choose a tag to compare

Loading

Release 0.15.0

Major Features and Improvements

Generate statistics for sparse features.
Directly convert a batch of tf.Examples to Arrow tables. Avoids conversion of
tf.Example to intermediate Dict representation.

Bug Fixes and Other Changes

Generate statistics for the weight feature.
Support validation and schema inference from sliced statistics that include
the default slice (validation/inference will be done using the default slice
statistics).
Avoid flattening null arrays.
Set weighted_num_examples field in the statistics proto if a weight
feature is specified.
Replace DecodedExamplesToTable with a Python implementation.
Building TFDV from source does not need pyarrow anymore.
Depends on apache-beam[gcp]>=2.16,<3.
Depends on six>=1.12,<2.
Depends on scikit-learn>=0.18,<0.22.
Depends on tfx-bsl>=0.15,<0.16.
Depends on tensorflow-metadata>=0.15,<0.16.
Depends on tensorflow-transform>=0.15,<0.16.
Depends on tensorflow>=1.15,<3.
- Starting from 1.15, package
  tensorflow comes with GPU support. Users won't need to choose between
  tensorflow and tensorflow-gpu.
- Caveat: tensorflow 2.0.0 is an exception and does not have GPU
  support. If tensorflow-gpu 2.0.0 is installed before installing
  tensorflow-data-validation, it will be replaced with tensorflow 2.0.0.
  Re-install tensorflow-gpu 2.0.0 if needed.

Breaking Changes

Deprecations

Assets 2

Loading

All reactions

Previous 1 2 3 4 5 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.