Skip to content

Releases: sdv-dev/RDT

v1.11.0 - 2024-04-10

11 Apr 02:26
Compare
Choose a tag to compare

This release adds support for Python 3.12! It also fixes a bug that kept certain functions from being used on the AnonymizedFaker when locales were provided.

Maintenance

Bugs Fixed

  • Providing locales to AnonymizedFaker with a function that uses the BaseProvider crashes - Issue #774 by @frances-h
  • Fix minimum version workflow when pointing to github branch - Issue #783 by @R-Palazzo

New Features

  • Move out sdtype validations from multi-column transformers - Issue #778 by @R-Palazzo

v1.10.1 - 2024-03-21

21 Mar 19:17
Compare
Choose a tag to compare

This release fixes a bug with loading saved AnonymizedFaker transformers from previous versions of RDT.

Bugs Fixed

  • Add enforce_uniqueness attribute to AnonymizedFaker - PR #771 by @fealho
  • Fix backwards compatability for cardinality_rule- PR #772 by @frances-h

v1.10.0 - 2024-03-13

13 Mar 14:34
Compare
Choose a tag to compare

The AnonymizedFaker now supports more options for the cardinality of the generated data. Previously you could make make the generated data be all unique, or not take uniqueness into consideration. Now you can use the cardinality_rule parameter to match the cardinality of the original data.

New Features

  • Allow AnonymizedFaker to learn cardinality from the real data - Issue #756 by @fealho

Deprecations

The enforce_uniqueness parameter of the AnonymizedFaker is deprecated in favor of the cardinality_rule parameter.

Maintenance

  • Transition from using setup.py to pyproject.toml to specify project metadata - Issue #763 by @R-Palazzo
  • Remove bumpversion and use bump-my-version - Issue #764 by @R-Palazzo
  • Add build to dev requirements - Issue #768 by @amontanez24

v1.9.2 - 2024-02-13

13 Feb 21:25
Compare
Choose a tag to compare

This release makes a couple improvements to the RegexGenerator. Error messaging is improved and it is now capable of generating an unlimited amount of rows even when the enforce_uniqueness flag is True. It does this by adding suffixes if the max amount of combinations for the provided regex is met.

Additionally, this release resolves a few bugs. The OneHotEncoder should no longer crash on the categorical dtype and the UniformEncoder was improved to support more dtypes.

Bugs Fixed

  • Categorical reverse transform may crash with ValueError for certain dtypes (int64) - Issue #747 by @R-Palazzo
  • RegexGenerator gives a confusing message: # of possibilities are shown as an imaginary number - Issue #748 by @R-Palazzo
  • OneHotEncoder doesn't support dtype 'category' - Issue #751 by @fealho

New Features

  • RegexGenerator should create unlimited regexes, even if unique enforcement is on - Issue #749 by @fealho
  • Add a _update_multi_column_transformer method - Issue #757 by @R-Palazzo

Internal

  • Move the _learn_rounding_digits of the FloatFormatter into a helper - Issue #750 by @fealho

v1.9.1 - 2024-01-10

10 Jan 22:45
Compare
Choose a tag to compare

This release fixes a bug that caused the AnonymizedFaker to crash with provider/function combinations that return tuples.

Bugs Fixed

  • AnonymizedFaker crashes with ValueError for specific provider/function pairs (eg. currency) - Issue #743 by @ R-Palazzo

v1.9.0 - 2023-11-14

14 Nov 22:52
Compare
Choose a tag to compare

This release adds a parameter to the UnixTimestampEncoder and OptimizedTimestampEncoder, called enforce_min_max_values. When this is set to True, it clips all values in the reverse transformed data to the min and max datetimes seen in the fitted data.

This release also internally adds support for multi-column transformers!

New Features

  • Support multi-column transformers - Issue #683 by @R-Palazzo
  • Improve user warnings and logic for update_sdtype - Issue #684 by @R-Palazzo
  • Improve user warnings and logic for update_transformers and update_transformers_by_sdtype - Issue #685 by @R-Palazzo
  • Improve user warnings and logic for remove_transformers and remove_transformers_by_sdtype - Issue #686 by @R-Palazzo
  • Add enforce_min_max_values to datetime transformers - Issue #740 by @R-Palazzo

Internal

Bugs Fixed

  • Multi column transformers crash when assigned to single column - Issue #734 by @R-Palazzo

v1.8.0 - 2023-10-31

31 Oct 16:05
Compare
Choose a tag to compare

This release adds the 'random' missing value replacement strategy, which uses random values of the dataset to fill in missing values.
Additionally users are now able to use the UniformUnivariate distribution within the Gaussian Normalizer with this update.

This release contains fixes for the ClusterBasedNormalizer which crashes in the reverse transform caused by values being out of bounds
and a patch for the randomization issue dealing with different values after applying reset_randomization.

Anonymization has been moved into RDT library from SDV as it was found to self contained module for RDT and would reduce dependencies needed in SDV.

Features

  • Make the default missing value imputation 'mean' - Issue#730 by @R-Palazzo
  • When no rounding scheme is detected, log the info instead of showing a warning - Issue#709 by @frances-h
  • The GaussianNormalizer should accept distribution names that are consistent with scipy - Issue#656 by @fealho
  • The GaussianNormalizer should accept uniform distributions - Issue#655 by @fealho
  • Remove psutil - Issue#615 by @fealho
  • Consider deprecating the FrequencyEncoder - Issue#614 by @fealho
  • Replace missing values with variable (random) values from the dataset - Issue#606

Bugs

  • RDT Uniform Encoder creates nan Value bug - Issue#719 by @lajohn4747
  • HyperTransformer transforms while fitting and messes up the random seed - Issue#716 by @pvk-developer
  • Resolve locales warning for specific sdtype/locale combos (eg. en_US with postcode) - Issue#701 by @pvk-developer
  • The OrderedLabelEncoder should not accept duplicate categories - Issue#673 by @frances-h
  • ClusterBasedNormalizer crashes on reverse transform (IndexError) - Issue#672 by @fealho
  • Unnecessary warning in OneHotEncoder when there are nan values - Issue#616 by @fealho

Maintenance

Deprecations

  • The frequencyEncoder transformer will no longer be supported in future versions of RDT. Please use the UniformEncoder transformer instead.
  • GaussianNormalizer distribution option names have been updated to be consistent with scipy. gaussian -> norm, student_t-> t, and truncated_gaussian -> truncnorm

v1.7.0 - 2023-08-22

22 Aug 19:33
Compare
Choose a tag to compare

This release adds 3 new transformers:

  1. UniformEncoder - A categorical and boolean transformer that converts the column into a uniform distribution.
  2. OrderedUniformEncoder - The same as above, but the order for the categories can be specified, changing which range in the uniform distribution each category belongs to.
  3. IDGenerator- A text transformer that drops the input column during transform and returns IDs during reverse transform. The IDs all take the form <prefix><number><suffix> and can be configured with a custom prefix, suffix and starting point.

Additionally, the AnonymizedFaker is enhanced to support the text sdtype.

Deprecations

  • The get_input_sdtype method is being deprecated in favor of get_supported_sdtypes.

New Features

Maintenance

v1.6.1 - 2023-08-02

02 Aug 20:56
Compare
Choose a tag to compare

This release updates the default transformers used for certain sdtypes. It also enables the AnonymizedFaker and PseudoAnonymizedFaker to work with any sdtype besides boolean, categorical, datetime, numerical or text.

Bugs

  • [Enterprise Usage] Unable to assign generic PII transformers (eg. AnonymizedFaker) - Issue #674 by @amontanez24

New Features

  • Update the default transformers that HyperTransformer assigns to each sdtype - Issue #664 by @amontanez24

v1.6.0 - 2023-07-12

12 Jul 20:36
Compare
Choose a tag to compare

This release adds the ability to generate missing values to the AnonymizedFaker. Users can now provide the missing_value_generation parameter during initialization. They can set it to None to not generate any missing values, or 'random' to generate random missing values in the same proportion as the fitted data.

Additionally, this release improves the NullTransformer by allowing nulls to be replaced on the forward transform even if missing_value_generation is set to None. It also fixes a bug that was causing the UnixTimestampEncoder to return a different dtype than the input on reverse_transform. This was particularly problematic when datetime columns are represented as ints.

New Features

  • AnonymizedFaker should be able to model and generate missing values - Issue #660 by @R-Palazzo

Bugs

  • The datetime transformers don't give me back the same dtype sometimes - Issue #657 by @frances-h
  • RDT NullTransformer doesn't replace nulls if missing_value_generation is None - Issue #658 by @amontanez24

Maintenance

Internal