Releases · sdv-dev/RDT

11 Apr 02:26

v1.11.0

a5a5e5c

v1.11.0 - 2024-04-10

This release adds support for Python 3.12! It also fixes a bug that kept certain functions from being used on the AnonymizedFaker when locales were provided.

Maintenance

Support Python 3.12 - Issue #744 by @fealho
Add dependency checker - Issue #777 by @lajohn4747
Add bandit workflow - Issue #781 by @R-Palazzo

Bugs Fixed

Providing locales to AnonymizedFaker with a function that uses the BaseProvider crashes - Issue #774 by @frances-h
Fix minimum version workflow when pointing to github branch - Issue #783 by @R-Palazzo

New Features

Move out sdtype validations from multi-column transformers - Issue #778 by @R-Palazzo

Contributors

lajohn4747, frances-h, and 2 other contributors

Assets 2

21 Mar 19:17

frances-h

v1.10.1

903722c

v1.10.1 - 2024-03-21

This release fixes a bug with loading saved AnonymizedFaker transformers from previous versions of RDT.

Bugs Fixed

Add enforce_uniqueness attribute to AnonymizedFaker - PR #771 by @fealho
Fix backwards compatability for cardinality_rule- PR #772 by @frances-h

Contributors

frances-h and fealho

Assets 2

13 Mar 14:34

amontanez24

v1.10.0

a8608f6

v1.10.0 - 2024-03-13

The AnonymizedFaker now supports more options for the cardinality of the generated data. Previously you could make make the generated data be all unique, or not take uniqueness into consideration. Now you can use the cardinality_rule parameter to match the cardinality of the original data.

New Features

Allow AnonymizedFaker to learn cardinality from the real data - Issue #756 by @fealho

Deprecations

The enforce_uniqueness parameter of the AnonymizedFaker is deprecated in favor of the cardinality_rule parameter.

Maintenance

Transition from using setup.py to pyproject.toml to specify project metadata - Issue #763 by @R-Palazzo
Remove bumpversion and use bump-my-version - Issue #764 by @R-Palazzo
Add build to dev requirements - Issue #768 by @amontanez24

Contributors

amontanez24, fealho, and R-Palazzo

Assets 2

13 Feb 21:25

amontanez24

v1.9.2

563e722

v1.9.2 - 2024-02-13

This release makes a couple improvements to the RegexGenerator. Error messaging is improved and it is now capable of generating an unlimited amount of rows even when the enforce_uniqueness flag is True. It does this by adding suffixes if the max amount of combinations for the provided regex is met.

Additionally, this release resolves a few bugs. The OneHotEncoder should no longer crash on the categorical dtype and the UniformEncoder was improved to support more dtypes.

Bugs Fixed

Categorical reverse transform may crash with ValueError for certain dtypes (int64) - Issue #747 by @R-Palazzo
RegexGenerator gives a confusing message: # of possibilities are shown as an imaginary number - Issue #748 by @R-Palazzo
OneHotEncoder doesn't support dtype 'category' - Issue #751 by @fealho

New Features

RegexGenerator should create unlimited regexes, even if unique enforcement is on - Issue #749 by @fealho
Add a _update_multi_column_transformer method - Issue #757 by @R-Palazzo

Internal

Move the _learn_rounding_digits of the FloatFormatter into a helper - Issue #750 by @fealho

Contributors

fealho and R-Palazzo

Assets 2

10 Jan 22:45

amontanez24

v1.9.1

862fe05

v1.9.1 - 2024-01-10

This release fixes a bug that caused the AnonymizedFaker to crash with provider/function combinations that return tuples.

Bugs Fixed

AnonymizedFaker crashes with ValueError for specific provider/function pairs (eg. currency) - Issue #743 by @ R-Palazzo

Assets 2

14 Nov 22:52

amontanez24

v1.9.0

c03d9ac

v1.9.0 - 2023-11-14

This release adds a parameter to the UnixTimestampEncoder and OptimizedTimestampEncoder, called enforce_min_max_values. When this is set to True, it clips all values in the reverse transformed data to the min and max datetimes seen in the fitted data.

This release also internally adds support for multi-column transformers!

New Features

Support multi-column transformers - Issue #683 by @R-Palazzo
Improve user warnings and logic for update_sdtype - Issue #684 by @R-Palazzo
Improve user warnings and logic for update_transformers and update_transformers_by_sdtype - Issue #685 by @R-Palazzo
Improve user warnings and logic for remove_transformers and remove_transformers_by_sdtype - Issue #686 by @R-Palazzo
Add enforce_min_max_values to datetime transformers - Issue #740 by @R-Palazzo

Internal

Support multi-column transformers - Issue #683 by @R-Palazzo

Bugs Fixed

Multi column transformers crash when assigned to single column - Issue #734 by @R-Palazzo

Contributors

R-Palazzo

Assets 2

31 Oct 16:05

lajohn4747

v1.8.0

519c217

v1.8.0 - 2023-10-31

This release adds the 'random' missing value replacement strategy, which uses random values of the dataset to fill in missing values.
Additionally users are now able to use the UniformUnivariate distribution within the Gaussian Normalizer with this update.

This release contains fixes for the ClusterBasedNormalizer which crashes in the reverse transform caused by values being out of bounds
and a patch for the randomization issue dealing with different values after applying reset_randomization.

Anonymization has been moved into RDT library from SDV as it was found to self contained module for RDT and would reduce dependencies needed in SDV.

Features

Make the default missing value imputation 'mean' - Issue#730 by @R-Palazzo
When no rounding scheme is detected, log the info instead of showing a warning - Issue#709 by @frances-h
The GaussianNormalizer should accept distribution names that are consistent with scipy - Issue#656 by @fealho
The GaussianNormalizer should accept uniform distributions - Issue#655 by @fealho
Remove psutil - Issue#615 by @fealho
Consider deprecating the FrequencyEncoder - Issue#614 by @fealho
Replace missing values with variable (random) values from the dataset - Issue#606

Bugs

RDT Uniform Encoder creates nan Value bug - Issue#719 by @lajohn4747
HyperTransformer transforms while fitting and messes up the random seed - Issue#716 by @pvk-developer
Resolve locales warning for specific sdtype/locale combos (eg. en_US with postcode) - Issue#701 by @pvk-developer
The OrderedLabelEncoder should not accept duplicate categories - Issue#673 by @frances-h
ClusterBasedNormalizer crashes on reverse transform (IndexError) - Issue#672 by @fealho
Unnecessary warning in OneHotEncoder when there are nan values - Issue#616 by @fealho

Maintenance

Remove performance tests - Issue#707 by @fealho
ClusterBasedNormalizer code cleanup - Issue#696 by @fealho
Switch default branch from master to main - Issue#687 by @amontanez24

Deprecations

The frequencyEncoder transformer will no longer be supported in future versions of RDT. Please use the UniformEncoder transformer instead.
GaussianNormalizer distribution option names have been updated to be consistent with scipy. gaussian -> norm, student_t-> t, and truncated_gaussian -> truncnorm

Contributors

amontanez24, lajohn4747, and 4 other contributors

Assets 2

22 Aug 19:33

amontanez24

v1.7.0

7616a41

v1.7.0 - 2023-08-22

This release adds 3 new transformers:

UniformEncoder - A categorical and boolean transformer that converts the column into a uniform distribution.
OrderedUniformEncoder - The same as above, but the order for the categories can be specified, changing which range in the uniform distribution each category belongs to.
IDGenerator- A text transformer that drops the input column during transform and returns IDs during reverse transform. The IDs all take the form <prefix><number><suffix> and can be configured with a custom prefix, suffix and starting point.

Additionally, the AnonymizedFaker is enhanced to support the text sdtype.

Deprecations

The get_input_sdtype method is being deprecated in favor of get_supported_sdtypes.

New Features

Create IDGenerator transformer - Issue #675 by @R-Palazzo
Add UniformEncoder (and its ordered version) - Issue #678 by @R-Palazzo
Allow me to use AnonymizedFaker with sdtype text columns - Issue #688 by @amontanez24

Maintenance

Deprecate get_input_sdtype - Issue #682 by @R-Palazzo

Contributors

amontanez24 and R-Palazzo

Assets 2

02 Aug 20:56

amontanez24

v1.6.1

c2d3ac7

v1.6.1 - 2023-08-02

This release updates the default transformers used for certain sdtypes. It also enables the AnonymizedFaker and PseudoAnonymizedFaker to work with any sdtype besides boolean, categorical, datetime, numerical or text.

Bugs

[Enterprise Usage] Unable to assign generic PII transformers (eg. AnonymizedFaker) - Issue #674 by @amontanez24

New Features

Update the default transformers that HyperTransformer assigns to each sdtype - Issue #664 by @amontanez24

Contributors

amontanez24

Assets 2

12 Jul 20:36

amontanez24

v1.6.0

eb1ad13

v1.6.0 - 2023-07-12

This release adds the ability to generate missing values to the AnonymizedFaker. Users can now provide the missing_value_generation parameter during initialization. They can set it to None to not generate any missing values, or 'random' to generate random missing values in the same proportion as the fitted data.

Additionally, this release improves the NullTransformer by allowing nulls to be replaced on the forward transform even if missing_value_generation is set to None. It also fixes a bug that was causing the UnixTimestampEncoder to return a different dtype than the input on reverse_transform. This was particularly problematic when datetime columns are represented as ints.

New Features

AnonymizedFaker should be able to model and generate missing values - Issue #660 by @R-Palazzo

Bugs

The datetime transformers don't give me back the same dtype sometimes - Issue #657 by @frances-h
RDT NullTransformer doesn't replace nulls if missing_value_generation is None - Issue #658 by @amontanez24

Maintenance

Remove python 3.7 builds - Issue #663 by @amontanez24
Drop support for Python 3.7 - Issue #666 by @amontanez24

Internal

Add add-on modules to sys.modules - Issue #653 by @amontanez24

Contributors

amontanez24, frances-h, and R-Palazzo

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintenance

Bugs Fixed

New Features

Contributors

Bugs Fixed

Contributors

New Features

Deprecations

Maintenance

Contributors

Bugs Fixed

New Features

Internal

Contributors

Bugs Fixed

New Features

Internal

Bugs Fixed

Contributors

Features

Bugs

Maintenance

Deprecations

Contributors

Deprecations

New Features

Maintenance

Contributors

Bugs

New Features

Contributors

New Features

Bugs

Maintenance

Internal

Contributors

Releases: sdv-dev/RDT

v1.11.0 - 2024-04-10

Maintenance

Bugs Fixed

New Features

Contributors

v1.10.1 - 2024-03-21

Bugs Fixed

Contributors

v1.10.0 - 2024-03-13

New Features

Deprecations

Maintenance

Contributors

v1.9.2 - 2024-02-13

Bugs Fixed

New Features

Internal

Contributors

v1.9.1 - 2024-01-10

Bugs Fixed

v1.9.0 - 2023-11-14

New Features

Internal

Bugs Fixed

Contributors

v1.8.0 - 2023-10-31

Features

Bugs

Maintenance

Deprecations

Contributors

v1.7.0 - 2023-08-22

Deprecations

New Features

Maintenance

Contributors

v1.6.1 - 2023-08-02

Bugs

New Features

Contributors

v1.6.0 - 2023-07-12

New Features

Bugs

Maintenance

Internal

Contributors