Releases: sdv-dev/RDT
v1.11.0 - 2024-04-10
This release adds support for Python 3.12! It also fixes a bug that kept certain functions from being used on the AnonymizedFaker
when locales were provided.
Maintenance
- Support Python 3.12 - Issue #744 by @fealho
- Add dependency checker - Issue #777 by @lajohn4747
- Add bandit workflow - Issue #781 by @R-Palazzo
Bugs Fixed
- Providing locales to AnonymizedFaker with a function that uses the BaseProvider crashes - Issue #774 by @frances-h
- Fix minimum version workflow when pointing to github branch - Issue #783 by @R-Palazzo
New Features
- Move out sdtype validations from multi-column transformers - Issue #778 by @R-Palazzo
v1.10.1 - 2024-03-21
This release fixes a bug with loading saved AnonymizedFaker
transformers from previous versions of RDT.
Bugs Fixed
- Add
enforce_uniqueness
attribute toAnonymizedFaker
- PR #771 by @fealho - Fix backwards compatability for
cardinality_rule
- PR #772 by @frances-h
v1.10.0 - 2024-03-13
The AnonymizedFaker
now supports more options for the cardinality of the generated data. Previously you could make make the generated data be all unique, or not take uniqueness into consideration. Now you can use the cardinality_rule
parameter to match the cardinality of the original data.
New Features
Deprecations
The enforce_uniqueness
parameter of the AnonymizedFaker
is deprecated in favor of the cardinality_rule
parameter.
Maintenance
- Transition from using setup.py to pyproject.toml to specify project metadata - Issue #763 by @R-Palazzo
- Remove bumpversion and use bump-my-version - Issue #764 by @R-Palazzo
- Add build to dev requirements - Issue #768 by @amontanez24
v1.9.2 - 2024-02-13
This release makes a couple improvements to the RegexGenerator
. Error messaging is improved and it is now capable of generating an unlimited amount of rows even when the enforce_uniqueness
flag is True. It does this by adding suffixes if the max amount of combinations for the provided regex is met.
Additionally, this release resolves a few bugs. The OneHotEncoder
should no longer crash on the categorical
dtype and the UniformEncoder
was improved to support more dtypes.
Bugs Fixed
- Categorical reverse transform may crash with
ValueError
for certain dtypes (int64) - Issue #747 by @R-Palazzo - RegexGenerator gives a confusing message: # of possibilities are shown as an imaginary number - Issue #748 by @R-Palazzo
- OneHotEncoder doesn't support dtype
'category'
- Issue #751 by @fealho
New Features
- RegexGenerator should create unlimited regexes, even if unique enforcement is on - Issue #749 by @fealho
- Add a _update_multi_column_transformer method - Issue #757 by @R-Palazzo
Internal
v1.9.1 - 2024-01-10
This release fixes a bug that caused the AnonymizedFaker
to crash with provider/function combinations that return tuples.
Bugs Fixed
- AnonymizedFaker crashes with ValueError for specific provider/function pairs (eg. currency) - Issue #743 by @ R-Palazzo
v1.9.0 - 2023-11-14
This release adds a parameter to the UnixTimestampEncoder
and OptimizedTimestampEncoder
, called enforce_min_max_values
. When this is set to True, it clips all values in the reverse transformed data to the min and max datetimes seen in the fitted data.
This release also internally adds support for multi-column transformers!
New Features
- Support multi-column transformers - Issue #683 by @R-Palazzo
- Improve user warnings and logic for update_sdtype - Issue #684 by @R-Palazzo
- Improve user warnings and logic for update_transformers and update_transformers_by_sdtype - Issue #685 by @R-Palazzo
- Improve user warnings and logic for remove_transformers and remove_transformers_by_sdtype - Issue #686 by @R-Palazzo
- Add enforce_min_max_values to datetime transformers - Issue #740 by @R-Palazzo
Internal
- Support multi-column transformers - Issue #683 by @R-Palazzo
Bugs Fixed
- Multi column transformers crash when assigned to single column - Issue #734 by @R-Palazzo
v1.8.0 - 2023-10-31
This release adds the 'random' missing value replacement strategy, which uses random values of the dataset to fill in missing values.
Additionally users are now able to use the UniformUnivariate
distribution within the Gaussian Normalizer with this update.
This release contains fixes for the ClusterBasedNormalizer
which crashes in the reverse transform caused by values being out of bounds
and a patch for the randomization issue dealing with different values after applying reset_randomization
.
Anonymization has been moved into RDT library from SDV as it was found to self contained module for RDT and would reduce dependencies needed in SDV.
Features
- Make the default missing value imputation 'mean' - Issue#730 by @R-Palazzo
- When no rounding scheme is detected, log the info instead of showing a warning - Issue#709 by @frances-h
- The GaussianNormalizer should accept distribution names that are consistent with scipy - Issue#656 by @fealho
- The GaussianNormalizer should accept uniform distributions - Issue#655 by @fealho
- Remove psutil - Issue#615 by @fealho
- Consider deprecating the FrequencyEncoder - Issue#614 by @fealho
- Replace missing values with variable (random) values from the dataset - Issue#606
Bugs
- RDT Uniform Encoder creates nan Value bug - Issue#719 by @lajohn4747
- HyperTransformer transforms while fitting and messes up the random seed - Issue#716 by @pvk-developer
- Resolve locales warning for specific sdtype/locale combos (eg. en_US with postcode) - Issue#701 by @pvk-developer
- The OrderedLabelEncoder should not accept duplicate categories - Issue#673 by @frances-h
- ClusterBasedNormalizer crashes on reverse transform (IndexError) - Issue#672 by @fealho
- Unnecessary warning in OneHotEncoder when there are nan values - Issue#616 by @fealho
Maintenance
- Remove performance tests - Issue#707 by @fealho
- ClusterBasedNormalizer code cleanup - Issue#696 by @fealho
- Switch default branch from master to main - Issue#687 by @amontanez24
Deprecations
- The
frequencyEncoder
transformer will no longer be supported in future versions of RDT. Please use theUniformEncoder
transformer instead. GaussianNormalizer
distribution option names have been updated to be consistent with scipy.gaussian
->norm
,student_t
->t
, andtruncated_gaussian
->truncnorm
v1.7.0 - 2023-08-22
This release adds 3 new transformers:
UniformEncoder
- A categorical and boolean transformer that converts the column into a uniform distribution.OrderedUniformEncoder
- The same as above, but the order for the categories can be specified, changing which range in the uniform distribution each category belongs to.IDGenerator
- A text transformer that drops the input column during transform and returns IDs during reverse transform. The IDs all take the form <prefix><number><suffix> and can be configured with a custom prefix, suffix and starting point.
Additionally, the AnonymizedFaker
is enhanced to support the text sdtype.
Deprecations
- The
get_input_sdtype
method is being deprecated in favor ofget_supported_sdtypes
.
New Features
- Create IDGenerator transformer - Issue #675 by @R-Palazzo
- Add UniformEncoder (and its ordered version) - Issue #678 by @R-Palazzo
- Allow me to use AnonymizedFaker with sdtype text columns - Issue #688 by @amontanez24
Maintenance
- Deprecate get_input_sdtype - Issue #682 by @R-Palazzo
v1.6.1 - 2023-08-02
This release updates the default transformers used for certain sdtypes. It also enables the AnonymizedFaker
and PseudoAnonymizedFaker
to work with any sdtype besides boolean, categorical, datetime, numerical or text.
Bugs
- [Enterprise Usage] Unable to assign generic PII transformers (eg. AnonymizedFaker) - Issue #674 by @amontanez24
New Features
- Update the default transformers that HyperTransformer assigns to each sdtype - Issue #664 by @amontanez24
v1.6.0 - 2023-07-12
This release adds the ability to generate missing values to the AnonymizedFaker
. Users can now provide the missing_value_generation
parameter during initialization. They can set it to None
to not generate any missing values, or 'random'
to generate random missing values in the same proportion as the fitted data.
Additionally, this release improves the NullTransformer
by allowing nulls to be replaced on the forward transform even if missing_value_generation
is set to None. It also fixes a bug that was causing the UnixTimestampEncoder
to return a different dtype than the input on reverse_transform
. This was particularly problematic when datetime columns are represented as ints.
New Features
- AnonymizedFaker should be able to model and generate missing values - Issue #660 by @R-Palazzo
Bugs
- The datetime transformers don't give me back the same dtype sometimes - Issue #657 by @frances-h
- RDT NullTransformer doesn't replace nulls if missing_value_generation is None - Issue #658 by @amontanez24
Maintenance
- Remove python 3.7 builds - Issue #663 by @amontanez24
- Drop support for Python 3.7 - Issue #666 by @amontanez24
Internal
- Add add-on modules to sys.modules - Issue #653 by @amontanez24