You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful to have a reset_generator parameter for the transformers that when reverse transforming generate data (AnonymizedFaker and RegexGenerator). This way if we are using enforce_uniqueness we would be able to start over again with new value.
This change should propagate to create_anonymized_columns as those transformers are mainly interacted from there.
Expected behavior
fromrdt.transformersimportAnonymizedFakeranonymizer=AnonymizedFaker(provider_name='job', function_name='job', enforce_uniqueness=True)
# There are 639 unique jobs in englishdata=pd.DataFrame({'job': np.arange(639)})
tr_data=anonymizer.fit_transform(data, 'job')
# With this run we empty the unique values from the current faker instancerev_data=anonymizer.reverse_transform(tr_data)
# This will not generate new data and will fail. There is no way to reset it currently.anonymizer.reverse_transform(tr_data)
File~/.virtualenvs/RDT/lib/python3.8/site-packages/faker/proxy.py:320, inUniqueProxy._wrap.<locals>.wrapper(*args, **kwargs)
319else:
-->320raiseUniquenessException(f"Got duplicated values after {_UNIQUE_ATTEMPTS:,} iterations.")
322generated.add(retval)
UniquenessException: Gotduplicatedvaluesafter1,000iterations.
Theaboveexceptionwasthedirectcauseofthefollowingexception:
ErrorTraceback (mostrecentcalllast)
CellIn [1], line129rev_data=anonymizer.reverse_transform(tr_data)
11# This will not generate new data and will fail. There is no way to reset it currently.--->12anonymizer.reverse_transform(tr_data)
File~/Projects/sdv-dev/RDT/rdt/transformers/base.py:362, inBaseTransformer.reverse_transform(self, data)
359data=data.copy()
361columns_data=self._get_columns_data(data, self.output_columns)
-->362reversed_data=self._reverse_transform(columns_data)
363data=data.drop(self.output_columns, axis=1)
364data=self._add_columns_to_data(data, reversed_data, self.columns)
File~/Projects/sdv-dev/RDT/rdt/transformers/pii/anonymizer.py:149, inAnonymizedFaker._reverse_transform(self, data)
144reverse_transformed=np.array([
145self._function()
146for_inrange(sample_size)
147 ], dtype=object)
148exceptfaker.exceptions.UniquenessExceptionasexception:
-->149raiseError(
150f'The Faker function you specified is not able to generate {sample_size} unique '151'values. Please use a different Faker function for column '152f"('{self.get_input_column()}')."153 ) fromexception155returnreverse_transformedError: TheFakerfunctionyouspecifiedisnotabletogenerate639uniquevalues. PleaseuseadifferentFakerfunctionforcolumn ('job').
Additional context
If we need to reuse the same transformer to generate data, we currently are not able to restart the current state.
The text was updated successfully, but these errors were encountered:
Each relevant transformer (AnonymizedFaker, PseudoAnonymizedFaker, RegexGenerator) should have a reset_anonymization() method that resets the generation. Next time you call reverse_transform it will start from the beginning
HyperTransformer should have a reset_anonymization() method that calls reset_anonymization() on each relevant transformer in step 3
SDV Usage Example: SDV single table models have a randomize_samples parameter (docs). When set to False, the HyperTransformer should reset anonymization before calling sample.
npatki
changed the title
Add a reset_generator parameter
Add a reset_anonymization method
Oct 5, 2022
Problem Description
It would be useful to have a
reset_generator
parameter for the transformers that when reverse transforming generate data (AnonymizedFaker
andRegexGenerator
). This way if we are usingenforce_uniqueness
we would be able to start over again with new value.This change should propagate to
create_anonymized_columns
as those transformers are mainly interacted from there.Expected behavior
Additional context
If we need to reuse the same transformer to generate data, we currently are not able to restart the current state.
The text was updated successfully, but these errors were encountered: