You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the RDT library uses many of the same sdtypes as the SDV library, however there is a difference for IDs.
In the SDV library, we use the sdtype id to refer to an identifier column that may have a structured format. By marking an sdtype as id, I am able to supply a regex format so that I can control what the structure looks like. By contrast, the sdtype text is a PII type, that is used to create random phrases or sentences.
In the RDT library, there is no such thing as sdtype id. Instead, for historical reasons, we use sdtype text to refer to ID columns that can get a regex format. (For creating random text, we just use the catch-all sdtype pii and assign the AnonymizedFaker to the correct method.).
This difference is especially confusing for those who are using SDV's synthesizers, and are looking to update the transformers assigned to the columns. To assign transformers within an SDV synthesizer, you need to look at the RDT docs which has different sdtype requirements than SDV.
Expected behavior
The expected behavior would be to add the sdtype id into the RDT library and phase out* the sdtype text. This means that the path should change to from rdt.transformers.id import ...
* For backwards compatibility, we should still allow users to submit sdtype text as an option, but write a notice that they should be using id instead. These two sdtypes should do the same thing in RDT.
The text was updated successfully, but these errors were encountered:
Problem Description
Currently, the RDT library uses many of the same sdtypes as the SDV library, however there is a difference for IDs.
id
to refer to an identifier column that may have a structured format. By marking an sdtype asid
, I am able to supply a regex format so that I can control what the structure looks like. By contrast, the sdtypetext
is a PII type, that is used to create random phrases or sentences.id
. Instead, for historical reasons, we use sdtypetext
to refer to ID columns that can get a regex format. (For creating random text, we just use the catch-all sdtypepii
and assign the AnonymizedFaker to the correct method.).This difference is especially confusing for those who are using SDV's synthesizers, and are looking to update the transformers assigned to the columns. To assign transformers within an SDV synthesizer, you need to look at the RDT docs which has different sdtype requirements than SDV.
Expected behavior
The expected behavior would be to add the sdtype
id
into the RDT library and phase out* the sdtypetext
. This means that the path should change tofrom rdt.transformers.id import ...
* For backwards compatibility, we should still allow users to submit sdtype
text
as an option, but write a notice that they should be usingid
instead. These two sdtypes should do the same thing in RDT.The text was updated successfully, but these errors were encountered: