Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align text/id sdtypes to the SDV library #880

Closed
npatki opened this issue Sep 9, 2024 · 0 comments · Fixed by #881
Closed

Align text/id sdtypes to the SDV library #880

npatki opened this issue Sep 9, 2024 · 0 comments · Fixed by #881
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Sep 9, 2024

Problem Description

Currently, the RDT library uses many of the same sdtypes as the SDV library, however there is a difference for IDs.

  • In the SDV library, we use the sdtype id to refer to an identifier column that may have a structured format. By marking an sdtype as id, I am able to supply a regex format so that I can control what the structure looks like. By contrast, the sdtype text is a PII type, that is used to create random phrases or sentences.
  • In the RDT library, there is no such thing as sdtype id. Instead, for historical reasons, we use sdtype text to refer to ID columns that can get a regex format. (For creating random text, we just use the catch-all sdtype pii and assign the AnonymizedFaker to the correct method.).

This difference is especially confusing for those who are using SDV's synthesizers, and are looking to update the transformers assigned to the columns. To assign transformers within an SDV synthesizer, you need to look at the RDT docs which has different sdtype requirements than SDV.

Expected behavior

The expected behavior would be to add the sdtype id into the RDT library and phase out* the sdtype text. This means that the path should change to from rdt.transformers.id import ...

* For backwards compatibility, we should still allow users to submit sdtype text as an option, but write a notice that they should be using id instead. These two sdtypes should do the same thing in RDT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants