Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python type of Timestamp and Boolean schema fields #42

Open
abhi19gupta opened this issue Aug 30, 2022 · 8 comments
Open

Python type of Timestamp and Boolean schema fields #42

abhi19gupta opened this issue Aug 30, 2022 · 8 comments

Comments

@abhi19gupta
Copy link
Contributor

My Pinot table schema has fields of type Timestamp and Boolean (I believe these types were introduced after Pinot v0.7.1).
When querying them using this Python client, the returned values for both of them are of the Python type str. More intuitive Python datatype IMO for them would have been datetime and bool. Can this be supported by this client?

@walterddr
Copy link
Collaborator

could you share the schema YAML or table config YAML of your Pinot table? i can quickly try to reproduce

@abhi19gupta
Copy link
Contributor Author

Something like this:

{
  "schemaName": "my_table",
  "dimensionFieldSpecs": [
    {
      "name": "id",
      "dataType": "STRING"
    },
    {
      "name": "is_approved",
      "dataType": "BOOLEAN"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "createdAt",
      "dataType": "TIMESTAMP",
      "format": "1:MILLISECONDS:TIMESTAMP",
      "granularity": "1:MILLISECONDS"
    }
  ],
  "primaryKeyColumns": [
    "id"
  ]
}

@walterddr
Copy link
Collaborator

thanks I will try to reproduce and create a fix

@abhi19gupta
Copy link
Contributor Author

Thanks @walterddr. Any estimate on the timeline? It will help me plan better.

@walterddr
Copy link
Collaborator

it seems a bit tricky. had to restructure the data type system in order for it to be reusable in both sqlalchemy and regular connection. will try to get this in by next week

@abhi19gupta
Copy link
Contributor Author

@walterddr In the PR, I see datetime.strptime being used. It might be way faster (50x) to use ciso8601.parse_datetime instead. Could you consider that? It will be essential when fetching thousands of records.

@walterddr
Copy link
Collaborator

@xiangfu0 do we always return datetime column in ISO8601 format (with the 2 extra double quote)?
if so we can consider blindly remove the head/tail and use the faster conversion ^

@abhi19gupta
Copy link
Contributor Author

Bumping ^ @walterddr @xiangfu0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants