Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink 1.20: Support Avro and Parquet timestamp(9), unknown, and defaults #12470

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Mar 6, 2025

This updates Flink's Avro and Parquet readers to support new timestamp(9) and unknown types.

While enabling DataTest cases, I found that supportsDefaultValues was not enabled so default value tests were not running for Avro. After I enabled those tests, I also needed to update the RowData assertions and also convert values to match Flink's object model in the readers by calling RowDataUtil.convertConstant.

@github-actions github-actions bot added the flink label Mar 6, 2025
@rdblue
Copy link
Contributor Author

rdblue commented Mar 6, 2025

I'll also follow up with a PR for Parquet readers, but that depends on changes in #12463.

@Fokko Fokko self-requested a review March 6, 2025 19:56
@rdblue rdblue force-pushed the flink-readers-writers branch from 5f505e7 to 0af3f01 Compare March 6, 2025 20:53
@github-actions github-actions bot added the API label Mar 6, 2025
@rdblue
Copy link
Contributor Author

rdblue commented Mar 6, 2025

#12463 was merged and the changes for Parquet were small, so I included them here.

@rdblue rdblue changed the title Flink: Support Avro timestamp(9), unknown, and defaults Flink: Support Avro and Parquet timestamp(9), unknown, and defaults Mar 6, 2025
} else {
return Optional.of(new MicrosToTimestampReader(desc));
}
return Optional.of(new MicrosToTimestampReader(desc));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the readers were converting values to LocalDateTime or OffsetDateTime and then Flink would convert those values back to a (millis, nanosOfMilli) pair. This involved a lot of unnecessary date/time logic in both Iceberg and Flink as well as readers to produce the separate types.

Now, the conversion to Flink is direct and doesn't go through Java date/time classes. That avoids all time zone calculations and should be quicker.

LogicalTypeAnnotation annotation = primitive.getLogicalTypeAnnotation();
if (annotation != null) {
Optional<ParquetValueWriter<?>> writer =
annotation.accept(new LogicalTypeWriterBuilder(fType, desc));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to use the logical annotation visitor.

@rdblue rdblue changed the title Flink: Support Avro and Parquet timestamp(9), unknown, and defaults Flink 1.20: Support Avro and Parquet timestamp(9), unknown, and defaults Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant