-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Use InternalData with avro and common DataIterable for readers. #12476
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,23 @@ | |
import org.slf4j.LoggerFactory; | ||
|
||
public class InternalData { | ||
/** | ||
* An iterable returned by readers that also exposes access to the file metadata. | ||
* | ||
* @param <D> internal data model for records | ||
*/ | ||
public interface DataIterable<D> extends CloseableIterable<D> { | ||
|
||
/** | ||
* Returns key/value metadata of file being read | ||
* | ||
* @return metadata | ||
*/ | ||
default Map<String, String> metadata() { | ||
throw new UnsupportedOperationException("File metadata not supported"); | ||
} | ||
} | ||
|
||
private InternalData() {} | ||
|
||
private static final Logger LOG = LoggerFactory.getLogger(InternalData.class); | ||
|
@@ -163,6 +180,11 @@ public interface ReadBuilder { | |
/** Set a custom class for in-memory objects at the given field ID. */ | ||
ReadBuilder setCustomType(int fieldId, Class<? extends StructLike> structClass); | ||
|
||
/** Set the classloader used for custom types. */ | ||
default ReadBuilder classLoader(ClassLoader classLoader) { | ||
throw new UnsupportedOperationException("Classloader not supported"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that this is needed. I originally added it in the InternalData commit, but because we are passing the classes themselves (rather than loading them dynamically by name) they are already loaded. |
||
} | ||
|
||
/** Build the configured reader. */ | ||
<D> CloseableIterable<D> build(); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,8 +25,6 @@ | |
import java.util.List; | ||
import java.util.Map; | ||
import java.util.Set; | ||
import org.apache.iceberg.avro.Avro; | ||
import org.apache.iceberg.avro.AvroIterable; | ||
import org.apache.iceberg.exceptions.RuntimeIOException; | ||
import org.apache.iceberg.expressions.Evaluator; | ||
import org.apache.iceberg.expressions.Expression; | ||
|
@@ -133,12 +131,18 @@ private <T extends ContentFile<T>> PartitionSpec readPartitionSpec(InputFile inp | |
private static <T extends ContentFile<T>> Map<String, String> readMetadata(InputFile inputFile) { | ||
Map<String, String> metadata; | ||
try { | ||
try (AvroIterable<ManifestEntry<T>> headerReader = | ||
Avro.read(inputFile) | ||
try (CloseableIterable<ManifestEntry<T>> headerReader = | ||
InternalData.read(FileFormat.AVRO, inputFile) | ||
.project(ManifestEntry.getSchema(Types.StructType.of()).select("status")) | ||
.classLoader(GenericManifestEntry.class.getClassLoader()) | ||
.build()) { | ||
metadata = headerReader.getMetadata(); | ||
|
||
if (headerReader instanceof InternalData.DataIterable) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a little bit of a workaround since we can't switch fully over to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure that this is needed and I'd prefer not to add the interface if possible. We should not really be using the file metadata to recover things like the partition spec. I think that this is only used by older code paths that we didn't migrate to pass the id to spec map through. At this point, I think we should go see where those methods are used and try to remove them, instead of adding this. |
||
metadata = ((InternalData.DataIterable<?>) headerReader).metadata(); | ||
} else { | ||
throw new RuntimeException( | ||
"Reader does not support metadata reading: " + headerReader.getClass().getName()); | ||
} | ||
} | ||
} catch (IOException e) { | ||
throw new RuntimeIOException(e); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just using the default, right?