-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Types.py for wrapping DH column types and table operations around them. #1088
Conversation
@_passThrough | ||
def asType(type_str): | ||
return _table_tools_.typeFromName(type_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, returning a class is not a proper way to get the type information into jpy. You need to use jpy.get_type(...)
, as it will then be recognized as a proper python "type".
To demonstrate difference in return types between get_type and something that returns Class:
>>> Integer = jpy.get_type('java.lang.Integer')
>>> type(Integer)
<class 'type'>
>>>
>>> type(Integer(1).getClass())
<class 'java.lang.Class'>
So:
int_ = jpy.get_type('int')
int_array = jpy.get_type('[I')
stringset = jpy.get_type('io.deephaven.db.tables.libs.StringSet')
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be proper python type, like np.
>>> import numpy as np
>>> type(np.int32)
<class 'type'>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, take a peek hopefully now makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, take a peek, hopefully now makes more sense.
def asType(type_str): | ||
return _table_tools_.typeFromName(type_str) | ||
|
||
bool_ = asType('java.lang.Boolean') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, numba and pandas represent bool_ as a byte; and I don't think they have the concept of a "null" boolean. Does this make our booleans incompatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without providing some kind of conversion, probably. Do we need a dh.bool_from_byte method?
raise Exception("when no column definitions are provided in the 'columns' argument, " + | ||
"only an empty table can be created, and no data can be specified; instead " + | ||
"got a non-empty 'data' argument with " + str(data)) | ||
return _table_tools_.emptyTable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty tables still need a size for number of rows.
return _col_def_.fromGenericType(col_name, data_type, component_type) | ||
|
||
@_passThrough | ||
def cols(ts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems unused?
# None until the first _defineSymbols() call | ||
_table_tools_ = None | ||
_col_def_ = None | ||
_python_tools_ = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still relevant comment.
col_header = col_header.header(t[0], t[1]) | ||
except Exception as e: | ||
raise Exception("Could not create column definition from " + str(t)) from e | ||
return _table_.of(col_header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is valid, can't call of
on a col_header. To create a table with zero rows from a ColumnHeader(s) you can do:
_table_.of(_qst_newtable_.empty(col_header.tableHeader()))
, or something to that effect.
Alternatively, it's perfectly fine to not handle this as a special case, and just fall through to the general case and create Columns with no data in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make the NewTable.empty
method a bit more generic, and accept Iterable<ColumnHeader>
instead of TableHeader
, in which case we can get rid of one layer of adapting:
_table_.of(_qst_newtable_.empty(col_header))
We added support for creating tables by column, passing in a first argument of dict, like:
|
So, I think there may be value in trying to directly reference For example, instead of having to call
And then also, I question if we actually need python support for creating/returning ColumnDefinitions? Or, would sticking w/ ColumnHeader serve us better? I'm not sure where |
Oh, I see the cols() calls from kafka tools now. I wonder if we should try to use |
@@ -211,7 +215,7 @@ def json(col_defs, mapping:dict = None): | |||
raise Exception("'col_defs' argument needs to be a sequence of tuples, instead got " + | |||
str(col_defs) + " of type " + type(col_defs).__name__) | |||
try: | |||
col_defs = _tuplesListToColDefsList(col_defs) | |||
col_defs = dh.cols(col_defs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place where we need cols. The Java side API for Kafka ingestion takes a ColumnDefinition[]
for JSON.
except Exception as e: | ||
raise Exception("Could not get java class type from " + str(data_type)) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you seen this happen? I think qst.type.Type#clazz
should always return successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type annotations help linters but they don't guarantee the type at runtime. For user accessible functions that call here, if the user passes the wrong object in, that's the message they will receive.
# For more involved types, you can always use the string representation | ||
# of the Java class (Class.getName()) to get a python type for it. | ||
@_passThrough | ||
def typeFromJavaClassName(name : str): | ||
""" | ||
Get the column data type for the corresponding Java type string reprensentation | ||
The string provided should match the output in Java for Class.getName() | ||
for a class visible to the main ClassLoader in the Deephaven engine in use. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are not correct. String[].class.getName
is not "java.lang.String[]". name
here is a Deephaven-specific "pretty" version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
stringset = typeFromJavaClassName('io.deephaven.db.tables.libs.StringSet') | ||
datetime = DataType(_qst_type_.instantType()) | ||
|
||
byte_array = typeFromJavaClassName('byte[]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can go from a non-array type to an array type like byte.arrayType()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# For more involved types, you can always use the string representation | ||
# of the Java class (Class.getName()) to get a python type for it. | ||
@_passThrough | ||
def typeFromJavaClassName(name : str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to allow python user to define arbitrary types? I might suggest "no", in which case, this can become internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# None until the first _defineSymbols() call | ||
_table_tools_ = None | ||
_col_def_ = None | ||
_python_tools_ = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still relevant comment.
if _isPrimitive(col_type): | ||
return _qst_column_.ofUnsafe(col_name, jvalues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break for bool_
, as written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
public static <T> Column<T> of(String name, Type<T> type, T... values) { | ||
return of(name, Array.of(type, values)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition.
Initial version contains enough to (1) rewrite Kafka column specifications (in subsequent PR, not here) and (2) provide a way to create table from an array of rows, with a similar API to Panda's DataFrame ctor.