Adding Types.py for wrapping DH column types and table operations around them. #1088

jcferretti · 2021-08-19T18:59:37Z

Initial version contains enough to (1) rewrite Kafka column specifications (in subsequent PR, not here) and (2) provide a way to create table from an array of rows, with a similar API to Panda's DataFrame ctor.

import deephaven.Types as dh

data = [ [ "Ashley", 92, 94, 93, 3.9 ],
         [ "Jeff",   78, 88, 93, 2.9 ],
         [ "Rita",   87, 81, 84, 3.0 ],
         [ "Zach",   74, 70, 72, 1.8 ] ]
columns = [ ( "Name",    dh.string ),
            ( "Test1",   dh.int_ ),
            ( "Test2",   dh.int_ ),
            ( "Average", dh.int_ ),
            ( "GPA",     dh.double ) ]
grades = dh.table_of(data, columns)

…ing out.

…und them.

devinrsmith · 2021-08-20T15:00:52Z

Integrations/python/deephaven/Types.py

+@_passThrough
+def asType(type_str):
+    return _table_tools_.typeFromName(type_str)


So, returning a class is not a proper way to get the type information into jpy. You need to use jpy.get_type(...), as it will then be recognized as a proper python "type".

To demonstrate difference in return types between get_type and something that returns Class:

>>> Integer = jpy.get_type('java.lang.Integer') >>> type(Integer) <class 'type'> >>> >>> type(Integer(1).getClass()) <class 'java.lang.Class'>

So:

int_ = jpy.get_type('int') int_array = jpy.get_type('[I') stringset = jpy.get_type('io.deephaven.db.tables.libs.StringSet')

etc.

Must be proper python type, like np.

>>> import numpy as np >>> type(np.int32) <class 'type'>

Done, take a peek hopefully now makes more sense.

Done, take a peek, hopefully now makes more sense.

devinrsmith · 2021-08-20T15:02:20Z

Integrations/python/deephaven/Types.py

+def asType(type_str):
+    return _table_tools_.typeFromName(type_str)
+
+bool_ = asType('java.lang.Boolean')


So, numba and pandas represent bool_ as a byte; and I don't think they have the concept of a "null" boolean. Does this make our booleans incompatible?

Without providing some kind of conversion, probably. Do we need a dh.bool_from_byte method?

Integrations/python/deephaven/Types.py

devinrsmith · 2021-08-20T15:14:34Z

Integrations/python/deephaven/Types.py

+            raise Exception("when no column definitions are provided in the 'columns' argument, " +
+                            "only an empty table can be created, and no data can be specified; instead " +
+                            "got a non-empty 'data' argument with " + str(data))
+        return _table_tools_.emptyTable()


empty tables still need a size for number of rows.

devinrsmith · 2021-08-20T15:16:57Z

Integrations/python/deephaven/Types.py

+    return _col_def_.fromGenericType(col_name, data_type, component_type)
+
+@_passThrough
+def cols(ts):


This method seems unused?

devinrsmith · 2021-08-20T15:17:47Z

Integrations/python/deephaven/Types.py

+# None until the first _defineSymbols() call
+_table_tools_ = None
+_col_def_ = None
+_python_tools_ = None


This seems unused.

Still relevant comment.

devinrsmith · 2021-08-20T15:20:03Z

Integrations/python/deephaven/Types.py

+                    col_header = col_header.header(t[0], t[1])
+            except Exception as e:
+                raise Exception("Could not create column definition from " + str(t)) from e
+        return _table_.of(col_header)


I don't think this is valid, can't call of on a col_header. To create a table with zero rows from a ColumnHeader(s) you can do:

_table_.of(_qst_newtable_.empty(col_header.tableHeader())), or something to that effect.

Alternatively, it's perfectly fine to not handle this as a special case, and just fall through to the general case and create Columns with no data in it.

We could make the NewTable.empty method a bit more generic, and accept Iterable<ColumnHeader> instead of TableHeader, in which case we can get rid of one layer of adapting:

_table_.of(_qst_newtable_.empty(col_header))

jcferretti · 2021-08-21T03:22:57Z

We added support for creating tables by column, passing in a first argument of dict, like:

#
# By column
#                                                                                                                                                                                                                                                                                                                             

import deephaven.Types as dh

data_by_col = {
    "Name"    : ( dh.string, [ "Ashley", "Jeff", "Rita", "Zach"  ] ),
    "Test1"   : ( dh.int_,   [    92,      78,    87,      74,   ] ),
    "Test2"   : ( dh.int_,   [    94,      88,    81,      70,   ] ),
    "Average" : ( dh.int_,   [    93,      93,    84,      72,   ] ),
    "GPA"     : ( dh.double, [     3.9,     2.9,   3.0,     1.8  ] )
}

grades2 = dh.table_of(data_by_col)


--

#                                                                                                                                                                                                                                                                                                                                                     
# No data.                                                                                                                                                                                                                                                                                                                                            
#                                                                                                                                                                                                                                                                                                                                                     

import deephaven.Types as dh

column_defs = {
    "Name"    : dh.string,
    "Test1"   : dh.int_,
    "Test2"   : dh.int_,
    "Average" : dh.int_,
    "GPA"     : dh.double
}

grades3 = dh.table_of(column_defs)

devinrsmith · 2021-08-21T17:14:17Z

So, I think there may be value in trying to directly reference io.deephaven.qst.type.Type as a field in a class that is a python Type. I know I led you down the jpy.get_type(...) route, but I think the adapting between different objects would be easier without having to do the map/class lookups.

For example, instead of having to call fromGenericType (which is incorrect for primitive types), we want to rely on the exact conversions from the QST when applicable:

public static ColumnDefinition<?> from(ColumnHeader<?> header)

And then also, I question if we actually need python support for creating/returning ColumnDefinitions? Or, would sticking w/ ColumnHeader serve us better? I'm not sure where cols(...) is used from.

devinrsmith · 2021-08-21T17:18:31Z

Oh, I see the cols() calls from kafka tools now. I wonder if we should try to use ColumnHeader<?> in KafkaTools instead of ColumnDefinition?

jcferretti · 2021-08-21T17:19:01Z

Generators/src/main/java/io/deephaven/pythonPreambles/KafkaToolsPreamble.txt

@@ -211,7 +215,7 @@ def json(col_defs, mapping:dict = None):
        raise Exception("'col_defs' argument needs to be a sequence of tuples, instead got " +
                        str(col_defs) + " of type " + type(col_defs).__name__)
    try:
-        col_defs = _tuplesListToColDefsList(col_defs)
+        col_defs = dh.cols(col_defs)


This is the place where we need cols. The Java side API for Kafka ingestion takes a ColumnDefinition[] for JSON.

jcferretti · 2021-08-23T04:27:28Z

This latest version has python DataType rebuilt on top of qst.type.Type.
Ready for review followup.

Integrations/python/deephaven/Types.py

devinrsmith · 2021-08-23T12:21:02Z

Integrations/python/deephaven/Types.py

+    except Exception as e:
+        raise Exception("Could not get java class type from " + str(data_type)) from e


Have you seen this happen? I think qst.type.Type#clazz should always return successfully.

Type annotations help linters but they don't guarantee the type at runtime. For user accessible functions that call here, if the user passes the wrong object in, that's the message they will receive.

devinrsmith · 2021-08-23T12:24:16Z

Integrations/python/deephaven/Types.py

+# For more involved types, you can always use the string representation
+# of the Java class (Class.getName()) to get a python type for it.
+@_passThrough
+def typeFromJavaClassName(name : str):
+    """
+    Get the column data type for the corresponding Java type string reprensentation
+    The string provided should match the output in Java for Class.getName()
+    for a class visible to the main ClassLoader in the Deephaven engine in use.
+    """


These comments are not correct. String[].class.getName is not "java.lang.String[]". name here is a Deephaven-specific "pretty" version.

devinrsmith · 2021-08-23T12:25:14Z

Integrations/python/deephaven/Types.py

+stringset =  typeFromJavaClassName('io.deephaven.db.tables.libs.StringSet')
+datetime = DataType(_qst_type_.instantType())
+
+byte_array = typeFromJavaClassName('byte[]')


We can go from a non-array type to an array type like byte.arrayType()

devinrsmith · 2021-08-23T12:28:42Z

Integrations/python/deephaven/Types.py

+# For more involved types, you can always use the string representation
+# of the Java class (Class.getName()) to get a python type for it.
+@_passThrough
+def typeFromJavaClassName(name : str):


Do we want to allow python user to define arbitrary types? I might suggest "no", in which case, this can become internal.

devinrsmith · 2021-08-23T12:29:08Z

Integrations/python/deephaven/Types.py

+# None until the first _defineSymbols() call
+_table_tools_ = None
+_col_def_ = None
+_python_tools_ = None


Still relevant comment.

devinrsmith · 2021-08-23T12:31:04Z

Integrations/python/deephaven/Types.py

+    if _isPrimitive(col_type):
+        return _qst_column_.ofUnsafe(col_name, jvalues)


This will break for bool_, as written.

devinrsmith · 2021-08-23T12:32:50Z

qst/src/main/java/io/deephaven/qst/column/Column.java

+    public static <T> Column<T> of(String name, Type<T> type, T... values) {
+        return of(name, Array.of(type, values));
+    }


Nice addition.

jcferretti added 4 commits August 17, 2021 12:34

Intent but not working yet.

482e87a

Move away from using **kwargs in python. Thanks to Jianfeng for point…

3463c3e

…ing out.

Merge branch 'main' into cfs-python-types-0

df6a2d0

Adding Types.py for wrapping DH column types and table operations aro…

023adf3

…und them.

jcferretti added python NoDocumentationNeeded labels Aug 19, 2021

jcferretti requested a review from devinrsmith August 19, 2021 18:59

jcferretti self-assigned this Aug 19, 2021

Merge branch 'main' into cfs-python-types-0

35e7e6f

devinrsmith requested changes Aug 20, 2021

View reviewed changes

jcferretti added 3 commits August 20, 2021 21:08

Followup to review comments.

599ffc7

More small fixes and tiddying.

56554b2

Support creating tables by columns.

55d96f5

jcferretti commented Aug 21, 2021

View reviewed changes

jcferretti added 4 commits August 22, 2021 23:42

getting closer.

124aec6

Merge branch 'main' into cfs-python-types-0

df638a3

Everything working again, top of DataType = qst.type.Type.

19df410

Fixed an issue.

21d8a44

Added generated json doc, fixed issue with kafka tools preamble.

646d2b7

devinrsmith reviewed Aug 23, 2021

View reviewed changes

jcferretti added 2 commits August 23, 2021 11:13

Followup to review comments.

cfb38b8

Followup to review comment.

471997c

devinrsmith self-requested a review August 23, 2021 15:21

devinrsmith previously approved these changes Aug 23, 2021

View reviewed changes

Fixed failing test :Integrations:test-py-37

c6bd3cb

jcferretti dismissed devinrsmith’s stale review via c6bd3cb August 23, 2021 17:19

cpwright previously approved these changes Aug 23, 2021

View reviewed changes

Merge branch 'main' into cfs-python-types-0

e2ffa1a

jcferretti dismissed cpwright’s stale review via e2ffa1a August 23, 2021 17:47

cpwright approved these changes Aug 23, 2021

View reviewed changes

jcferretti merged commit 8e64658 into deephaven:main Aug 23, 2021

jcferretti deleted the cfs-python-types-0 branch August 23, 2021 17:57

jcferretti mentioned this pull request Aug 23, 2021

Add deephaven types submodule for convenience when using DynamicTable… #893

Closed

chipkent mentioned this pull request Oct 13, 2021

Add new deephaven.java submodule #1450

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Types.py for wrapping DH column types and table operations around them. #1088

Adding Types.py for wrapping DH column types and table operations around them. #1088

jcferretti commented Aug 19, 2021

devinrsmith Aug 20, 2021

devinrsmith Aug 20, 2021 •

edited

Loading

jcferretti Aug 20, 2021

jcferretti Aug 20, 2021

devinrsmith Aug 20, 2021

jcferretti Aug 20, 2021

devinrsmith Aug 20, 2021

devinrsmith Aug 20, 2021

devinrsmith Aug 20, 2021

devinrsmith Aug 23, 2021

devinrsmith Aug 20, 2021

devinrsmith Aug 20, 2021

jcferretti commented Aug 21, 2021 •

edited

Loading

devinrsmith commented Aug 21, 2021

devinrsmith commented Aug 21, 2021

jcferretti Aug 21, 2021

jcferretti commented Aug 23, 2021

devinrsmith Aug 23, 2021

jcferretti Aug 23, 2021

devinrsmith Aug 23, 2021

jcferretti Aug 23, 2021

devinrsmith Aug 23, 2021

jcferretti Aug 23, 2021

devinrsmith Aug 23, 2021

jcferretti Aug 23, 2021

devinrsmith Aug 23, 2021

devinrsmith Aug 23, 2021

jcferretti Aug 23, 2021

devinrsmith Aug 23, 2021

		except Exception as e:
		raise Exception("Could not get java class type from " + str(data_type)) from e

		if _isPrimitive(col_type):
		return _qst_column_.ofUnsafe(col_name, jvalues)

Adding Types.py for wrapping DH column types and table operations around them. #1088

Adding Types.py for wrapping DH column types and table operations around them. #1088

Conversation

jcferretti commented Aug 19, 2021

Choose a reason for hiding this comment

devinrsmith Aug 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcferretti commented Aug 21, 2021 • edited Loading

devinrsmith commented Aug 21, 2021

devinrsmith commented Aug 21, 2021

Choose a reason for hiding this comment

jcferretti commented Aug 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devinrsmith Aug 20, 2021 •

edited

Loading

jcferretti commented Aug 21, 2021 •

edited

Loading