Refactor internals #110

cbourjau · 2025-03-10T17:05:10Z

This PR is a large refactor of the internal workings of ndonnx. The newly introduced concepts are described in an explicit file.

Breaking changes are limited to functions not found in the array-api standard and can likely be reduced further.

Closes #75.
Closes #74.

adityagoel4512

Thanks 🚀. We've agreed the general design already a few times so I go into various other directions here. I'm mainly concerned about the conflation of some subtle breaking changes (and I'm sure I cannot catch every single one given the size of this PR), obvious user API breakages, and internal refactoring. Many (though not all) of the breaking changes feel uncorrelated with "Refactor internals"; I'm hoping to minimally get an exhaustive enumeration for a change log and as a reviewing aide for me.

tests/schemas/Bool.json

adityagoel4512 · 2025-03-12T14:07:15Z

tests/bugs/test_upstream_bugs.py

+# Copyright (c) QuantCo 2023-2024
+# SPDX-License-Identifier: BSD-3-Clause
+
+import numpy as np


What is the intention behind these tests? If it is to maintain precise understanding of upstream dependencies (Spox, onnx shape inference or ort) then it seems quite dubious to run this as part of the ndonnx test suite. Changes to ndonnx have no bearing on these tests so this will always do the same thing until an external change.

A better strategy would be to have this run in CI weekly or on a pixi update that change Spox or ONNX or onnxruntime. We can then immediately exploit any fixes/improvements or mitigate anything that is broken.

adityagoel4512 · 2025-03-12T14:10:01Z

tests/test_build_utils.py

+    # Only set to `True` temporarily and only if there was a
+    # deliberate update to the schema.


I don't think we need schema updating code (and if we did, it would be an independent script and not in the test suite). Updating the schema breaks backwards compatibility. Any evolutions should go in a "v2" schema so as to not break users' input/output wrangling code. Do you have a use case in mind for editing existing schemas?

I backported this code to #111 . You are right that the schemas should not be updated. It might be that we add more schemas, though, or that one may want to regenerate the files for some other reasons. If nothing else it is good to see how the files were generated.

adityagoel4512 · 2025-03-12T18:43:03Z

README.md

@@ -5,7 +5,7 @@
 [![conda-forge](https://img.shields.io/conda/vn/conda-forge/ndonnx?style=flat-square&logoColor=white&logo=conda-forge)](https://anaconda.org/conda-forge/ndonnx)
 [![pypi](https://img.shields.io/pypi/v/ndonnx.svg?logo=pypi&logoColor=white)](https://pypi.org/project/ndonnx)

-An ONNX-backed array library that is compliant with the [Array API](https://data-apis.org/array-api/) standard.
+An ONNX-backed implementation of the [Array API](https://data-apis.org/array-api/) standard.


I don't agree the rewrite of the README. Implementing the Array API is a very nice feature worth mentioning, but is not the entire point of the library, nor its biggest utility at the present moment! If that was the only goal, the library wouldn't have an extension module, datetime/timedelta/nullable/user-defined data types, Spox interoperability and probably other features.

In any case, it might be easier if you move the README change proposal to a new pull request as the edits are mostly independent of the refactoring exercise (except the Array).

docs/datatypes/datatypes.rst

adityagoel4512 · 2025-03-12T21:05:10Z

ndonnx/_typed_array/masked_onnx.py

+#
+# Union types are exhaustive and don't create ambiguities with respect to user-defined subtypes.
+# TODO: Rename
+NCoreIntegerDTypes = (


NPrimitiveIntegerDTypes?

adityagoel4512 · 2025-03-12T21:10:22Z

ndonnx/_typed_array/masked_onnx.py

+            )
+        self._data = data
+
+    def disassemble(self) -> dict[str, Var]:


The optional mask creates a subtle but important breaking change: inference handling code that previously assumed that there is a one-two-one, static relationship between data type and "fields" see this invariant violated (I believe). This means you can read the metadata from an ONNX model but that does NOT tell you everything about the way to split up your inputs for the ONNX backend being used. i.e. inspecting the metadata, I have no idea if the "null" field is present or not. I also have to check all the inputs to the model.

Do I have this correct?

I think there are two aspects to address before merging this:

This is a very subtle but undocumented breaking change. Is it truly so necessary that it gets incorporated along with a substantial refactoring of internal behaviour? I understand that there will be perf wins.

If it is urgent, can we continue to have this very nice property that the dtype schema/metadata is sufficient to know exactly how to pass things to your ONNX model?

ndonnx/_typed_array/object_dtype.py

adityagoel4512 · 2025-03-12T21:17:28Z

ndonnx/_array.py

+            [f"{k}: {v}" for k, v in self._tyarray.__ndx_value_repr__().items()]
+        )
+        shape = self._tyarray.shape
+        return f"array({value_repr}, shape={shape}, dtype={self.dtype})"


The new print formatting gets a little long for constant arrays. Can we remove the shape for these?

adityagoel4512 · 2025-03-12T21:20:19Z

tests/test_dtypes.py

-        return x.endpoints.shape[:-1]
-
-
-class List(StructType):


This test was very valuable for testing the limits of the dtype system. I'm sure it's implementable again - can you please bring back the List test and/or have some custom dtype tests that illustrate how dtypes get defined from outside of ndonnx. Right now, everything (categoricals, object dtype) has been inlined into the library which won't be the long term way to support niche dtypes without a clear numpy counterpart.

(it might also be a nice one for the docs in the user-defined dtypes example that's currently got a ...!)

adityagoel4512 · 2025-03-12T21:37:18Z

Breaking changes are limited to functions not found in the array-api standard and can likely be reduced further.

I also think enumerating the breaking changes (some of which I've commented on already) will help with this.

adityagoel4512 · 2025-03-12T21:39:23Z

docs/experimental/experimental.rst

@@ -1,52 +1,27 @@
 Experimental
 ============


Thanks for already taking care of parts of the docs :). The Spox Integration page looks unedited.

.github/workflows/array-api.yml

adityagoel4512 · 2025-03-12T21:42:33Z

I also think we should indeed ship datetime64, timedelta64 but not the categorical or "object" dtype. It would be good to remove the pandas dependency added to the test environment and leave that to downstream libraries.

Co-authored-by: Aditya Goel <[email protected]>

This test is back-ported from #110 since it is important that that PR does not break the current schemas.

cbourjau and others added 30 commits December 11, 2024 14:22

Typed array refactoring

cfda716

Lessons from pdonnx and review feedback

e90a09c

Add categorical data type

4914402

Cluster abstract methods together

fd5b696

Improvements for categorical dtype/arrays

d1284f6

Use NumPy semantics for isin

274cdee

Add apply_mapping member function to TyArrayBase

788117e

Add further implementations for date times

d00fba2

Shuffle imports to maybe make the dependencies more obvious

36bdbbe

Improve masked, categorical, and time data types

d64baf2

Masked and time improvments

156c50a

Skip clip test if min and max is None

8c7e78f

Update array-api-tests

45c4e9d

Fix np1x

067123a

Fix (some) datetime NaT issues

b0675af

Remove TyPyScalarArray

bf14800

Fix isnan for ONNX dtypes, implement __matmul__, extend indexing

f2ffc1c

Fix truediv for time deltas

dfd3073

Headers

3643b45

Allow str scalars in more places

3927d95

Make categorical constructor arg 'ordered' mandatory

63cb15b

Add Device class

adfc299

Fix regression Boolean-bool promotion

9b8a11d

Implement put

d041523

broadcast_arrays works in lazy cases

d431758

make drop_unused=False the new default

b591132

Fix issues around isnan and apply_mapping

cae8a38

Improve naming of dtype constructor functions

c674040

DType._argument -> DType.__ndx_argument__

98a0e1b

Merge branch 'broadcast_arrays' into typed-array

788e716

cbourjau added 6 commits March 10, 2025 13:01

Make datetime and timedelta layout private

64d2acd

Make layouts of ONNX and categorical dtypes private

05bf748

Merge remote-tracking branch 'origin/main' into typed-array

353085e

Merge fixes

1ff03ae

Run ci on typed-array branch

7984536

Remove unused file

c8a8b7a

cbourjau requested a review from adityagoel4512 as a code owner March 10, 2025 17:05

cbourjau added 7 commits March 10, 2025 18:32

Mitigate breaking changes

c678e80

Add Array.dynamic_size

9fcf2f4

Datetime/timedelta fixes and other QoL things

a5fcef9

Numpy 1.x fix

05d1045

Rename some dtypes

ddf2aeb

Keep schema type names stable

c943e90

Use int16 instead of uint16 for categorical codes

8744e8d

adityagoel4512 requested changes Mar 12, 2025

View reviewed changes

adityagoel4512 reviewed Mar 12, 2025

View reviewed changes

cbourjau and others added 4 commits March 13, 2025 09:16

Fix equality for categoricals

a5617cb

Fix filename casing

dae4965

Apply suggestions from code review

df8698e

Co-authored-by: Aditya Goel <[email protected]>

Minor clean-ups

e4a59d1

cbourjau added a commit that referenced this pull request Mar 13, 2025

Add tests for schema stability

69f6bdb

This test is back-ported from #110 since it is important that that PR does not break the current schemas.

cbourjau mentioned this pull request Mar 13, 2025

Add tests for schema stability #111

Open

cbourjau added a commit that referenced this pull request Mar 13, 2025

Backport ort_compat module from #110

dc2782c

cbourjau mentioned this pull request Mar 13, 2025

Trigger github actions on branches and minor clean-ups #112

Merged

cbourjau added 4 commits March 13, 2025 14:27

Use latest array-api-tests

b8075c6

Remove unused xfails.txt

f2caec7

Remove object dtype

10722da

Code review

66a7174

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor internals #110

Refactor internals #110

cbourjau commented Mar 10, 2025 •

edited by adityagoel4512

Loading

adityagoel4512 left a comment •

edited

Loading

adityagoel4512 Mar 12, 2025

adityagoel4512 Mar 12, 2025

cbourjau Mar 13, 2025

adityagoel4512 Mar 12, 2025

adityagoel4512 Mar 12, 2025

adityagoel4512 Mar 12, 2025 •

edited

Loading

adityagoel4512 Mar 12, 2025

adityagoel4512 Mar 12, 2025 •

edited

Loading

adityagoel4512 commented Mar 12, 2025 •

edited

Loading

adityagoel4512 Mar 12, 2025

adityagoel4512 commented Mar 12, 2025 •

edited

Loading

		# Only set to `True` temporarily and only if there was a
		# deliberate update to the schema.

Refactor internals #110

Are you sure you want to change the base?

Refactor internals #110

Conversation

cbourjau commented Mar 10, 2025 • edited by adityagoel4512 Loading

adityagoel4512 left a comment • edited Loading

Choose a reason for hiding this comment

adityagoel4512 Mar 12, 2025

Choose a reason for hiding this comment

adityagoel4512 Mar 12, 2025

Choose a reason for hiding this comment

cbourjau Mar 13, 2025

Choose a reason for hiding this comment

adityagoel4512 Mar 12, 2025

Choose a reason for hiding this comment

adityagoel4512 Mar 12, 2025

Choose a reason for hiding this comment

adityagoel4512 Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

adityagoel4512 Mar 12, 2025

Choose a reason for hiding this comment

adityagoel4512 Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

adityagoel4512 commented Mar 12, 2025 • edited Loading

adityagoel4512 Mar 12, 2025

Choose a reason for hiding this comment

adityagoel4512 commented Mar 12, 2025 • edited Loading

cbourjau commented Mar 10, 2025 •

edited by adityagoel4512

Loading

adityagoel4512 left a comment •

edited

Loading

adityagoel4512 Mar 12, 2025 •

edited

Loading

adityagoel4512 Mar 12, 2025 •

edited

Loading

adityagoel4512 commented Mar 12, 2025 •

edited

Loading

adityagoel4512 commented Mar 12, 2025 •

edited

Loading