Skip to content

Commit e13babb

Browse files
[Docs] MessagePack IDL, Pydantic Support, and Attribute Access (flyteorg#6022)
* [Docs] MessagePack IDL, Pydantic Support and Attribute Access Signed-off-by: Future-Outlier <[email protected]> * support Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * lint Signed-off-by: Future-Outlier <[email protected]> * Trigger CI Signed-off-by: Future-Outlier <[email protected]> * Trigger CI Signed-off-by: Future-Outlier <[email protected]> * lint Signed-off-by: Future-Outlier <[email protected]> * Update docs/user_guide/data_types_and_io/dataclass.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * nit Signed-off-by: Future-Outlier <[email protected]> * nit Signed-off-by: Future-Outlier <[email protected]> * Update docs/user_guide/data_types_and_io/dataclass.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * Update docs/user_guide/data_types_and_io/dataclass.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> * format Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Han-Ru Chen (Future-Outlier) <[email protected]> Co-authored-by: David Espejo <[email protected]>
1 parent 09a6fb8 commit e13babb

File tree

4 files changed

+132
-8
lines changed

4 files changed

+132
-8
lines changed

docs/user_guide/data_types_and_io/accessing_attributes.md

+10-6
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ Note that while this functionality may appear to be the normal behavior of Pytho
1111
Consequently, accessing attributes in this manner is, in fact, a specially implemented feature.
1212
This functionality facilitates the direct passing of output attributes within workflows, enhancing the convenience of working with complex data structures.
1313

14+
```{important}
15+
Flytekit version >= v1.14.0 supports Pydantic BaseModel V2, you can do attribute access on Pydantic BaseModel V2 as well.
16+
```
17+
1418
```{note}
1519
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
1620
```
@@ -19,7 +23,7 @@ To begin, import the required dependencies and define a common task for subseque
1923

2024
```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
2125
:caption: data_types_and_io/attribute_access.py
22-
:lines: 1-10
26+
:lines: 1-9
2327
```
2428

2529
## List
@@ -31,38 +35,38 @@ Flyte currently does not support output promise access through list slicing.
3135

3236
```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
3337
:caption: data_types_and_io/attribute_access.py
34-
:lines: 14-23
38+
:lines: 13-22
3539
```
3640

3741
## Dictionary
3842
Access the output dictionary by specifying the key.
3943

4044
```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
4145
:caption: data_types_and_io/attribute_access.py
42-
:lines: 27-35
46+
:lines: 26-34
4347
```
4448

4549
## Data class
4650
Directly access an attribute of a dataclass.
4751

4852
```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
4953
:caption: data_types_and_io/attribute_access.py
50-
:lines: 39-53
54+
:lines: 38-51
5155
```
5256

5357
## Complex type
5458
Combinations of list, dict and dataclass also work effectively.
5559

5660
```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
5761
:caption: data_types_and_io/attribute_access.py
58-
:lines: 57-80
62+
:lines: 55-78
5963
```
6064

6165
You can run all the workflows locally as follows:
6266

6367
```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
6468
:caption: data_types_and_io/attribute_access.py
65-
:lines: 84-88
69+
:lines: 82-86
6670
```
6771

6872
## Failure scenario

docs/user_guide/data_types_and_io/dataclass.md

+17-1
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,24 @@ When you've multiple values that you want to send across Flyte entities, you can
1111
Flytekit uses the [Mashumaro library](https://github.com/Fatal1ty/mashumaro)
1212
to serialize and deserialize dataclasses.
1313

14+
With the 1.14 release, `flytekit` adopted `MessagePack` as the
15+
serialization format for dataclasses, overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype, like the previous versions do:
16+
17+
to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue.
18+
19+
:::{important}
20+
If you're using Flytekit version < v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
21+
:::
22+
1423
:::{important}
15-
If you're using Flytekit version below v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
24+
Flytekit version < v1.14.0 will produce protobuf `struct` literal for dataclasses.
25+
26+
Flytekit version >= v1.14.0 will produce msgpack bytes literal for dataclasses.
27+
28+
If you're using Flytekit version >= v1.14.0 and you want to produce protobuf `struct` literal for dataclasses, you can
29+
set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.
30+
31+
For more details, you can refer the MSGPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
1632
:::
1733

1834
```{note}

docs/user_guide/data_types_and_io/index.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ Here's a breakdown of these mappings:
114114
- Use ``pyspark.DataFrame`` as a type hint.
115115
* - ``pydantic.BaseModel``
116116
- ``Map``
117-
- To utilize the type, install the ``flytekitplugins-pydantic`` plugin.
117+
- To utilize the type, install the ``pydantic>2`` module.
118118
- Use ``pydantic.BaseModel`` as a type hint.
119119
* - ``torch.Tensor`` / ``torch.nn.Module``
120120
- File
@@ -144,6 +144,7 @@ flytefile
144144
flytedirectory
145145
structureddataset
146146
dataclass
147+
pydantic_basemodel
147148
accessing_attributes
148149
pytorch_type
149150
enum_type
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
(pydantic_basemodel)=
2+
3+
# Pydantic BaseModel
4+
5+
```{eval-rst}
6+
.. tags:: Basic
7+
```
8+
9+
`flytekit` version >=1.14 supports natively the `JSON` format that Pydantic `BaseModel` produces, enhancing the
10+
interoperability of Pydantic BaseModels with the Flyte type system.
11+
12+
:::{important}
13+
Pydantic BaseModel V2 only works when you are using flytekit version >= v1.14.0.
14+
:::
15+
16+
With the 1.14 release, `flytekit` adopted `MessagePack` as the serialization format for Pydantic `BaseModel`,
17+
overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype like the previous versions do:
18+
19+
to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue.
20+
21+
:::{important}
22+
By default, `flytekit >= 1.14` will produce `msgpack` bytes literals when serializing, preserving the types defined in your `BaseModel` class.
23+
If you're serializing `BaseModel` using `flytekit` version >= v1.14.0 and you want to produce Protobuf `struct` literal instead, you can set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.
24+
25+
For more details, you can refer the MESSAGEPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
26+
:::
27+
28+
```{note}
29+
You can put Dataclass and FlyteTypes (FlyteFile, FlyteDirectory, FlyteSchema, and StructuredDataset) in a pydantic BaseModel.
30+
```
31+
32+
```{note}
33+
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
34+
```
35+
36+
To begin, import the necessary dependencies:
37+
38+
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
39+
:caption: data_types_and_io/pydantic_basemodel.py
40+
:lines: 1-9
41+
```
42+
43+
Build your custom image with ImageSpec:
44+
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
45+
:caption: data_types_and_io/pydantic_basemodel.py
46+
:lines: 11-14
47+
```
48+
49+
## Python types
50+
We define a `pydantic basemodel` with `int`, `str` and `dict` as the data types.
51+
52+
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
53+
:caption: data_types_and_io/pydantic_basemodel.py
54+
:pyobject: Datum
55+
```
56+
57+
You can send a `pydantic basemodel` between different tasks written in various languages, and input it through the Flyte console as raw JSON.
58+
59+
:::{note}
60+
All variables in a data class should be **annotated with their type**. Failure to do should will result in an error.
61+
:::
62+
63+
Once declared, a dataclass can be returned as an output or accepted as an input.
64+
65+
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
66+
:caption: data_types_and_io/pydantic_basemodel.py
67+
:lines: 26-41
68+
```
69+
70+
## Flyte types
71+
We also define a data class that accepts {std:ref}`StructuredDataset <structured_dataset>`,
72+
{std:ref}`FlyteFile <files>` and {std:ref}`FlyteDirectory <folder>`.
73+
74+
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
75+
:caption: data_types_and_io/pydantic_basemodel.py
76+
:lines: 45-86
77+
```
78+
79+
A data class supports the usage of data associated with Python types, data classes,
80+
flyte file, flyte directory and structured dataset.
81+
82+
We define a workflow that calls the tasks created above.
83+
84+
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
85+
:caption: data_types_and_io/pydantic_basemodel.py
86+
:pyobject: basemodel_wf
87+
```
88+
89+
You can run the workflow locally as follows:
90+
91+
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
92+
:caption: data_types_and_io/pydantic_basemodel.py
93+
:lines: 99-100
94+
```
95+
96+
To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input:
97+
```
98+
pyflyte run \
99+
https://raw.githubusercontent.com/flyteorg/flytesnacks/b71e01d45037cea883883f33d8d93f258b9a5023/examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py \
100+
basemodel_wf --x 1 --y 2
101+
```
102+
103+
[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/

0 commit comments

Comments
 (0)