Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion) Adding vertexAI ingestion source (v2 - experiment and experiment run) #12836

Open
wants to merge 166 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
45ce05e
feat(ingestion) Adding vertexAI ingestion source
ryota-cloud Feb 13, 2025
9a1355d
lintfix
ryota-cloud Feb 13, 2025
04315d4
minor comment change
ryota-cloud Feb 13, 2025
e3a17b5
minor
ryota-cloud Feb 13, 2025
2a5ea58
minor change in unit test
ryota-cloud Feb 13, 2025
3739c20
Adding sources and documents
ryota-cloud Feb 18, 2025
520eda6
delete unnecessary file
ryota-cloud Feb 18, 2025
c320a6c
fetch list of training jobs
ryota-cloud Feb 22, 2025
bc9e451
adding comments
ryota-cloud Feb 23, 2025
960129b
feat(ingest): add vertex AI sample data ingestion
ryota-cloud Feb 12, 2025
95712f5
Update vertexai.py
ryota-cloud Feb 24, 2025
78d184b
added endopint workunit creation and refactored
ryota-cloud Feb 24, 2025
d746a4c
commit temporarily
ryota-cloud Feb 24, 2025
5fbe0e5
lintfix
ryota-cloud Feb 24, 2025
9f8e8a3
removing unnecesary commits
ryota-cloud Feb 24, 2025
85d1830
cleanup recipe
ryota-cloud Feb 24, 2025
aae6893
minor change in config
ryota-cloud Feb 24, 2025
764f8fd
fixing dataset
ryota-cloud Feb 24, 2025
29ddcff
adding comments for dataset
ryota-cloud Feb 24, 2025
437e7d2
minor fix
ryota-cloud Feb 24, 2025
a2a1f0a
adding vertex to dev requirements in setup.py
ryota-cloud Feb 24, 2025
bf869da
minor fix
ryota-cloud Feb 24, 2025
c1f24b7
caching dataset list acquisitions
ryota-cloud Feb 25, 2025
453688d
review comment on dataset
ryota-cloud Feb 25, 2025
be03cf5
minor chagne
ryota-cloud Feb 25, 2025
8c76435
change name
ryota-cloud Feb 25, 2025
33a19c9
lint fix
ryota-cloud Feb 25, 2025
b76ec25
Refactor code to use auto_workunit
ryota-cloud Feb 25, 2025
c7d5165
flattern make_vertexai_name
ryota-cloud Feb 25, 2025
482c159
lint type error is fixed
ryota-cloud Feb 25, 2025
1032630
adding credentail config
ryota-cloud Feb 26, 2025
616b76a
refactor and changed GCP credential to pass project_id
ryota-cloud Feb 26, 2025
1dcfce1
Adding more unit test case coverage, fixed lint and test case
ryota-cloud Feb 26, 2025
f16c8f5
fix platform name
ryota-cloud Feb 26, 2025
1de43a0
fixed _get_data_process_input_workunit test case
ryota-cloud Feb 26, 2025
ea577cb
Adding subtype and container to dataset and training job
ryota-cloud Feb 27, 2025
46ff526
fix UI issue on timestamp and refactor
ryota-cloud Feb 27, 2025
9b6c01e
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Feb 27, 2025
7b0fb70
removed token
ryota-cloud Feb 27, 2025
cf9c242
Adding integration test for VertexAI
ryota-cloud Feb 28, 2025
398c380
Adding unit test cases
ryota-cloud Feb 28, 2025
4703cd9
increasing unit test coverage
ryota-cloud Feb 28, 2025
63e8e8e
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Feb 28, 2025
ba26abb
adding more unit tests
ryota-cloud Feb 28, 2025
3a85d8a
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Feb 28, 2025
84ebae0
fixed review comments
ryota-cloud Mar 3, 2025
0b6b7db
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Mar 3, 2025
5472929
fixed review comments, adding unit test cases
ryota-cloud Mar 3, 2025
0eeeb72
minor change
ryota-cloud Mar 3, 2025
6c43ecc
Change BigQueryCredentail to common function: GCPCredential
ryota-cloud Mar 3, 2025
d381b9e
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Mar 3, 2025
1f64a95
fixed one unit test case failure, and naming chagne
ryota-cloud Mar 3, 2025
b559286
Added Enum and refactoring
ryota-cloud Mar 3, 2025
4edd575
add comment
ryota-cloud Mar 3, 2025
5765025
fixed review comments
ryota-cloud Mar 4, 2025
4b09365
delete test case using real model
ryota-cloud Mar 4, 2025
eb261c3
delete commented out code
ryota-cloud Mar 4, 2025
e6feb8a
consolidate use of auto_workunit and change func output to mcps
ryota-cloud Mar 4, 2025
a8d7980
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Mar 4, 2025
b31d0f6
fix comment
ryota-cloud Mar 4, 2025
99269aa
Add POJO for model and change logic of model extraction and mcps crea…
ryota-cloud Mar 5, 2025
a517173
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Mar 5, 2025
f900f6d
use datetime_to_ts_millis helper
ryota-cloud Mar 5, 2025
5c46c59
refactored unit test case for better assertion
ryota-cloud Mar 5, 2025
1772b7e
Modified integration test to cover relationship between job to datase…
ryota-cloud Mar 5, 2025
8e40b7c
fix import error in test case
ryota-cloud Mar 5, 2025
2a91e6d
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Mar 5, 2025
defbc31
feat(ingestion) Adding vertexAI ingestion source
ryota-cloud Feb 13, 2025
1b0d945
lintfix
ryota-cloud Feb 13, 2025
ab722c5
minor comment change
ryota-cloud Feb 13, 2025
c16a787
minor
ryota-cloud Feb 13, 2025
20dec5c
minor change in unit test
ryota-cloud Feb 13, 2025
300f11f
Adding sources and documents
ryota-cloud Feb 18, 2025
1b750fa
delete unnecessary file
ryota-cloud Feb 18, 2025
0f7bbea
fetch list of training jobs
ryota-cloud Feb 22, 2025
ad3f7fb
adding comments
ryota-cloud Feb 23, 2025
9af098f
feat(ingest): add vertex AI sample data ingestion
ryota-cloud Feb 12, 2025
a2594d4
Update vertexai.py
ryota-cloud Feb 24, 2025
8c090d0
added endopint workunit creation and refactored
ryota-cloud Feb 24, 2025
6675964
commit temporarily
ryota-cloud Feb 24, 2025
ec8c7fc
lintfix
ryota-cloud Feb 24, 2025
704c605
removing unnecesary commits
ryota-cloud Feb 24, 2025
7d3a580
cleanup recipe
ryota-cloud Feb 24, 2025
40bdcd3
minor change in config
ryota-cloud Feb 24, 2025
ed47fb7
fixing dataset
ryota-cloud Feb 24, 2025
3546835
adding comments for dataset
ryota-cloud Feb 24, 2025
cc32302
minor fix
ryota-cloud Feb 24, 2025
12408cc
adding vertex to dev requirements in setup.py
ryota-cloud Feb 24, 2025
309b2da
minor fix
ryota-cloud Feb 24, 2025
9921bbd
caching dataset list acquisitions
ryota-cloud Feb 25, 2025
635e0ff
review comment on dataset
ryota-cloud Feb 25, 2025
8788650
minor chagne
ryota-cloud Feb 25, 2025
7e75027
change name
ryota-cloud Feb 25, 2025
e776885
lint fix
ryota-cloud Feb 25, 2025
80d1ab0
Refactor code to use auto_workunit
ryota-cloud Feb 25, 2025
3f8c9ba
flattern make_vertexai_name
ryota-cloud Feb 25, 2025
760749c
lint type error is fixed
ryota-cloud Feb 25, 2025
5aebfc3
adding credentail config
ryota-cloud Feb 26, 2025
2a95584
refactor and changed GCP credential to pass project_id
ryota-cloud Feb 26, 2025
8b922aa
Adding more unit test case coverage, fixed lint and test case
ryota-cloud Feb 26, 2025
ef15b6d
fix platform name
ryota-cloud Feb 26, 2025
27305b3
fixed _get_data_process_input_workunit test case
ryota-cloud Feb 26, 2025
c9818fa
Adding subtype and container to dataset and training job
ryota-cloud Feb 27, 2025
07e641c
fix UI issue on timestamp and refactor
ryota-cloud Feb 27, 2025
34af7e8
removed token
ryota-cloud Feb 27, 2025
ae8a0b9
Adding integration test for VertexAI
ryota-cloud Feb 28, 2025
8c54c77
Adding unit test cases
ryota-cloud Feb 28, 2025
1b24f2f
increasing unit test coverage
ryota-cloud Feb 28, 2025
372afb6
adding more unit tests
ryota-cloud Feb 28, 2025
57528d8
fixed review comments
ryota-cloud Mar 3, 2025
2598e7e
fixed review comments, adding unit test cases
ryota-cloud Mar 3, 2025
581af26
minor change
ryota-cloud Mar 3, 2025
54ada0f
Change BigQueryCredentail to common function: GCPCredential
ryota-cloud Mar 3, 2025
7ee9248
fixed one unit test case failure, and naming chagne
ryota-cloud Mar 3, 2025
95e56a5
Added Enum and refactoring
ryota-cloud Mar 3, 2025
93f57b1
add comment
ryota-cloud Mar 3, 2025
4aa1822
fixed review comments
ryota-cloud Mar 4, 2025
6f5e7e9
delete test case using real model
ryota-cloud Mar 4, 2025
4f1da5d
delete commented out code
ryota-cloud Mar 4, 2025
93cc061
consolidate use of auto_workunit and change func output to mcps
ryota-cloud Mar 4, 2025
987fb82
fix comment
ryota-cloud Mar 4, 2025
753eb2c
Add POJO for model and change logic of model extraction and mcps crea…
ryota-cloud Mar 5, 2025
a086d32
use datetime_to_ts_millis helper
ryota-cloud Mar 5, 2025
837edf6
refactored unit test case for better assertion
ryota-cloud Mar 5, 2025
d699269
Modified integration test to cover relationship between job to datase…
ryota-cloud Mar 5, 2025
3ae94ce
fix import error in test case
ryota-cloud Mar 5, 2025
46b4158
Adding Experiment and Experiment Run
ryota-cloud Mar 6, 2025
79ee661
Added unit tests and integration tests
ryota-cloud Mar 8, 2025
27d11e8
minor wording fix
ryota-cloud Mar 8, 2025
807dff1
Fix minor issue in test case
ryota-cloud Mar 8, 2025
706a123
Fixed review comments, refactored unit and integration test case, com…
ryota-cloud Mar 9, 2025
55e9cfc
Merge remote-tracking branch 'oss-datahub/master' into vertex_src_temp
ryota-cloud Mar 9, 2025
35f0081
renamed mock data file
ryota-cloud Mar 9, 2025
a4d9c5b
changed function name to _make_training_job_urn
ryota-cloud Mar 9, 2025
8887bd0
Merge branch 'vertex_src_temp' into vertex_exp_3
ryota-cloud Mar 10, 2025
ea0b2a4
Change golden files for integration test and fixed lint
ryota-cloud Mar 10, 2025
4a99f38
changed unnecessary change in tests/conftest.py
ryota-cloud Mar 10, 2025
2fef2de
Adding parent project key to experiment container
ryota-cloud Mar 10, 2025
85561af
Fixed CI/CD error
ryota-cloud Mar 10, 2025
dedab91
pushed CI/CD fix
ryota-cloud Mar 10, 2025
fbff05d
adding SubType and Container Class to ML entities
ryota-cloud Mar 11, 2025
523af99
revert back conftest.py
ryota-cloud Mar 11, 2025
ae4d7ff
fix import
ryota-cloud Mar 11, 2025
ec05cc1
lint fix
ryota-cloud Mar 11, 2025
6b3a761
fixed golden file for integration test
ryota-cloud Mar 11, 2025
3d98b1f
Merge branch 'vertex_src_temp' into vertex_exp_3
ryota-cloud Mar 11, 2025
1e5239e
fix lint
ryota-cloud Mar 11, 2025
a1f1885
Merge remote-tracking branch 'oss-datahub/master' into vertex_exp_4
ryota-cloud Mar 13, 2025
4d65ac3
adding missin MLTypes
ryota-cloud Mar 13, 2025
b5e5101
lint fix
ryota-cloud Mar 13, 2025
1ed361d
Adding version set properly
ryota-cloud Mar 13, 2025
9e77d8f
removing token
ryota-cloud Mar 13, 2025
85bf9f3
removed unnecssary comment out
ryota-cloud Mar 13, 2025
5a7ddc8
added missing aspect and updated unit/integration test
ryota-cloud Mar 14, 2025
4e1ff8d
Merge remote-tracking branch 'oss-datahub/master' into vertex_exp_4
ryota-cloud Mar 14, 2025
3347258
Adding DataProcessInputOutput Class to fix DPI lineage
ryota-cloud Mar 14, 2025
ccc4956
Modify experiment run logic
ryota-cloud Mar 14, 2025
b235bbb
Adding DPI for experiment run execution
ryota-cloud Mar 15, 2025
57b3add
Refactoring code to multiple files
ryota-cloud Mar 15, 2025
bb59dc7
missing file
ryota-cloud Mar 15, 2025
3341d28
fixing errors
ryota-cloud Mar 15, 2025
115c503
change import logic
ryota-cloud Mar 15, 2025
6e77726
refactored vertexai.py to multiple files.
ryota-cloud Mar 17, 2025
82d97d3
deleted
ryota-cloud Mar 17, 2025
a6cf015
Fixed type error for duration in experiment_run mcp
ryota-cloud Mar 17, 2025
3389c32
adding run status logic
ryota-cloud Mar 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ source:
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
server: "http://localhost:8080"
10 changes: 10 additions & 0 deletions metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,13 @@ class BIAssetSubTypes(StrEnum):
class MLAssetSubTypes(StrEnum):
MLFLOW_TRAINING_RUN = "ML Training Run"
MLFLOW_EXPERIMENT = "ML Experiment"
VERTEX_EXPERIMENT = "Experiment"
VERTEX_EXPERIMENT_RUN = "Experiment Run"
VERTEX_EXECUTION = "Execution"

VERTEX_MODEL = "ML Model"
VERTEX_MODEL_GROUP = "ML Model Group"
VERTEX_TRAINING_JOB = "Training Job"
VERTEX_ENDPOINT = "Endpoint"
VERTEX_DATASET = "Dataset"
VERTEX_PROJECT = "Project"
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from datahub.ingestion.source.vertexai.config import VertexAIConfig
from datahub.ingestion.source.vertexai.vertexai import (
ContainerKeyWithId,
ModelMetadata,
TrainingJobMetadata,
VertexAISource,
)
33 changes: 33 additions & 0 deletions metadata-ingestion/src/datahub/ingestion/source/vertexai/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from typing import Any, Optional

from pydantic import Field, PrivateAttr

from datahub.configuration.source_common import EnvConfigMixin
from datahub.ingestion.source.common.gcp_credentials_config import GCPCredential


class VertexAIConfig(EnvConfigMixin):
credential: Optional[GCPCredential] = Field(
default=None, description="GCP credential information"
)
project_id: str = Field(description=("Project ID in Google Cloud Platform"))
region: str = Field(
description=("Region of your project in Google Cloud Platform"),
)
bucket_uri: Optional[str] = Field(
default=None,
description=("Bucket URI used in your project"),
)
vertexai_url: Optional[str] = Field(
default="https://console.cloud.google.com/vertex-ai",
description=("VertexUI URI"),
)

_credentials_path: Optional[str] = PrivateAttr(None)

def __init__(self, **data: Any):
super().__init__(**data)
if self.credential:
self._credentials_path = self.credential.create_credential_temp_file(
project_id=self.project_id
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
from typing import Union

from google.cloud.aiplatform.base import VertexAiResourceNoun
from google.cloud.aiplatform.jobs import _RunnableJob
from google.cloud.aiplatform.training_jobs import _TrainingJob
from google.cloud.aiplatform_v1.types import JobState, PipelineState

from datahub.metadata.schema_classes import RunResultTypeClass


def get_automl_job_result_type(job: _TrainingJob) -> Union[str, RunResultTypeClass]:
state = job.state
if state == PipelineState.PIPELINE_STATE_SUCCEEDED:
return RunResultTypeClass.SUCCESS
elif state == PipelineState.PIPELINE_STATE_FAILED:
return RunResultTypeClass.FAILURE
elif state == PipelineState.PIPELINE_STATE_CANCELLED:
return "Cancelled"
elif state == PipelineState.PIPELINE_STATE_PAUSED:
return "Paused"
elif state == PipelineState.PIPELINE_STATE_QUEUED:
return "Queued"
elif state == PipelineState.PIPELINE_STATE_RUNNING:
return "Running"
elif state == PipelineState.PIPELINE_STATE_UNSPECIFIED:
return "Unspecific"
else:
return "UNKNOWN"


def get_custom_job_result_type(job: _RunnableJob) -> Union[str, RunResultTypeClass]:
state = job.state
if state == JobState.JOB_STATE_SUCCEEDED:
return RunResultTypeClass.SUCCESS
elif state == JobState.JOB_STATE_FAILED:
return RunResultTypeClass.FAILURE
elif state == JobState.JOB_STATE_CANCELLED:
return "Cancelled"
elif state == JobState.JOB_STATE_PAUSED:
return "Paused"
elif state == JobState.JOB_STATE_QUEUED:
return "Queued"
elif state == JobState.JOB_STATE_RUNNING:
return "Running"
elif state == JobState.JOB_STATE_CANCELLING:
return "Cancelling"
elif state == JobState.JOB_STATE_EXPIRED:
return "Expired"
elif state == JobState.JOB_STATE_UPDATING:
return "Updating"
else:
return "UNKNOWN"


def get_job_result_status(job: VertexAiResourceNoun) -> Union[str, RunResultTypeClass]:
if isinstance(job, _TrainingJob):
return get_automl_job_result_type(job)
elif isinstance(job, _RunnableJob):
return get_custom_job_result_type(job)
return "UNKNOWN"


def get_execution_result_status(status: int) -> Union[str, RunResultTypeClass]:
"""
State of the execution.
STATE_UNSPECIFIED = 0
PENDING = 1
RUNNING = 2
SUCCEEDED = 3
FAILED = 4
"""
if status == 3:
return RunResultTypeClass.SUCCESS
elif status == 4:
return RunResultTypeClass.FAILURE
elif status == 2:
return "RUNNING"
elif status == 1:
return "PENDING"
elif status == 0:
return "STATE_UNSPECIFIED"
else:
return "UNKNOWN"


def is_status_for_run_event_class(status: Union[str, RunResultTypeClass]) -> bool:
return status in [RunResultTypeClass.SUCCESS, RunResultTypeClass.FAILURE]
Loading
Loading