Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatched name referencing preventing tear down of Cloud instances. #754

Closed
dacbd opened this issue Sep 27, 2021 · 6 comments
Closed

Mismatched name referencing preventing tear down of Cloud instances. #754

dacbd opened this issue Sep 27, 2021 · 6 comments
Labels
bug Something isn't working cloud-gcp Google Cloud cml-runner Subcommand p0-critical Max priority (ASAP)

Comments

@dacbd
Copy link
Contributor

dacbd commented Sep 27, 2021

Currently still investigating but it feels related to #742 and #738

I can see in GCP logs a Failed:Delete VM that 404's to iterative-2r77ew2wn9qkm where the name of the instance is using a cml prefix instead.

@dacbd
Copy link
Contributor Author

dacbd commented Sep 27, 2021

cml runner cmd:

  cml-runner \
    --single \
    --name=cli-name-opt \
    --log=debug \
    --idle-timeout=1800 \
    --token=*** \
    --cloud=gcp \
    --cloud-region=us-west \
    --cloud-type=m \
    --cloud-hdd-size=10

Service logs:

~$ journalctl --unit cml --no-pager
-- Logs begin at Mon 2021-09-27 17:51:44 UTC, end at Mon 2021-09-27 18:17:01 UTC. --
Sep 27 17:54:53 cli-name-opt systemd[1]: Started cml.service.
Sep 27 17:55:02 cli-name-opt cml.sh[22569]: {"level":"info","message":"Preparing workdir /tmp/tmp.uAzCgwV9Td/.cml/cli-name-opt..."}
Sep 27 17:55:02 cli-name-opt cml.sh[22569]: {"level":"info","message":"Launching github runner"}
Sep 27 17:55:11 cli-name-opt cml.sh[22569]: {"level":"warn","message":"SpotNotifier can not be started."}
Sep 27 17:55:12 cli-name-opt cml.sh[22569]: {"date":"2021-09-27T17:55:12.622Z","level":"info","message":"runner status","repo":"xxx"}
Sep 27 17:55:12 cli-name-opt cml.sh[22569]: {"date":"2021-09-27T17:55:12.624Z","level":"info","message":"runner status √ Connected to GitHub","repo":"xxx"}
Sep 27 17:55:13 cli-name-opt cml.sh[22569]: {"date":"2021-09-27T17:55:13.085Z","level":"info","message":"runner status Listening for Jobs","repo":"xxx","status":"ready"}
Sep 27 17:55:27 cli-name-opt cml.sh[22569]: {"date":"2021-09-27T17:55:27.393Z","job":3723710456,"level":"info","message":"runner status Running job: train-model","repo":"xxx","status":"job_started"}
Sep 27 18:00:24 cli-name-opt cml.sh[22569]: {"date":"2021-09-27T18:00:24.822Z","job":"","level":"info","message":"runner status Job train-model completed with result: Succeeded","repo":"xxx","status":"job_ended","success":true}
Sep 27 18:00:25 cli-name-opt cml.sh[22569]: {"date":"2021-09-27T18:00:25.285Z","level":"info","message":"runner status √ Removed .credentials","repo":"xxx"}
Sep 27 18:00:25 cli-name-opt cml.sh[22569]: {"date":"2021-09-27T18:00:25.286Z","level":"info","message":"runner status √ Removed .runner","repo":"xxx"}
Sep 27 18:00:25 cli-name-opt cml.sh[22569]: {"level":"info","message":"runner status","reason":"proc_exit","status":"terminated"}
Sep 27 18:00:25 cli-name-opt cml.sh[22569]: {"level":"info","message":"waiting 20 seconds before exiting..."}
Sep 27 18:00:45 cli-name-opt cml.sh[22569]: {"level":"info","message":"Unregistering runner cli-name-opt..."}
Sep 27 18:00:45 cli-name-opt cml.sh[22569]: {"level":"error","message":"\tFailed: Cannot destructure property 'id' of '(intermediate value)' as it is undefined."}
Sep 27 18:00:47 cli-name-opt systemd[1]: cml.service: Succeeded.

@dacbd
Copy link
Contributor Author

dacbd commented Sep 27, 2021

using the following results in the same behavior:

  CML_RUNNER_NAME=cml-test-name cml-runner \
    --single \
    --log=debug \
    --idle-timeout=1800 \
    --token=*** \
    --cloud=gcp \
    --cloud-region=us-west \
    --cloud-type=m \
    --cloud-hdd-size=10

@dacbd
Copy link
Contributor Author

dacbd commented Sep 27, 2021

@dacbd
Copy link
Contributor Author

dacbd commented Sep 27, 2021

@DavidGOrtega DavidGOrtega added bug Something isn't working cml-runner Subcommand p0-critical Max priority (ASAP) cloud-gcp Google Cloud labels Sep 30, 2021
@DavidGOrtega
Copy link
Contributor

related to #678

@dacbd
Copy link
Contributor Author

dacbd commented Oct 7, 2021

Closed with upstream release of: https://github.com/iterative/terraform-provider-iterative

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cloud-gcp Google Cloud cml-runner Subcommand p0-critical Max priority (ASAP)
Projects
None yet
Development

No branches or pull requests

2 participants