Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE] Auto-purged clusters still create a faulty terraform plan #1197

Closed
dugernierg opened this issue Mar 16, 2022 · 10 comments · Fixed by #1252
Closed

[ISSUE] Auto-purged clusters still create a faulty terraform plan #1197

dugernierg opened this issue Mar 16, 2022 · 10 comments · Fixed by #1252
Labels
Cannot reproduce Indicates insufficient information to reproduce or solve the problem. platform bug this issue cannot be fixed or worked around in scope of this plugin. Please create a support case.

Comments

@dugernierg
Copy link

Hi there,

Follow up to #1177 and #1178, as the problem doesn't seem to be resolved by 0.5.3.

As instructed, you can find the debug output below. If anything relevant is missing let me know.

Steps to Reproduce

see #1177

Terraform and provider versions

Terraform v1.0.1
Provider V0.5.3

Debug Output

2022-03-16T09:18:23.820Z [DEBUG] provider.terraform-provider-databricks_v0.5.3: 400 Bad Request {
  "error_code": "INVALID_STATE",
  "message": "Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago... (1 more bytes)"
}: timestamp=2022-03-16T09:18:23.820Z
2022-03-16T09:18:23.820Z [WARN]  provider.terraform-provider-databricks_v0.5.3: /api/2.0/permissions/clusters/####-######-#######:400 - Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago.: timestamp=2022-03-16T09:18:23.820Z
2022-03-16T09:18:23.820Z [WARN]  provider.terraform-provider-databricks_v0.5.3: /api/2.0/permissions/clusters/####-######-#######:400 - Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago.: timestamp=2022-03-16T09:18:23.820Z

(...)

2022-03-16T09:18:24.565Z [DEBUG] provider.terraform-provider-databricks_v0.5.3: 400 Bad Request {
  "error_code": "INVALID_STATE",
  "message": "Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago... (1 more bytes)"
}: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [WARN]  provider.terraform-provider-databricks_v0.5.3: /api/2.0/clusters/get:400 - Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago. https://docs.databricks.com/dev-tools/api/latest/clusters.html#get: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [WARN]  provider.terraform-provider-databricks_v0.5.3: /api/2.0/clusters/get:400 - Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago. https://docs.databricks.com/dev-tools/api/latest/clusters.html#get: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [WARN]  provider.terraform-provider-databricks_v0.5.3: assuming that cluster is removed on backend: Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago.: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [INFO]  provider.terraform-provider-databricks_v0.5.3: cluster[id=####-######-#######] is removed on backend: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [WARN]  Provider "registry.terraform.io/databrickslabs/databricks" produced an unexpected new value for databricks_cluster.[MASKED]_etl_cluster_rd during refresh.
      - Root resource was present, but now absent
2022-03-16T09:18:24.572Z [WARN]  Provider "registry.terraform.io/databrickslabs/databricks" produced an invalid plan for databricks_cluster.[MASKED]_etl_cluster_rd, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .num_workers: planned value cty.NumberIntVal(0) for a non-computed attribute

(...)

2022-03-16T09:18:19.491Z [ERROR] AttachSchemaTransformer: No resource schema available for databricks_permissions.etl_rd_usage

(...)

2022-03-16T09:18:28.160Z [INFO]  backend/local: plan operation completed
╷
│ Error: Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago.
│ 
│ 
╵
@nfx
Copy link
Contributor

nfx commented Mar 16, 2022

@dugernierg thanks for the log! assuming that cluster is removed on backend: Cannot access cluster ####-######-####### is a very important line. this means the fix 15bca2c triggered, but didn't have the intended effect.

Are you sure that the message is Error: Cannot access cluster and not Error: cannot read cluster: Cannot access cluster?.. is it for cluster resource or for mount or sql permissions?

Can you build provider code locally? if so - we can try couple of things over the call. otherwise it may take me 30 days to reproduce the issue.

there's manual mitigation as the last resort - https://www.terraform.io/cli/commands/state/rm, but i'm looking to figure out the permanent fix.

@dugernierg
Copy link
Author

dugernierg commented Mar 16, 2022

I confirm that the error is indeed Error: Cannot access cluster, I've double-checked in the logs.

Running any custom terraform command is proven... complicated. I'm deploying the project via a gitlab ci/cd pipeline that follows a company-level template. I've been looking to run terraform state rm, but even that is tricky because I don't have a way to access the state directly.

I've sent an email to the company devOps team to see if someone would be available to join a call so we can modify the pipeline on the fly for investigation purposes.

Just in case it may be relevant: there was a databricks_permission resource also linked to that cluster and present in the project.

@nfx
Copy link
Contributor

nfx commented Mar 16, 2022

@dugernierg i'm more looking for someone that can rapidly replace TF binaries with every fix attempted. databricks_permission has nothing to do with this recently rolled out update of cluster manager api.

@nfx
Copy link
Contributor

nfx commented Mar 16, 2022

@dugernierg is it on databricks_cluster? or is it on databricks_mount or databricks_sql_permissions, which use clusters api behind the scenes? I've just reproduced the error and it works as expected.

main_go_—_terraform-provider-databricks

@nfx nfx added the Cannot reproduce Indicates insufficient information to reproduce or solve the problem. label Mar 16, 2022
@dugernierg
Copy link
Author

It's on a databricks_cluster. I might take you up on your offer for a call, it might be faster to investigate that way.

@nfx
Copy link
Contributor

nfx commented Mar 29, 2022

#1227 actually gives a very important detail about the issue: HTTP 400 error returned by permissions API, not just clusters API.

The fix for this should involve copying "wrapMissingError" from clusters Get api to getting list of permissions api. I'm away until second half of April and would be able to release a fix only then.

@nfx nfx added the platform bug this issue cannot be fixed or worked around in scope of this plugin. Please create a support case. label Mar 29, 2022
@dugernierg
Copy link
Author

dugernierg commented Mar 30, 2022

We must have misunderstood each other here. I pointed out the databricks_permission resource, and the first three lines of the debug output are about calls to the permission API. I should have been clearer, my bad.

In any case I'm glad the issue was identified, thanks for keeping me updated and thanks for amazing work you're doing with this provider!

@nfx
Copy link
Contributor

nfx commented Mar 30, 2022

@dugernierg This is definitely new behavior for permissions api 🤷🏻‍♂️ please report it to our support.

@dugernierg
Copy link
Author

I sent an email to the support with links to both this ticket and 1227 explaining the situation. I hope it will help resolve the situation. I'll transmit any information I might get from them, though I imagine they will probably also communicate them to you internally.

@nfx nfx linked a pull request Apr 21, 2022 that will close this issue
@nfx
Copy link
Contributor

nfx commented Apr 21, 2022

@dugernierg added a fix in #1252

@nfx nfx closed this as completed in #1252 Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cannot reproduce Indicates insufficient information to reproduce or solve the problem. platform bug this issue cannot be fixed or worked around in scope of this plugin. Please create a support case.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants