[ISSUE] Auto-purged clusters still create a faulty terraform plan #1197

dugernierg · 2022-03-16T09:47:42Z

Hi there,

Follow up to #1177 and #1178, as the problem doesn't seem to be resolved by 0.5.3.

As instructed, you can find the debug output below. If anything relevant is missing let me know.

Steps to Reproduce

see #1177

Terraform and provider versions

Terraform v1.0.1
Provider V0.5.3

Debug Output

2022-03-16T09:18:23.820Z [DEBUG] provider.terraform-provider-databricks_v0.5.3: 400 Bad Request {
  "error_code": "INVALID_STATE",
  "message": "Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago... (1 more bytes)"
}: timestamp=2022-03-16T09:18:23.820Z
2022-03-16T09:18:23.820Z [WARN]  provider.terraform-provider-databricks_v0.5.3: /api/2.0/permissions/clusters/####-######-#######:400 - Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago.: timestamp=2022-03-16T09:18:23.820Z
2022-03-16T09:18:23.820Z [WARN]  provider.terraform-provider-databricks_v0.5.3: /api/2.0/permissions/clusters/####-######-#######:400 - Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago.: timestamp=2022-03-16T09:18:23.820Z

(...)

2022-03-16T09:18:24.565Z [DEBUG] provider.terraform-provider-databricks_v0.5.3: 400 Bad Request {
  "error_code": "INVALID_STATE",
  "message": "Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago... (1 more bytes)"
}: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [WARN]  provider.terraform-provider-databricks_v0.5.3: /api/2.0/clusters/get:400 - Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago. https://docs.databricks.com/dev-tools/api/latest/clusters.html#get: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [WARN]  provider.terraform-provider-databricks_v0.5.3: /api/2.0/clusters/get:400 - Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago. https://docs.databricks.com/dev-tools/api/latest/clusters.html#get: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [WARN]  provider.terraform-provider-databricks_v0.5.3: assuming that cluster is removed on backend: Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago.: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [INFO]  provider.terraform-provider-databricks_v0.5.3: cluster[id=####-######-#######] is removed on backend: timestamp=2022-03-16T09:18:24.565Z
2022-03-16T09:18:24.565Z [WARN]  Provider "registry.terraform.io/databrickslabs/databricks" produced an unexpected new value for databricks_cluster.[MASKED]_etl_cluster_rd during refresh.
      - Root resource was present, but now absent
2022-03-16T09:18:24.572Z [WARN]  Provider "registry.terraform.io/databrickslabs/databricks" produced an invalid plan for databricks_cluster.[MASKED]_etl_cluster_rd, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .num_workers: planned value cty.NumberIntVal(0) for a non-computed attribute

(...)

2022-03-16T09:18:19.491Z [ERROR] AttachSchemaTransformer: No resource schema available for databricks_permissions.etl_rd_usage

(...)

2022-03-16T09:18:28.160Z [INFO]  backend/local: plan operation completed
╷
│ Error: Cannot access cluster ####-######-####### that was terminated or unpinned more than 30 days ago.
│ 
│ 
╵

The text was updated successfully, but these errors were encountered:

nfx · 2022-03-16T11:49:17Z

@dugernierg thanks for the log! assuming that cluster is removed on backend: Cannot access cluster ####-######-####### is a very important line. this means the fix 15bca2c triggered, but didn't have the intended effect.

Are you sure that the message is Error: Cannot access cluster and not Error: cannot read cluster: Cannot access cluster?.. is it for cluster resource or for mount or sql permissions?

Can you build provider code locally? if so - we can try couple of things over the call. otherwise it may take me 30 days to reproduce the issue.

there's manual mitigation as the last resort - https://www.terraform.io/cli/commands/state/rm, but i'm looking to figure out the permanent fix.

dugernierg · 2022-03-16T13:17:58Z

I confirm that the error is indeed Error: Cannot access cluster, I've double-checked in the logs.

Running any custom terraform command is proven... complicated. I'm deploying the project via a gitlab ci/cd pipeline that follows a company-level template. I've been looking to run terraform state rm, but even that is tricky because I don't have a way to access the state directly.

I've sent an email to the company devOps team to see if someone would be available to join a call so we can modify the pipeline on the fly for investigation purposes.

Just in case it may be relevant: there was a databricks_permission resource also linked to that cluster and present in the project.

nfx · 2022-03-16T14:18:04Z

@dugernierg i'm more looking for someone that can rapidly replace TF binaries with every fix attempted. ~~databricks_permission has nothing to do with this recently rolled out update of cluster manager api.~~

nfx · 2022-03-16T17:35:14Z

@dugernierg is it on databricks_cluster? or is it on databricks_mount or databricks_sql_permissions, which use clusters api behind the scenes? I've just reproduced the error and it works as expected.

dugernierg · 2022-03-17T08:45:08Z

It's on a databricks_cluster. I might take you up on your offer for a call, it might be faster to investigate that way.

nfx · 2022-03-29T17:14:08Z

#1227 actually gives a very important detail about the issue: HTTP 400 error returned by permissions API, not just clusters API.

The fix for this should involve copying "wrapMissingError" from clusters Get api to getting list of permissions api. I'm away until second half of April and would be able to release a fix only then.

dugernierg · 2022-03-30T06:26:23Z

We must have misunderstood each other here. I pointed out the databricks_permission resource, and the first three lines of the debug output are about calls to the permission API. I should have been clearer, my bad.

In any case I'm glad the issue was identified, thanks for keeping me updated and thanks for amazing work you're doing with this provider!

nfx · 2022-03-30T08:27:07Z

@dugernierg This is definitely new behavior for permissions api 🤷🏻‍♂️ please report it to our support.

dugernierg · 2022-03-30T12:35:31Z

I sent an email to the support with links to both this ticket and 1227 explaining the situation. I hope it will help resolve the situation. I'll transmit any information I might get from them, though I imagine they will probably also communicate them to you internally.

nfx · 2022-04-21T19:01:05Z

@dugernierg added a fix in #1252

nfx added the Cannot reproduce Indicates insufficient information to reproduce or solve the problem. label Mar 16, 2022

nfx added the platform bug this issue cannot be fixed or worked around in scope of this plugin. Please create a support case. label Mar 29, 2022

nfx linked a pull request Apr 21, 2022 that will close this issue

Delete permissions resource for auto-purged cluster #1252

Merged

nfx closed this as completed in #1252 Apr 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE] Auto-purged clusters still create a faulty terraform plan #1197

[ISSUE] Auto-purged clusters still create a faulty terraform plan #1197

dugernierg commented Mar 16, 2022

nfx commented Mar 16, 2022

dugernierg commented Mar 16, 2022 •

edited

Loading

nfx commented Mar 16, 2022 •

edited

Loading

nfx commented Mar 16, 2022 •

edited

Loading

dugernierg commented Mar 17, 2022

nfx commented Mar 29, 2022

dugernierg commented Mar 30, 2022 •

edited

Loading

nfx commented Mar 30, 2022

dugernierg commented Mar 30, 2022

nfx commented Apr 21, 2022

[ISSUE] Auto-purged clusters still create a faulty terraform plan #1197

[ISSUE] Auto-purged clusters still create a faulty terraform plan #1197

Comments

dugernierg commented Mar 16, 2022

Steps to Reproduce

Terraform and provider versions

Debug Output

nfx commented Mar 16, 2022

dugernierg commented Mar 16, 2022 • edited Loading

nfx commented Mar 16, 2022 • edited Loading

nfx commented Mar 16, 2022 • edited Loading

dugernierg commented Mar 17, 2022

nfx commented Mar 29, 2022

dugernierg commented Mar 30, 2022 • edited Loading

nfx commented Mar 30, 2022

dugernierg commented Mar 30, 2022

nfx commented Apr 21, 2022

dugernierg commented Mar 16, 2022 •

edited

Loading

nfx commented Mar 16, 2022 •

edited

Loading

nfx commented Mar 16, 2022 •

edited

Loading

dugernierg commented Mar 30, 2022 •

edited

Loading