Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Missing lock for job X failed #13702

Open
Fank opened this issue Mar 5, 2025 · 8 comments
Open

Error: Missing lock for job X failed #13702

Fank opened this issue Mar 5, 2025 · 8 comments
Labels
in linear Issue or PR has been created in Linear for internal review Needs Feedback Waiting for further input or clarification.

Comments

@Fank
Copy link

Fank commented Mar 5, 2025

Bug Description

When using worker to run task seperated by main and an execution fails it shows always looks like this:

Worker started execution 902 (job 155)
Worker finished execution 902 (job 155)
Queue errored
Queue errored
Error: Missing lock for job 155 failed
    at Object.finishedErrors (/usr/local/lib/node_modules/n8n/node_modules/bull/lib/scripts.js:225:16)
    at Job.moveToFailed (/usr/local/lib/node_modules/n8n/node_modules/bull/lib/job.js:342:19)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)

docker-compose.yml of the worker:

volumes:
  n8n_test_storage:
  n8n_test_data:

x-shared: &shared
  image: docker.n8n.io/n8nio/n8n:1.81.4
  restart: unless-stopped
  environment:
    - DB_TYPE=postgresdb
    - DB_POSTGRESDB_HOST=1.2.3.7
    - DB_POSTGRESDB_PORT=5432
    - DB_POSTGRESDB_DATABASE=n8n-test
    - DB_POSTGRESDB_USER=n8n-test
    - DB_POSTGRESDB_PASSWORD=DDDDDDDDDDDDDDDD
    - EXECUTIONS_MODE=queue
    - QUEUE_BULL_PREFIX=n8n-test
    - QUEUE_BULL_REDIS_CLUSTER_NODES=1.2.3.4:6379,1.2.3.5:6379,1.2.3.6:6379
    - QUEUE_HEALTH_CHECK_ACTIVE=true
    - N8N_ENCRYPTION_KEY=CCCCCCCCCCCCCCCCCCCCC
    # License
#    - N8N_LICENSE_TENANT_ID=1001
#    - N8N_LICENSE_ACTIVATION_KEY=${LICENSE_ACTIVATION_KEY}
    # Binary Data
    - N8N_AVAILABLE_BINARY_DATA_MODES=s3
    - N8N_DEFAULT_BINARY_DATA_MODE=s3
    # S3
    - N8N_EXTERNAL_STORAGE_S3_HOST=storage.googleapis.com
    - N8N_EXTERNAL_STORAGE_S3_BUCKET_NAME=n8n-test-bucket
    - N8N_EXTERNAL_STORAGE_S3_BUCKET_REGION=europe-west3
    - N8N_EXTERNAL_STORAGE_S3_ACCESS_KEY=AAAAAAAAAAAAAAAAAAAAAAA
    - N8N_EXTERNAL_STORAGE_S3_ACCESS_SECRET=BBBBBBBBBBBBBBBBBBBBBBB
    # security
    - N8N_BLOCK_ENV_ACCESS_IN_NODE=true
  volumes:
    - n8n_test_storage:/home/node/.n8n
    - n8n_test_data:/home/node/data:ro

services:
  n8n-worker:
    <<: *shared
    command: worker

#  n8n-worker-2:
#    <<: *shared
#    command: worker

Debug info

core

  • n8nVersion: 1.81.4
  • platform: docker (self-hosted)
  • nodeJsVersion: 20.18.3
  • database: postgres
  • executionMode: scaling
  • concurrency: -1
  • license: enterprise (production)
  • consumerId: 03176d77-824f-4ef2-83db-39f7bc022c2f

storage

  • success: all
  • error: all
  • progress: false
  • manual: true
  • binaryMode: s3

pruning

  • enabled: true
  • maxAge: 336 hours
  • maxCount: 10000 executions

client

  • userAgent: mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/133.0.0.0 safari/537.36
  • isTouchDevice: false

Generated at: 2025-03-05T11:49:03.933Z

To Reproduce

  1. Setup redis cluster and postgresql cluster
  2. Setup 1+x main and 1x Worker
  3. Run an execution e.g. webhook using queue mode which is forced to fail.
  4. observe error log

Expected behavior

I think this error should not be shown, also Queue errored should be shown once I think.

Operating System

docker (self-hosted)

n8n Version

1.81.4

Node.js Version

20.18.3

Database

PostgreSQL

Execution mode

queue

@Joffcom
Copy link
Member

Joffcom commented Mar 5, 2025

Hey @Fank,

We have created an internal ticket to look into this which we will be tracking as "GHC-1061"

@Joffcom Joffcom added the in linear Issue or PR has been created in Linear for internal review label Mar 5, 2025
@Joffcom
Copy link
Member

Joffcom commented Mar 5, 2025

Hey @Fank,

What do you mean by "task seperated by main and an execution"?

@Joffcom Joffcom added the Needs Feedback Waiting for further input or clarification. label Mar 5, 2025
@Fank
Copy link
Author

Fank commented Mar 5, 2025

Dedicated worker, which runs the executions and the main is only the one who queues the executions.
Sorry for misunderstanding.

@Joffcom Joffcom removed the Needs Feedback Waiting for further input or clarification. label Mar 5, 2025
@Joffcom
Copy link
Member

Joffcom commented Mar 5, 2025

Hey @Fank,

I am not able to reproduce this in my environment, What is the error in the UI that you are seeing? Looking at the queue errored message it only seems to appear twice and it looks like it might be an issue when trying to connect to Redis rather than caused by a workflow failing.

@Joffcom Joffcom added the Needs Feedback Waiting for further input or clarification. label Mar 5, 2025
@Fank
Copy link
Author

Fank commented Mar 5, 2025

So far I am not able to debug this in the UI, because of size of the items I am hitting Out of memory in browser nearly all the time.
Also also hitting issues that worker gets disconnected because of load issues.

Worker during execution:

Image

worker near the end of execution:

Image

@Joffcom Joffcom removed the Needs Feedback Waiting for further input or clarification. label Mar 5, 2025
@Joffcom
Copy link
Member

Joffcom commented Mar 5, 2025

Hey @Fank,

Looking at the resource usage there this isn't feeling like a bug and more of a deployment issue. The browser hitting out of memory can occur without the worker hitting the same limit due to the extra data the front end works with, The 100% CPU is likely going to cause issues though so you may need to tweak your workflows or increase the available resources.

@Joffcom Joffcom added the Needs Feedback Waiting for further input or clarification. label Mar 5, 2025
@Fank
Copy link
Author

Fank commented Mar 5, 2025

Thanks for the tip I think the kapa.ai in discord gave me a good idea of parsing the csv in steps instead of all at once.
But this still does not solve the issues of the maybe timeouts, and maybe those are the reason for the issues.
Even if the worker is under load, it should still respond instead of turning stale or am I wrong?

@Joffcom Joffcom removed the Needs Feedback Waiting for further input or clarification. label Mar 5, 2025
@Joffcom
Copy link
Member

Joffcom commented Mar 5, 2025

Hey @Fank,

If the worker can't connect to Redis or the database I would expect there to be some issues, It will eventually reconnect and I suspect the job has been cleaned up when it comes back which shows the error, The good news is this message shouldn't impact anything but it will be interesting to see how things start running for you once you have everything sorted. We are always working on improving our scaling process as well so I suspect this will only get better as we start to spend more time on it.

@Joffcom Joffcom added the Needs Feedback Waiting for further input or clarification. label Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in linear Issue or PR has been created in Linear for internal review Needs Feedback Waiting for further input or clarification.
Projects
None yet
Development

No branches or pull requests

2 participants