You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is after looking through all of the artifacts during this CI failure.
Scenario: new isbsvc test-isbservice-rollout-1 created with new pipeline test-pipeline-rollout-2.
The sink vertex pod named "out" continually restarts. We have the log from the first failure and not the others unfortunately, but presumably they are the same?:
{"level":"info","ts":"2025-03-04T15:35:05.204127832Z","logger":"numaflow.Sink-processor","caller":"commands/processor.go:48","msg":"Starting vertex data processor","version":"Version: v1.4.3-rc3+3b92155, BuildDate: 2025-02-24T00:39:14Z, GitCommit: 3b921554682af1b663f451aefe8ceb106bffebc8, GitTag: , GitTreeState: clean, GoVersion: go1.23.4, Compiler: gc, Platform: linux/amd64"}
Error: failed to get consumer info, nats: consumer not found
Usage:
numaflow processor [flags]
Flags:
-h, --help help for processor
--isbsvc-type string ISB Service type, e.g. jetstream
--type string Processor type, 'source', 'sink' or 'udf'
panic: failed to get consumer info, nats: consumer not found
goroutine 1 [running]:
github.com/numaproj/numaflow/cmd/commands.Execute(...)
/Users/jwang21/workspace/numaproj/numaflow/cmd/commands/root.go:33
main.main()
/Users/jwang21/workspace/numaproj/numaflow/cmd/main.go:24 +0x3c
Pod restarted 6 times and failed every time.
Prior to this, the Job Pod failed and then succeeded. Unfortunately, we don't have the log from the successful run, only the failed one:
{"level":"info","ts":"2025-03-04T15:34:32.556800234Z","logger":"numaflow.isbsvc-create","caller":"isbsvc/jetstream_service.go:89","msg":"Succeeded to create a side inputs KV","pipeline":"test-pipeline-rollout-2","kvName":"numaplane-system-test-pipeline-rollout-2_SIDE_INPUTS"}
{"level":"info","ts":"2025-03-04T15:34:33.061041705Z","logger":"numaflow.isbsvc-create","caller":"isbsvc/jetstream_service.go:161","msg":"Succeeded to create a stream","pipeline":"test-pipeline-rollout-2","stream":"numaplane-system-test-pipeline-rollout-2-cat-0"}
{"level":"info","ts":"2025-03-04T15:34:33.516849285Z","logger":"numaflow.isbsvc-create","caller":"isbsvc/jetstream_service.go:172","msg":"Succeeded to create a consumer for a stream","pipeline":"test-pipeline-rollout-2","stream":"numaplane-system-test-pipeline-rollout-2-cat-0","consumer":"numaplane-system-test-pipeline-rollout-2-cat-0"}
{"level":"error","ts":"2025-03-04T15:34:38.526356783Z","logger":"numaflow.isbsvc-create","caller":"commands/isbsvc_create.go:93","msg":"Failed to create buffers, buckets and side inputs store.","pipeline":"test-pipeline-rollout-2","error":"failed to create stream \"numaplane-system-test-pipeline-rollout-2-out-0\" and buffers, context deadline exceeded","stacktrace":"github.com/numaproj/numaflow/cmd/commands.NewISBSvcCreateCommand.func1\n\t/Users/jwang21/workspace/numaproj/numaflow/cmd/commands/isbsvc_create.go:93\ngithub.ghproxy.top/spf13/cobra.(*Command).execute\n\t/Users/jwang21/go/pkg/mod/github.com/spf13/[email protected]/command.go:985\ngithub.ghproxy.top/spf13/cobra.(*Command).ExecuteC\n\t/Users/jwang21/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117\ngithub.ghproxy.top/spf13/cobra.(*Command).Execute\n\t/Users/jwang21/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041\ngithub.ghproxy.top/numaproj/numaflow/cmd/commands.Execute\n\t/Users/jwang21/workspace/numaproj/numaflow/cmd/commands/root.go:32\nmain.main\n\t/Users/jwang21/workspace/numaproj/numaflow/cmd/main.go:24\nruntime.main\n\t/usr/local/Cellar/go/1.23.4/libexec/src/runtime/proc.go:272"}
{"level":"error","ts":"2025-03-04T15:34:38.638514817Z","logger":"numaflow.isbsvc-create","caller":"nats/nats_client.go:69","msg":"Nats default: disconnected","pipeline":"test-pipeline-rollout-2","stacktrace":"github.com/numaproj/numaflow/pkg/shared/clients/nats.NewNATSClient.func3\n\t/Users/jwang21/workspace/numaproj/numaflow/pkg/shared/clients/nats/nats_client.go:69\ngithub.ghproxy.top/nats-io/nats%2ego.(*Conn).close.func1\n\t/Users/jwang21/go/pkg/mod/github.com/nats-io/[email protected]/nats.go:5332\ngithub.ghproxy.top/nats-io/nats%2ego.(*asyncCallbacksHandler).asyncCBDispatcher\n\t/Users/jwang21/go/pkg/mod/github.com/nats-io/[email protected]/nats.go:3011"}
{"level":"info","ts":"2025-03-04T15:34:38.656961721Z","logger":"numaflow.isbsvc-create","caller":"nats/nats_client.go:63","msg":"Nats default: connection closed","pipeline":"test-pipeline-rollout-2"}
Error: failed to create stream "numaplane-system-test-pipeline-rollout-2-out-0" and buffers, context deadline exceeded
Usage:
numaflow isbsvc-create [flags]
Flags:
--buckets strings Buckets to create
--buffers strings Buffers to create
-h, --help help for isbsvc-create
--isbsvc-type string ISB Service type, e.g. jetstream
--serving-source-streams strings Serving source streams to create
--side-inputs-store string Name of the side inputs store
panic: failed to create stream "numaplane-system-test-pipeline-rollout-2-out-0" and buffers, context deadline exceeded
goroutine 1 [running]:
github.com/numaproj/numaflow/cmd/commands.Execute(...)
/Users/jwang21/workspace/numaproj/numaflow/cmd/commands/root.go:33
main.main()
/Users/jwang21/workspace/numaproj/numaflow/cmd/main.go:24 +0x3c
I am attaching all artifacts and logs that we have:
I know this has only happened once and may be hard to reproduce. I'm okay if we don't look into it yet, but I wanted to create a record for it if it happens again.
Describe the bug
This is after looking through all of the artifacts during this CI failure.
Scenario: new
isbsvc test-isbservice-rollout-1
created with newpipeline test-pipeline-rollout-2
.The sink vertex pod named "out" continually restarts. We have the log from the first failure and not the others unfortunately, but presumably they are the same?:
Pod restarted 6 times and failed every time.
Prior to this, the Job Pod failed and then succeeded. Unfortunately, we don't have the log from the successful run, only the failed one:
I am attaching all artifacts and logs that we have:
pod-logs-progressive-functional (7).zip
resource-changes-progressive-functional (5).zip
The timeline is this:
2025-03-04T15:33:43.781272599Z pipeline created
2025-03-04T15:34:29.771129989Z Create Job starts running
2025-03-04T15:34:49: Create Job Pod restarts after failure and succeeds
2025-03-04T15:35:05.204127832Z test-pipeline-rollout-2 out-0 runs
2025-03-04T15:35:08 test-pipeline-rollout-2 out-0 panics
2025-03-04T15:38:04Z test-pipeline-rollout-2 out-0 has now restarted 5 times
To Reproduce
Steps to reproduce the behavior:
This may not be easily reproducible. This CI test usually passes.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered: