Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] new configuration ops failed #8998

Open
loomts opened this issue Mar 4, 2025 · 2 comments
Open

[BUG] new configuration ops failed #8998

loomts opened this issue Mar 4, 2025 · 2 comments
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@loomts
Copy link
Contributor

loomts commented Mar 4, 2025

Describe the bug

  1. use main branch test and install crds
  2. install clickhouse and clickhouse-cluster using new configuration

See error

{"level":"info","ts":"2025-03-04T11:53:50+08:00","logger":"ComponentParameterReconciler","msg":"failed to run configuration reconcile task.","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-ch-keeper","namespace":"default"},"namespace":"default","name":"ch-cluster-ch-keeper","reconcileID":"f8c05ff1-896a-4fc2-b3a5-1ae091c766b5","Namespace":"default","ComponentParameter":"ch-cluster-ch-keeper"}
{"level":"error","ts":"2025-03-04T11:53:50+08:00","msg":"Reconciler error","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-ch-keeper","namespace":"default"},"namespace":"default","name":"ch-cluster-ch-keeper","reconcileID":"f8c05ff1-896a-4fc2-b3a5-1ae091c766b5","error":"Object default/ch-cluster-ch-keeper-clickhouse-keeper-tpl is already owned by another Configuration controller ch-cluster-ch-keeper","errorCauses":[{"error":"Object default/ch-cluster-ch-keeper-clickhouse-keeper-tpl is already owned by another Configuration controller ch-cluster-ch-keeper"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2025-03-04T11:53:54+08:00","logger":"ComponentParameterReconciler","msg":"failed to run configuration reconcile task.","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-clickhouse","namespace":"default"},"namespace":"default","name":"ch-cluster-clickhouse","reconcileID":"b23185fe-c018-450b-b689-93a2b86e771d","Namespace":"default","ComponentParameter":"ch-cluster-clickhouse"}
{"level":"error","ts":"2025-03-04T11:53:54+08:00","msg":"Reconciler error","controller":"componentparameter","controllerGroup":"parameters.kubeblocks.io","controllerKind":"ComponentParameter","ComponentParameter":{"name":"ch-cluster-clickhouse","namespace":"default"},"namespace":"default","name":"ch-cluster-clickhouse","reconcileID":"b23185fe-c018-450b-b689-93a2b86e771d","error":"Object default/ch-cluster-clickhouse-clickhouse-tpl is already owned by another Configuration controller ch-cluster-clickhouse","errorCauses":[{"error":"Object default/ch-cluster-clickhouse-clickhouse-tpl is already owned by another Configuration controller ch-cluster-clickhouse"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/Users/loomt/gopath/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

under this situation, reconfiguration ops also get stuck

@loomts loomts added the kind/bug Something isn't working label Mar 4, 2025
@shanshanying shanshanying added this to the Release 1.0.0 milestone Mar 4, 2025
@sophon-zt
Copy link
Contributor

I tried to reproduce the bug, but it it failed. The steps are as follows:

step1: create ch cluster

helm upgrade --install ch2 addons-cluster/clickhouse -n test

step2: prepare ops cr

$ cat chops.yaml 
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: ch-reconfiguring
  namespace: test
spec:
  # Specifies the name of the Cluster resource that this operation is targeting.
  clusterName: ch2
  # Instructs the system to bypass pre-checks (including cluster state checks and customized pre-conditions hooks) and immediately execute the opsRequest, except for the opsRequest of 'Start' type, which will still undergo pre-checks even if `force` is true.  Note: Once set, the `force` field is immutable and cannot be updated.
  force: false
  # Specifies a component and its configuration updates. This field is deprecated and replaced by `reconfigures`.
  reconfigures:
    # Specifies the name of the Component.
  - componentName: clickhouse
   # Contains a list of ConfigurationItem objects, specifying the Component's configuration template name, upgrade policy, and parameter key-value pairs to be updated.
    parameters:
      # Represents the name of the parameter that is to be updated.
    - key: clickhouse.profiles.web.max_partition_size_to_drop
      # Represents the parameter values that are to be updated.
      # If set to nil, the parameter defined by the Key field will be removed from the configuration file.
      value: '0'
  # Specifies the name of the configuration template.
  # Specifies the maximum number of seconds the OpsRequest will wait for its start conditions to be met before aborting. If set to 0 (default), the start conditions must be met immediately for the OpsRequest to proceed.
  preConditionDeadlineSeconds: 0
  type: Reconfiguring

step3: check

$ k get ops -n test |grep ch2
ch-reconfiguring          Reconfiguring   ch2       Succeed   -/-        22m

# zhangtao @ 192 in ~ [11:15:54] 
$ k get parameters -n test|grep ch2
ch-reconfiguring               ch2       Finished      22m

# zhangtao @ 192 in ~ [11:16:03] 
$ k get componentparameters -n test|grep ch2 
ch2-ch-keeper            ch2         ch-keeper          Finished   24m
ch2-clickhouse           ch2         clickhouse         Finished   24m

$ k get ops -n test ch-reconfiguring -o jsonpath='{.status}' |python3 -m json.tool 
{
    "clusterGeneration": 2,
    "completionTimestamp": "2025-03-05T02:53:54Z",
    "conditions": [
        {
            "lastTransitionTime": "2025-03-05T02:53:51Z",
            "message": "wait for the controller to process the OpsRequest: ch-reconfiguring in Cluster: ch2",
            "reason": "WaitForProgressing",
            "status": "True",
            "type": "WaitForProgressing"
        },
        {
            "lastTransitionTime": "2025-03-05T02:53:52Z",
            "message": "OpsRequest: ch-reconfiguring is validated",
            "reason": "ValidateOpsRequestPassed",
            "status": "True",
            "type": "Validated"
        },
        {
            "lastTransitionTime": "2025-03-05T02:53:52Z",
            "message": "Start to reconfigure in Cluster: ch2, Component: clickhouse",
            "reason": "ReconfigureStarted",
            "status": "True",
            "type": "Reconfigure"
        },
        {
            "lastTransitionTime": "2025-03-05T02:53:54Z",
            "message": "Successfully processed the OpsRequest: ch-reconfiguring in Cluster: ch2",
            "reason": "OpsRequestProcessedSuccessfully",
            "status": "True",
            "type": "Succeed"
        }
    ],
    "phase": "Succeed",
    "progress": "-/-",
    "startTimestamp": "2025-03-05T02:53:52Z"
}


$ k get parameters -n test ch-reconfiguring -o jsonpath='{.status}' |python3 -m json.tool
{
    "componentReconfiguringStatus": [
        {
            "componentName": "clickhouse",
            "parameterStatus": [
                {
                    "lastDoneRevision": "2",
                    "name": "clickhouse-user-tpl",
                    "phase": "Finished",
                    "reconcileDetail": {
                        "currentRevision": "3",
                        "execResult": "None",
                        "expectedCount": 2,
                        "policy": "restart",
                        "succeedCount": 2
                    },
                    "updateRevision": "2",
                    "updatedParameters": {
                        "user.xml": {
                            "parameters": {
                                "clickhouse.profiles.web.max_partition_size_to_drop": "0"
                            }
                        }
                    }
                }
            ],
            "phase": "Finished"
        }
    ],
    "observedGeneration": 1,
    "phase": "Finished"
}

@sophon-zt
Copy link
Contributor

sophon-zt commented Mar 5, 2025

The test found two problems:

  1. The error log of the controller was not printed, so the cause of this error is unknown.
  2. The status of ops/parameters occasionally is inconsistent with the status of componentparameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants