Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ai-quota 插件加载失败 #1830

Open
1 task done
Colstuwjx opened this issue Feb 27, 2025 · 27 comments
Open
1 task done

ai-quota 插件加载失败 #1830

Colstuwjx opened this issue Feb 27, 2025 · 27 comments
Assignees

Comments

@Colstuwjx
Copy link
Contributor

If you are reporting any crash or any potential security issue, do not
open an issue in this repo. Please report the issue via ASRC(Alibaba Security Response Center) where the issue will be triaged appropriately.

  • I have searched the issues of this repository and believe that this is not a duplicate.

Ⅰ. Issue Description

本地 kind 安装后希望能使用 ai-quota 来给特定 consumer 配置限额,但是发现不生效,gateway 日志里发现 ai-quota 插件可能加载失败:

2025-02-27T04:56:15.538253Z	warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:128	gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter higress-system.ai-quota-1.0.0	thread=31
2025-02-27T04:56:15.538254Z	warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:128	gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter higress-system.ai-quota-1.0.0	thread=31
2025-02-27T04:56:15.538256Z	warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:128	gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter higress-system.ai-quota-1.0.0	thread=31
2025-02-27T04:56:15.538257Z	warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:128	gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter higress-system.ai-quota-1.0.0	thread=31
2025-02-27T04:56:15.751393Z	info	Readiness succeeded in 3.823923794s
2025-02-27T04:56:15.751614Z	info	Envoy proxy is ready

Ⅱ. Describe what happened

在 console 页面上配置了路由需要认证请求,并且开启了 ai-quota 插件:

Image Image

但是请求返回 404:

➜  ~ curl http://gateway.local/gpt-4o/v1/chat/completions/quota\?consumer\=test-consumer-01 -H 'Authorization: Bearer d906bd60-424f-471e-9adb-0e835ef967ff'
{"error":{"code":"404","message": "Resource not found"}}%

除了上面提到的 ai-quota 插件未加载的报错以外,这次 quota 查询请求也看到实际走到了 ai-proxy 插件这里:

2025-02-27T04:57:14.155290Z	warning	envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1398	wasm log higress-system.ai-proxy.internal: [ai-proxy] [33c351e5-8f29-461f-9829-bb24abfd754a] [onHttpRequestHeader] unsupported path: /gpt-4o/v1/chat/completions/quota, will not process http path and body	thread=46

那报错我理解就是 ai-quota 没加载成功导致的?

Ⅲ. Describe what you expected to happen

希望能够看到 quota 并进行后续的 quota 验证

Ⅳ. How to reproduce it (as minimally and precisely as possible)

  1. kind 本地安装 higress 集群
  2. 本地配置和安装 redis,确保设置了 redis.dns
  3. 界面上配置一个路由,开启请求认证,并去到策略页面启用限额插件配置
  4. 尝试查询 quota

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Higress version: 2.0.7
  • OS : macOS, kind
  • Others:
@johnlanni
Copy link
Collaborator

插件没加载成功是会有error日志的,能否看下 curl localhost:15000/config_dump 里面搜一下 ai-quota 有没有相关配置

@Colstuwjx
Copy link
Contributor Author

看起来是有的,但是为啥查询会返回 404 呢?

curl localhost:15000/config_dump |grep -C 30 -i quota
...
             {
              "name": "higress-system.ai-quota-1.0.0",
              "config_discovery": {
               "config_source": {
                "ads": {},
                "initial_fetch_timeout": "0s",
                "resource_api_version": "V3"
               },
               "default_config": {
                "@type": "type.googleapis.com/envoy.extensions.filters.http.composite.v3.Composite"
               },
               "type_urls": [
                "type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm",
                "type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBAC",
                "type.googleapis.com/envoy.extensions.filters.http.composite.v3.Composite"
               ]
              }
             },
...

@johnlanni
Copy link
Collaborator

好的 我本地复现下

@johnlanni
Copy link
Collaborator

@Colstuwjx
Copy link
Contributor Author

这里的?应该不用转义?

我本地还是 404,我理解主要问题在于这个请求没到 ai-quota 插件而是直接到了 ai-proxy, 所以 100% 会 404 :

2025-02-27T05:52:38.100900Z	warning	envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1398	wasm log higress-system.ai-proxy.internal: [ai-proxy] [d7cedf82-8af4-4f41-8bec-79204d205cb2] [onHttpRequestHeader] unsupported path: /gpt-4o/v1/chat/completions/quota, will not process http path and body	thread=45

@johnlanni
Copy link
Collaborator

cc @2456868764 一起帮忙看看

@2456868764
Copy link
Collaborator

这个可能性很大是 redis 没有连上,初始化就失败

@2456868764
Copy link
Collaborator

要看 redis.dns cluster 下发到 Envoy?

@Colstuwjx
Copy link
Contributor Author

这个可能性很大是 redis 没有连上,初始化就失败

明白,我是参考这里去建了一套 redis 集群然后配置了 mcp redis.dns ,我的问题是,如果 redis 有问题的话,quota 插件应该会报一个 redis 连接错误之类的报错?但是目前没看到啥日志。

要看 redis.dns cluster 下发到 Envoy?

这块不太熟悉,请教下应该如何排查呢

@2456868764
Copy link
Collaborator

2456868764 commented Feb 27, 2025

这个可能性很大是 redis 没有连上,初始化就失败

明白,我是参考这里去建了一套 redis 集群然后配置了 mcp redis.dns ,我的问题是,如果 redis 有问题的话,quota 插件应该会报一个 redis 连接错误之类的报错?但是目前没看到啥日志。

要看 redis.dns cluster 下发到 Envoy?

这块不太熟悉,请教下应该如何排查呢

redis 连接不上,插件初始化就失败,会报错 “Unable to create Wasm HTTP filter”

@Colstuwjx
Copy link
Contributor Author

是有看到这个报错:Unable to create Wasm HTTP filter higress-system.ai-quota-1.0.0 thread=31,这个 issue 里有贴我如何设置的本地 redis ,请教下如何进一步排查 redis.dns 和插件之间的连通性问题呢

@2456868764
Copy link
Collaborator

是有看到这个报错:Unable to create Wasm HTTP filter higress-system.ai-quota-1.0.0 thread=31,这个 issue 里有贴我如何设置的本地 redis ,请教下如何进一步排查 redis.dns 和插件之间的连通性问题呢

gateway 容器里,你看一下 curl http://localhost:15000/clusters 看看有没有名字里带 redis.dns cluster。

@Colstuwjx
Copy link
Contributor Author

你看一下 curl http://localhost:15000/clusters 看看有没有名字里带 redis.dns cluster

是有的:
Image

@2456868764
Copy link
Collaborator

你看一下 curl http://localhost:15000/clusters 看看有没有名字里带 redis.dns cluster

是有的: Image

有 redis.default.svc.cluster.local cluster 下发吗?按照 #1826 (comment) 这个配置 有 redis.default.svc.cluster.local 下发?,插件 service_name 配置成 “redis.default.svc.cluster.local“ 测试一下

@2456868764 2456868764 self-assigned this Feb 27, 2025
@2456868764
Copy link
Collaborator

2456868764 commented Feb 27, 2025

@johnlanni @CH3CHO 看一下 redis.dns 指向 redis.default.svc.cluster.local 直接用dns 连接好像是有问题的,稍后我本地复现一下

@johnlanni
Copy link
Collaborator

@2456868764 应该不是redis的问题,如果是redis的问题,会有error日志的

@Colstuwjx
Copy link
Contributor Author

Colstuwjx commented Feb 27, 2025

我试了下重启 gateway 组件,发现 model mapper 加载也失败?

2025-02-27T08:58:55.222950Z	warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:128	gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter higress-system.model-mapper.internal	thread=34
2025-02-27T08:58:55.222951Z	warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:128	gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter higress-system.model-mapper.internal	thread=34
[05:59:18:081 - FFFF9CE1A040]: WASM module load failed: failed to get llvm target from triple Unable to find target for this triple (no targets are registered

你们本地能复现这个问题吗

@johnlanni
Copy link
Collaborator

是arm架构吗

@Colstuwjx
Copy link
Contributor Author

我理解还是 redis 问题?但是我换到一台 EC2 linux 机器,然后连一个 elasticache 的 redis 也是报错的:

2025-02-28T02:24:04.425034Z     critical        envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1404 wasm log: failed to init redis: error status returned by host: bad argument     thread=47
2025-02-28T02:24:04.425039Z     warning envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1398 wasm log: [ai-quota] parse rule config failed: error status returned by host: bad argument      thread=47

我看能不能本地再调试下这个 plugin

@johnlanni
Copy link
Collaborator

嗯 这个是redis服务发现不了,可以 curl localhost:15000/clusters |grep redis 看下吗

@Colstuwjx
Copy link
Contributor Author

结果:

istio-proxy@higress-gateway-fb6fc585b-r29bp:/$ curl localhost:15000/clusters |grep redis 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20554    0 20554    0     0  20.3M      0 --:--:-- --:--:-- --:--:-- 19.6M

但是我理解不需要吧,我现在都换成云上 redis 了:

admin_consumer: "gpt-4o-consumer-01"
admin_path: "/quota"
redis:
  service_name: "xxx.cache.amazonaws.com"
  service_port: 6379
  timeout: 2000
redis_key_prefix: "chat_quota:"

@2456868764
Copy link
Collaborator

2456868764 commented Feb 28, 2025

结果:

istio-proxy@higress-gateway-fb6fc585b-r29bp:/$ curl localhost:15000/clusters |grep redis 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20554    0 20554    0     0  20.3M      0 --:--:-- --:--:-- --:--:-- 19.6M

但是我理解不需要吧,我现在都换成云上 redis 了:

admin_consumer: "gpt-4o-consumer-01"
admin_path: "/quota"
redis:
  service_name: "xxx.cache.amazonaws.com"
  service_port: 6379
  timeout: 2000
redis_key_prefix: "chat_quota:"

要用 mcp bridge 配置一下,

apiVersion: networking.higress.io/v1
kind: McpBridge
metadata:
  name: default
  namespace: higress-system
spec:
  registries:
  - domain: xxx.cache.amazonaws.com
    name: redis
    type: dns
    port: 6379
admin_consumer: "gpt-4o-consumer-01"
admin_path: "/quota"
redis:
  service_name: redis.dns
  service_port: 6379
  username:  xxx
  password:  xxx
  timeout: 2000
redis_key_prefix: "chat_quota:"

另外这里redis 访问有用户名和密码?如何有,要配置username和 password

@Colstuwjx
Copy link
Contributor Author

使用云上 redis 加上配置 mcp 以后可以了... 有两个疑问,方便的话可以帮忙解答下哈:

1、为什么需要走一层 mcp bridge + ingress ,而不能直接配置 redis 域名,这个背景知识如果运维不知道的话很容易就忘记了;
2、为什么 kind 本地拉起一个 default ns 下的 redis 然后配置 mcp + ingress 是不行的呢

@2456868764
Copy link
Collaborator

2456868764 commented Feb 28, 2025

使用云上 redis 加上配置 mcp 以后可以了... 有两个疑问,方便的话可以帮忙解答下哈:

1、为什么需要走一层 mcp bridge + ingress ,而不能直接配置 redis 域名,这个背景知识如果运维不知道的话很容易就忘记了; 2、为什么 kind 本地拉起一个 default ns 下的 redis 然后配置 mcp + ingress 是不行的呢

  1. envoy 插件不允许直接访问外部服务,需要通过 Envoy Cluster 对外访问,higress 通过 mcp bridge 把外部服务包括 dns, 静态 IP, nacos 等服务发现服务转成 envoy cluster

  2. 如果是 k8s servcie

    • 默认情况下 为了减轻数据面的压力,Higress 的 global.onlyPushRouteCluster 配置参数被设置为 true,意味着不会自动发现 Kubernetes Service, 如果需要使用 Kubernetes Service 作为服务发现,可以将 global.onlyPushRouteCluster 参数设置为 false,
    • 或者通过 ingress 路由 绑定这个 k8s 服务,就可以下发这个k8s 服务
  3. 是否是 Redis 账号和密码问题? Redis 本地测试可以联通?

@Colstuwjx
Copy link
Contributor Author

是否是 Redis 账号和密码问题

本地 redis 和云上 redis 都是没有账号密码的

Redis 本地测试可以联通?

本地参考文档配置了下 redis 不生效,这是我比较困惑的点

另外,目前 ai quota 插件我已经通过云上 redis + mcpbridge 跑通逻辑了,但是 ai-token-limit 试了下还是不起作用

@johnlanni
Copy link
Collaborator

线下可能是mac的问题?刚刚查出arm下插件启动不了的问题了,目前已经修复,重启就会拉到修复后的插件了

@Colstuwjx
Copy link
Contributor Author

刚刚查出arm下插件启动不了的问题了

这个有具体 debug 的过程吗,对于投产这块,我们比较担心的就是插件功能出问题然后不好排查定位。目前我只能看到比如 rejected: Unable to create Wasm HTTP filter 这样的日志,还是不清楚具体什么原因导致的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants