Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ollama embedding to ai-cache #1794

Merged
merged 3 commits into from
Feb 21, 2025
Merged

feat: add ollama embedding to ai-cache #1794

merged 3 commits into from
Feb 21, 2025

Conversation

Beatrueman
Copy link
Contributor

Ⅰ. Describe what this PR did

AI 缓存插件对接 Ollama

Ⅱ. Does this pull request fix one issue?

Fix #1445

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

1.环境介绍

ollama本地部署llama3.2,并且使用ngrok内网穿透,直接调用接口/api/embed可以正常使用

image-20250221120413852

2.编译插件

ai-proxy

cd ./higress/plugins/wasm-go/extensions/ai-proxy

tinygo build -o ai.wasm -scheduler=none -target=wasi -gc=custom -tags="custommalloc nottinygc_finalizer proxy_wasm_version_0_2_100" ./

ai-cache

cd ./higress/plugins/wasm-go/extensions/ai-cache

tinygo build -o main.wasm -scheduler=none -target=wasi -gc=custom -tags="custommalloc nottinygc_finalizer proxy_wasm_version_0_2_100" ./

3.Docker部署higress

docker-compose.yaml

services:
  envoy:
    image: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/gateway:v2.0.2
    entrypoint: /usr/local/bin/envoy
    command: -c /etc/envoy/envoy.yaml --component-log-level wasm:debug
    networks:
      - wasmtest
    ports:
      - "10002:10002"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
      - ./main.wasm:/etc/envoy/main.wasm
      - ./ai.wasm:/etc/envoy/ai.wasm

networks:
  wasmtest: {}

envoy.yaml

admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 10002
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                scheme_header_transformation:
                  scheme_to_overwrite: https
                stat_prefix: ingress_http
                # Output envoy logs to stdout
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                # Modify as required
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: [ "*" ]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: ollama
                            timeout: 300s
                http_filters:
                  - name: wasmtest
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: wasmtest
                          vm_config:
                            runtime: envoy.wasm.runtime.v8
                            code:
                              local:
                                filename: /etc/envoy/ai.wasm
                          configuration:
                            "@type": "type.googleapis.com/google.protobuf.StringValue"
                            value: |
                              {
                                "provider": {
                                  "type": "ollama",
                                  "apiTokens": [
                                    "sk-"
                                  ],
                                  "ollamaServerHost": "a2b4-112-46-215-209.ngrok-free.app",
                                  "ollamaServerPort": 443,
                                  "ollamaModel": ""
                                }
                              }

                  - name: cache
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: cache
                          vm_config:
                            runtime: envoy.wasm.runtime.v8
                            code:
                              local:
                                filename: /etc/envoy/main.wasm
                          configuration:
                            "@type": "type.googleapis.com/google.protobuf.StringValue"
                            value: |
                              {
                                "embedding": {
                                  "type": "ollama",
                                  "serviceName": "ollama.dns",
                                  "servicePort": 443,
                                  "serviceHost": "a2b4-112-46-215-209.ngrok-free.app"
                                },
                                "vector": {
                                  "type": "dashvector",
                                  "serviceName": "dashvector.dns",
                                  "collectionID": "test1",
                                  "serviceHost": "your host",
                                  "apiKey": "your key",
                                  "threshold": 0.4
                                },
                                "cache": {
                                  "serviceName": "",
                                  "type": ""
                                }
                              }
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
    - name: ollama
      connect_timeout: 30s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: ollama
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: a2b4-112-46-215-209.ngrok-free.app
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          "sni": "a2b4-112-46-215-209.ngrok-free.app"

    - name: outbound|443||ollama.dns
      connect_timeout: 30s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: outbound|443||ollama.dns
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: a2b4-112-46-215-209.ngrok-free.app
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          "sni": "a2b4-112-46-215-209.ngrok-free.app"

    - name: outbound|443||dashvector.dns
      connect_timeout: 30s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: outbound|443||dashvector.dns
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: vrs-*aliyuncs.com
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          "sni": "vrs-*aliyuncs.com"

4. 测试

curl "http://localhost:10002/ai/v1/chat/completions"  -H "Content-Type: application/json" -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "你好"
    }
  ]
}'

回显

image-20250221120952337

容器日志

img_v3_02jn_6196f9fd-47d3-4c04-b265-e3817767e6bg

Ⅳ. Describe how to verify it

感谢老师指导

@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.41%. Comparing base (ef31e09) to head (1a7ba56).
Report is 301 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1794      +/-   ##
==========================================
+ Coverage   35.91%   43.41%   +7.50%     
==========================================
  Files          69       76       +7     
  Lines       11576    12278     +702     
==========================================
+ Hits         4157     5331    +1174     
+ Misses       7104     6617     -487     
- Partials      315      330      +15     

see 71 files with indirect coverage changes

Copy link
Collaborator

@CH3CHO CH3CHO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体上没什么问题,微调一下就可以了


//var ollamaConfig OllamaProviderConfig

//type ollamaProviderConfig struct {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这俩注释还有必要吗?

本来是1.20,但是我这里commit上去成1.21了。现在修正回来
Copy link
Collaborator

@CH3CHO CH3CHO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@CH3CHO CH3CHO merged commit 2986e19 into alibaba:main Feb 21, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants