Skip to content

Instantly share code, notes, and snippets.

@yankay
Last active March 19, 2025 11:12
Show Gist options
  • Save yankay/a589c8576431c7dd82b8b30b2879706b to your computer and use it in GitHub Desktop.
Save yankay/a589c8576431c7dd82b8b30b2879706b to your computer and use it in GitHub Desktop.
快速安装 lws

参考: https://github.com/kubernetes-sigs/lws/blob/main/charts/lws/README.md

git clone https://github.com/kubernetes-sigs/lws.git
cd lws/charts
helm upgrade --install lws lws --create-namespace --namespace lws-system \
  --set image.manager.repository=m.daocloud.io/gcr.io/k8s-staging-lws/lws

卸载:

helm uninstall lws  --namespace lws-system

OR 重建 kind

@yankay
Copy link
Author

yankay commented Mar 19, 2025

Sample

apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: vllm
spec:
  replicas: 1
  leaderWorkerTemplate:
    size: 2
    restartPolicy: RecreateGroupOnPodRestart
    leaderTemplate:
      metadata:
        labels:
          role: leader
      spec:
        containers:
          - name: vllm-leader
            image: docker.io/vllm/vllm-openai:latest
            env:
              - name: VLLM_USE_MODELSCOPE
                value: "true"
            command:
              - sh
              - -c
              - "chmod +x /vllm-workspace/examples/online_serving/multi-node-serving.sh;
                 /vllm-workspace/examples/online_serving/multi-node-serving.sh leader --ray_cluster_size=$(LWS_GROUP_SIZE); 
                 python3 -m vllm.entrypoints.openai.api_server --port 8080 --model Qwen/Qwen2.5-0.5B-Instruct --pipeline_parallel_size 2 --gpu-memory-utilization=0.4"
            resources:
              limits:
                nvidia.com/gpu: "1"
            ports:
              - containerPort: 8080
            readinessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 15
              periodSeconds: 10
            volumeMounts:
              - mountPath: /dev/shm
                name: dshm
              - mountPath: /root
                name: root-hostpath
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 15Gi
        - name: root-hostpath
          hostPath:
            path: /root
            type: Directory
    workerTemplate:
      spec:
        containers:
          - name: vllm-worker
            image: docker.io/vllm/vllm-openai:latest
            command:
              - sh
              - -c
              - "chmod +x /vllm-workspace/examples/online_serving/multi-node-serving.sh;
                 /vllm-workspace/examples/online_serving/multi-node-serving.sh worker --ray_address=$(LWS_LEADER_ADDRESS)"
            resources:
              limits:
                nvidia.com/gpu: "1"
            env:
              - name: VLLM_USE_MODELSCOPE
                value: "true"
            volumeMounts:
              - mountPath: /dev/shm
                name: dshm
              - mountPath: /root
                name: root-hostpath
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 15Gi
        - name: root-hostpath
          hostPath:
            path: /root
            type: Directory
---
apiVersion: v1
kind: Service
metadata:
  name: vllm-leader
spec:
  ports:
    - name: http
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector:
    leaderworkerset.sigs.k8s.io/name: vllm
    role: leader
  type: ClusterIP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment