Forked from mhamilton723/Tensorflow Serving with a GPU Kubernetes cluster on Azure.md
Created
February 9, 2018 21:57
Revisions
-
mhamilton723 revised this gist
Dec 23, 2017 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -77,7 +77,7 @@ This will deploy a website to localhost:8001. You can configure the port with th ## Tensorflow Serving Tensorflow serving is a library for deploying tensorflow models efficiently. They supply a GPU enabled dockerfile that as of 12/22/2017 does not compile. We have built and published an [earlier version of this docker image](https://hub.docker.com/r/mhamilton723/tensorflow-serving-devel-gpu/) so you can jump straight to the deployment: Here is a yaml file for a simple tf-serving deployment. The to deploy your own model, zip up an exported tensorflow model and host it online. We use azure blob storage, and generate SAS urls. Paste whatever URL you use in the `<YOUR_MODEL_URL>` section of the yaml. -
mhamilton723 revised this gist
Dec 23, 2017 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -77,7 +77,7 @@ This will deploy a website to localhost:8001. You can configure the port with th ## Tensorflow Serving Tensorflow serving is a library for deploying tensorflow models efficiently. They supply a GPU enabled dockerfile that as of 12/22/2017 does not compile. We have built and published an (earlier version of this docker image)[https://hub.docker.com/r/mhamilton723/tensorflow-serving-devel-gpu/] so you can jump straight to the deployment: Here is a yaml file for a simple tf-serving deployment. The to deploy your own model, zip up an exported tensorflow model and host it online. We use azure blob storage, and generate SAS urls. Paste whatever URL you use in the `<YOUR_MODEL_URL>` section of the yaml. -
mhamilton723 created this gist
Dec 23, 2017 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,185 @@ # Tensorflow Serving with a GPU Kubernetes cluster on Azure ## Prerequisites * `az` command line version >= `2.0.23` ## Making the Kubernetes cluster First make a resource group to house your deployment. Note that as of 12/22/2017 this will only work in westus2 and uksouth because of Nvidia driver deployment. ```bash az group create -n gpu-cluster-rg -l westus2 ``` Spin up a cluster: ```bash az acs create \ -n gpu-cluster \ --orchestrator-type Kubernetes \ -g gpu-cluster-rg \ --agent-vm-size Standard_NC6 \ --generate-ssh-keys \ -l westus2 ``` ## Connect to your cluster with kubectl Install the kubectl cli: ```az acs kubernetes install-cli``` Get the cluster's credentials: ``` az acs kubernetes get-credentials --resource-group=gpu-cluster-rg --name=gpu-cluster ``` ### Quick sanity check: To make sure that the you are connected to your cluster and that the drivers are in working condition check the following ``` kubectl get nodes | grep agentpool ``` Should return something like: ``` k8s-agentpool0-98822346-0 Ready agent 24m v1.7.9 ``` We can now check to make sure the cluster has the proper drivers: ``` kubectl describe node k8s-agentpool0-98822346-0 | grep nvidia-gpu ``` Should return something like: ``` alpha.kubernetes.io/nvidia-gpu: 1 alpha.kubernetes.io/nvidia-gpu: 1 ``` The first line is your overall capacity, and your second line is the GPUs you have availible (this will be zero if you have GPU obs running). If your first line is ` alpha.kubernetes.io/nvidia-gpu: 0` you might not be in a region that has the latest gpu drivers, or have a recent enough az command line. ## Using kubectl's UI You can see a nice graphical manager for your kubernetes cluster using the following: ``` kubectl proxy ``` This will deploy a website to localhost:8001. You can configure the port with the `--port` flag. Now you can navegat to `localhost:8001/ui` to see the manager. ## Tensorflow Serving Tensorflow serving is a library for deploying tensorflow models efficiently. They supply a GPU enabled dockerfile that as of 12/22/2017 does not compile. We have built and published an (earlier version of this docker image)[] so you can jump straight to the deployment: Here is a yaml file for a simple tf-serving deployment. The to deploy your own model, zip up an exported tensorflow model and host it online. We use azure blob storage, and generate SAS urls. Paste whatever URL you use in the `<YOUR_MODEL_URL>` section of the yaml. ```yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: tf-deployment spec: replicas: 1 template: metadata: labels: app: tf-server spec: volumes: - name: bin hostPath: path: /usr/lib/nvidia-384/bin - name: lib hostPath: path: /usr/lib/nvidia-384 - name: libcuda hostPath: path: /usr/lib/x86_64-linux-gnu/libcuda.so.1 containers: - name: tf-container image: mhamilton723/tensorflow-serving-devel-gpu command: ["/bin/sh", "-c"] args: ["MODEL_URL=\"<YOUR_MODEL_URL>\"; MODEL_NAME=saved_model; PORT=9000; cd /serving; mkdir models; ZIP_FILE=\"model.zip\"; curl -o \"$ZIP_FILE\" \"$MODEL_URL\"; echo \"HERE\"; python -m zipfile -e \"$ZIP_FILE\" /serving/models/1; rm \"$ZIP_FILE\"; ls -l /serving/models/; tensorflow_model_server --port=\"$PORT\" --model_name=\"$MODEL_NAME\" --model_base_path=\"/serving/models/\""] ports: - containerPort: 9000 resources: limits: alpha.kubernetes.io/nvidia-gpu: 1 volumeMounts: - mountPath: /usr/local/nvidia/bin name: bin - mountPath: /usr/local/nvidia/lib64 name: lib - mountPath: /usr/lib/x86_64-linux-gnu/libcuda.so.1 name: libcuda --- apiVersion: v1 kind: Service metadata: labels: run: tf-service name: tf-service spec: ports: - port: 9000 targetPort: 9000 selector: app: tf-server type: LoadBalancer ``` Once you have modified the above yaml file to point to your model, save it to a file, such as `tf-serving.yaml`. Then you can deploy it to the cluster with ```bash kubectl create -f tf-serving.yaml ``` Note this will take about 1 minute to boot up the servers on the nodes, and will take around 5 minutes to make the service endpoint. ## Calling your API Grab the IP address of your new service using ```kubectl get services``` You should see an output like the following: ``` NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 1d tf-service LoadBalancer 10.0.166.211 <YOUR _IP> 9000:30242/TCP 7m ``` `<YOUR_IP>` is the external IP adress you will call in order to query your model. ### Authors * Mark Hamilton, marhamil@microsoft.com * Andrew Shonhoffer ### Thanks to * William Buchwalter