Remote Development on Azure ML Studio

This document contains all the information I could gather on remote development workflows on Azure ML Studio Compute Instances.

TLDR

Use Dev Containers feature of VS Code. Dev Containers allows vs code to connect to docker containers running in your docker context, install vscode server, and lets you develop inside the docker. You can set up a remote docker context via ssh. When you use the context, vs code sees the docker images running on the remote as if they're local.

Compute Instance Environment

The VM image for the compute instance is not configurable. Some release notes are available here: https://learn.microsoft.com/en-us/azure/machine-learning/azure-machine-learning-ci-image-release-notes.

But essentially, when a compute instance is created, it just starts a VM with the latest image, with no options to customize the image at all. The image comes with several tools pre-installed, and there i no good documentation on what these VMs come with, so here is what I've found to be installed on them:

Python 3.8.10
Conda 22.11.1
Docker version 20.10.22+azure
git version 2.39.1
JupyterLab Version 3.2.4
Jupyter:
- IPython : 8.7.0
- ipykernel : 6.8.0
- ipywidgets : 7.7.1
- jupyter_client : 6.1.12
- jupyter_core : 5.1.0
- jupyter_server : 2.0.1
- jupyterlab : not installed
- nbclient : 0.7.2
- nbconvert : not installed
- nbformat : 5.2.0
- notebook : 6.5.2
- qtconsole : 5.4.0
- traitlets : 5.7.1
nginx version: nginx/1.18.0 (Ubuntu)
- This is a reverse proxy which exposes the applications. (Jupyter, custom applications etc)
- See /etc/nginx/nginx.conf for settings
- See /mnt/batch/tasks/startup for logs
c3-progenitor:
- Not sure what this is but:
  - It gets run like this: bash- /docker run -a stdout -a stderr --log-driver none --rm --read-only --name=c3-progenitor -v /var/run/docker.sock:/var/run/docker.sock -v /:/host --privileged --network=host --ipc=host --pid=host localhost/c3:latest
  - Pulled from here: ghcr.io/azure/c3
  - Listens to port 8704
  - c3 page shows up when a custom application is set up incorrectly and no one listens to the target port. So perhaps it has something to do with port forwarding.
  - Best guess is that it has something to do with this Red Team network tool which is used for tunneling forward etc: https://github.com/WithSecureLabs/C3

The VM mounts a shared cloud storage under /home/azureuser/cloudfiles. This is meant to be used to store all workspace files. The VM itself does have persistence between runs, but if you delete the instance, the files on it are also lost. There is no option to save the storage.

Remote Development Options

There are a few options when it comes to remote development on Azure Instances.

Jupyter Notebooks

JupyterLab is installed on the VM. To access it, open your instance in ML Studio and click on JupyterLab under applications.

Note: It seems like auto-complete features aren't working, which makes this quite useless to use. But let's discuss custom conda environments / kernels.

Custom Kernel

Open the terminal to your compute instance, and create a new environment, and activate it, and install your dependencies e.g. :

conda create -n custom-env python=3.10
conda activate custom-env
# install pip, requirements.txt etc. Set up the environment

When the environment is ready, simply install ipykernel and start it:

conda install ipykernel
python -m ipykernel install --user --name custom-env --display-name "Python (custom-env)

The custom kernel will show up in JupyterLab

Custom Applications

It's possible to deploy a custom application on the Compute Instance. The custom application is simply and http-based application in a docker image that runs on the VM Instance. The target port is then exposed via NGINX.

There seems to be some limitation with what sort of application can be served this way. For example code-server version 4 and above fail to load correctly due to some NGINX related restriction, either to do with SSL configurations or websocket support. A Triton-Server should work, seeing as it doesn't need websockets or https.

To set up a custom application see: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-manage-compute-instance?tabs=python#setup-other-custom-applications

A requirement is that the docker image needs to be available on some container registry. We have set up a container registry on azure portal which is available at: unitysolai.azurecr.io

To push images to this registry, first build and tag the image, for example:

docker build . -t some-custom-image
docker tag some-custom-image unitysolai.azurecr.io/aidin/some-custom-image:latest

In order to push the image to the registry, you need to login. If you don't have Azure CLI installed locally, install it: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli then login:

az login
# opens browser to login to azure. when done, come back to the terminal.
az acr login -n unitysolai
# login done. Now just push the image to the registry
docker push unitysolai.azurecr.io/aidin/some-custom-image:latest

Now you can use your image "unitysolai.azurecr.io/aidin/some-custom-image:latest" when setting up the custom application.

Remote Development on the Compute Instance Via SSH Using VSCode/PyCharm

If developing on the Compute Instance, with whatever environment that it comes with, is enough for the user, it's possible to use VSCode or PyCharm to remote develop on the instance.

To make this possible, we need to enable SSH for the Compute Instance when it's being created.

Enable SSH for the Compute Instance

First go to <portal.azure.com> and search for ssh keys.

Create a new key, and download the .pem file generated.

Note: Make sure to restrict file permissions for the .pem file. On linux and mac:

chmod 400 ~/.ssh/ec2private.pem

On Windows, if you have a good terminal emulator with linux commands (e.g. cmder or perhaps bash) you can also just use chmod. Otherwise, you can use the windows gui, go to file properties, and remove USERS access. Read more about it here: https://superuser.com/questions/1296024/windows-ssh-permissions-for-private-key-are-too-open

Once the SSH key is create on Azure and the .pem file stored locally on your machine, you can create a Compute Instance, and enable SSH for it.

Enable SSH when Creating a Compute Instance

When creating a new instance, go to advanced settings, Enable SSH and set the key to the stored key.

When the instance is created you can navigate to the compute instance, and click on "Connect" under SSH access. This will provide you with the connection details.

Once you have this information, and the .pem file, you can use VSCode or PyCharm to remote develop on the Compute Instance.

PyCharm

PyCharm supports remote development (still in beta, 2022.3.2).

Open PyCharm click use the Remote Development tab to create a new connection. Simply fill in the details and start the connection. This will start PyCharm Gateway, and will trigger installation and execution of the IDE backend on the VM. Once this is complete, you can use whatever remote development features PyCharm provides. These should get better once no longer in beta.

Note that if you're using Jupyter Notebooks, you have access to all the kernels, including custom kernels you've defined on the Compute Instance.

VSCode

There are multiple ways to develop remotely on the Compute Instances.

Option 1: ML Studio UI [Easiest]

Navigate to your compute instance on the ML Studio page and under Applications, click on VS Code.

If you have VS Code installed, this will use Azure Machine Learning Extension, and use a lot of magic under the hood, to open an SSH connection to the Instance, install the code server backend, and start a remode development session.

Option 2: Azure Machine Learning Extension [Most Useful Long term]

Once installed and set up, you get access to all the ML Studio resources in VS Code, and can also connect to any of the Compute Instances.

Option 3: Remote Explorer Extension

You'll need to install the Visual Studio Code Remote Development Extension Pack. Then open the Remote Explorer tab and under SSH create a new SSH remote, enter the connection details. You can now connect to the instance remotely.

Remote Development on a Remote Container

This is probably the most advanced and customizable workflow. If working remotely on the Compute Instance VM itself is not flexible enough (e.g. the VM OS doesn't work for you, etc) and you prefer to have your custom work environment defined by a Dockerfile, then this section describes the workflow for you. The workflow is described here: https://code.visualstudio.com/docs/containers/ssh but I'll provide a summary.

For this workflow we assume you have the following done:

Enable and setup SSH for the Compute Instance, and have the .pem file available locally.
Have a Dockerfile which defines your work environment.
Docker image is built and pushed to a registry.
Have docker CLI installed on the local developer's machine
Have VS Code Remote Explorer extension installed.

To enable this workflow we will use:

Remote Explorer extension: Allows connecting to docker containers.
Docker Context: Allows making remote containers available locally

The Remote Explorer extension, is normally used to connect to the containers running on the local developer's machine. But if we use docker contexts, we can have access to containers running on a remote machine, as if they were running on the local machine. This way, the Remote Explorer is able to connect to remote containers, which is all we need for this workflow.

Step 1: Create a Docker Context

You'll need to add the ssh key (.pem file) to the ssh-agent using:

ssh-add <path to .pem file>

Once this is done, all that needs to be done is to use the Docker Context feature to get access to the docker images on the Compute Instance. Normally, when using docker, you have access to the containers on your own machine, i.e. your default context. You can get access to containers available elsewhere. For example, you can get access to a kubernetes context or containers available on a remote machine via SSH.

What we will do next, is to create a docker context with an SSH endpoint to the Compute Instance.

docker context create compute-instance-context --docker "host=ssh://[email protected]:50001"

Next, you can switch back and forth between contexts. To get a list of contexts use:

docker context list

And to use a context, e.g. "compute-instance-context" that we just created:

docker context use compute-instance-context

When you switch to a new context, all the docker images available in that context are visible to you. For example, if you run:

docker ps

It will list all the running docker containers in the context in use.

Step 2: Use Remote Explorer to Connect to Remote Containers

First ensure that the container you want to connect to is running on the remote. You can SSH into, or use the terminal feature on Azure ML Studio to run the container you want to use for development on the Compute Cluster, with the proper settings.

For example, I'll run a tritonserver container, giving it a volume for persistence, and gpu access.

docker run -d --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v/home/azureuser/cloudfiles: /home/azureuser/cloudfiles/ nvcr.io/nvidia/tritonserver:23.01-pyt-python-py3 tritonserver

If you open Remote Explorer, while a remote docker context is in use, you will see the remote containers available in the explorer, and you can simply connect to any of them.

We are connected, via VS Code, to a container running on our Compute Instance, with GPU access and full control over what the docker environment looks like.

Note: Currently it's not easily possible to connect to a remote container using PyCharm. To hear all the people who are unhappy with this see https://youtrack.jetbrains.com/issue/PY-33489/Native-support-for-running-Docker-on-the-remote-machine

aidin-foroughi/remote-azure-ml.md