Skip to content

Instantly share code, notes, and snippets.

@ckandoth
Created October 20, 2025 07:42
Show Gist options
  • Save ckandoth/a6eb5e8ce5a97e5e41dffc4b562dc551 to your computer and use it in GitHub Desktop.
Save ckandoth/a6eb5e8ce5a97e5e41dffc4b562dc551 to your computer and use it in GitHub Desktop.
Prototype Illumina's Nirvana on Dragen Pay-As-You-Go (PAYG) VMs on Azure

Purpose

Prototype Illumina's Nirvana on Dragen Pay-As-You-Go (PAYG) VMs on Azure.

Prerequisites

  1. Sign up for an Azure subscription at this link if you don't already have one.
  2. Follow these instructions to register resource providers Microsoft.Network, Microsoft.Storage, and Microsoft.Compute. Strictly speaking, you don't need Microsoft.Storage for the steps in this guide. But in a production environment, it is recommended to use blob storage for inputs/outputs. You should also upload Nirvana annotation source files into your own blob storage account so that you can quickly deploy them into multiple ephemeral VMs that run Nirvana in parallel on each sample.
  3. Visit this page, login if needed, and increase your Quota for Total Regional vCPUs and for Standard EBSv5 Family vCPUs to 16 vCPUs each, in your preferred region. This allows you to create a single Standard_E16bs_v5 VM which has enough RAM to store our annotation source files (under /dev/shm) and for runtime use by Nirvana. Based on demand for these VM SKUs in your preferred region, you may also need to submit a service request and justify your use-case to a person before that quota gets approved.
  4. Visit this page, login if needed, and ensure that Status is set to Enable for the Azure subscription you intend to use. This allows programmatic deployment of the VMs that we will use later.
  5. Follow these instructions to generate an SSH key using the Ed25519 algorithm and store it as ~/.ssh/id_ed25519. We'll use this to SSH into VMs.
  6. Follow these instructions to install Azure CLI, and then run az login --use-device-code and follow the instructions to link your subscription.
  7. Accept the Terms of Use for the Dragen PAYG image we will use.
    az vm image terms accept --urn illuminainc1586452220102:dragen-vm-payg:dragen-4-4-4-payg:latest
  8. Follow these instructions to set up an Illumina API key and save it somewhere safe. We'll need it to create a credentials.json file later in this guide.
  9. All the commands in this repo were tested on Ubuntu 24.04 in WSL2 with these dotfiles, but you should be fine with Bash in any Linux environment or Zsh on macOS.

Build

Make separate resource groups for networking and VMs.

az group create --name dgn-net-rg --location eastus2
az group create --name dgn-vms-rg --location eastus2

Create a VNet with a subnet that permits SSH connections, but only from our current IP address.

az network nsg create --resource-group dgn-net-rg --name dgn-nsg
az network vnet create --resource-group dgn-net-rg --network-security-group dgn-nsg --name dgn-vnet --address-prefixes "10.10.10.0/24"
az network vnet subnet create --resource-group dgn-net-rg --network-security-group dgn-nsg --vnet-name dgn-vnet --name dgn-sub1 --address-prefixes "10.10.10.0/25" --service-endpoints Microsoft.Storage
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowSSHInBound --priority 200 --protocol TCP --access Allow --direction Inbound --source-address-prefixes $(curl -s https://icanhazip.com) --source-port-ranges "*" --destination-address-prefixes "*" --destination-port-ranges 22

Set up an SSH configuration that we have determined (with trial and error) will reliably get us into Azure VMs to run long-running scripts.

SSH_USERNAME="azureuser"
SSH_AUTH_KEY="~/.ssh/id_ed25519"
SSH_OPTIONS="-q -i ${SSH_AUTH_KEY} -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=QUIET -o ServerAliveInterval=120 -o ServerAliveCountMax=30"

Start an Azure VM running the Dragen PAYG image on the subnet and NSG created earlier. We'll choose a VM size with plenty of RAM in which we can store annotation source files for low latency high bandwidth access. Keep in mind that as soon as this VM starts, you'll start incurring the hourly fee for the Dragen software license.

SUBNET=$(az network vnet subnet show --resource-group dgn-net-rg --vnet-name dgn-vnet --name dgn-sub1 --query id --output tsv)
NSG=$(az network nsg show --resource-group dgn-net-rg --name dgn-nsg --query id --output tsv)
VMIP=$(az vm create --resource-group dgn-vms-rg --subnet $SUBNET --nsg $NSG --public-ip-address dgn1-pip --name dgn1 --size Standard_E16bs_v5 --os-disk-name dgn1-os-disk --os-disk-size-gb 128 --os-disk-delete-option delete --nic-delete-option delete --image illuminainc1586452220102:dragen-vm-payg:dragen-4-4-4-payg:latest --admin-username ${SSH_USERNAME} --ssh-key-values ${SSH_AUTH_KEY}.pub --query publicIpAddress --output tsv)

SSH into the VM and create a credential file as required by the Nirvana DataManager. Replace KEY in the command below with the API key you created earlier at this link.

ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VMIP}
mkdir ~/.ilmnAnnotations
echo -e '{\n  "MyIlluminaApiKey": "KEY"\n}' > ~/.ilmnAnnotations/credentials.json

Resize /dev/shm and start downloading annotation sources for GRCh38 into it.

sudo mount -o remount,size=64G /dev/shm
/opt/edico/share/nirvana/DataManager make-config --ref GRCh38
/opt/edico/share/nirvana/DataManager download --ref GRCh38 --thread 8 --dir /dev/shm

Install Illumina's BaseSpace CLI to help download a demo VCF from Illumina. In production, we'll use VCFs generated from our own FASTQs.

curl --create-dirs --output ~/.local/bin/bs  https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs
chmod +x ~/.local/bin/bs

Visit basespace.illumina.com and login. Create a free account if needed. Then visit basespace.illumina.com/s/htWXpgEKrRu6 and click "Accept" to add the demo WGS data to your BaseSpace account. Use command "bs auth" to login with the same account before using command below.

Download a small-variant VCF generated by Dragen 4.3.6 on WGS from the NIST GIAB sample HG002.

bs list dataset --terse --project-name "TruSeq-PCRfree-HG001-HG007-10B-2-v13" --is-type "illumina.dragen.complete.v0.4.3" --filter-term "TSPF-HG002-10B-2-v13-Rep1" |\
    xargs -L1 bs contents dataset --terse --extension hard-filtered.vcf.gz,hard-filtered.vcf.gz.tbi --id |\
    xargs -L1 bs download file --no-metadata -o data --id

Run Nirvana on the WGS VCF and time how long it takes.

time /opt/edico/share/nirvana/Nirvana --cache /dev/shm/Cache --sd /dev/shm/SupplementaryAnnotation --ref /dev/shm/References/Homo_sapiens.GRCh38.Nirvana.dat --in data/TSPF-HG002-10B-2-v13-Rep1.hard-filtered.vcf.gz --out data/TSPF-HG002-10B-2-v13-Rep1.hard-filtered

Delete the VM and its public IP to save money.

az vm delete --yes --resource-group dgn-vms-rg --name dgn1
az network public-ip delete --resource-group dgn-vms-rg --name dgn1-pip

Now we are ready to orchestrate the creation of VMs and have them analyze multiple samples in parallel and/or in series.

Now that we have CRAM files and gVCFs representing the raw FASTQs, we can either delete the FASTQs or set their blobs to Cold tier if we can be certain not to unarchive them for at least 1 year. Here is how we can move a set of FASTQs into Cold tier after alignment.

az storage blob list --container-name fqs --prefix ajtrio/mom --query [].name --output tsv | grep fastq.gz$ | xargs -L1 az storage blob set-tier --container-name fqs --tier cold --name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment