Skip to content

Instantly share code, notes, and snippets.

@arakashic
Last active August 29, 2025 14:15
Show Gist options
  • Save arakashic/e06b84c2806adb5bd4c7638189cd4c4c to your computer and use it in GitHub Desktop.
Save arakashic/e06b84c2806adb5bd4c7638189cd4c4c to your computer and use it in GitHub Desktop.
On Setting Intel QuickAssist Accelerator for ZFS

I built a TrueNAS storage server with Intel QuickAssist accelerator for ZFS back in 2022. It worked well for me. Since I am upgrading it to TrueNAS 25.04, I had to redo a lot of the past work. So I decided to write down everything. This article is not meant to be a comprehensive guide.

Also, the QAT support in ZFS is mostly a research product. It does not get much maintenance as you can see in git history.

Introduction

The plan was building a TrueNAS storage server, but also using it host various containers and VMs for my services. Essentially, a all-in-one server (or all-in-BOOM if the server fails). Partly because my last NAS had hardware RAID card, and with limited hardware resource in the new server, I was looking at way to offload some ZFS work from the CPU. I did not find any ZFS specific accelerator that I can get cheaply on eBay utill I came across the QZFS paper in USENIX ATC'2019 (https://www.usenix.org/conference/atc19/presentation/hu-xiaokang). It demonstated a way to use the Intel QuickAssist (QAT) accelerator to handle the checksum and compression. And, the QAT cards are relatively cheap on eBay.

ServeTheHome.com has a very good write up about the different generations of QAT card/accelerators (https://www.servethehome.com/intel-quickassist-parts-and-cards-by-qat-generation/). Basically, for ZFS purpose, there are two types that can be used:

  1. Gen 1 cards: Intel 8920/8950. These are 20 Gbps/50 Gbps cards. About $50 back in 2022 when I first attempted this.
  2. Gen 2 cards: Intel 8960/8970 (C620). These are 50 Gbps/100 Gbps cards. They are essentially a Intel C620 Chipset in it, so I guess you can use the C620 on the motherboard if it has the QAT enabled. Those were expensive back then, but a few around $50 on eBay now.

From Gen 3, the QAT is built in the CPU (Ice Lake, etc.). The driver for Gen 3+ hardware does not have kernel API support, so they do not work with QAT in ZFS. Software side, Intel has two QAT drivers, one for HW2.0 and one for HW1.x. Gen 1/2 cards needs the HW1.x driver. The latest is at https://www.intel.com/content/www/us/en/download/19734/intel-quickassist-technology-driver-for-linux-hw-version-1-x.html.

The QZFS is part of the mainline OpenZFS code. You need to rebuild the ZFS modules to use it. Here are steps:

  1. Build and install the Intel QAT driver with --enable-kapi. This builds the QAT driver with the QAT kernel API which QZFS uses. The driver will install out-of-tree kernel modules and firmware for the QAT cards.
  2. Build/Rebuild the ZFS with environment variable ICP_ROOT=<path to QAT driver driver directory>. This enables the QAT support in ZFS. It also makes the zfs module depends on the intel_qat module.

In fact, you can follow this issue to see how it is done on Debian 11 (openzfs/zfs#12723).

Problem on TrueNAS SCALE

Because TrueNAS SCALE is an appliance and is not designed to be tinkered with like a regular Linux distro, I cannot just built the QAT driver and ZFS and replace it on a live system (I did not know about the developer mode back then). Also, it does not have DKMS, so I would need to do a live swap of the ZFS kernel modules. I suspect that it would not work because the boot drive is also ZFS. Anything can happen is I swap the ZFS module live, I guess.

So I took the route rebuilding the TrueNAS image with QAT support enabled. There was a suggestion of adding QAT support in TrueNAS officially (https://ixsystems.atlassian.net/browse/NAS-107334), but it was not accepted. The JIRA issue has an example of how to do this (truenas/scale-build#41). It creates a intel-qat debian package and add it to the TrueNAS build. It also changes the configuration of ZFS so that it build with the intel-qat.

TrueNAS Image with QAT Support in ZFS

Long story short, I just created my own verion of the intel-qat package and TrueNAS build script. The current one is based off the latest Intel QAT driver for HW1.x and TrueNAS 25.04.

Just build and install, you will have a TrueNAS with QAT support in ZFS.

  1. The intel-qat driver directory is at /opt/intel-qat. It contains the source code, sample configuration, and build directory.
  2. The sample configuration files are also available at /usr/share/qat/conf.

Setting Up Intel QAT Devices

The setup for Intel QAT Device is quite messy and with a few surprises. I had to go through a bunch of Intel documents to figure out how to create the device configurations. Intel QAT is probably the worst device I have used. Here are the documents if anyone want to read further (all from https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/resources.html).

Issue 1: SR-IOV

The first issue come up is SR-IOV. Apparently, the QAT device is designed to work on the host machine with SR-IOV disabled, or on guest VMs with Virtual Functions (VFs) passed through. But, it turns out the VFs can be used on the host if they are not passed to any VM. I don't remember where exactly I read about this (probably from VT or some where on the Internet), but this turned out to be working. To do this, follow steps are needed.

  1. Build QAT driver with --enable-icp-sriov=host.
  2. Make sure SRIOV_ENABLE=1 is in /etc/default/qat.
  3. Remove the kernel module for VF of your device from /etc/modprobe.d/blacklist-qat-vfs.conf. For Intel 8920/8950, remove qat_dh895xccvf. For Intel 8960/8970, remove qat_c62xvf. The VF drivers are not loaded by default, making the VF unusable to the host.
  4. Create configuration files for the QAT Physical Function (PF). Depending on your device, you will need to create /etc/dh895xcc_dev0.conf or /etc/c6xx_dev[0-2].conf (C620 needs three files). Just copy the content from the sample configuration file dh895xccpf_dev0.conf or c6xxpf_dev0.conf for now.
  5. Create configuration files for the QAT Virtual Function (VF). Depending on your device, you will need to create /etc/dh895xccvf_dev0.conf or /etc/c6xxvf_dev0.conf (C620 needs three files). Just copy the content from the sample configuration file dh895xccvf_dev0.conf.vm or c6xxvf_dev0.conf.vm for now.
  6. Start/restart the QAT service to bring up the device. systemctl restart qat.service.

Once these step is done, run adf_ctl status to list all QAT devices. you should see some device are up. In C620's case, there will be three c6xx device up (PFs) and one c6xxvf device up (VF). Here is what I see on my machine.

Checking status of all devices.
There is 51 QAT acceleration device(s) in the system:
 qat_dev0 - type: c6xx,  inst_id: 0,  node_id: 0,  bsf: 0000:04:00.0,  #accel: 5 #engines: 10 state: up
 qat_dev1 - type: c6xx,  inst_id: 1,  node_id: 0,  bsf: 0000:06:00.0,  #accel: 5 #engines: 10 state: up
 qat_dev2 - type: c6xx,  inst_id: 2,  node_id: 0,  bsf: 0000:08:00.0,  #accel: 5 #engines: 10 state: up
 qat_dev3 - type: c6xxvf,  inst_id: 0,  node_id: 0,  bsf: 0000:04:01.0,  #accel: 1 #engines: 1 state: up
 qat_dev4 - type: c6xxvf,  inst_id: 1,  node_id: 0,  bsf: 0000:04:01.5,  #accel: 1 #engines: 1 state: down
 qat_dev5 - type: c6xxvf,  inst_id: 2,  node_id: 0,  bsf: 0000:04:01.6,  #accel: 1 #engines: 1 state: down
 qat_dev6 - type: c6xxvf,  inst_id: 3,  node_id: 0,  bsf: 0000:04:01.1,  #accel: 1 #engines: 1 state: down
 qat_dev7 - type: c6xxvf,  inst_id: 4,  node_id: 0,  bsf: 0000:04:01.7,  #accel: 1 #engines: 1 state: down
 qat_dev8 - type: c6xxvf,  inst_id: 5,  node_id: 0,  bsf: 0000:04:01.2,  #accel: 1 #engines: 1 state: down
 qat_dev9 - type: c6xxvf,  inst_id: 6,  node_id: 0,  bsf: 0000:04:02.0,  #accel: 1 #engines: 1 state: down
 qat_dev10 - type: c6xxvf,  inst_id: 7,  node_id: 0,  bsf: 0000:04:01.3,  #accel: 1 #engines: 1 state: down
 qat_dev11 - type: c6xxvf,  inst_id: 8,  node_id: 0,  bsf: 0000:04:02.1,  #accel: 1 #engines: 1 state: down
 qat_dev12 - type: c6xxvf,  inst_id: 9,  node_id: 0,  bsf: 0000:04:01.4,  #accel: 1 #engines: 1 state: down
 qat_dev13 - type: c6xxvf,  inst_id: 10,  node_id: 0,  bsf: 0000:04:02.2,  #accel: 1 #engines: 1 state: down
 qat_dev14 - type: c6xxvf,  inst_id: 11,  node_id: 0,  bsf: 0000:04:02.3,  #accel: 1 #engines: 1 state: down
 qat_dev15 - type: c6xxvf,  inst_id: 12,  node_id: 0,  bsf: 0000:04:02.4,  #accel: 1 #engines: 1 state: down
 qat_dev16 - type: c6xxvf,  inst_id: 13,  node_id: 0,  bsf: 0000:04:02.5,  #accel: 1 #engines: 1 state: down
 qat_dev17 - type: c6xxvf,  inst_id: 14,  node_id: 0,  bsf: 0000:04:02.6,  #accel: 1 #engines: 1 state: down
 qat_dev18 - type: c6xxvf,  inst_id: 15,  node_id: 0,  bsf: 0000:04:02.7,  #accel: 1 #engines: 1 state: down
 qat_dev19 - type: c6xxvf,  inst_id: 16,  node_id: 0,  bsf: 0000:06:01.0,  #accel: 1 #engines: 1 state: down
 qat_dev20 - type: c6xxvf,  inst_id: 17,  node_id: 0,  bsf: 0000:06:01.1,  #accel: 1 #engines: 1 state: down
 qat_dev21 - type: c6xxvf,  inst_id: 18,  node_id: 0,  bsf: 0000:06:01.2,  #accel: 1 #engines: 1 state: down
 qat_dev22 - type: c6xxvf,  inst_id: 19,  node_id: 0,  bsf: 0000:06:01.3,  #accel: 1 #engines: 1 state: down
 qat_dev23 - type: c6xxvf,  inst_id: 20,  node_id: 0,  bsf: 0000:06:01.4,  #accel: 1 #engines: 1 state: down
 qat_dev24 - type: c6xxvf,  inst_id: 21,  node_id: 0,  bsf: 0000:06:01.5,  #accel: 1 #engines: 1 state: down
 qat_dev25 - type: c6xxvf,  inst_id: 22,  node_id: 0,  bsf: 0000:06:01.6,  #accel: 1 #engines: 1 state: down
 qat_dev26 - type: c6xxvf,  inst_id: 23,  node_id: 0,  bsf: 0000:06:01.7,  #accel: 1 #engines: 1 state: down
 qat_dev27 - type: c6xxvf,  inst_id: 24,  node_id: 0,  bsf: 0000:06:02.0,  #accel: 1 #engines: 1 state: down
 qat_dev28 - type: c6xxvf,  inst_id: 25,  node_id: 0,  bsf: 0000:06:02.1,  #accel: 1 #engines: 1 state: down
 qat_dev29 - type: c6xxvf,  inst_id: 26,  node_id: 0,  bsf: 0000:06:02.2,  #accel: 1 #engines: 1 state: down
 qat_dev30 - type: c6xxvf,  inst_id: 27,  node_id: 0,  bsf: 0000:06:02.3,  #accel: 1 #engines: 1 state: down
 qat_dev31 - type: c6xxvf,  inst_id: 28,  node_id: 0,  bsf: 0000:06:02.4,  #accel: 1 #engines: 1 state: down
 qat_dev32 - type: c6xxvf,  inst_id: 29,  node_id: 0,  bsf: 0000:06:02.5,  #accel: 1 #engines: 1 state: down
 qat_dev33 - type: c6xxvf,  inst_id: 30,  node_id: 0,  bsf: 0000:06:02.6,  #accel: 1 #engines: 1 state: down
 qat_dev34 - type: c6xxvf,  inst_id: 31,  node_id: 0,  bsf: 0000:06:02.7,  #accel: 1 #engines: 1 state: down
 qat_dev35 - type: c6xxvf,  inst_id: 32,  node_id: 0,  bsf: 0000:08:01.0,  #accel: 1 #engines: 1 state: down
 qat_dev36 - type: c6xxvf,  inst_id: 33,  node_id: 0,  bsf: 0000:08:01.1,  #accel: 1 #engines: 1 state: down
 qat_dev37 - type: c6xxvf,  inst_id: 34,  node_id: 0,  bsf: 0000:08:01.2,  #accel: 1 #engines: 1 state: down
 qat_dev38 - type: c6xxvf,  inst_id: 35,  node_id: 0,  bsf: 0000:08:01.3,  #accel: 1 #engines: 1 state: down
 qat_dev39 - type: c6xxvf,  inst_id: 36,  node_id: 0,  bsf: 0000:08:01.4,  #accel: 1 #engines: 1 state: down
 qat_dev40 - type: c6xxvf,  inst_id: 37,  node_id: 0,  bsf: 0000:08:01.5,  #accel: 1 #engines: 1 state: down
 qat_dev41 - type: c6xxvf,  inst_id: 38,  node_id: 0,  bsf: 0000:08:01.6,  #accel: 1 #engines: 1 state: down
 qat_dev42 - type: c6xxvf,  inst_id: 39,  node_id: 0,  bsf: 0000:08:01.7,  #accel: 1 #engines: 1 state: down
 qat_dev43 - type: c6xxvf,  inst_id: 40,  node_id: 0,  bsf: 0000:08:02.0,  #accel: 1 #engines: 1 state: down
 qat_dev44 - type: c6xxvf,  inst_id: 41,  node_id: 0,  bsf: 0000:08:02.1,  #accel: 1 #engines: 1 state: down
 qat_dev45 - type: c6xxvf,  inst_id: 42,  node_id: 0,  bsf: 0000:08:02.2,  #accel: 1 #engines: 1 state: down
 qat_dev46 - type: c6xxvf,  inst_id: 43,  node_id: 0,  bsf: 0000:08:02.3,  #accel: 1 #engines: 1 state: down
 qat_dev47 - type: c6xxvf,  inst_id: 44,  node_id: 0,  bsf: 0000:08:02.4,  #accel: 1 #engines: 1 state: down
 qat_dev48 - type: c6xxvf,  inst_id: 45,  node_id: 0,  bsf: 0000:08:02.5,  #accel: 1 #engines: 1 state: down
 qat_dev49 - type: c6xxvf,  inst_id: 46,  node_id: 0,  bsf: 0000:08:02.6,  #accel: 1 #engines: 1 state: down
 qat_dev50 - type: c6xxvf,  inst_id: 47,  node_id: 0,  bsf: 0000:08:02.7,  #accel: 1 #engines: 1 state: down

Now, running the example from QAT driver should work on the system.

Issue 2: Configuration Files

The QAT support in ZFS uses the QAT kernel API. It is enabled in the driver, but the configuration file also need some work.

  1. The ENABLE_KAPI=1 need to be set in /etc/default/qat. If the driver is built correctly, this should already been done.
  2. The VF's configuration file need to have the [KERNEL_QAT] section and only this section should have non-zero NumberCyInstances and NumberDcInstances. The sample configuration dh895xccvf_dev0.conf.vm.km and c6xxvf_dev0.conf.vm.km are good reference.
  3. Because the KERNEL_QAT instances bind interrupt handler to individual cores (set by the CoreAffinity parameter in the file), it is probably a good idea to have multiple VFs which bind to different cores. This can be done by copying the configuration files and update the core affinity accordingly. You can have as many VFs as adf_ctl listed. Each VF needs its own configuration file.

Normally, it would end here. But I decided to customize the configuration a bit further, which led me to a few quirks in the QAT configuration.

According to the documentation, the maximum number of instance each VF can have depends on which services are enabled in the ServiceEnabled parameter. By default, it is set to cy;dc which means all crypto and compression service are enabled. In this case, each VF can only have 1 Cy instance and 1 Dc instance. This is because the symmetric and asymmetric crypto service need separate resource (Sec 4.3.3.2 of PG). Since I only use the QAT for ZFS, the asymmetric crypto service is useless to me. If I only enable QAT with sym and dc service, I should be able to get 2 Cy and 2 Dc instances per VF. I need to do this because I want to have at least 1 Cy and 1 Dc instances for each of the 56 cores. But the number of VFs is only 48. So I 2 Cy and 2 Dc per VF would be perfect.

For C620 to have 2 Cy and 2 Dc instances per VF, the VF configuration need to be like this.

[GENERAL]
ServicesEnabled = dc;sym

ConfigVersion = 2

# Default values for number of concurrent requests
CyNumConcurrentSymRequests = 512
CyNumConcurrentAsymRequests = 64

# Statistics, valid values: 1,0
statsGeneral = 1
statsDh = 0
statsDrbg = 0
statsDsa = 0
statsEcc = 0
statsKeyGen = 0
statsDc = 1
statsLn = 0
statsPrime = 0
statsRsa = 0
statsSym = 1

##############################################
# Kernel Instances Section
##############################################
[KERNEL]
NumberCyInstances = 0
NumberDcInstances = 0

#############################################
# Kernel Instances Section for QAT API
#############################################
[KERNEL_QAT]
NumberCyInstances = 2
NumberDcInstances = 2

# Crypto - Kernel instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 0

# Data Compression - Kernel instance #0
Dc0Name = "Dc0"
Dc0IsPolled = 1
# List of core affinities
Dc0CoreAffinity = 0

# Crypto - Kernel instance #1
Cy1Name = "SSL1"
Cy1IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 1

# Data Compression - Kernel instance #1
Dc1Name = "Dc1"
Dc1IsPolled = 1
# List of core affinities
Dc0CoreAffinity = 1

##############################################
# User Process Instance Section
##############################################
[SSL]
NumberCyInstances = 0
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0

And the PF configuration like this.

[GENERAL]
ServicesEnabled = dc;sym

ServicesProfile = DEFAULT

ConfigVersion = 2

#Default values for number of concurrent requests*/
CyNumConcurrentSymRequests = 512
CyNumConcurrentAsymRequests = 64

#Statistics, valid values: 1,0
statsGeneral = 1
statsDh = 1
statsDrbg = 1
statsDsa = 1
statsEcc = 1
statsKeyGen = 1
statsDc = 1
statsLn = 1
statsPrime = 1
statsRsa = 1
statsSym = 1


# Specify size of intermediate buffers for which to
# allocate on-chip buffers. Legal values are 32 and
# 64 (default is 64). Specify 32 to optimize for
# compressing buffers <=32KB in size.
DcIntermediateBufferSizeInKB = 64

##############################################
# Kernel Instances Section
##############################################
[KERNEL]
NumberCyInstances = 0
NumberDcInstances = 0

##############################################
# User Process Instance Section
##############################################
[SSL]
NumberCyInstances = 0
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0

Issue 3: Polling Mode vs Interrupt Mode

This one is fairly simple. The QAT Kernel API instance can work in either polling mode or interrupt mode. This is controlled by the IsPolled parameter. ZFS need the interrupt mode, so IsPolled=0 is what the configuration needs to have. But this breaks the 2 Cy + 2 Dc per VF configuration. According to PG, the increased instance number only works with instances in polling mode.

Enabling QAT in ZFS

QAT is by default disabled in ZFS. Update the option in /proc.

# cat /sys/module/zfs/parameters/zfs_qat_checksum_disable 
0
# cat /sys/module/zfs/parameters/zfs_qat_compress_disable 
0
# cat /sys/module/zfs/parameters/zfs_qat_encrypt_disable 
0

These values are 1 by default. Then need to be 0 for ZFS to use QAT. You can set one or more of them depending on what you need. I use init/shutdown scripts in TrueNAS to set/unset them. So I don't have to worry about them being set before qat.service is started.

Then, the following stat should show usage of QAT.

# cat /proc/spl/kstat/zfs/qat

A Few More Notes

So I don't need to look at documents every time.

  1. Using VFs for kernel API and userspace applications. You can do this, just not with the same VF. A VF can either be used for kernel API or userspace. So you need to balance how many of them are used for each purpose and configure them accordingly. I have no comment on how to set it, but it should be simple following the documentations (I hope, but Intel did a pretty poor job here).
  2. Available resource. I will just summarize it here. A dh895xcc device (8920/8950) has one PF. Each PF has 32 VFs. A c620 device (8960/8970) has three PFs. Each PF has 16 VFs. So total of 48 VFs.
  3. The VFs are just queues in the hardware. All VFs of a dh895xcc device share the same underlying accelerator HW. The c620 is three accelerators HW connected with an internal PCIe switch, so VFs from the same PF will share the HW, but VFs from different PFs are actually different devices.
  4. CoreAffinity parameter is only effective when IsPolled=1. The interrupt handler of one instance can only be bind to one core.
  5. /sys/kernel/debug/qat_* has a bunch of useful stat and info.
  6. The kernel log shows a bunch of "There are 1 requests pending" message and QAT device cannot be stopped. This is due to a bug in the QAT code in ZFS. Basically, it looks like when a interrupt callback is triggered, the request may still be in-flight in the session. In this case, trying to remove the session will results in the aforementioned message and a "retry" return value. But the QAT code in ZFS never check the return value and retry, therefore leaving some session in device and never remove them. The Intel guide suggested to check if there is a in-flight request before removing the session. I have a patch here https://gist.github.com/arakashic/3de54ecf8f355f315e45632ea964fef0 . Since this issue has never been fixed in the upstream ZFS code, I suspect that I might be the only user that uses QAT with ZFS regularly.
@arakashic
Copy link
Author

My vf configuration is as follows:

# cat /etc/c6xxvf_dev0.conf 
[GENERAL]
ServicesEnabled = dc;sym

ConfigVersion = 2

# Default values for number of concurrent requests
CyNumConcurrentSymRequests = 512
CyNumConcurrentAsymRequests = 64

# Statistics, valid values: 1,0
statsGeneral = 1
statsDh = 0
statsDrbg = 0
statsDsa = 0
statsEcc = 0
statsKeyGen = 0
statsDc = 1
statsLn = 0
statsPrime = 0
statsRsa = 0
statsSym = 1

##############################################
# Kernel Instances Section
##############################################
[KERNEL]
NumberCyInstances = 0
NumberDcInstances = 0

#############################################
# Kernel Instances Section for QAT API
#############################################
[KERNEL_QAT]
NumberCyInstances = 2
NumberDcInstances = 2

# Crypto - Kernel instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 0

# Data Compression - Kernel instance #0
Dc0Name = "Dc0"
Dc0IsPolled = 1
# List of core affinities
Dc0CoreAffinity = 0

# Crypto - Kernel instance #1
Cy1Name = "SSL1"
Cy1IsPolled = 1
# List of core affinities
Cy1CoreAffinity = 1

# Data Compression - Kernel instance #1
Dc1Name = "Dc1"
Dc1IsPolled = 1
# List of core affinities
Dc1CoreAffinity = 1

##############################################
# User Process Instance Section
##############################################
[SSL]
NumberCyInstances = 0
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0

The configuration of VF needs some changes. The KERNAL_QAT need to look like this

[KERNEL_QAT]
NumberCyInstances = 1
NumberDcInstances = 1

# Crypto - Kernel instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 0
# List of core affinities
Cy0CoreAffinity = 0

# Data Compression - Kernel instance #0
Dc0Name = "Dc0"
Dc0IsPolled = 0
# List of core affinities
Dc0CoreAffinity = 0

ZFS need the IsPolled set to 0 to use the interrupt mode. This also means you can only have 1 Cy and 1 Dc per VF.

The following should be set to 0.

/sys/module/zfs/parameters/zfs_qat_checksum_disable 
/sys/module/zfs/parameters/zfs_qat_compress_disable 
/sys/module/zfs/parameters/zfs_qat_encrypt_disable

@crp2103
Copy link

crp2103 commented May 4, 2025

First, thanks a bunch for your writeup. It was very useful for me as I tried to bring up QAT on my own system.

I wanted to comment on this:

I need to do this because I want to have at least 1 Cy and 1 Dc instances for each of the 56 cores. But the number of VFs is only 48. So I 2 Cy and 2 Dc per VF would be perfect.

Your goal of 1 Dc instance (i.e. 1 "ring pair") per core won't quite have the benefits you think. Each actual QAT endpoint (i.e. each QAT PF) has a fixed number of underlying HW engines, which are what actually does the QAT acceleration offload. You'll see these listed as "#engines" here - i.e. your system has 30 total QAT engines:

qat_dev0 - type: c6xx, inst_id: 0, node_id: 0, bsf: 0000:04:00.0, #accel: 5 #engines: 10 state: up
qat_dev1 - type: c6xx, inst_id: 1, node_id: 0, bsf: 0000:06:00.0, #accel: 5 #engines: 10 state: up
qat_dev2 - type: c6xx, inst_id: 2, node_id: 0, bsf: 0000:08:00.0, #accel: 5 #engines: 10 state: up

The endpoint FW takes care of scheduling incoming requests from the ring pairs onto the underlying HW engines - i.e. the engines are context switching between the actual rings. As such, you can't have more than 30 ring pairs/instances actively being processed at any one time. So, if all 56 of your cores simultaneously wanted to do QAT activity, 26 of them will be stuck waiting. Nothing you do about allocating more ring pairs can change this.

I think it's really just a matter of whether you want your extra cores to be stuck waiting on the same ring pair or on different ring pairs, but I suspect the performance differences between that will be trivially small. Further, I suspect this unlikely to matter, as it seems quite unlikely that all 56 cores will be trying to do compression at the same time.

@arakashic
Copy link
Author

First, thanks a bunch for your writeup. It was very useful for me as I tried to bring up QAT on my own system.

I wanted to comment on this:

I need to do this because I want to have at least 1 Cy and 1 Dc instances for each of the 56 cores. But the number of VFs is only 48. So I 2 Cy and 2 Dc per VF would be perfect.

Your goal of 1 Dc instance (i.e. 1 "ring pair") per core won't quite have the benefits you think. Each actual QAT endpoint (i.e. each QAT PF) has a fixed number of underlying HW engines, which are what actually does the QAT acceleration offload. You'll see these listed as "#engines" here - i.e. your system has 30 total QAT engines:

qat_dev0 - type: c6xx, inst_id: 0, node_id: 0, bsf: 0000:04:00.0, #accel: 5 #engines: 10 state: up
qat_dev1 - type: c6xx, inst_id: 1, node_id: 0, bsf: 0000:06:00.0, #accel: 5 #engines: 10 state: up
qat_dev2 - type: c6xx, inst_id: 2, node_id: 0, bsf: 0000:08:00.0, #accel: 5 #engines: 10 state: up

The endpoint FW takes care of scheduling incoming requests from the ring pairs onto the underlying HW engines - i.e. the engines are context switching between the actual rings. As such, you can't have more than 30 ring pairs/instances actively being processed at any one time. So, if all 56 of your cores simultaneously wanted to do QAT activity, 26 of them will be stuck waiting. Nothing you do about allocating more ring pairs can change this.

I think it's really just a matter of whether you want your extra cores to be stuck waiting on the same ring pair or on different ring pairs, but I suspect the performance differences between that will be trivially small. Further, I suspect this unlikely to matter, as it seems quite unlikely that all 56 cores will be trying to do compression at the same time.

That is correct, the instance are just queues that get mapped to the underlying accelerator hardware.

For my use case with ZFS, this would not ended up in some core blocking since it runs in the interrupt mode. Having more instances allows one interrupt handler to be bind to each core. Then ZFS can issue requests in a locality-aware fashion, especially for the case that inter-NUMA cases. However, QAT code in ZFS does not do this yet. It only does round-robin when using the instance.

The irony with QAT device is that you cannot have more than one Cy and one Dc per VF in interrupt mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment