Rapthor + HTCondor/GlideinWMS: Architecture, Late-Binding, and Integration Options

Technical analysis for radio astronomers and SKA/SRCNet engineers Date: May 2026

1. Background: The Two Systems

1.1 Rapthor

Rapthor is the LOFAR direction-dependent effects (DDE) calibration and imaging pipeline, developed at ASTRON and adopted as the basis of the SKAO ICAL pipeline. It automates the iterative self-calibration loop:

[Input MSes] → calibrate (DP3/DDECal) → image (WSClean+IDG) → extract sources → loop

Each pipeline operation (calibrate, image, subtract, mosaic, source-find) is implemented as a CWL (Common Workflow Language) workflow. Toil acts as the CWL execution engine and provides the pluggable batch-system layer. The supported backends are:

--batchSystem=single_machine — local development/testing
--batchSystem=slurm — HPC cluster, Toil leader submits Slurm jobs
--batchSystem=kubernetes — cloud/containerised clusters
--batchSystem=htcondor — HTCondor pool (community-supported)

Toil owns the full job DAG: it parses CWL, resolves step dependencies, marshals intermediate data, and issues individual compute tasks to the chosen batch backend. Rapthor's Python orchestration code drives the outer self-calibration loop, calling Toil repeatedly per operation.

Compute profile of Rapthor's job types:

Operation step	Primary tool	Resource profile
Direction-dependent calibration	DP3 / DDECal	CPU-heavy (many cores), high RAM (tens of GB per facet), I/O-heavy (MS read)
Wide-field imaging	WSClean + IDG	CPU or GPU (IDG-GPU mode); very high RAM (100+ GB for large fields); I/O-heavy
Beam application / subtraction	DP3	CPU-moderate, I/O-heavy
Source finding	PyBDSF / similar	CPU-light to moderate, memory-moderate
Mosaicking	custom	I/O-dominated, CPU-light

The mix is heterogeneous: some steps want GPUs (WSClean+IDG), most want many CPU cores and a lot of RAM, and several are gated by I/O bandwidth from measurement sets that can be hundreds of gigabytes per pointing.

1.2 HTCondor and GlideinWMS

HTCondor is the reference implementation of High Throughput Computing (HTC), developed at the University of Wisconsin-Madison since 1988. Unlike traditional HPC batch systems (Slurm, PBS) that assume a static cluster, HTCondor is designed for dynamic, heterogeneous resource pools where nodes come and go.

GlideinWMS (Glidein-based Workflow Management System) is a meta-scheduler built on top of HTCondor, developed by Fermilab and used by CMS, ATLAS, and the Open Science Grid (OSG). It implements the pilot job (Glidein) model for true late-binding across federated resources.

Core HTCondor/GlideinWMS components relevant here:

Component	Role
Central Manager	Matchmaker daemon; receives job and machine ClassAds, performs matchmaking
Schedd	Scheduler daemon; holds job queue, negotiates with central manager for slots
Startd	Execution daemon on worker nodes; advertises slot capabilities, runs jobs
GlideinWMS Factory	Submits pilot jobs (glideins) to remote batch systems on behalf of users
GlideinWMS Frontend	Monitors user job queues, requests glideins from factory based on demand
HTCondor-CE	Compute Entrypoint; accepts remote job submissions, translates to local batch system
ClassAds	Flexible key-value expression language for describing jobs, machines, and matchmaking

2. HTCondor Late Binding: A Deep Dive

Late binding is a defining feature of HTCondor's glidein model. Understanding it precisely is essential before mapping Rapthor onto it.

2.1 The Problem Late Binding Solves

In early binding (classical batch submission), a job is submitted to a specific queue at a specific site. If that site is congested, the job waits. If the site fails, the job is lost. The job is irrevocably coupled to a slot at submission time.

In HTCondor's late binding via GlideinWMS, the work and the resource are matched only when the resource is actually available — when a glidein starts executing and joins the HTCondor pool.

2.2 The Glidein Mechanism

The canonical GlideinWMS flow is:

1. User submits jobs to a local HTCondor Schedd (user pool)
   — jobs specify requirements via ClassAds but are NOT bound to any site

2. Frontend monitors the user pool: "There are N pending jobs with requirements X"

3. Frontend requests glideins from Factory: "Please submit pilots to sites A, B, C
   that can satisfy requirements X"

4. Factory submits glidein jobs to HTCondor-CEs at multiple sites
   — glideins are submitted to local batch systems (Slurm, PBS, K8s)
   — glideins carry NO payload; they just request a slot

5. When a glidein starts on a worker node, it:
   a. Contacts the user pool's Central Manager
   b. Advertises its capabilities as a ClassAd (CPU, RAM, disk, GPU, etc.)
   c. Requests a job that matches its capabilities

6. Central Manager performs matchmaking: job ClassAd ↔ slot ClassAd

7. Job is dispatched to the glidein; glidein executes payload

8. Glidein can request another job (slot reuse) or exit

The binding between job and compute slot happens at step 6, after the slot is already running — hence "late binding."

2.3 Why This Differs from PanDA

Both GlideinWMS and PanDA use pilot jobs, but their architectures differ:

Aspect	GlideinWMS	PanDA
Central queue	HTCondor Schedd (job ClassAds)	PanDA Server (job database)
Matchmaking	Condor Central Manager (ClassAd expressions)	JEDI brokerage (plugin architecture)
Pilot submission	Factory → HTCondor-CE → local batch	Harvester → local batch
Job pull	Startd negotiates with Central Manager	Pilot polls PanDA Server REST API
Data locality	Via ClassAd requirements/rank	Via Rucio integration in JEDI

The key difference: HTCondor's matchmaking is expression-based (ClassAds), while PanDA's is database-driven with explicit Rucio integration. HTCondor can express data locality via ClassAds, but this requires explicit infrastructure to populate machine ClassAds with data availability information.

2.4 ClassAd Matchmaking for Data-Aware Scheduling

HTCondor's ClassAd mechanism allows flexible job requirements and ranking:

Requirements (hard constraints):

requirements = (TARGET.HasGPU == True) && (TARGET.Memory >= 64000)

Rank (soft preferences):

rank = (TARGET.HasLocalData ? 100 : 0) + (TARGET.Memory / 1000)

For data-aware scheduling, one could imagine:

# Job requires data at a specific site
requirements = member("dataset_1234", TARGET.AvailableDatasets)

# Prefer sites with more data locally
rank = size(filter(lambda ds: member(ds, TARGET.AvailableDatasets), MY.RequiredDatasets))

This requires:

A service that populates worker/site ClassAds with data availability
Jobs that specify their data requirements as ClassAd attributes
Periodic refresh of data availability ClassAds

Pablo's proposal mentions a "Data Locality Awareness Metadata Hydrator" — this is exactly the missing component.

2.5 Why Late Binding Matters for SRCNet

In a federated system like SRCNet — where compute nodes are at different institutions across multiple countries with heterogeneous network links, storage backends, and queue policies — late binding provides:

Resilience: if a site becomes unavailable after glidein submission, another site's glideins can pick up the work
Opportunism: idle capacity at any site is automatically exploited
Load balancing: jobs naturally flow to sites with available slots
Heterogeneous resource routing: GPU vs. CPU decisions made at match time based on actual availability
Queue absorption: glideins wait in site queues, but jobs remain unbound until execution

3. Existing Toil + HTCondor Integration

3.1 Current State

Toil already includes an HTCondor batch system backend (--batchSystem htcondor). The HTCondorBatchSystem class:

class HTCondorBatchSystem(AbstractGridEngineBatchSystem):
    """
    Submits jobs to a local HTCondor pool via condor_submit.
    """
    
    def issueBatchJob(self, command, jobNode, job_environment=None):
        # Creates HTCondor submit description
        # Submits via Schedd API
        # Returns job ID
    
    def getRunningBatchJobIDs(self):
        # Queries condor_q for job status
    
    def killBatchJobs(self, jobIDs):
        # Calls condor_rm

This is a direct submission model: Toil leader submits jobs directly to a local HTCondor Schedd. It does NOT use GlideinWMS or HTCondor-CE — it assumes access to an existing HTCondor pool.

3.2 Limitations of Current Integration

The existing Toil HTCondor backend has several limitations:

Local pool only: Assumes direct Schedd access; no federation
No late binding: Jobs are submitted directly, not via glideins
Community-supported: Not actively maintained by Toil core team
Known issues: Environment variable handling bugs (issue #4363), Schedd busy handling
No data locality: No mechanism to express or match data requirements

For SRCNet's federated model, this is insufficient. We need either:

A GlideinWMS-aware batch system that submits to a glidein-backed pool, or
A new backend that submits via HTCondor-CE to remote sites

4. Mapping Rapthor's Job Graph onto HTCondor Concepts

4.1 Conceptual Correspondence

Rapthor / Toil concept	HTCondor / GlideinWMS concept
CWL `Workflow`	DAGMan workflow (or external orchestrator)
CWL `CommandLineTool` step	HTCondor job cluster
Individual Toil `BatchJob` (one scatter element)	HTCondor job (one proc in a cluster)
Toil batch backend	HTCondorBatchSystem (to Schedd)
Toil job store (file-based checkpoint)	No equivalent — would use shared filesystem or HTCondor file transfer
Toil leader process (DAG orchestration)	DAGMan or external (Toil stays)
Toil `--batchSystem` plugin	HTCondor submit + ClassAd configuration
CWL `ResourceRequirement` hints	ClassAd `request_cpus`, `request_memory`, `request_disk`, `request_gpus`
Toil scatter (parallel over facets)	HTCondor job cluster with N procs

4.2 The Self-Calibration Loop

Rapthor's outer loop (calibrate → image → extract → re-calibrate) is driven by Python code in Rapthor's Pipeline class. HTCondor's DAGMan supports conditional branching via SCRIPT PRE/POST and RETRY, but expressing convergence-based loops requires external logic.

The most natural mapping keeps Toil (or Rapthor's Python) as the outer orchestrator, submitting individual CWL operations to HTCondor. This is similar to Option A in the PanDA analysis.

4.3 The Scatter Pattern

Rapthor processes each facet or calibration direction independently. In CWL this is a scatter over direction items. In HTCondor, this maps to a job cluster:

# Submit 100 parallel jobs for 100 directions
queue 100

Each job in the cluster gets a unique $(Process) ID (0-99). GlideinWMS glideins at multiple sites can execute these jobs in parallel, with late binding ensuring jobs run where slots become available.

5. Integration Architectures: Three Options

Option A — Enhanced Toil-HTCondor Plugin (Toil Stays; Local HTCondor Pool)

Architecture:

Rapthor (Python orchestrator)
  └── Toil CWL runner (leader process, DAG management)
        └── HTCondorBatchSystem (existing, enhanced)
              └── Local HTCondor Schedd
                    └── GlideinWMS glideins from multiple SRC sites

How it works:

Deploy a central HTCondor pool (Schedd + Central Manager) accessible from SRCNet
Deploy GlideinWMS Factory + Frontend that provisions glideins at SRC sites via HTCondor-CE
Enhance Toil's existing HTCondorBatchSystem to:

Set data-locality ClassAd attributes on jobs
Configure resource requirements from CWL hints

Glideins from multiple sites join the central pool
Jobs are matched to glideins based on ClassAd expressions

Data staging: Toil's job store must be accessible from all SRC sites. Options:

Globally-mounted object store (Ceph/S3)
HTCondor file transfer (for small files)
External data management (Rucio or DMAPI) with ClassAd-based locality matching

Pros:

Minimal changes to Rapthor codebase
Uses existing Toil HTCondor backend (with enhancements)
Preserves Toil's CWL DAG management and checkpoint/restart
True late binding via GlideinWMS
Leverages HTCondor's mature matchmaking
Single central job queue simplifies monitoring

Cons:

Requires deploying GlideinWMS infrastructure (Factory, Frontend, per-site HTCondor-CEs)
Central HTCondor pool is a single point of failure
Data locality requires custom "hydrator" service to populate machine ClassAds
Toil leader must remain alive for pipeline duration
HTCondor backend is "community supported" with known issues

Verdict: This is the most direct path if SRCNet decides to adopt HTCondor. The infrastructure investment is significant but well-understood (OSG has run this at scale for 15+ years).

Option B — Toil + HTCondor-CE Direct Submission (Per-Site Late Binding)

Architecture:

Rapthor (Python orchestrator)
  └── Toil CWL runner
        └── [NEW] HTCondorCEBatchSystem
              └── Multiple HTCondor-CE endpoints (per SRC site)
                    └── Local batch systems (Slurm, K8s, etc.)

How it works:

Each SRC site deploys HTCondor-CE as a gateway to its local batch system
New Toil batch system plugin submits jobs directly to HTCondor-CE endpoints
Site selection logic (brokerage) lives in the plugin or a separate service
HTCondor-CE translates job to local batch system (Slurm, K8s, etc.)

This is not true GlideinWMS late binding — it's more like the current TIGER Broker/Gateway model but using HTCondor-CE as the gateway protocol.

Pros:

Simpler than full GlideinWMS deployment
HTCondor-CE is already used by 170+ sites in WLCG
SciTokens authentication supported
No central HTCondor pool required

Cons:

Not true late binding — job is bound to site at submission time
Requires implementing brokerage/site-selection logic (the same problem as TIGER)
Each site needs HTCondor-CE deployment
Less flexible than GlideinWMS for opportunistic resource use

Verdict: This provides a standardized gateway protocol but doesn't solve the late-binding problem. It's an alternative to TIGER's custom Job Gateway, not a fundamentally different scheduling model.

Option C — Replace Broker/Gateway with GlideinWMS (Full HTCondor Stack)

Architecture:

User / Client
  └── HTCondor Schedd (central job queue)
        └── Central Manager (matchmaking)
              └── GlideinWMS Factory
                    └── HTCondor-CE at each SRC site
                          └── Local batch (Slurm, K8s)
                                └── Glideins (workers)
                                      └── Execute Rapthor CWL steps

How it works:

Replace TIGER Broker entirely with HTCondor Schedd + Central Manager
Replace Job Gateways with HTCondor-CE + glidein infrastructure
Submit Rapthor CWL operations as HTCondor DAGMan workflows or via Toil
GlideinWMS provisions workers across all SRC sites
ClassAd matchmaking handles resource and data locality

This is the "full HTCondor" approach — using HTCondor as the unified job management layer.

Pros:

True slot-level late binding across federation
Battle-tested at scale (CMS processes millions of jobs/day)
Rich matchmaking via ClassAds
Large community, good documentation, annual conference
No custom brokerage code to maintain
Handles heterogeneous resources (CPU, GPU) natively

Cons:

Significant infrastructure investment (full GlideinWMS stack)
Requires HTCondor-CE at every SRC site
Data locality requires custom ClassAd hydration (no native Rucio integration)
Learning curve for operators unfamiliar with HTCondor
Doesn't leverage existing SRCNet investment in PanDA/Harvester

Data locality implementation:

Unlike PanDA (which integrates with Rucio), HTCondor has no native data management. Options:

ClassAd hydrator service: Periodically query DMAPI/Rucio, update machine ClassAds with available datasets
Job-side data staging: Jobs use DMAPI to stage data before execution (early binding of data, late binding of slot)
Storage-coupled workers: Glideins only run at sites with local data access (constrain factory submissions)

Verdict: This is a major architectural change that replaces rather than enhances the current SRCNet approach. It would work well technically but represents a strategic pivot away from PanDA.

6. Late Binding: Specific Advantages for Rapthor's Workload

6.1 Heterogeneous Resource Routing (CPU vs. GPU)

Rapthor's WSClean imaging step can run in CPU-only, idg (CPU), or idg-gpu (GPU) modes. With HTCondor:

# Job ClassAd
+WantsGPU = True
request_gpus = 1
rank = TARGET.HasGPU ? 1000 : 0

# Falls back to CPU if no GPU slots available
requirements = (TARGET.HasGPU == True) || (TARGET.Cpus >= 16 && TARGET.Memory >= 64000)

The matchmaker routes to GPU slots when available, falls back to CPU otherwise — without user intervention.

6.2 Data Locality Across SRCNet Nodes

Assuming a ClassAd hydrator populates machine ads with dataset availability:

# Machine ClassAd (populated by hydrator)
AvailableDatasets = {"obs_1234", "obs_5678", "skymodel_v3"}
SiteLocation = "SRCNET_UK"

# Job ClassAd
+RequiredDatasets = {"obs_1234", "skymodel_v3"}
requirements = subset(MY.RequiredDatasets, TARGET.AvailableDatasets)
rank = size(intersection(MY.RequiredDatasets, TARGET.AvailableDatasets))

Jobs preferentially run at sites with local data, but can run elsewhere if needed (triggering data staging).

6.3 HPC Queue Latency Absorption

GlideinWMS glideins wait in site batch queues. When they start, they immediately join the HTCondor pool and request work. This absorbs queue latency:

Factory submits 100 glideins across 5 sites
Glideins wait in local queues (minutes to hours depending on site)
As glideins start, they pull jobs from the central queue
Sites with faster queues naturally process more jobs
No job is "trapped" in a slow queue — work stays in the central pool until a glidein is ready

6.4 Multi-Site Parallelism

A Rapthor scatter over 100 facets creates 100 HTCondor jobs. With glideins at 5 sites:

20 glideins run at SRCNET_UK (fast queue)
15 at SRCNET_NL (medium queue)
10 at SRCNET_AU (slow queue)
Jobs flow to available slots as glideins start
Natural load balancing across sites

6.5 Resilience and Retry

If a glidein crashes or a site goes offline:

Running jobs on that glidein are marked failed
HTCondor automatically requeues them (configurable retry policy)
Another glidein (possibly at a different site) picks them up

No manual intervention required for transient failures.

7. Comparison: HTCondor/GlideinWMS vs. PanDA vs. TIGER

Feature	TIGER (current)	PanDA	HTCondor/GlideinWMS
Late binding level	Site (Gateway claims job)	Slot (pilot pulls job)	Slot (glidein pulls job)
Central queue	RabbitMQ	PanDA Server DB	HTCondor Schedd
Matchmaking	Simple (spread across sites)	JEDI plugins (data-aware)	ClassAd expressions
Data locality	Future (DMAPI integration)	Native (Rucio)	Custom (ClassAd hydrator)
Site gateway	Job Gateway	Harvester	HTCondor-CE
Pilot/worker	Toil batch runner	PanDA Pilot	Glidein (startd)
Workflow DAG	Toil	iDDS	DAGMan or external
GPU routing	Manual	Native	Native (ClassAds)
Community size	Small (in-house)	Medium (HEP)	Large (OSG, HEP, HTC)
Complexity	Low	High	Medium
SRCNet investment	Current focus	Already planned	Would be new

8. Gotchas and Open Problems

8.1 Toil Job Store Accessibility

Toil uses a job store for intermediate files between CWL steps. With HTCondor distributing jobs across sites:

Shared filesystem: All sites mount the same GPFS/Lustre/CephFS — simplest but requires infrastructure
Object store: Toil's S3 job store, accessible from all sites — adds latency for small files
HTCondor file transfer: Transfer input/output files with each job — works for small files, problematic for large MS data

For Rapthor's large intermediate products (GB-scale measurement sets), option 1 or 2 is necessary.

8.2 Container Image Distribution

Rapthor requires DP3, WSClean, EveryBeam, IDG — a complex ~10GB image. Options:

CVMFS: Distribute via CernVM-FS (used by LHC experiments)
Container registry: Per-site Harbor/Docker registry with pull-through caching
Singularity/Apptainer: Convert to SIF, stage to shared storage

HTCondor supports container_image in job ClassAds, with configurable execution via Singularity/Apptainer.

8.3 No Native Data Management Integration

Unlike PanDA (which integrates with Rucio), HTCondor has no built-in data management. The "ClassAd hydrator" for data locality is a custom component that must:

Query DMAPI/Rucio for dataset locations
Update machine ClassAds periodically
Handle replication-in-progress states

This is significant engineering — roughly equivalent to the "data-aware scheduling" goal in Michele's proposal, but implemented differently.

8.4 GlideinWMS Operational Complexity

Deploying and operating GlideinWMS requires:

Central HTCondor pool (Schedd + Central Manager + Collector)
GlideinWMS Factory (submits glideins, manages credentials)
GlideinWMS Frontend (monitors demand, requests glideins)
HTCondor-CE at each site (translates to local batch)
SciTokens/IAM integration for authentication

This is well-documented (OSG provides detailed guides), but it's not trivial. The OSG/CMS operations teams have decades of experience; SRCNet would be starting fresh.

8.5 HTCondor Community Support in Toil

Toil's HTCondor backend is "community supported" — the core Toil team at UCSC doesn't actively maintain it. Known issues include:

Environment variable handling bugs
Schedd busy handling
Limited testing in CI

Any SRCNet deployment would likely need to enhance and maintain this code.

9. Recommended Path Forward

For an SRCNet deployment, given the context of existing TIGER work and planned PanDA investment:

If HTCondor is seriously considered (Pablo's proposal):

Stage 1 (3-6 months): Proof of Concept

Deploy minimal HTCondor pool (single Schedd + Central Manager) in SRCNet dev environment
Use existing Toil HTCondor backend to run Rapthor calibration operation
No GlideinWMS yet — just validate HTCondor job execution
Document gaps in Toil backend, estimate enhancement effort

Stage 2 (6-12 months): Single-Site Late Binding

Deploy HTCondor-CE at one SRC site
Deploy GlideinWMS Factory + Frontend
Glideins join central pool from one site
Validate late-binding behavior under queue pressure

Stage 3 (12-18 months): Multi-Site + Data Locality

Add HTCondor-CE at additional sites
Implement ClassAd hydrator for data locality (DMAPI integration)
Test data-aware job placement
Compare performance/complexity with PanDA approach

If PanDA remains the primary path:

HTCondor evaluation becomes a parallel spike to validate Pablo's hypothesis:

Small team (0.2 FTE) explores HTCondor proof of concept
Results inform tool assessment decision
Avoid committing significant resources until comparative data exists

10. Summary

HTCondor/GlideinWMS is a legitimate alternative to PanDA for SRCNet federated job execution. Its strengths:

True slot-level late binding via glidein model
Mature, battle-tested (25+ years, millions of jobs/day at CMS)
Large community with active development and support
Flexible matchmaking via ClassAd expressions
Existing Toil integration (though community-supported)

Its weaknesses for SRCNet:

No native data management — requires custom ClassAd hydrator
Significant infrastructure investment — full GlideinWMS stack
Diverges from existing SRCNet plans — PanDA/Harvester already documented
Operational learning curve — SRCNet team would need to build expertise

The most practical near-term integration path is enhancing Toil's HTCondor backend to work with a GlideinWMS-backed pool, while implementing a data-locality ClassAd hydrator that integrates with DMAPI. This preserves Rapthor's CWL/Toil architecture while gaining HTCondor's scheduling capabilities.

Whether this is preferable to the Toil-PanDA plugin approach depends on:

SRCNet's strategic commitment to PanDA vs. openness to alternatives
Relative complexity of ClassAd hydrator vs. PanDA/Rucio integration
Availability of HTCondor operational expertise in the SRCNet community

Pablo's proposal to "do due diligence" by exploring HTCondor is reasonable — a controlled proof-of-concept would provide empirical data for the tool assessment decision.

d3v-null/rapthor_toil_htcondor_glidein.md

Rapthor + HTCondor/GlideinWMS: Architecture, Late-Binding, and Integration Options

1. Background: The Two Systems

1.1 Rapthor

1.2 HTCondor and GlideinWMS

2. HTCondor Late Binding: A Deep Dive

2.1 The Problem Late Binding Solves

2.2 The Glidein Mechanism

2.3 Why This Differs from PanDA

2.4 ClassAd Matchmaking for Data-Aware Scheduling

2.5 Why Late Binding Matters for SRCNet

3. Existing Toil + HTCondor Integration

3.1 Current State

3.2 Limitations of Current Integration

4. Mapping Rapthor's Job Graph onto HTCondor Concepts

4.1 Conceptual Correspondence

4.2 The Self-Calibration Loop

4.3 The Scatter Pattern

5. Integration Architectures: Three Options

Option A — Enhanced Toil-HTCondor Plugin (Toil Stays; Local HTCondor Pool)

Option B — Toil + HTCondor-CE Direct Submission (Per-Site Late Binding)

Option C — Replace Broker/Gateway with GlideinWMS (Full HTCondor Stack)

6. Late Binding: Specific Advantages for Rapthor's Workload

6.1 Heterogeneous Resource Routing (CPU vs. GPU)

6.2 Data Locality Across SRCNet Nodes

6.3 HPC Queue Latency Absorption

6.4 Multi-Site Parallelism

6.5 Resilience and Retry

7. Comparison: HTCondor/GlideinWMS vs. PanDA vs. TIGER

8. Gotchas and Open Problems

8.1 Toil Job Store Accessibility

8.2 Container Image Distribution

8.3 No Native Data Management Integration

8.4 GlideinWMS Operational Complexity

8.5 HTCondor Community Support in Toil

9. Recommended Path Forward

If HTCondor is seriously considered (Pablo's proposal):

If PanDA remains the primary path:

10. Summary

References and Sources