Benchmarking
afquery benchmark measures query performance on synthetic or real data and produces a JSON report.
Quick Benchmark
Run against synthetic data (no real database required):
This generates 1,000 synthetic samples and 10,000 variants per chromosome, builds an in-memory database, runs a suite of query types, and writes results to benchmark_report.json.
Options
| Option | Default | Description |
|---|---|---|
--n-samples |
1000 |
Number of synthetic samples |
--n-variants |
10000 |
Number of variants per chromosome |
--output |
benchmark_report.json |
Output path for JSON report |
--db |
None | Run against an existing database instead of synthetic data |
Benchmark Against a Real Database
This uses your actual Parquet files and sample metadata, giving realistic performance numbers for your specific cohort size and variant density.
Report Format
The JSON report contains timing results for each query type:
{
"n_samples": 1000,
"n_variants": 10000,
"genome_build": "GRCh38",
"queries": {
"point_query_cold_ms": 87.3,
"point_query_warm_ms": 9.1,
"region_query_1mbp_ms": 312.4,
"batch_query_100_ms": 198.7,
"annotation_100_variants_ms": 523.1
},
"timestamp": "2026-03-16T10:00:00Z"
}
Interpreting Results
| Metric | Target | Notes |
|---|---|---|
point_query_cold_ms |
< 100 ms | First query after cold start; includes Parquet I/O |
point_query_warm_ms |
< 20 ms | Subsequent queries; OS page cache active |
region_query_1mbp_ms |
< 500 ms | Depends on variant density in region |
batch_query_100_ms |
< 300 ms | 100 pre-specified variants on same chromosome |
annotation_100_variants_ms |
< 2000 ms | VCF annotation, single-threaded |
If point_query_cold_ms exceeds 500 ms, check disk I/O performance. If point_query_warm_ms is slow, check available RAM for OS page cache.
Synthetic Data Generation
The benchmark generates:
Nsamples with random sex (50/50) and random technology assignmentMvariants per chromosome with uniformly random positions- Random genotypes with configurable carrier rates
Synthetic data is written to a temporary directory and cleaned up after the benchmark completes.
Tracking Performance Over Time
Run the benchmark after major updates to detect regressions:
# Before update
afquery benchmark --db ./db/ --output before.json
# After adding 500 samples
afquery update-db --db ./db/ --add-samples new_batch.tsv
afquery benchmark --db ./db/ --output after.json
# Compare
python3 -c "
import json
before = json.load(open('before.json'))['queries']
after = json.load(open('after.json'))['queries']
for k in before:
delta = after[k] - before[k]
print(f'{k}: {before[k]:.1f} → {after[k]:.1f} ms ({delta:+.1f})')
"
Comparison with Alternative Approaches
Feature Comparison
| Feature | AFQuery | bcftools stats | VCFtools --freq | GATK GenomicsDB | Hail |
|---|---|---|---|---|---|
| Query latency | <100 ms | Minutes (VCF scan) | Minutes (VCF scan) | Seconds | Seconds–minutes |
| Dynamic subcohort queries | Yes | No | No | Partial | Yes (programmatic) |
| Metadata filtering | Arbitrary labels | No | No | No | User-defined |
| Sex-stratified AF | Yes (auto ploidy) | Manual | Manual | No | Manual |
| Technology-aware AN | Yes (BED capture) | No | No | No | No |
| Incremental updates | Yes (no rebuild) | N/A | N/A | Yes (import) | Rebuild |
| Infrastructure required | None (file-based) | None | None | Java/server | Spark cluster |
| Input format | Single-sample VCFs | Any VCF | VCF | gVCF | VCF/BGEN |
| Output format | JSON/TSV/VCF annotation | Stats text | Freq file | Merged gVCF | Table/VCF |
| Max cohort size (tested) | 50,000 | 100,000+ | 100,000+ | 100,000+ | 1,000,000+ |
| Bitmap compression | Yes (Roaring) | No | No | Yes (GenomicsDB) | No |
| VCF annotation | Yes | No | No | No | Yes |
| Python API | Yes | No | No | No | Yes |
Query Latency Summary
Brief summary of query latency comparison:
| Tool | 50K samples, chr1, point query |
|---|---|
| AFQuery (cold) | <100 ms |
| AFQuery (warm) | ~10 ms |
| bcftools stats (VCF scan) | ~5 minutes |
| VCFtools --freq (VCF scan) | ~5 minutes |
AFQuery achieves this speed advantage because queries access only the bitmaps for a single 1-Mbp bucket (one Parquet file) rather than scanning the entire dataset.
Regression Testing
Run the benchmark after each major update to detect performance regressions:
# Save baseline before changes
afquery benchmark --db ./db/ --output baseline.json
# After update
afquery benchmark --db ./db/ --output after_update.json
# Compare
python3 - <<'EOF'
import json
b = json.load(open('baseline.json'))['queries']
a = json.load(open('after_update.json'))['queries']
for k in b:
pct = (a[k] - b[k]) / b[k] * 100
flag = " âš " if pct > 20 else ""
print(f"{k}: {b[k]:.1f} → {a[k]:.1f} ms ({pct:+.1f}%){flag}")
EOF
A regression of >20% on point_query_cold_ms warrants investigation.
Next Steps
- Performance Tuning — tune threads and memory to improve build and query speed
- Debugging Results — diagnose unexpected AN=0 or surprising AF values