Benchmarking

afquery benchmark measures query performance on synthetic or real data and produces a JSON report.

Quick Benchmark

Run against synthetic data (no real database required):

afquery benchmark

This generates 1,000 synthetic samples and 10,000 variants per chromosome, builds an in-memory database, runs a suite of query types, and writes results to benchmark_report.json.

Options

Option	Default	Description
`--n-samples`	`1000`	Number of synthetic samples
`--n-variants`	`10000`	Number of variants per chromosome
`--output`	`benchmark_report.json`	Output path for JSON report
`--db`	None	Run against an existing database instead of synthetic data

Benchmark Against a Real Database

afquery benchmark --db ./my_db/ --output my_db_benchmark.json

This uses your actual Parquet files and sample metadata, giving realistic performance numbers for your specific cohort size and variant density.

Report Format

The JSON report contains timing results for each query type:

{
  "n_samples": 1000,
  "n_variants": 10000,
  "genome_build": "GRCh38",
  "queries": {
    "point_query_cold_ms": 87.3,
    "point_query_warm_ms": 9.1,
    "region_query_1mbp_ms": 312.4,
    "batch_query_100_ms": 198.7,
    "annotation_100_variants_ms": 523.1
  },
  "timestamp": "2026-03-16T10:00:00Z"
}

Interpreting Results

Metric	Target	Notes
`point_query_cold_ms`	< 100 ms	First query after cold start; includes Parquet I/O
`point_query_warm_ms`	< 20 ms	Subsequent queries; OS page cache active
`region_query_1mbp_ms`	< 500 ms	Depends on variant density in region
`batch_query_100_ms`	< 300 ms	100 pre-specified variants on same chromosome
`annotation_100_variants_ms`	< 2000 ms	VCF annotation, single-threaded

If point_query_cold_ms exceeds 500 ms, check disk I/O performance. If point_query_warm_ms is slow, check available RAM for OS page cache.

Synthetic Data Generation

The benchmark generates:

N samples with random sex (50/50) and random technology assignment
M variants per chromosome with uniformly random positions
Random genotypes with configurable carrier rates

Synthetic data is written to a temporary directory and cleaned up after the benchmark completes.

Tracking Performance Over Time

Run the benchmark after major updates to detect regressions:

# Before update
afquery benchmark --db ./db/ --output before.json

# After adding 500 samples
afquery update-db --db ./db/ --add-samples new_batch.tsv
afquery benchmark --db ./db/ --output after.json

# Compare
python3 -c "
import json
before = json.load(open('before.json'))['queries']
after  = json.load(open('after.json'))['queries']
for k in before:
    delta = after[k] - before[k]
    print(f'{k}: {before[k]:.1f} → {after[k]:.1f} ms  ({delta:+.1f})')
"

Comparison with Alternative Approaches

Feature Comparison

Feature	AFQuery	bcftools stats	VCFtools --freq	GATK GenomicsDB	Hail
Query latency	<100 ms	Minutes (VCF scan)	Minutes (VCF scan)	Seconds	Seconds–minutes
Dynamic subcohort queries	Yes	No	No	Partial	Yes (programmatic)
Metadata filtering	Arbitrary labels	No	No	No	User-defined
Sex-stratified AF	Yes (auto ploidy)	Manual	Manual	No	Manual
Technology-aware AN	Yes (BED capture)	No	No	No	No
Incremental updates	Yes (no rebuild)	N/A	N/A	Yes (import)	Rebuild
Infrastructure required	None (file-based)	None	None	Java/server	Spark cluster
Input format	Single-sample VCFs	Any VCF	VCF	gVCF	VCF/BGEN
Output format	JSON/TSV/VCF annotation	Stats text	Freq file	Merged gVCF	Table/VCF
Max cohort size (tested)	50,000	100,000+	100,000+	100,000+	1,000,000+
Bitmap compression	Yes (Roaring)	No	No	Yes (GenomicsDB)	No
VCF annotation	Yes	No	No	No	Yes
Python API	Yes	No	No	No	Yes

Query Latency Summary

Brief summary of query latency comparison:

Tool	50K samples, chr1, point query
AFQuery (cold)	<100 ms
AFQuery (warm)	~10 ms
bcftools stats (VCF scan)	~5 minutes
VCFtools --freq (VCF scan)	~5 minutes

AFQuery achieves this speed advantage because queries access only the bitmaps for a single 1-Mbp bucket (one Parquet file) rather than scanning the entire dataset.

Regression Testing

Run the benchmark after each major update to detect performance regressions:

# Save baseline before changes
afquery benchmark --db ./db/ --output baseline.json

# After update
afquery benchmark --db ./db/ --output after_update.json

# Compare
python3 - <<'EOF'
import json
b = json.load(open('baseline.json'))['queries']
a = json.load(open('after_update.json'))['queries']
for k in b:
    pct = (a[k] - b[k]) / b[k] * 100
    flag = " ⚠" if pct > 20 else ""
    print(f"{k}: {b[k]:.1f} → {a[k]:.1f} ms  ({pct:+.1f}%){flag}")
EOF

A regression of >20% on point_query_cold_ms warrants investigation.

Next Steps

Performance Tuning — tune threads and memory to improve build and query speed
Debugging Results — diagnose unexpected AN=0 or surprising AF values