Debugging Results
When AFQuery returns unexpected results β AN=0, surprising AF values, or missing variants β use this diagnostic checklist to identify the root cause.
Diagnostic Checklist
1. Unexpected AN=0
AN=0 means no eligible samples at the queried position. Work through these checks in order:
| Check | Command | What to look for |
|---|---|---|
| Chromosome normalization | afquery query --db ./db/ --chrom 1 ... vs --chrom chr1 |
Database may use chr1 while you're querying 1 (or vice versa). Check manifest.json for genome_build. |
| Position exists in database | afquery query --db ./db/ --locus chr1:12345678 |
If no result at all, the variant was not observed in any sample during ingestion. |
| BED coverage (WES) | afquery info --db ./db/ |
If all eligible samples are WES and the position is outside capture regions, AN=0 is correct. |
| Sample filter too restrictive | Remove --phenotype and --sex filters |
Query with no filters first. If AN>0 without filters, the filter is excluding all samples. |
| Technology filter | Remove --tech filter |
Check if any samples match the requested technology. |
2. Unexpected AF Value
| Symptom | Likely Cause | Resolution |
|---|---|---|
| AF higher than expected | Cohort enriched for disease with this variant | Compare with --phenotype ^disease_code to get control AF |
| AF lower than expected | Many WES samples without coverage at this position β diluted AN | Filter by --tech wgs to check WGS-only AF |
| AF=1.0 | All eligible samples carry the variant | Check if AN is very small (e.g., AN=2 means just 1 sample) |
| AF differs from gnomAD | Expected β local cohort AF β global population AF | This is by design; see Why Cohort-Specific AF Matters |
3. Missing Variants
A variant you expect to find is not in the database:
| Check | Details |
|---|---|
| Was it in the source VCFs? | AFQuery only stores variants present in ingested VCFs |
| Was it FILTER=PASS? | Default ingestion skips non-PASS variants. Check with afquery info for pass_only_filter |
| Multiallelic sites | AFQuery stores each ALT separately. Query the specific ALT allele, not just position |
| Chromosome naming | Ensure consistent chr prefix usage |
4. Unexpected N_FAIL > 0
N_FAIL > 0 means some eligible samples had the alt allele called but with FILTERβ PASS. These samples are excluded from AC/AN. This is usually benign (1β2 samples), but a high N_FAIL warrants investigation:
| N_FAIL relative to n_eligible | Likely cause |
|---|---|
| 1β2 samples | Isolated low-quality calls β not concerning |
| > 5% of n_eligible | Systematic sequencing artifact at this site |
| All eligible samples | Site-wide QC failure β AF=0 but variant is present in source VCFs |
To inspect a site with high N_FAIL, query with --format json to see all fields:
Identify failing samples
Use afquery variant-info --db ./db/ --locus chr1:12345678 to see exactly which samples have FAIL status and their metadata (technology, phenotype codes). This helps determine if failures cluster in a specific technology or sample subset. See Variant Info.
If N_FAIL is consistently high across many sites, check the variant calling pipeline and FILTER field settings in your VCFs.
Diagnostic Commands
Database Info
Shows: sample count, technology list, schema version, genome build, PASS-only status.
Check Database Integrity
Validates: manifest consistency, Parquet file integrity, capture index presence.
Query with Full Output
JSON format shows all fields including N_HET, N_HOM_ALT, N_HOM_REF, and N_FAIL β useful for understanding the composition of the result.
Direct Metadata Inspection
import sqlite3
conn = sqlite3.connect("./db/metadata.sqlite")
# List all phenotype codes and sample counts
cursor = conn.execute("""
SELECT code, COUNT(*) as n_samples
FROM sample_phenotypes
GROUP BY code
ORDER BY n_samples DESC
""")
for row in cursor:
print(f"{row[0]:30s} {row[1]} samples")
# List technologies
cursor = conn.execute("""
SELECT technology, COUNT(*) as n_samples
FROM samples
GROUP BY technology
""")
for row in cursor:
print(f"{row[0]:20s} {row[1]} samples")
conn.close()
Common Root Causes
| Symptom | Root Cause | Fix |
|---|---|---|
| AN=0 for all queries | Wrong --db path or empty database |
Verify path; run afquery info |
| AN=0 for specific region | WES-only cohort, position outside capture | Check BED file coverage |
| AN much lower than sample count | Mixed WGS/WES, position outside WES capture | Filter by --tech wgs to isolate |
| AF=None in output | AN=0 (division by zero) | See AN=0 diagnosis above |
Different AF between query and annotate |
Different default filters or phenotype context | Ensure same --phenotype, --sex, --tech flags |
| N_FAIL high at a site | Systematic QC failure in source VCFs at this position | Inspect site with --format json; check VCF FILTER annotations |
Next Steps
- Understanding Output β what each field means
- FAQ β common questions and answers
- Troubleshooting β error messages and solutions
- FILTER=PASS Tracking β how PASS filtering works