Clinical Variant Prioritization
Scenario
A rare disease patient has undergone whole-exome sequencing. After standard filtering, 500,000 candidate variants remain. You want to annotate each variant with cohort-specific allele frequency and filter to those that are genuinely rare in your population — removing variants that are common locally but might appear rare in gnomAD.
Why Standard Databases Fall Short
gnomAD provides an excellent first filter, but:
-
Population mismatch: A variant at AF=0.001 in gnomAD may be at AF=0.02 in your local cohort — common locally but appearing rare globally. This discrepancy often reflects natural allele frequency variation between subpopulations driven by genetic drift, founder effects, and historical bottlenecks: alleles that are rare on a global scale may have reached appreciable frequencies in geographically or ethnically isolated groups.
-
Fine-grained Control Cohort Selection: Unlike resources such as gnomAD, where allele frequencies are derived from largely phenotype-agnostic populations, AFQuery allows the dynamic inclusion or exclusion of samples based on any annotated feature. This is particularly valuable in rare disease studies, where overlapping genetic architectures may confound analyses: for example, samples associated with a related condition can be selectively excluded to avoid bias. Because phenotypes are treated as flexible annotations, this control extends to any variable of interest, enabling more precise and context-aware frequency estimation.
-
Local artifacts: Systematic sequencing artifacts specific to your pipeline or capture kit manifest as recurrent variants that appear rare in gnomAD but accumulate high frequency in your cohort. These are best identified by elevated AF in your local database paired with low allele number (AN) or high N_FAIL counts, indicating poor genotype quality at the site.
AFQuery lets you apply cohort-specific AF as an additional filter layer on top of gnomAD, removing locally common variants that standard databases miss.
Step-by-Step Example
1. Annotate patient VCF with cohort AF
afquery annotate \
--db ./db/ \
--input patient.vcf.gz \
--output patient_annotated.vcf.gz \
--threads 16
This adds to each variant:
AFQUERY_AC: allele count in cohortAFQUERY_AN: allele number (eligible samples at this position)AFQUERY_AF: allele frequencyAFQUERY_N_HET,AFQUERY_N_HOM_ALT: genotype counts
2. Filter for rare variants with reliable AN
bcftools filter \
-i 'AFQUERY_AF < 0.001 && AFQUERY_AN >= 1000' \
patient_annotated.vcf.gz \
-o patient_rare.vcf.gz
The AFQUERY_AN >= 1000 threshold ensures the AF estimate is based on sufficient data. An AF estimate from AN=10 is meaningless — with AN=10, even a single carrier gives AF=0.1.
3. Handle variants not in the cohort
Variants absent from the database have AFQUERY_AN=0 (no eligible samples, or variant not observed). These require separate treatment:
# Variants NOT in cohort (AN=0): treat as novel
bcftools filter -i 'AFQUERY_AN == 0' patient_annotated.vcf.gz
# Variants in cohort with sufficient coverage
bcftools filter -i 'AFQUERY_AN >= 1000 && AFQUERY_AF < 0.001' patient_annotated.vcf.gz
4. Annotation with subgroup AF (optional)
If you want AF relative to a matched control group:
afquery annotate \
--db ./db/ \
--input patient.vcf.gz \
--output patient_control_af.vcf.gz \
--phenotype ^rare_disease # AF in non-rare-disease samples
5. Python workflow
import cyvcf2
from afquery import Database
db = Database("./db/")
# Annotate and filter in memory
vcf = cyvcf2.VCF("patient.vcf.gz")
rare_candidates = []
for variant in vcf:
results = db.query(variant.CHROM, pos=variant.POS, alt=variant.ALT[0])
if not results:
rare_candidates.append(variant) # Not in cohort → novel
continue
r = results[0]
if r.AN >= 1000 and r.AF < 0.001:
rare_candidates.append(variant)
print(f"Rare candidates: {len(rare_candidates)}")
Biological Interpretation
| Filter | Retained | Removed | Reason |
|---|---|---|---|
| AFQUERY_AN >= 1000 | 45,000 | 455,000 | Insufficient cohort coverage |
| AFQUERY_AF < 0.001 | 1,200 | 43,800 | Locally common variants |
| Novel (AN=0) | 300 | — | Not observed in cohort |
A typical clinical pipeline retains ~1,500 rare/novel candidates after cohort AF filtering, compared to 500,000 before.
AN threshold guidance:
AN >= 100: minimum for any AF interpretationAN >= 500: recommended for rare variant filteringAN >= 1000: conservative threshold for robust AF estimates
For detailed ACMG workflows with worked examples and AN threshold guidance, see ACMG Criteria (BA1/PM2/PS4).
6. Inspect carriers of rare candidates
After identifying rare variants, use variant-info to see who carries each one — useful for confirming case/control enrichment or checking sample quality:
This lists each carrier with their phenotype codes, sequencing technology, genotype, and FILTER status. See Variant Info for filtering options.
Next Steps
- Variant Info — list carriers of any variant with metadata
- Annotate a VCF — full annotation options
- FILTER=PASS Tracking — using N_FAIL in filtering
- Population-Specific AF — local vs. gnomAD comparison