Tutorial: End-to-End Walkthrough
This tutorial walks through every major AFQuery feature using a synthetic demo dataset. By the end, you will have built a database, queried variants, filtered by metadata, annotated a VCF, and exported results.
Prerequisites
Make sure AFQuery is installed. See Installation if needed.
1. Generate Demo Data
AFQuery ships with a script that creates 10 synthetic VCFs, a manifest, and BED files for two WES technologies:
This creates examples/demo/demo_output/ with:
vcfs/β 10 single-sample VCFs (DEMO_001 through DEMO_010)beds/β capture BED files forwes_v1andwes_v2manifest.tsvβ sample metadata
The demo cohort has 4 WGS samples, 3 wes_v1 samples, and 3 wes_v2 samples, with phenotype codes E11.9, I10, and control. It includes a set of variants at fixed positions so all query examples in this tutorial produce reproducible output.
2. Create the Database
afquery create-db \
--manifest examples/demo/demo_output/manifest.tsv \
--output-dir ./demo_db/ \
--genome-build GRCh38 \
--bed-dir examples/demo/demo_output/beds/
This ingests all VCFs, builds Roaring Bitmap Parquet files, and writes manifest.json and metadata.sqlite.
3. Inspect the Database
Example output:
Database: ./demo_db/
Version: 1.0
Genome build: GRCh38 Schema: 2.0
Created: ... Updated: N/A
Samples: 10 total
By sex: female=5 male=5
By tech: wes_v1=3 wes_v2=3 wgs=4
By phenotype: E11.9=4 I10=5 control=4
The 10 samples span 3 technologies. Phenotype counts reflect samples tagged with each code; a sample may have multiple codes.
4. Query a Single Variant
Query the variant at chr1:925952:
Example output:
Reading the output
Every query result includes the following fields:
| Field | Description |
|---|---|
| AC | Allele count β alt allele copies in eligible samples |
| AN | Allele number β total alleles examined (adjusted for coverage and ploidy) |
| AF | Allele frequency β AC / AN |
| n_eligible | Samples in the filtered set (before coverage or ploidy adjustments). Here, 10 = no filter applied. |
| N_HET | Eligible samples heterozygous for the alt allele |
| N_HOM_ALT | Eligible samples homozygous for the alt allele |
| N_HOM_REF | Eligible samples homozygous reference |
| N_FAIL | Eligible samples with a non-ref allele called but FILTERβ PASS in the source VCF |
For chr1:925952: AN=20 = 10 diploid samples Γ 2 alleles; AC=6 = 4 het samples (1 copy each) + 1 hom-alt (2 copies).
Accounting identity: n_eligible = N_HET + N_HOM_ALT + N_HOM_REF + N_FAIL always holds (10 = 4+1+5+0 β).
N_FAIL in practice
Some variants have samples whose call passed genotyping but failed a quality filter in the source VCF. Query the variant at chr1:946000:
Example output:
Here, 2 samples were flagged as LowQual in their source VCF at this position. N_FAIL samples are:
- Counted in AN β they are eligible and contribute to the denominator
- Excluded from AC β their alleles are not counted in the numerator
This makes AF a conservative estimate: AF=2/20=0.1000 even though 2 additional samples may carry the variant. Verify: 10 = 2+0+6+2 β.
Tip
N_FAIL > 0 is a signal to inspect source VCF quality at this site. A high N_FAIL relative to n_eligible may indicate a systematic sequencing artifact. See Understanding Output for thresholds.
Region query
To find all variants in the WES capture region:
Example output:
chr1:925100 C>T AC=4 AN=20 AF=0.2000 n_eligible=10 N_HET=4 N_HOM_ALT=0 N_HOM_REF=6 N_FAIL=0
chr1:925952 G>A AC=6 AN=20 AF=0.3000 n_eligible=10 N_HET=4 N_HOM_ALT=1 N_HOM_REF=5 N_FAIL=0
chr1:946000 T>C AC=2 AN=20 AF=0.1000 n_eligible=10 N_HET=2 N_HOM_ALT=0 N_HOM_REF=6 N_FAIL=2
5. Inspect Variant Carriers
After finding a variant of interest, use variant-info to see which specific samples carry it:
Example output:
sample_id sample_name sex tech phenotypes genotype filter
--------- ----------- ------ ------ --------------- -------- ------
0 DEMO_001 female wgs E11.9,I10 het PASS
2 DEMO_003 male wgs E11.9 het PASS
4 DEMO_005 female wes_v1 E11.9,control het PASS
6 DEMO_007 male wes_v2 control het PASS
8 DEMO_009 female wgs E11.9 hom PASS
Each row is one carrier. The genotype column shows het (heterozygous), hom (homozygous alt), or alt (non-ref with FILTERβ PASS). The filter column indicates whether the call passed quality filters in the source VCF.
For machine-readable output, use --format tsv:
See Variant Info for full options including allele-specific queries and sample filtering.
6. Filter by Sex
Query only female samples:
Example output:
Now query male samples:
Example output:
AN drops from 20 to 10 in both cases because only 5 samples are eligible. The AF happens to be identical here (0.3000), but the genotype distributions differ: the male group has one homozygous-alt carrier while the female group has three heterozygous carriers.
7. Filter by Phenotype
Query samples tagged with E11.9:
Example output:
Exclude control samples:
Example output:
The ^ prefix means "exclude". Excluding controls removes 4 samples, leaving 6. See Sample Filtering for the full syntax.
8. Filter by Technology
Restrict to WGS samples only:
Example output:
Now query each WES technology separately:
chr1:925952 falls inside the chr1:900000-950000 capture region shared by both WES kits. All 3 wes_v1 samples and all 3 wes_v2 samples are covered, so AN=6 (3 samples Γ 2 alleles) for both.
The critical case: a position outside all WES capture regions
Now query a position that is not in any WES BED file:
chr1:1399914 lies between the two chr1 WES capture regions (900000β950000 and 1200000β1250000). When you restrict to --tech wes_v1, AFQuery excludes all wes_v1 samples from AN because none of them have coverage at this position β AN=0, so no results are returned.
No variants found β rare
"No variants found" here means AFQuery has no coverage data for the requested technology at this position. The variant exists in the database (visible with --tech wgs), but frequency cannot be estimated for wes_v1. Always check that AN is sufficient before interpreting AF.
9. Combine Filters
All filter dimensions compose with AND:
afquery query \
--db ./demo_db/ \
--locus chr1:925952 \
--sex female \
--phenotype E11.9 \
--tech wgs
Example output:
Only one sample meets all three criteria (DEMO_001: female, wgs, E11.9). With n_eligible=1, AN=2 (one diploid sample on an autosome) and AC=1 (heterozygous carrier).
10. Annotate a VCF
Use one of the demo VCFs as input:
afquery annotate \
--db ./demo_db/ \
--input examples/demo/demo_output/vcfs/DEMO_001.vcf.gz \
--output ./annotated_demo.vcf
The output VCF gains AFQUERY_AC, AFQUERY_AN, AFQUERY_AF, AFQUERY_N_HET, AFQUERY_N_HOM_ALT, AFQUERY_N_HOM_REF, and AFQUERY_N_FAIL INFO fields. Inspect the chr1 variants:
Example output (columns: CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, DEMO_001):
chr1 925100 . C T 50 PASS AFQUERY_AC=4;AFQUERY_AN=20;AFQUERY_AF=0.2;AFQUERY_N_HET=4;AFQUERY_N_HOM_ALT=0;AFQUERY_N_HOM_REF=6;AFQUERY_N_FAIL=0 GT 0/1
chr1 925952 . G A 50 PASS AFQUERY_AC=6;AFQUERY_AN=20;AFQUERY_AF=0.3;AFQUERY_N_HET=4;AFQUERY_N_HOM_ALT=1;AFQUERY_N_HOM_REF=5;AFQUERY_N_FAIL=0 GT 0/1
chr1 946000 . T C 50 PASS AFQUERY_AC=2;AFQUERY_AN=20;AFQUERY_AF=0.1;AFQUERY_N_HET=2;AFQUERY_N_HOM_ALT=0;AFQUERY_N_HOM_REF=6;AFQUERY_N_FAIL=2 GT 0/1
AFQUERY_N_FAIL=2 for chr1:946000 matches the query output from Step 4. The annotated values reflect cohort-wide frequencies across all 10 samples, not just DEMO_001.
See Annotate a VCF for filtering and downstream usage.
11. Bulk Export with Dump
Export all variant frequencies to CSV:
Preview the first rows:
Example output:
chrom,pos,ref,alt,AC,AN,AF,N_HET,N_HOM_ALT,N_HOM_REF,N_FAIL
chr1,925100,C,T,4,20,0.2,4,0,6,0
chr1,925952,G,A,6,20,0.3,4,1,5,0
chr1,946000,T,C,2,20,0.1,2,0,6,2
chr1,1225000,A,G,5,20,0.25,5,0,5,0
chr1,1399914,G,T,2,8,0.25,2,0,2,0
Note that chr1:1399914 has AN=8 even in the full-cohort dump β technology-aware AN is applied automatically (WES samples are excluded for positions outside their capture regions).
Disaggregate by sex and technology:
This adds columns following the pattern AC_{sex}_{tech}, AN_{sex}_{tech}, AF_{sex}_{tech} (and genotype counts) for each sex-technology combination. The stratified dump is useful for case-control comparisons and population-specific frequency analysis.
12. Interpret Results with ACMG Criteria
With the annotated VCF or query results, you can apply ACMG criteria:
- BA1: Is AF > 5% in your cohort? β Variant is benign.
- PM2_Supporting: Is the variant absent (AC=0) with high AN? β Supporting pathogenic evidence.
- PS4: Is the variant enriched in cases vs. controls? Use
--phenotypeand--phenotype ^disease_codeto compare.
For detailed guidance, see ACMG Criteria (BA1/PM2/PS4).
Next Steps
- Key Concepts β understand how bitmaps, Parquet, and metadata filtering work
- Understanding Output β field definitions and interpretation of N_FAIL, AN=0, and other special cases
- Manifest Format β prepare your own cohort manifest
- Create a Database β build a database from your real data
- ACMG Criteria β apply local AF to variant classification