Quickstart
This tutorial walks through building a small AFQuery database and running your first queries. It takes about 5 minutes.
New to AFQuery?
If you want to understand how bitmaps, Parquet storage, and metadata filtering work together before diving in, read Key Concepts first. Otherwise, follow along — you can always come back to the theory later.
Normalize your VCFs first
AFQuery works best with normalized, left-aligned VCFs with ploidy-corrected sex chromosome calls. See VCF Preprocessing for a reference normalization pipeline using bcftools.
1. Create a Manifest
The manifest is a TSV file describing your initial cohort. Create manifest.tsv:
sample_name vcf_path sex tech_name phenotype_codes
SAMPLE_001 /data/vcfs/sample001.vcf.gz female wgs E11.9,I10
SAMPLE_002 /data/vcfs/sample002.vcf.gz male wgs E11.9
SAMPLE_003 /data/vcfs/sample003.vcf.gz female wes_v1 I10
Fields:
sample_name: unique identifiervcf_path: path to single-sample VCF (plain or.gz)sex:maleorfemaletech_name: sequencing technology name. UseWGSfor whole genome sequencingphenotype_codes: comma-separated phenotype codes (arbitrary strings)
See Manifest Format for full details.
2. Create the Database
afquery create-db \
--manifest manifest.tsv \
--output-dir ./my_db/ \
--genome-build GRCh38 \
--bed-dir ./beds/ \
--threads 12
BED files are required for WES and panel technologies. Place one file per technology in the --bed-dir directory, named <tech_name>.bed (e.g., wes_v1.bed). WGS technologies do not need a BED file.
The command will:
1. Ingest all VCFs
2. Build Roaring Bitmap Parquet files per chromosome/bucket
3. Write manifest.json and metadata.sqlite
3. Inspect the Database (optional)
4. Query a Single Position
Example output:
Filter to female samples with a specific phenotype:
See Sample Filtering for the full include/exclude syntax.
Warnings for missing data
If a phenotype code, technology name, or chromosome is not found in the database, afquery prints a warning to stderr and returns empty results. Use --no-warn to suppress these warnings.
5. Inspect Carriers (optional)
See which samples carry the variant you just queried:
This lists each carrier with their sex, technology, phenotype codes, genotype (het/hom), and FILTER status. See Variant Info for details.
6. Query a Region
7. Annotate a VCF
Given a VCF with variants you want to annotate:
The output VCF gains INFO fields (see Annotate a VCF for parallelism options and downstream usage):
| Field | Number | Description |
|---|---|---|
AFQUERY_AC |
A (per ALT) | Allele count |
AFQUERY_AN |
1 (per site) | Allele number |
AFQUERY_AF |
A (per ALT) | Allele frequency |
AFQUERY_N_HET |
A (per ALT) | Heterozygous sample count |
AFQUERY_N_HOM_ALT |
A (per ALT) | Homozygous alt sample count |
AFQUERY_N_HOM_REF |
A (per ALT) | Homozygous ref sample count |
AFQUERY_N_FAIL |
1 (per site) | Samples with FILTER≠PASS |
Next Steps
- Key Concepts — understand how bitmaps, Parquet, and metadata filtering work together
- Sample Filtering — full syntax for phenotype, sex, and technology filters
- Variant Info — list carriers of any variant with metadata
- Annotate a VCF — annotation options, parallelism, and downstream usage
- ACMG Criteria — applying local AF to BA1, PM2, and PS4