Skip to content

Query Allele Frequencies

afquery query retrieves allele frequencies from the database. Three query modes are available: point, region, and batch.


Point Query

Query a single genomic position:

afquery query --db ./db/ --locus chr1:925952

Filter to a specific alt allele (useful at multi-allelic sites):

afquery query --db ./db/ --locus chr1:925952 --alt A

Region Query

Query all variants in a genomic range:

afquery query --db ./db/ --region chr1:900000-1000000

The range is 1-based, inclusive on both ends.


Python API — multi-chromosome regions

To query variants across multiple regions (including different chromosomes) in a single call, use query_region_multi:

from afquery import Database

db = Database("./db/")
regions = [
    ("chr1",  900000,   1000000),
    ("chr17", 41196311, 41277500),
]
results = db.query_region_multi(regions, phenotype=["E11.9"])

Results are returned in genomic order (chr1, chr2, …, chr22, chrX, chrY, chrM). Overlapping regions are automatically deduplicated — each variant appears at most once. Chromosome names are normalized, so "1" and "chr1" are equivalent.

For querying specific variants across chromosomes, use query_batch_multi:

variants = [
    ("chr1",  925952,  "G", "A"),
    ("chrX",  5000000, "A", "G"),
]
results = db.query_batch_multi(variants)

Results are returned in input order (by original index). Duplicate entries are deduplicated per chromosome — if the same (chrom, pos, ref, alt) appears more than once, only the first occurrence is included. Chromosome names are normalized, so "1" and "chr1" are equivalent.


Batch Query

Query multiple positions at once from a file:

afquery query --db ./db/ --from-file variants.tsv

The input file is a headerless TSV with columns chrom pos [ref [alt]] (ref and alt are optional):

chr1    925952  G   A
chr1    1014541 C   T
chrX    5000000 A   G

Batch queries support variants across multiple chromosomes in a single file.


Output Formats

text (default)

Human-readable, one block per variant:

chr1:925952 G>A  AC=142  AN=2742  AF=0.0518  n_eligible=1371  N_HET=138  N_HOM_ALT=2  N_HOM_REF=1231  N_FAIL=0

tsv

Tab-separated, one row per variant, suitable for downstream processing:

afquery query --db ./db/ --region chr1:900000-1000000 --format tsv
chrom   pos ref alt AC  AN  AF  n_eligible  N_HET   N_HOM_ALT   N_HOM_REF   N_FAIL
chr1    925952  G   A   142 2742    0.051782    1371    138 2   1231    0

json

JSON array, one object per variant:

afquery query --db ./db/ --locus chr1:925952 --format json
[
  {
    "chrom": "chr1",
    "pos": 925952,
    "ref": "G",
    "alt": "A",
    "AC": 142,
    "AN": 2742,
    "AF": 0.05178,
    "n_eligible": 1371,
    "N_HET": 138,
    "N_HOM_ALT": 2,
    "N_HOM_REF": 1231,
    "N_FAIL": 0
  }
]

Sample Filtering

All query modes support the same filter options:

afquery query \
  --db ./db/ \
  --locus chr1:925952 \
  --phenotype E11.9 \
  --sex female \
  --tech wgs

Filters compose with AND:

  • --phenotype E11.9 --sex female = female samples with E11.9

Multiple values for the same filter compose with OR:

  • --phenotype E11.9 --phenotype I10 = samples with E11.9 OR I10

Exclude with ^ prefix:

  • --tech ^wes_v1 = all technologies except wes_v1

See Sample Filtering for full syntax.


Results When AN=0

If all samples are excluded by your filters, the result will have AC=0, AN=0, and AF=None. This is expected behavior — it means no samples in the selected subgroup were eligible at that position (e.g., all WES samples and the position is not in their capture regions).


Comparing AF Across Subgroups

Run two queries and compare:

# Diabetic patients
afquery query --db ./db/ --locus chr1:925952 --phenotype E11.9 --format json

# Healthy controls (exclude diabetic)
afquery query --db ./db/ --locus chr1:925952 --phenotype ^E11.9 --format json

For systematic comparison across many variants, consider Bulk Export with --by-phenotype, or see Cohort Stratification for a worked multi-group comparison.


Full Option Reference

See CLI Reference → query.


Next Steps