Skip to content

Update a Database

afquery update-db supports three operations: adding new samples, removing existing samples, and compacting the database to reclaim space.

Update Timeline

graph TD
    A["Initial Build<br/>1000 samples<br/>DB v1.0"]
    B["Add Batch 2<br/>+500 samples<br/>DB v1.1"]
    C["Add Batch 3<br/>+300 samples<br/>DB v1.2"]
    D["Remove<br/>50 samples<br/>DB v1.3"]
    E["Compact<br/>reclaim 2% disk<br/>DB v1.3c"]

    A -->|--add-samples batch2.tsv| B
    B -->|--add-samples batch3.tsv| C
    C -->|--remove-samples SAMP_*| D
    D -->|--compact| E

    style A fill:#e3f2fd
    style B fill:#fff3e0
    style C fill:#fff3e0
    style D fill:#ffebee
    style E fill:#e8f5e9

Add Samples

Provide a new manifest TSV with the samples to add:

afquery update-db \
  --db ./db/ \
  --add-samples new_samples.tsv

The new manifest follows the same format as the original (see Manifest Format). New samples are assigned monotonically increasing sample IDs.

To add multiple manifests at once:

afquery update-db \
  --db ./db/ \
  --add-samples batch1.tsv \
  --add-samples batch2.tsv

For WES samples, provide the BED file directory:

afquery update-db \
  --db ./db/ \
  --add-samples new_samples.tsv \
  --bed-dir ./beds/

Remove Samples

Remove one or more samples by name:

afquery update-db \
  --db ./db/ \
  --remove-samples SAMP_001

Remove multiple samples:

afquery update-db \
  --db ./db/ \
  --remove-samples SAMP_001,SAMP_002,SAMP_003

Or repeat the flag:

afquery update-db \
  --db ./db/ \
  --remove-samples SAMP_001 \
  --remove-samples SAMP_002

Note

Removal marks the sample as inactive and clears its bit from all bitmaps. The physical bit position is not reused. Run --compact after removing many samples to reclaim disk space.


Compact

After removing samples, compact the database to remove dead bits and reduce disk usage:

afquery update-db \
  --db ./db/ \
  --compact

This rewrites all Parquet files, removing bits for deleted samples. For large databases, compact runs in parallel and may take several minutes.

When to Compact

  • After removing more than 5–10% of samples
  • When disk space is a concern
  • Before archiving or sharing the database

Combine Operations

Operations can be combined in a single command:

afquery update-db \
  --db ./db/ \
  --remove-samples SAMP_OLD_001 \
  --add-samples new_cohort.tsv \
  --compact

Operations execute in this order: remove → add → compact.


Database Version

By default, the version label auto-increments (e.g., 1.02.0). Set a custom version:

afquery update-db \
  --db ./db/ \
  --add-samples new_samples.tsv \
  --db-version 2026.03

View Changelog

Every update operation is logged. View the history:

afquery info --db ./db/ --changelog

Example output:

v1.0  2026-01-15  create   1371 samples added
v2.0  2026-02-01  add       42 samples added
v2.0  2026-02-15  remove     3 samples removed
v3.0  2026-03-01  compact   compacted after removal


Update Sample Metadata

Correct a sample's sex or phenotype_codes without re-ingesting its VCF. Precomputed bitmaps are regenerated and the change is logged in the changelog.

Single sample

# Change sex
afquery update-db --db ./db/ --update-sample SAMP_001 --set-sex female

# Replace phenotype codes (replaces ALL current codes)
afquery update-db --db ./db/ --update-sample SAMP_001 --set-phenotype "E11.9,I10"

# Change both fields in one command
afquery update-db --db ./db/ \
  --update-sample SAMP_001 \
  --set-sex female \
  --set-phenotype "E11.9,I10"

Batch update from TSV

Create a TSV file with a sample_name, field, new_value header. One change per row; the same sample can appear on multiple rows:

sample_name field   new_value
SAMP_001    sex female
SAMP_002    phenotype_codes E11.9,I10
SAMP_003    sex male
SAMP_003    phenotype_codes C50
afquery update-db --db ./db/ --update-samples-file corrections.tsv

Operator note

Attach a free-text note to every changelog entry created by the update:

afquery update-db --db ./db/ \
  --update-sample SAMP_001 \
  --set-phenotype "E11.9" \
  --operator-note "Corrected after clinical review 2026-03-19"

Verify the change

# Inspect the changelog
afquery info --db ./db/ --changelog

# List samples to confirm new values
afquery info --db ./db/ --samples

# Query with the updated phenotype
afquery query --db ./db/ --locus chr1:925952 --phenotype E11.9

Full Option Reference

See CLI Reference → update-db.


Next Steps