Update a Database
afquery update-db supports three operations: adding new samples, removing existing samples, and compacting the database to reclaim space.
Update Timeline
graph TD
A["Initial Build<br/>1000 samples<br/>DB v1.0"]
B["Add Batch 2<br/>+500 samples<br/>DB v1.1"]
C["Add Batch 3<br/>+300 samples<br/>DB v1.2"]
D["Remove<br/>50 samples<br/>DB v1.3"]
E["Compact<br/>reclaim 2% disk<br/>DB v1.3c"]
A -->|--add-samples batch2.tsv| B
B -->|--add-samples batch3.tsv| C
C -->|--remove-samples SAMP_*| D
D -->|--compact| E
style A fill:#e3f2fd
style B fill:#fff3e0
style C fill:#fff3e0
style D fill:#ffebee
style E fill:#e8f5e9
Add Samples
Provide a new manifest TSV with the samples to add:
The new manifest follows the same format as the original (see Manifest Format). New samples are assigned monotonically increasing sample IDs.
To add multiple manifests at once:
For WES samples, provide the BED file directory:
Remove Samples
Remove one or more samples by name:
Remove multiple samples:
Or repeat the flag:
Note
Removal marks the sample as inactive and clears its bit from all bitmaps. The physical bit position is not reused. Run --compact after removing many samples to reclaim disk space.
Compact
After removing samples, compact the database to remove dead bits and reduce disk usage:
This rewrites all Parquet files, removing bits for deleted samples. For large databases, compact runs in parallel and may take several minutes.
When to Compact
- After removing more than 5–10% of samples
- When disk space is a concern
- Before archiving or sharing the database
Combine Operations
Operations can be combined in a single command:
afquery update-db \
--db ./db/ \
--remove-samples SAMP_OLD_001 \
--add-samples new_cohort.tsv \
--compact
Operations execute in this order: remove → add → compact.
Database Version
By default, the version label auto-increments (e.g., 1.0 → 2.0). Set a custom version:
View Changelog
Every update operation is logged. View the history:
Example output:
v1.0 2026-01-15 create 1371 samples added
v2.0 2026-02-01 add 42 samples added
v2.0 2026-02-15 remove 3 samples removed
v3.0 2026-03-01 compact compacted after removal
Update Sample Metadata
Correct a sample's sex or phenotype_codes without re-ingesting its VCF. Precomputed bitmaps are regenerated and the change is logged in the changelog.
Single sample
# Change sex
afquery update-db --db ./db/ --update-sample SAMP_001 --set-sex female
# Replace phenotype codes (replaces ALL current codes)
afquery update-db --db ./db/ --update-sample SAMP_001 --set-phenotype "E11.9,I10"
# Change both fields in one command
afquery update-db --db ./db/ \
--update-sample SAMP_001 \
--set-sex female \
--set-phenotype "E11.9,I10"
Batch update from TSV
Create a TSV file with a sample_name, field, new_value header. One change per row; the same sample can appear on multiple rows:
sample_name field new_value
SAMP_001 sex female
SAMP_002 phenotype_codes E11.9,I10
SAMP_003 sex male
SAMP_003 phenotype_codes C50
Operator note
Attach a free-text note to every changelog entry created by the update:
afquery update-db --db ./db/ \
--update-sample SAMP_001 \
--set-phenotype "E11.9" \
--operator-note "Corrected after clinical review 2026-03-19"
Verify the change
# Inspect the changelog
afquery info --db ./db/ --changelog
# List samples to confirm new values
afquery info --db ./db/ --samples
# Query with the updated phenotype
afquery query --db ./db/ --locus chr1:925952 --phenotype E11.9
Full Option Reference
See CLI Reference → update-db.
Next Steps
- Create a Database — initial database creation from a manifest
- Performance Tuning — thread and memory configuration for the build phase
- Multi-cohort Strategies — organizing and versioning databases across cohorts