varcode¶
Varcode is a Python library for manipulating genomic variants and predicting their effects on protein sequences.
Install¶
Reference genome data is managed through PyEnsembl:
Quick example¶
import varcode
variants = varcode.load_maf("tcga-ovarian-cancer-variants.maf")
TP53_effects = variants.groupby_gene_name()["TP53"].effects()
print(TP53_effects.top_priority_effect())
See the project README for a longer walkthrough and the effect-class table.
Feature guides¶
- Effect annotation — how variants become effects, splice outcome representations, pluggable annotators, possibility sets for ambiguous outcomes, and structural variants. Start here.
- Germline-aware annotation — classify somatic variants against the patient's germline-applied transcript; possibility sets when phase is unknown; LOH detection. New in 4.19.
- Genotypes and sample-aware queries — per-sample zygosity on multi-sample VCFs.
- VariantCollection transforms — pure
VC -> VCrefinements; shipspair_breakendsfor collapsing MATEID-paired BND rows into one combined variant. - CSV round-trip and metadata headers —
to_csv/from_csvwith genome recovered from the header. - Error handling —
ReferenceMismatchError,GenomeBuildMismatchError,SampleNotFoundError, andraise_on_error=False.
API reference¶
The API reference is auto-generated from docstrings in the source.
Change log¶
See the changelog for release history.