Skip to content

varcode

Varcode is a Python library for manipulating genomic variants and predicting their effects on protein sequences.

Install

pip install varcode

Reference genome data is managed through PyEnsembl:

pyensembl install --release 75 76

Quick example

import varcode

variants = varcode.load_maf("tcga-ovarian-cancer-variants.maf")
TP53_effects = variants.groupby_gene_name()["TP53"].effects()
print(TP53_effects.top_priority_effect())

See the project README for a longer walkthrough and the effect-class table.

Feature guides

  • Effect annotation — how variants become effects, splice outcome representations, pluggable annotators, possibility sets for ambiguous outcomes, and structural variants. Start here.
  • Germline-aware annotation — classify somatic variants against the patient's germline-applied transcript; possibility sets when phase is unknown; LOH detection. New in 4.19.
  • Genotypes and sample-aware queries — per-sample zygosity on multi-sample VCFs.
  • VariantCollection transforms — pure VC -> VC refinements; ships pair_breakends for collapsing MATEID-paired BND rows into one combined variant.
  • CSV round-trip and metadata headersto_csv / from_csv with genome recovered from the header.
  • Error handlingReferenceMismatchError, GenomeBuildMismatchError, SampleNotFoundError, and raise_on_error=False.

API reference

The API reference is auto-generated from docstrings in the source.

Change log

See the changelog for release history.