psite_annotation.annotators.SiteSequenceContextAnnotator

class psite_annotation.annotators.SiteSequenceContextAnnotator(annotation_file, pspInput=False, context_left=15, context_right=15, retain_other_mods=False, return_unique=False, return_sorted=False, organism='human')

Bases: object

Annotate pandas dataframe with +/- 15 amino acids around each of the modified sites, separated by semicolons.

Example

annotator = SiteSequenceContextAnnotator(<path_to_annotation_file>)
annotator.load_annotations()
df = annotator.annotate(df)

Initialize the input files and options for PeptidePositionAnnotator.

Parameters:
  • annotation_file (str) – fasta file containing protein sequences

  • pspInput (bool) – set to True if fasta file was obtained from PhosphositePlus

  • context_left (int) – number of amino acids to the left of the modification to include

  • context_right (int) – number of amino acids to the right of the modification to include

  • retain_other_mods (bool) – retain other modifications from the modified peptide in the sequence context in lower case

Methods

annotate

Adds columns regarding the peptide position within the protein to a pandas dataframe.

load_annotations

Reads in protein sequences from fasta file.

annotate(df, inplace=False)

Adds columns regarding the peptide position within the protein to a pandas dataframe.

Adds the following annotation columns to dataframe:

  • ‘Site sequence context’ = +/- 15 amino acids around each of the modified sites, separated by semicolons

Parameters:
  • df (DataFrame) – pandas dataframe to be annotated which contains a column “Site positions”

  • inplace (bool) – add the new column to df in place

Returns:

annotated dataframe

Return type:

pd.DataFrame

Required columns:

Site positions

load_annotations()

Reads in protein sequences from fasta file.

Return type:

None