psite_annotation.annotators.SiteSequenceContextAnnotator

class psite_annotation.annotators.SiteSequenceContextAnnotator(annotation_file, pspInput=False, context_left=15, context_right=15, retain_other_mods=False, return_unique=False, return_sorted=False, organism='human')

Bases: object

Annotate pandas dataframe with +/- 15 amino acids around each of the modified sites, separated by semicolons.

Example

annotator = SiteSequenceContextAnnotator(<path_to_annotation_file>)
annotator.load_annotations()
df = annotator.annotate(df)

Initialize the input files and options for PeptidePositionAnnotator.

Parameters:

annotation_file (str) – fasta file containing protein sequences
pspInput (bool) – set to True if fasta file was obtained from PhosphositePlus
context_left (int) – number of amino acids to the left of the modification to include
context_right (int) – number of amino acids to the right of the modification to include
retain_other_mods (bool) – retain other modifications from the modified peptide in the sequence context in lower case

Methods

`annotate`	Adds columns regarding the peptide position within the protein to a pandas dataframe.
`load_annotations`	Reads in protein sequences from fasta file.

annotate(df, inplace=False)

Adds columns regarding the peptide position within the protein to a pandas dataframe.

Adds the following annotation columns to dataframe:

‘Site sequence context’ = +/- 15 amino acids around each of the modified sites, separated by semicolons

Parameters:

df (DataFrame) – pandas dataframe to be annotated which contains a column “Site positions”
inplace (bool) – add the new column to df in place

Returns:

annotated dataframe

Return type:

pd.DataFrame

Required columns:

Site positions

load_annotations()

Reads in protein sequences from fasta file.

Return type:: None