psite_annotation.addSiteSequenceContext

psite_annotation.addSiteSequenceContext(df, fastaFile, pspInput=False, context_left=15, context_right=15, retain_other_mods=False, return_unique=False, return_sorted=False, organism='human')

Annotate pandas dataframe with sequence context of a p-site.

Adds the following annotation columns to dataframe:

  • ‘Site sequence context’ = +/- 15 amino acids around each of the modified sites, separated by semicolons

Required columns:

Site positions

Parameters:
  • df (DataFrame) – pandas dataframe with ‘Site positions’ column

  • fastaFile (str) – fasta file containing protein sequences

  • pspInput (bool) – set to True if fasta file was obtained from PhosphositePlus

  • context_left (int) – number of amino acids to the left of the modification to include

  • context_right (int) – number of amino acids to the right of the modification to include

  • retain_other_mods (bool) – retain other modifications from the modified peptide in the sequence context in lower case

  • return_unique (bool) – eliminate duplicated sequences from the ‘Site sequence context’ column, not preserving the order between the this column and the rest of the data frame

  • return_sorted (bool) – sort the sequences from the ‘Site sequence context’ column alphabetically, not preserving the order between the this column and the rest of the data frame

Returns:

annotated dataframe

Return type:

pd.DataFrame