psite_annotation.annotators.ModifiedSequenceGroupAnnotator

class psite_annotation.annotators.ModifiedSequenceGroupAnnotator(match_tolerance=2)

Bases: object

Annotate pandas dataframe with modified sequence groups where localizations are within match_tolerance of each other.

Example

annotator = ModifiedSequenceGroupAnnotator()
df = annotator.annotate(df)

Initialize the options for ModifiedSequenceGroupAnnotator.

Parameters:

match_tolerance (int) – group all modifiable positions within n positions of modified sites.

Methods

annotate

Group delocalized phospho-forms.

load_annotations

rtype:

None

annotate(df, inplace=False)

Group delocalized phospho-forms.

This function identifies peptide sequences that differ only by the position of their phosphorylation ((ph)) group and collapses them into “delocalized” groups. Each group contains all modified sequence variants that represent the same underlying peptide backbone.

The following columns are added to the dataframe:

  • ‘Delocalized sequence’ = Canonical unmodified backbone with an index suffix to distinguish the number of modifications.

  • ‘Modified sequence group’ = All peptide variants belonging to the same delocalized group, concatenated with semicolons.

Parameters:
  • df (DataFrame) – Input dataframe with: - “Modified sequence” column containing peptide strings with (ph) annotations

  • inplace (bool) – add the new column to df in place

Returns:

Dataframe with Modified sequence group column

Return type:

pd.DataFrame

Required columns:

Modified sequence