psite_annotation.annotators.ModifiedSequenceAggregatorAnnotator

class psite_annotation.annotators.ModifiedSequenceAggregatorAnnotator(experiment_cols, agg_func='mean', agg_cols=None)

Bases: object

Annotate and aggregate pandas dataframe with representative modified sequence from a modified sequence group.

Example

annotator = ModifiedSequenceAggregatorAnnotator()
df = annotator.annotate(df)

Initialize the options for ModifiedSequenceAggregatorAnnotator.

Parameters:
  • experiment_cols (list[str]) – list of column names with quantitative values.

  • agg_func (str) – function to aggregate quantitative values within each group, e.g. ‘mean’, ‘sum’, etc.

Methods

annotate

Group delocalized phospho-forms and aggregate their quantitative values.

load_annotations

rtype:

None

annotate(df)

Group delocalized phospho-forms and aggregate their quantitative values.

This function identifies peptide sequences that differ only by the position of their phosphorylation ((ph)) group and collapses them into “delocalized” groups. Each group contains all modified sequence variants that represent the same underlying peptide backbone.

The following columns are added to the dataframe:

  • ‘Modified sequence representative’ = A single representative sequence selected from the group, i.e. the most frequently measured across experiments.

  • ‘Modified sequence representative degree’ = Fraction of summed observation frequency contributed by the representative peptide.

All experiment columns (e.g. “Experiment 1”, “Experiment 2”, …) are aggregated per group by summing the intensities of member sequences.

Parameters:

df (DataFrame) – Input dataframe with: - “Modified sequence” column containing peptide strings with (ph) annotations - ‘Delocalized sequence’ = Canonical unmodified backbone with an index suffix to distinguish the number of modifications. - ‘Modified sequence group’ = All peptide variants belonging to the same delocalized group, concatenated with semicolons.

Returns:

Dataframe with grouped phospho-forms and aggregated intensities.

Return type:

pd.DataFrame

Required columns:

Modified sequence, Delocalized sequence, Modified sequence group