psite_annotation.aggregateModifiedSequenceGroups

psite_annotation.aggregateModifiedSequenceGroups(df, experiment_cols, match_tolerance=2, agg_cols=None, agg_func='mean')

Annotate and aggregate DataFrame with representative sequences from grouped localizations.

Requires “Modified sequence” column in the dataframe to be present.

Adds the following annotation columns to dataframe:

  • ‘Delocalized sequence’ = Canonical unmodified backbone with an index suffix to distinguish the number of modifications.

  • ‘Modified sequence group’ = All peptide variants belonging to the same delocalized group, concatenated with semicolons.

  • ‘Modified sequence representative’ = A single representative sequence selected from the group, i.e. the most frequently measured across experiments.

  • ‘Modified sequence representative degree’ = Fraction of summed observation frequency contributed by the representative peptide.

All experiment columns (e.g. “Experiment 1”, “Experiment 2”, …) are aggregated per group by summing the intensities of member sequences.

Example

df = pa.aggregateModifiedSequenceGroups(df)
Required columns:

Modified sequence

Parameters:
  • df (DataFrame) – pandas dataframe with ‘Modified sequence’ column.

  • experiment_cols (list[str]) – list of column names with quantitative values.

  • match_tolerance (int) – group all modifiable positions within n positions of modified sites.

  • agg_func (str) – function to aggregate quantitative values within each group, e.g. ‘mean’, ‘sum’, etc.

  • agg_cols (Optional[dict[str, Any]]) – dictionary for non-quantitative columns of {column name: aggregation function}.

Returns:

annotated and aggregated dataframe

Return type:

pd.DataFrame