psite_annotation.addKinaseLibraryAnnotations

psite_annotation.addKinaseLibraryAnnotations(df, motifs_file, quantiles_file, top_n=5, sort_type='total', threshold_type='total', score_cutoff=3, split_sequences=False)

Annotate pandas dataframe with highest scoring kinases from the kinase library.

Johnson et al. 2023, https://doi.org/10.1038/s41586-022-05575-3

Requires “Site sequence context” column in the dataframe to be present. The “Site sequence context” column can be generated with PeptidePositionAnnotator() followed by SiteSequenceContextAnnotator().

Adds the following annotation columns to dataframe:

  • Motif Kinases = semicolon separated list of kinases that match with the site sequence contexts

  • Motif Scores = semicolon separated list of scores corresponding to Motif Kinases

  • Motif Percentiles = semicolon separated list of percentiles corresponding to Motif Kinases

  • Motif Totals = semicolon separated list of score*percentile corresponding to Motif Kinases

Example

df = pa.addPeptideAndPsitePositions(df, pa.pspFastaFile, pspInput = True)
df = pa.addKinaseLibraryAnnotations(df, pa.kinaseLibraryMotifsFile, pa.kinaseLibraryQuantilesFile)
Required columns:

Site sequence context

Parameters:
  • df (DataFrame) – pandas dataframe with ‘Site sequence context’ column

  • motifs_file (str) – tab separated file with in odds ratios for each kinase, AA and position

  • quantiles_file (str) – tab separated file with quantile score for each kinase

  • top_n (int) – maximum number of returned kinases (default: 5)

  • sort_type – score by which to sort the kinases, one of “percentile”, “score” or “total” (default: “total”)

  • threshold_type – score to which to apply the cutoff, one of “percentile”, “score” or “total” (total=score*percentile) (default: “total”)

  • score_cutoff (float) – do not report kinases with a score below this cutoff (default: 3.0)

  • split_sequences (bool) – if set to True, the ‘Site sequence context’ column is split by ‘;’ and exploded before annotating

Returns:

annotated dataframe

Return type:

pd.DataFrame