psite_annotation.annotators.KinaseLibraryAnnotator

class psite_annotation.annotators.KinaseLibraryAnnotator(motifs_file, quantiles_file, top_n=5, score_cutoff=3, split_sequences=False, threshold_type='total', sort_type='total')

Bases: object

Annotate pandas dataframe with highest scoring kinases from the kinase library.

Johnson et al. 2023, https://doi.org/10.1038/s41586-022-05575-3

Requires “Site sequence context” column in the dataframe to be present. The “Site sequence context” column can be generated with PeptidePositionAnnotator().

Example

annotator = KinaseLibraryAnnotator(<path_to_motifs_file>, <path_to_quantiles_file>)
annotator.load_annotations()
df = annotator.annotate(df)

Initialize the input files and options for MotifAnnotator.

Parameters:: annotation_file – tab separated file with motifs and their identifiers

Methods

`annotate`	Adds column with motifs the site sequence context matches with.
`load_annotations`	Reads in tab separated file with motif and quantile annotations.

annotate(df, inplace=False)

Adds column with motifs the site sequence context matches with.

Adds the following annotation columns to dataframe:

Motif Kinases = semicolon separated list of kinases that match with the site sequence contexts
Motif Scores = semicolon separated list of scores corresponding to Motif Kinases
Motif Percentiles = semicolon separated list of percentiles corresponding to Motif Kinases
Motif Totals = semicolon separated list of score*percentile corresponding to Motif Kinases

Parameters:

df (DataFrame) – pandas dataframe with “Site sequence context” column
inplace (bool) – Whether to modify the DataFrame rather than creating a new one.

Returns:

annotated dataframe

Return type:

pd.DataFrame

Required columns:

Site sequence context

load_annotations()

Reads in tab separated file with motif and quantile annotations.

Return type:: None