Basic usage

The psite-annotation package consists of annotator functions and annotator classes. Using the annotator functions is generally easier, whereas using the annotator classes offers more flexibility.

The first step is always to import the package:

import psite_annotation as pa

After loading your dataframe with pandas, you can then add annotations using one of the annotator functions. These functions always have the same structure, following general pandas principles:

df = pa.addSomeAnnotation(df, other_arguments, optional_argument=optional_argument)

This adds one or more new column(s) to the dataframe with the annotation(s).

Annotator functions

psite_annotation.addPeptideAndPsitePositions(df, ...)

Annotate pandas dataframe with positions of the peptide within the protein sequence based on a fasta file.

psite_annotation.addSiteSequenceContext(df, ...)

Annotate pandas dataframe with sequence context of a p-site.

psite_annotation.addPSPAnnotations(df, ...)

Annotate pandas dataframe with number of high and low-throughput studies according to PhosphositePlus.

psite_annotation.addPSPKinaseSubstrateAnnotations(df, ...)

Annotate pandas dataframe with upstream kinases according to PhosphositePlus.

psite_annotation.addPSPRegulatoryAnnotations(df, ...)

Annotate pandas dataframe with regulatory functions according to PhosphositePlus.

psite_annotation.addDomains(df, ...)

Adds column with domains the peptide overlaps with.

psite_annotation.addTurnoverRates(df, ...)

Annotate pandas dataframe with PTM turnover behavior.

psite_annotation.addInVitroKinases(df, ...)

Annotate pandas dataframe with upstream in vitro kinases according to Sugiyama et al (2019).

psite_annotation.addMotifs(df, motifsFile)

Adds column with motifs the site sequence context matches with.

psite_annotation.addKinaseLibraryAnnotations(df, ...)

Annotate pandas dataframe with highest scoring kinases from the kinase library.

Please note the following:

  • Each annotator function has one or more required columns, which are listed in the documentation of the corresponding function.

  • Multiple annotator functions can (and some times have to) be applied to the dataframe in succession.

  • The documentation of each annotator function also includes one or more examples.

Example: Add upstream kinases

To add upstream kinases to a pandas dataframe df with columns Proteins (UniProt identifiers separated by semicolons, e.g. Q86U42-2;Q86U42) and Modified sequence (standard MaxQuant notation, e.g. (ac)AAAAAAAAAAGAAGGRGS(ph)GPGR):

import psite_annotation as pa

df = pa.addPeptideAndPsitePositions(df, pa.pspFastaFile, pspInput = True)
df = pa.addPSPKinaseSubstrateAnnotations(df, pa.pspKinaseSubstrateFile)

We first annotate the peptide and modification positions within the protein using addPeptideAndPsitePositions(). This adds a column with an identifier for the phosphosite, which can then be mapped to the phosphorylating kinase using addPSPKinaseSubstrateAnnotations().