psite_annotation.annotators.PeptidePositionAnnotator
- class psite_annotation.annotators.PeptidePositionAnnotator(annotation_file, pspInput=False, returnAllPotentialSites=False, localization_uncertainty=0, mod_dict={'S(Phospho (STY))': 's', 'S(ph)': 's', 'T(Phospho (STY))': 't', 'T(ph)': 't', 'Y(Phospho (STY))': 'y', 'Y(ph)': 'y', 'pS': 's', 'pT': 't', 'pY': 'y'}, return_unique=False, return_sorted=False, organism='human')
Bases:
objectAnnotate pandas dataframe with positions of the peptide within the protein sequence based on a fasta file.
Example
annotator = PeptidePositionAnnotator(<path_to_annotation_file>) annotator.load_annotations() df = annotator.annotate(df)
Initialize the input files and options for PeptidePositionAnnotator.
- Parameters:
annotation_file (
str) – fasta file containing protein sequencespspInput (
bool) – set to True if fasta file was obtained from PhosphositePlusreturnAllPotentialSites (
bool) – return all modifiable positions within the peptide as potential p-sites.localization_uncertainty (
int) – return all modifiable positions within n positions of modified sites as potential p-sites.mod_regex – regex to capture all modification strings
Methods
Adds columns regarding the peptide position within the protein to a pandas dataframe.
Reads in protein sequences from fasta file.
- annotate(df, inplace=False)
Adds columns regarding the peptide position within the protein to a pandas dataframe.
Adds the following annotation columns to dataframe:
‘Matched proteins’ = subset of ‘Proteins’ in the input column in which the protein could indeed be found. If the same peptide is found multiple times, the protein identifier will be repeated.
‘Start positions’ = starting positions of the modified peptide in the protein sequence (1-based, methionine is counted). If multiple isoforms/proteins contain the sequence, the starting positions are separated by semicolons in the same order as they are listed in the ‘Matched proteins’ column
‘End positions’ = end positions of the modified peptide in the protein sequence (see above for details)
‘Site positions’ = position of the modification (see ‘Start positions’ above for details on how the position is counted)
- Parameters:
df (
DataFrame) – pandas dataframe to be annotated with “Proteins” and “Modified sequence” columnsinplace (
bool) – add the new column to df in place
- Returns:
annotated dataframe
- Return type:
pd.DataFrame
- Required columns:
Proteins,Modified sequence
- load_annotations()
Reads in protein sequences from fasta file.
- Return type:
None