psite_annotation.annotators.DomainAnnotator

class psite_annotation.annotators.DomainAnnotator(annotation_file)

Bases: object

Annotate pandas dataframe with domains from uniprot.

Requires ‘Matched proteins’, ‘Start positions’, ‘End positions’ columns in the dataframe to be annotated. The ‘Matched proteins’, ‘Start positions’, ‘End positions’ columns can be generated with PeptidePositionAnnotator().

Example

annotator = DomainAnnotator(<path_to_annotation_file>)
annotator.load_annotations()
df = annotator.annotate(df)

Initialize the input files and options for DomainAnnotator.

Parameters:: annotation_file (Union[str, IO]) – comma separated file with domains and their positions within the protein

Methods

`annotate`	Adds column with domains the peptide overlaps with.
`load_annotations`	Reads in comma separated file with domain annotations extracted from ProteomicsDB.

annotate(df, inplace=False)

Adds column with domains the peptide overlaps with.

Adds the following annotation columns to dataframe:

Domains = semicolon separated list of domains that overlap with the peptide

Parameters:

df (DataFrame) – pandas dataframe with ‘Proteins’, ‘Start positions’ and ‘End positions’ columns
inplace (bool) – Whether to modify the DataFrame rather than creating a new one.

Returns:

annotated dataframe

Return type:

pd.DataFrame

Required columns:

Matched proteins, Start positions, End positions

load_annotations()

Reads in comma separated file with domain annotations extracted from ProteomicsDB.

Return type:: None