Abstract

The subcellular location of a protein is a key factor in determining the molecular function of the protein in an organism. MetazSecKB is a secretome and subcellular proteome knowledgebase specifically designed for metazoan, i.e. human and animals. The protein sequence data, consisting of over 4 million entries with 121 species having a complete proteome, were retrieved from UniProtKB. Protein subcellular locations including secreted and 15 other subcellular locations were assigned based on either curated experimental evidence or prediction using seven computational tools. The protein or subcellular proteome data can be searched and downloaded using several different types of identifiers, gene name or keyword(s), and species. BLAST search and community annotation of subcellular locations are also supported. Our primary analysis revealed that the proteome sizes, secretome sizes and other subcellular proteome sizes vary tremendously in different animal species. The proportions of secretomes vary from 3 to 22% (average 8%) in metazoa species . The proportions of other major subcellular proteomes ranged approximately 21–43% (average 31%) in cytoplasm, 20–37% (average 30%) in nucleus, 3–19% (average 12%) as plasma membrane proteins and 3–9% (average 6%) in mitochondria. We also compared the protein families in secretomes of different primates. The Gene Ontology and protein family domain analysis of human secreted proteins revealed that these proteins play important roles in regulation of human structure development, signal transduction, immune systems and many other biological processes.

Database URL:http://proteomics.ysu.edu/secretomes/animal/index.php

Introduction

Secreted proteins play important roles in the development of multicellular organisms, serving as signal molecules, extracellular enzymes and structural matrix. The first sequenced protein, human insulin, was actually a secreted protein. Human secreted proteins have potential to be used as biomarkers for the diagnosis of diseases ( 1 ). The term ‘secretome’ was first used by Tjalsma et al . ( 2 ) to include all proteins that are synthesized and processed by the secretary pathway and proteins located in the secretion machinery. However, the term recently was limited to include only the set of secreted or extracellular proteins in a species ( 3 , 4 ). The secretome plays a central role in creating an extracellular environment that allows for physiological coordination and maintaining the homeostatic conditions that support cellular life and thus the organism.

Because of biomedical importance, secretome identification and analysis have been carried out in a number of human and animal cells or tissues including human arterial smooth muscle cells ( 5 ), human oligodendrocytes ( 6 ), human mesenchymal stem cells ( 7 ), human and mouse preimplantation embryos ( 8 ), primary human adipocytes during insulin resistance ( 9 ), rat adipose tissues ( 10 ), 23 cancer cell lines ( 11 ), and different types of human primary cell cultures and human body fluids including plasma, cerebrospinal fluid and urine ( 12 ). In addition to experimental characterization of human secretomes in various cell types, proteome-wide computational prediction of secretomes has been performed in mouse ( 13 ), human, pufferfish, pigs, and zebrafish ( 14 , 15 ). A secreted protein database was developed for human, rat and mouse, but unfortunately this database has not been updated since 2006 ( http://spd.cbi.pku.edu.cn/ ) ( 16 ), and another database, LOCATE, describing the membrane organization and subcellular location including secreted proteins was developed for mouse and human only ( http://locate.imb.uq.edu.au/ ) ( 17 ). However, as the complete genome sequencing projects have generated many complete proteome data in animal species, a database having information for computational prediction and curated information of secretomes and other subcellular proteomes in these species would provide a useful resource for both searching an individual protein subcellular location and performing proteome-wide comparative analysis.

In this work, we describe MetazSecKB, the Metazoan, i.e. human and animals, Secretome and Subcellular Proteome Knowledgebase. MetazSecKB is constructed with all available human and animal protein sequences by combining curated subcellular information and predicted information, with a well tested computational protocol, on secretomes and other subcellular proteomes of 15 subcellular locations. This knowledgebase is expected to serve as a central portal for providing information on metazoan protein subcellular locations for biological and medical researchers interested in protein biology.

Data collection and database implementation

Data collection

The protein sequences for the kingdom Animalia, also called Metazoa, were retrieved from the UniProtKB/Swiss-Prot dataset and the UniProtKB/TrEMBL dataset (release 2014_01) ( http://www.uniprot.org/downloads ). The UniProtKB/Swiss-Prot dataset contains manually annotated and reviewed protein sequences with information extracted from literature of experimental results and curator-evaluated computational analysis ( 18 ). The UniProtKB/TrEMBL dataset contains computationally analysed protein sequences. The combined metazoan dataset consisted of a total of 4 080 818 protein entries with 103 088 and 3 977 730 entries from the UniProtKB/Swiss-Prot dataset and the UniProtKB/TrEMBL dataset, respectively. The identifier mapping data including UniProt accession number (AC), UniProt ID, RefSeq accession number and gi number were retrieved from the UniProt ID mapping data file.

Protein subcellular localization prediction

We have previously evaluated several computational tools for predicting classic secreted proteins, i.e. proteins having a secretory signal peptide at the N-terminus ( 19 ) (Min 2010). These tools were chosen because they have relatively high prediction accuracy and are available as standalone tools for local processing of large datasets. The protein sequences were processed using the following programs: SignalP (version 3.0 and 4.0) ( 20 , 21 ), Phobius ( 22 ), WoLF PSORT ( 23 ) and TargetP ( 24 ) for secretory signal peptide and subcellular location prediction. TMHMM (version 2.0) was used to identify proteins having transmembrane domains ( 25 ) and Scan-Prosite (called PS-Scan in standalone version) ( http://www.expasy.org/tools/scanprosite/ ) was used to scan endoplasmic reticulum (ER) targeting sequence (Prosite: PS00014) ( 26 , 27 ). Proteins having one or more membrane domains, but not located within the N-terminus (the first 70 amino acids), were predicted as membrane proteins by TMHMM. The tools mentioned above were installed on a local Linux system for data processing. The commands for running these tools were summarized by Lum and Min ( 28 ). Protein sequences predicted to have a signal peptide by SignalP (version 3) were further processed using FragAnchor webserver to identify the glycosylphosphatidyinositol (GPI) anchors ( http://navet.ics.hawaii.edu/∼fraganchor/NNHMM/NNHMM.html ) ( 29 ). These tools have been used for processing fungal and plant protein sequences in construction of FunSecKB ( 3 ), FunSecKB2 ( 4 ) and PlantSecKB ( 30 ). However, based on our previous evaluations, the detailed methods were slightly different for assigning secretomes in different kingdoms of eukaryotes ( 19 ).

The metazoan protein subcellular locations are classified into the following categories: secreted proteins, mitochondrial (membrane or non-membrane), ER (membrane or lumen), cytosol (cytoplasm), cytoskeleton, Golgi apparatus (membrane or lumen), nuclear (membrane or non-membrane), vacuolar (membrane or non-membrane), lysosome, peroxisome, plasma membrane, other membrane and GPI-anchored proteins. For assigning a protein subcellular location, the UniProtKB subcellular annotation information was considered prior to using prediction information. For proteins not having annotated subcellular information, their subcellular location assignments are based on computational prediction. In this work, SignalP4 is used to replace SignalP3 as SignalP4 improves the prediction accuracy ( 21 , 31 ). However, the information generated by SignalP3 was also included as it predicts signal peptide cleavage sites more accurately than SignalP4 ( 21 ). The rules for assigning a protein subcellular location are defined below.

Secreted protein

Secreted proteins are further divided as curated secreted proteins, highly likely secreted, likely secreted, and weakly likely secreted. Curated secreted proteins are proteins that are annotated and reviewed to be ‘secreted’ or ‘extracellular’ in the subcellular location from the UniProtKB/Swiss-Prot dataset. Four predictors consisting of SignalP4, Phobius, TargetP and WoLF PSORT are used for protein secretory signal peptide or subcellular location prediction ( 19 ). The highly likely secreted, likely secreted and weakly likely secreted proteins are proteins that are predicted to be secreted or contain a secretory signal peptide by four and three, two or one of the four tools, respectively. The accuracies for these subcategories of secreted proteins are reported in the section of results. It should be noted that proteins having a transmembrane domain or an ER retention signal were excluded from this set. We recommend that the data for making up a secretome should consist of curated secreted proteins and the predicted highly likely secreted protein dataset. The rational for having subcategories of likely secreted and weakly likely secreted proteins is to provide a means for a user to access these data as some of them may be real secreted proteins.

Mitochondrial proteins

A protein predicted as ‘M’ (for mitochondrial) for subcellular location by TargetP and ‘mito’ by WoLF PSORT is classified as a mitochondrial protein. The accuracy is reported in the result. If it is also classified as a membrane protein by TMHMM, then it is further classified as mitochondrial membrane protein.

ER proteins

ER proteins were predicted using WoLF PSORT and PS-Scan. If they contain one or more transmembrane domains, they are classified as ER membrane proteins. Otherwise, they are classified as ER luminal proteins. Proteins predicted to contain a signal peptide by SignalP 4.0 and an ER target signal (Prosite: PS00014) by PS-Scan often are luminal ER proteins.

GPI-anchored proteins

Signal peptide containing proteins that were predicted to have a GPI anchor by FragAnchor were further classified as GPI-anchored proteins. Protein sequences predicted to have a signal peptide and a GPI anchor may attach to the outer leaflet of the plasma membrane or are secreted, thereby becoming components of the extracellular matrix.

Proteins in other subcellular locations

Other subcellular locations, including cytoplasm (cytosol), cytoskeleton, Golgi apparatus, lysosome, nucleus, peroxisome, plasma membrane and vacuole, were predicted by WoLF PSORT. For a protein predicted as located in Golgi apparatus, nucleus or vacuole, it was further classified as a membrane protein in that specific subcellular location if it contained one or more transmembrane domain predicted by TMHMM.

Database implementation

The protein sequence data, species information, subcellular annotation and information predicted from the tools mentioned above were formatted into tab-delimited text files and were stored in a relational database using MySQL hosted in a Linux server. The user interface and modules to access the data were implemented using PHP. BLAST utility and community annotation submission can be accessed from links on the main user interface at http://proteomics.ysu.edu/secretomes/animal/index.php . The supplementary tables and all other data described in the work can be downloaded at http://proteomics.ysu.edu/publication/data/MetazSecKB/ .

Evaluation of prediction accuracies of protein subcellular locations

The prediction tools we employed above were based on our previous evaluation ( 19 , 31 , 32 ). To further evaluate the prediction accuracies of our rule-based methods for each subcellular location in this dataset, we retrieved protein entries having an annotated, unique subcellular location from UniProtKB/Swiss-Prot dataset. Proteins having multiple subcellular locations or labeled as ‘fragment’ or not starting with ‘M’ or having a length < 70 amino acids were excluded. Protein entries having a term including ‘By similarity’, ‘Probable’ or ‘Potential’ in their subcellular location annotation were excluded. The prediction accuracy for each subcellular location was evaluated using prediction sensitivity ( Equation 1 ), specificity ( Equation 2 ) and Matthews Correlation Coefficient (MCC) ( Equation 3 ) ( 33 ).
Sensitivity (%)= TP/(TP + FN) × 100
(1)
Specificity (%)= TN/(TN + FP) × 100
(2)
MCC(%)=(TP×TNFP×FN)×100/((TP+FP)(TP+FN)(TN+FP)(TN+FN))1/2
(3)
TP is the number of true positives, FN is the number of false negatives, FP is the number of false positives and TN is the number of true negatives. The MCC is used as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure. The MCC returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 means no better than random prediction, and −1 indicates total disagreement between prediction and observation ( 33 ). The dataset contains a total of 18,874 proteins. For each category, the number of actual positives equals TP plus FN and the number of actual negatives equals FP plus TN ( Table 1 ). As both TargetP and WoLF PSORT can predict mitochondrial proteins, we evaluated their prediction accuracy, either used individually or combined, using a dataset consisting of 1870 annotated mitochondrial proteins as positives and 17 004 proteins located in other subcellular locations as negatives.
Table 1.

Prediction accuracy evaluation of human and animal protein subcellular locations a

TPFPTNFNSn (%)Sp (%)MCC
(a) Mitochondrial proteins
 TargetP93097216 03294049.794.30.44
 WoLF PSORT92048216 52295049.297.20.53
 TargetP AND WoLF PSORT79426216 742107642.598.50.53
 TaregetP OR WoLF PSORT1056120215 80281456.592.90.45
(b) Secreted proteins b
 Secreted502427612 87470087.897.90.88
 S + HLS535052212 62837493.596.00.89
 S + HLS + LS541379412 35631194.694.00.87
 S + HLS + LS + WLS5440146211 68828495.088.90.80
(c) The subcellular locations
 Cytoplasm1095112415 77987655.693.40.46
 Cytoskeleton2186318 02057327.699.70.45
 ER25718717 90652432.999.00.42
 Golgi122118 5842574.599.90.12
 Lysosome1818 6751900.5100.00.02
 Nucleus297989314 19081278.694.10.72
 Peroxisome410118 6531163.399.50.03
 Plasma membrane276764714 88058082.795.80.78
 Vacuole0018 855190.0100.0-
TPFPTNFNSn (%)Sp (%)MCC
(a) Mitochondrial proteins
 TargetP93097216 03294049.794.30.44
 WoLF PSORT92048216 52295049.297.20.53
 TargetP AND WoLF PSORT79426216 742107642.598.50.53
 TaregetP OR WoLF PSORT1056120215 80281456.592.90.45
(b) Secreted proteins b
 Secreted502427612 87470087.897.90.88
 S + HLS535052212 62837493.596.00.89
 S + HLS + LS541379412 35631194.694.00.87
 S + HLS + LS + WLS5440146211 68828495.088.90.80
(c) The subcellular locations
 Cytoplasm1095112415 77987655.693.40.46
 Cytoskeleton2186318 02057327.699.70.45
 ER25718717 90652432.999.00.42
 Golgi122118 5842574.599.90.12
 Lysosome1818 6751900.5100.00.02
 Nucleus297989314 19081278.694.10.72
 Peroxisome410118 6531163.399.50.03
 Plasma membrane276764714 88058082.795.80.78
 Vacuole0018 855190.0100.0-

Note: FP, false positives; FN, false negatives; MCC, Matthews correlation coefficient; Sn, sensitivity; Sp, specificity; TP, true positives; TN, true negatives.

a The dataset contains a total of 18 874 proteins.

b Secreted: predicted by four predictors; HLS: highly likely secreted, predicted by three out of four predictors; LS: likely secreted, predicted by two out of four predictors; WLS: weakly likely secreted, predicted by one out of four predictors.

Table 1.

Prediction accuracy evaluation of human and animal protein subcellular locations a

TPFPTNFNSn (%)Sp (%)MCC
(a) Mitochondrial proteins
 TargetP93097216 03294049.794.30.44
 WoLF PSORT92048216 52295049.297.20.53
 TargetP AND WoLF PSORT79426216 742107642.598.50.53
 TaregetP OR WoLF PSORT1056120215 80281456.592.90.45
(b) Secreted proteins b
 Secreted502427612 87470087.897.90.88
 S + HLS535052212 62837493.596.00.89
 S + HLS + LS541379412 35631194.694.00.87
 S + HLS + LS + WLS5440146211 68828495.088.90.80
(c) The subcellular locations
 Cytoplasm1095112415 77987655.693.40.46
 Cytoskeleton2186318 02057327.699.70.45
 ER25718717 90652432.999.00.42
 Golgi122118 5842574.599.90.12
 Lysosome1818 6751900.5100.00.02
 Nucleus297989314 19081278.694.10.72
 Peroxisome410118 6531163.399.50.03
 Plasma membrane276764714 88058082.795.80.78
 Vacuole0018 855190.0100.0-
TPFPTNFNSn (%)Sp (%)MCC
(a) Mitochondrial proteins
 TargetP93097216 03294049.794.30.44
 WoLF PSORT92048216 52295049.297.20.53
 TargetP AND WoLF PSORT79426216 742107642.598.50.53
 TaregetP OR WoLF PSORT1056120215 80281456.592.90.45
(b) Secreted proteins b
 Secreted502427612 87470087.897.90.88
 S + HLS535052212 62837493.596.00.89
 S + HLS + LS541379412 35631194.694.00.87
 S + HLS + LS + WLS5440146211 68828495.088.90.80
(c) The subcellular locations
 Cytoplasm1095112415 77987655.693.40.46
 Cytoskeleton2186318 02057327.699.70.45
 ER25718717 90652432.999.00.42
 Golgi122118 5842574.599.90.12
 Lysosome1818 6751900.5100.00.02
 Nucleus297989314 19081278.694.10.72
 Peroxisome410118 6531163.399.50.03
 Plasma membrane276764714 88058082.795.80.78
 Vacuole0018 855190.0100.0-

Note: FP, false positives; FN, false negatives; MCC, Matthews correlation coefficient; Sn, sensitivity; Sp, specificity; TP, true positives; TN, true negatives.

a The dataset contains a total of 18 874 proteins.

b Secreted: predicted by four predictors; HLS: highly likely secreted, predicted by three out of four predictors; LS: likely secreted, predicted by two out of four predictors; WLS: weakly likely secreted, predicted by one out of four predictors.

Results

Prediction accuracy evaluation

Mitochondrial proteins

The accuracy results are shown in Table 1 a. When an individual tool was used, WoLF PSORT prediction showed a slightly lower sensitivity but a higher specificity than TargetP prediction. Thus, the MCC value was higher in the set predicted by WoLF PSORT (0.53) than the set predicted by TargetP (0.44). If only positives predicted by both tools were used, the specificity was slightly increased and the MCC value remains unchanged (0.53) compared with WoLF PSORT prediction. In contrast, including positives predicted by either tool decreased the MCC value to 0.45. Thus we assigned mitochondrial subcellular locations to entries only predicted to be mitochondrial proteins by both programs. As the specificity was high (up to 98.5%) when both tools were used, these predicted entries were reasonably reliable. However, the prediction sensitivity (42.5%) of the tools was low, i.e. more than half of proteins located in mitochondria remained to be predicted. Thus future efforts need to be made to improve prediction sensitivity for mitochondrial proteins.

Secreted proteins

Our previous evaluation showed that secreted prediction accuracy can be improved by removing transmembrane proteins, which can be predicted using TMHMM, and ER resident proteins, which can be predicted using PS-Scan ( 19 ). As we employed four tools—SignalP (version 4), TargetP, WoLF PSORT and Phobius—for predicting secreted proteins or secretory signal peptides, we had to determine which should be included in the secretome set. After removing transmembrane proteins and ER proteins, the protein set predicted either to contain a secretory signal peptide or to be secreted are divided into four categories: (i) Secreted: predicted by 4 predictors; (ii) Highly likely secreted (HLS): predicted by 3 out of 4 predictors; (iii) Likely secreted (LS): predicted by 2 out of 4 predictors; and (iv) Weakly likely secreted (WLS): predicted by 1 out of 4 predictors. The dataset consisted of 5724 curated secreted proteins as positives and 13 150 proteins located in other subcellular locations as negatives. The accuracy results are shown in Table 1 b.

As expected, when only entries were predicted by all four tools to be positives as true positives, the prediction specificity was increased. However, the sensitivity was decreased. On the other hand, the prediction specificity was decreased but the sensitivity was increased when including all entries predicted by any of the four tools to be positives as true positives. Based on the MCC values, the most accurate prediction (0.89) for a secretome includes secreted entries predicted by at least three out of four predictors with a specificity of 96.0% and a sensitivity of 93.5% ( Table 1 b). Thus, we recommend including only curated secreted proteins and highly likely secreted proteins for estimating the secretome size. Though including the set of likely secreted proteins increased the coverage of a secretome, it increased more (272 entries) false positives than true (63 entries) positives. It should be noted that both entries predicted by 4 of 4 tools and 3 of 4 tools were assigned as the category of highly like secreted in the database, making them distinguishable from curated secreted entries.

Proteins in other subcellular locations

Proteins for the cytoplasm subset also include cytosol as these two terms are used interchangeably in the UniProtKB annotation. However, we noticed that the annotated cytoskeleton entries are also annotated as cytoplasm. In our evaluation, cytoskeleton proteins were not counted in the subset of cytoplasm. We would also like to point out that plasma membrane proteins were annotated as cell membrane in UniProtKB, thus cell membrane proteins were retrieved for evaluating the category of plasma membrane. The prediction accuracy results for proteins located in cytoplasm, cytoskeleton, ER, Golgi apparatus, lysosome, nucleus, peroxisome, plasma membrane and vacuole are shown in Table 1 c.

The prediction accuracies for these subcellular locations vary significantly. Predictions of proteins located in nucleus and plasma membrane were relatively accurate with a MCC value of 0.78 and 0.72, respectively. Predictions for proteins located in cytoplasm, cytoskeleton, and ER were highly specific (specificity 93.4–99.7%) with a MCC value of 0.42–0.46. However, the sensitivities (27.6–55.6%) need to be improved for these subcellular locations. Predictions for proteins located in Golgi apparatus, lysosome, peroxisome were also highly specific (specificity > 99%) but with a very low sensitivity (0.5–4.5%). Human and animal vacuolar proteins could not be predicted by WoLF PSORT as there were no positive being predicted ( Table 1 c). It should be noted that the low MCC values for some of the subcellular locations were caused by low sensitivities, and in fact, the specificities were relatively high. Thus, there are a good number of proteins located in these subcellular locations not being predicted. However, if a protein is predicted to be located in such a location, the prediction is most likely reliable.

Database statistics: subcellular proteome distribution in different species

The database contains curated and predicted subcellular location information of 4 080 818 metazoan proteins that were downloaded from UniProtKB. These proteins were generated from 185 256 metazoa species and subspecies with 121 of them having a complete proteome. Species specific proteins located at each subcellular location can be searched and downloaded from the database user interface. The distributions of subcellular proteomes in human and different animal species having a complete proteome are summarized in Table 2 and Supplementary Table S1 . Table 2 includes the following subcellular locations: secreted proteins (3 subcategories), mitochondrial membrane and mitochondrial non-membrane, cytoplasm (cytosol), nuclear membrane and nuclear non-membrane, plasma membrane. The category of secreted proteins includes the following subcategories: curated secreted, highly likely secreted and likely secreted. Information on other subcellular protein locations including weakly likely secreted, cytoskeleton, ER (membrane or lumen), Golgi apparatus (membrane or lumen), lysosome, peroxisome, vacuole (membrane or non-membrane), other membrane, other curated locations and the information of species taxonomy can be found in Supplementary Table S1 .

Table 2.

Summary of proteins located in some major subcellular locations in human and different animal species

ReferenceproteomeTotalproteinsCuratedsecreted Predicted
Mito
Nuc
PlasmamemSecr(%)
HLSLSmemnon-memCytmemnon-memSecr
Vertebrata (Actinopterygii)
Oryzias latipes24 63326 0601441805649141118573301628629358019497.5
Xiphophorus maculatus20 45120 527921476510739595288927237305215687.6
Oreochromis niloticus26 75327 5511222179638148105169711499638407823018.4
Gasterosteus aculeatus27 24828 1101141813618106141880801429443387719276.9
Takifugu rubripes47 85649 09026126551028172164512 63032317 843844329165.9
Tetraodon nigroviridis*23 07349 32719427001236182224813 33333716 961594428945.9
Danio rerio*41 05455 41437246351189282231914 90926719 521686650079.0
Vertebrata (Amphibia)
Xenopus tropicalis*23 49130 52119419266561691327967416810 086411321206.9
X. laevis16 011269105926216175251241095711163613288.3
Vertebrata (Mammalia)
 Glires
   Oryctolagus cuniculus21 15022 7883341670479135122256921507241366520048.8
   Heterocephalus glaber21 44921 54893126651390100963431036924326613596.3
   Cavia porcellus19 91120 3782361432461100101653491036410334216688.2
   Cricetulus griseus23 88424 442109140792796117070731167114278615166.2
   Mus musculus*43 53974 158179243501698717344320 45671423 226913761428.3
   Rattus norvegicus*27 34033 55596622116374761473915341110 094540731779.5
   Spermophilus tridecemlineatus19 96620 07911014374298393754881036603329015477.7
 Primates
   Macaca fascicularis *17 39628 9552331912928359197676391218186297021457.4
   M. mulatta*35 53669 56740745541694653371918 66732623 502729549617.1
   Gorilla gorilla gorilla27 28627 3712121994676218148067011519481335822068.1
   Homo sapiens*68 04913 5661202066823480373717 62334 82587734 27410 60787026.4
   Pan troglodytes*20 12633 32629622418204471825796613711 618419025377.6
   Pongo abelii22 78524 5292371818580229145764521688228287920558.4
   Nomascus leucogenys19 73419 837141148951811411435053996893245716308.2
   Callithrix jacchus*42 02555 08524437761280195286715 06430820 159617840207.3
   Otolemur garnettii19 93020 1569915154809310225226966801309916148.0
 Carnivora
   Canis familiaris25 43928 3623451813595385149170401709489404721587.6
   Mustela putorius furo38 8261732017984137212711 78515412 830407321905.6
   Neovison vison16 2371875035666839563652523315077684.7
   Ailuropoda melanoleuca*21 13635 74324720867791761746997516211 905513323336.5
   Felis catus20 30321 2301961406483108106558311076791309116027.5
 Cetartiodactyla
   Bos mutus18 92215013774531239314911855854315915278.1
   Bos taurus*23 84231 7808802215620508149181713179598449230959.7
   Sus scrofa*26 05433 9626452534779411156090381669463478731799.4
   Camelus ferus20 02867108463699113257151476257258811515.7
 Chiroptera
   Pteropus alecto19 52019 54897116248874116053641216774244712596.4
   Myotis brandtii19 3015810324329093858061046427225010905.6
   M. davidii15 44615 4666791634560782453073519418169836.4
   M. lucifugus20 65020 899143173843110010525716966782285518819.0
 Other mammalia
   Loxodonta africana25 61525 8321321744556128111965541298459483518767.3
   Equus caballus22 67627 8412721659514284104288861338701382519316.9
   Tupaia chinensis20 82420 85185127552764114956991256701311413606.5
   Sarcophilus harrisii22 38822 565107149055311086763681027495349515977.1
   Monodelphis domestica22 24022 794108150548584110363981067252393016137.1
   Ornithorhynchus anatinus23 55223 763113120269810311117229957184315713155.5
Vertebrata (Testudines + Archosauria group)
Anas platyrhynchos *16 37731 8791391360893123126910 5421489829331614994.7
Meleagris gallopavo16 53716 991114892413756735622835377207310065.9
Gallus gallus*17 62323 8004401640581278123164031477077328220808.7
Ficedula albicollis15 92216 14864985390577784669815208202110496.5
Taeniopygia guttata18 14119 7248571643277972674972619722118014.1
Chelonia mydas19 0667188047871794603197638421949515.0
Pelodiscus sinensis20 7841261271492647986724926703268313976.7
Other vertebrata
Petromyzon marinus13 1605452225566669394534379712395764.4
Latimeria chalumnae23 42923 5137512705938799381171167349290513455.7
Anolis carolinensis19 10919 5628112385102889365727896435247813196.7
Invertebrate
Chordata (Tunicata)
Oikopleura dioica*17 05029 0571518641177116149310 785928060254018796.5
Ciona intestinalis17 30818 639281507601957386231475014181215358.2
C. savignyi20 00420 1174582235952697779255568525358674.3
Ecdysozoa (Arachnida)
Tetranychus urticae18 08218 2431218916859570159865936152219190310.4
Ixodes ricinus16 1991355481913684835289736491246355521.9
I. scapularis20 47321 16221194376210314335706955905168119649.3
Rhipicephalus pulchellus11 205116204596096223804934241231162114.5
Ecdysozoa (Insecta)
Drosophila mojavensis14 52515 0862120493479583243027442411700207013.7
D. virilis14 45614 9283419413548477043085743231709197513.2
D. erecta15 1164422203627187339185943661721226415.0
D. grimshawi14 75414 7983118613626277343367941951698189212.8
D. ananassae14 96815 2982821393496479142436745031793216714.2
D. melanogaster*20 12039 9512544761923269212710 61319112 1014659501512.6
D. persimilis16 75416 8612821064207793048466850761701213412.7
D. pseudoobscura pseudoobscura 17 0474823164167793946327151261950236413.9
D. sechellia16 13416 3613722504107193644645447651734228714.0
D. simulans15 35419 057572372436100102854835653742165242912.7
D. willistoni15 44715 5642518753557781548086044341722190012.2
D. yakuba17 25741252139277100647525950241798256214.8
Megaselia scalaris11 46311 503107734173144949472424026397836.8
Anopheles darlingi10 44711 686879327293758368754397111628016.9
A. gambiae*13 07219 38450261041087110460275748341869266013.7
Aedes aegypti16 65417 68354236746918296150525648732008242113.7
Culex quinquefasciatus18 70319 062252345534128110457517355011866237012.4
Dendroctonus ponderosae23 65014199254910611538928966087250220068.5
Tribolium castaneum16 50217 0742617174236684658305041962109174310.2
Apis mellifera10 91012 2996575726781390444045329317148226.7
Camponotus floridanus14 78714 8011566232945744553365407813466774.6
Acromyrmex echinatior13 96213 9701759232736847522662421912536094.4
Atta cephalotes18 07918 11316753597991094657975471515597694.2
Solenopsis invicta14 19314 35926636437100748541331350811206624.6
Harpegnathos saltator15 02915 0421773932946696548460422312997565.0
Nasonia vitripennis17 04017 289141545305657016951554883142315599.0
Bombyx mori14 76717 915125177337910880662935445801681189810.6
Danaus plexippus16 25316 358341486441958085657664528149315209.3
Pararge aegeria15 104124285617585059831437635034402.9
Rhodnius prolixus15 18016 639441420537415626782623769147314648.8
Acyrthosiphon pisum35 80935 211241834814102173615 209668622173618585.3
Pediculus humanus subsp. corporis 10 8471151325737349429440317411935244.8
Ecdysozoa (Nematoda)
Ascaris suum*921318 53939122357710713025894754965243712626.8
Pristionchus pacificus29 07929 319143027103875136896999471573263304110.4
Caenorhabditis brenneri29 98230 71221360289676113410 3148873384255362311.8
C. briggsae21 75121 914302734655119874643510651783540276412.6
C. elegans26 17326 44718235738561631065715610761904401375514.2
C. japonica35 06335 06914266599895206112 2671239112323426797.6
C. remanei31 25232 133213859111784128210 3529371994981388012.1
Haemonchus contortus18 5803218155883101658267746792261218411.8
Brugia malayi*164311 561106683473757945404431399086785.9
Loa loa15 31915 3561178458846750577449374913877955.2
Wuchereria bancrofti19 29819 52518716677129870825439450412057343.8
Trichinella spiralis16 04116 2781793577073980538971343312349525.8
Ecdysozoa (Arthropoda)
Daphnia pulex30 13730 988222827892333143211 433888315213028499.2
Strigamia maritima14 97215 011191118428487265331683635183511377.6
Lophotrochozoa
Helobdella robusta23 32823 3791911706715992493851455866222611895.1
Capitella teleta31 20722218390776126310 8271067765391722057.1
Crassostrea gigas25 98226 8502619046338581410 1781407045291219307.2
Lottia gigantea23 721341683588486599382765530273417177.2
Platyhelminthes
Echinococcus granulosus11 1240614375381656285540351812606145.5
E. multilocularis10 572059132691656287848353212395915.6
Clonorchis sinensis13 60613 880656234955990429489507412345684.1
Schistosoma japonicum16 2361717676077085360863635111357178411.0
S. mansoni11 72312 8369605427203505449160374012426144.8
Other Invertebrates
Amphimedon queenslandica29 74129 8166149089365124611 722737333268514965.0
Nematostella vectensis24 43525 0356111355865810058651726385329311964.8
Strongylocentrotus purpuratus28 56729 56046219873794110198951458580402622447.6
Trichoplax adhaerens11 52011 590748221336489501344250217764894.2
Branchiostoma floridae28 54429 237372227710152114687991407800382622647.7
ReferenceproteomeTotalproteinsCuratedsecreted Predicted
Mito
Nuc
PlasmamemSecr(%)
HLSLSmemnon-memCytmemnon-memSecr
Vertebrata (Actinopterygii)
Oryzias latipes24 63326 0601441805649141118573301628629358019497.5
Xiphophorus maculatus20 45120 527921476510739595288927237305215687.6
Oreochromis niloticus26 75327 5511222179638148105169711499638407823018.4
Gasterosteus aculeatus27 24828 1101141813618106141880801429443387719276.9
Takifugu rubripes47 85649 09026126551028172164512 63032317 843844329165.9
Tetraodon nigroviridis*23 07349 32719427001236182224813 33333716 961594428945.9
Danio rerio*41 05455 41437246351189282231914 90926719 521686650079.0
Vertebrata (Amphibia)
Xenopus tropicalis*23 49130 52119419266561691327967416810 086411321206.9
X. laevis16 011269105926216175251241095711163613288.3
Vertebrata (Mammalia)
 Glires
   Oryctolagus cuniculus21 15022 7883341670479135122256921507241366520048.8
   Heterocephalus glaber21 44921 54893126651390100963431036924326613596.3
   Cavia porcellus19 91120 3782361432461100101653491036410334216688.2
   Cricetulus griseus23 88424 442109140792796117070731167114278615166.2
   Mus musculus*43 53974 158179243501698717344320 45671423 226913761428.3
   Rattus norvegicus*27 34033 55596622116374761473915341110 094540731779.5
   Spermophilus tridecemlineatus19 96620 07911014374298393754881036603329015477.7
 Primates
   Macaca fascicularis *17 39628 9552331912928359197676391218186297021457.4
   M. mulatta*35 53669 56740745541694653371918 66732623 502729549617.1
   Gorilla gorilla gorilla27 28627 3712121994676218148067011519481335822068.1
   Homo sapiens*68 04913 5661202066823480373717 62334 82587734 27410 60787026.4
   Pan troglodytes*20 12633 32629622418204471825796613711 618419025377.6
   Pongo abelii22 78524 5292371818580229145764521688228287920558.4
   Nomascus leucogenys19 73419 837141148951811411435053996893245716308.2
   Callithrix jacchus*42 02555 08524437761280195286715 06430820 159617840207.3
   Otolemur garnettii19 93020 1569915154809310225226966801309916148.0
 Carnivora
   Canis familiaris25 43928 3623451813595385149170401709489404721587.6
   Mustela putorius furo38 8261732017984137212711 78515412 830407321905.6
   Neovison vison16 2371875035666839563652523315077684.7
   Ailuropoda melanoleuca*21 13635 74324720867791761746997516211 905513323336.5
   Felis catus20 30321 2301961406483108106558311076791309116027.5
 Cetartiodactyla
   Bos mutus18 92215013774531239314911855854315915278.1
   Bos taurus*23 84231 7808802215620508149181713179598449230959.7
   Sus scrofa*26 05433 9626452534779411156090381669463478731799.4
   Camelus ferus20 02867108463699113257151476257258811515.7
 Chiroptera
   Pteropus alecto19 52019 54897116248874116053641216774244712596.4
   Myotis brandtii19 3015810324329093858061046427225010905.6
   M. davidii15 44615 4666791634560782453073519418169836.4
   M. lucifugus20 65020 899143173843110010525716966782285518819.0
 Other mammalia
   Loxodonta africana25 61525 8321321744556128111965541298459483518767.3
   Equus caballus22 67627 8412721659514284104288861338701382519316.9
   Tupaia chinensis20 82420 85185127552764114956991256701311413606.5
   Sarcophilus harrisii22 38822 565107149055311086763681027495349515977.1
   Monodelphis domestica22 24022 794108150548584110363981067252393016137.1
   Ornithorhynchus anatinus23 55223 763113120269810311117229957184315713155.5
Vertebrata (Testudines + Archosauria group)
Anas platyrhynchos *16 37731 8791391360893123126910 5421489829331614994.7
Meleagris gallopavo16 53716 991114892413756735622835377207310065.9
Gallus gallus*17 62323 8004401640581278123164031477077328220808.7
Ficedula albicollis15 92216 14864985390577784669815208202110496.5
Taeniopygia guttata18 14119 7248571643277972674972619722118014.1
Chelonia mydas19 0667188047871794603197638421949515.0
Pelodiscus sinensis20 7841261271492647986724926703268313976.7
Other vertebrata
Petromyzon marinus13 1605452225566669394534379712395764.4
Latimeria chalumnae23 42923 5137512705938799381171167349290513455.7
Anolis carolinensis19 10919 5628112385102889365727896435247813196.7
Invertebrate
Chordata (Tunicata)
Oikopleura dioica*17 05029 0571518641177116149310 785928060254018796.5
Ciona intestinalis17 30818 639281507601957386231475014181215358.2
C. savignyi20 00420 1174582235952697779255568525358674.3
Ecdysozoa (Arachnida)
Tetranychus urticae18 08218 2431218916859570159865936152219190310.4
Ixodes ricinus16 1991355481913684835289736491246355521.9
I. scapularis20 47321 16221194376210314335706955905168119649.3
Rhipicephalus pulchellus11 205116204596096223804934241231162114.5
Ecdysozoa (Insecta)
Drosophila mojavensis14 52515 0862120493479583243027442411700207013.7
D. virilis14 45614 9283419413548477043085743231709197513.2
D. erecta15 1164422203627187339185943661721226415.0
D. grimshawi14 75414 7983118613626277343367941951698189212.8
D. ananassae14 96815 2982821393496479142436745031793216714.2
D. melanogaster*20 12039 9512544761923269212710 61319112 1014659501512.6
D. persimilis16 75416 8612821064207793048466850761701213412.7
D. pseudoobscura pseudoobscura 17 0474823164167793946327151261950236413.9
D. sechellia16 13416 3613722504107193644645447651734228714.0
D. simulans15 35419 057572372436100102854835653742165242912.7
D. willistoni15 44715 5642518753557781548086044341722190012.2
D. yakuba17 25741252139277100647525950241798256214.8
Megaselia scalaris11 46311 503107734173144949472424026397836.8
Anopheles darlingi10 44711 686879327293758368754397111628016.9
A. gambiae*13 07219 38450261041087110460275748341869266013.7
Aedes aegypti16 65417 68354236746918296150525648732008242113.7
Culex quinquefasciatus18 70319 062252345534128110457517355011866237012.4
Dendroctonus ponderosae23 65014199254910611538928966087250220068.5
Tribolium castaneum16 50217 0742617174236684658305041962109174310.2
Apis mellifera10 91012 2996575726781390444045329317148226.7
Camponotus floridanus14 78714 8011566232945744553365407813466774.6
Acromyrmex echinatior13 96213 9701759232736847522662421912536094.4
Atta cephalotes18 07918 11316753597991094657975471515597694.2
Solenopsis invicta14 19314 35926636437100748541331350811206624.6
Harpegnathos saltator15 02915 0421773932946696548460422312997565.0
Nasonia vitripennis17 04017 289141545305657016951554883142315599.0
Bombyx mori14 76717 915125177337910880662935445801681189810.6
Danaus plexippus16 25316 358341486441958085657664528149315209.3
Pararge aegeria15 104124285617585059831437635034402.9
Rhodnius prolixus15 18016 639441420537415626782623769147314648.8
Acyrthosiphon pisum35 80935 211241834814102173615 209668622173618585.3
Pediculus humanus subsp. corporis 10 8471151325737349429440317411935244.8
Ecdysozoa (Nematoda)
Ascaris suum*921318 53939122357710713025894754965243712626.8
Pristionchus pacificus29 07929 319143027103875136896999471573263304110.4
Caenorhabditis brenneri29 98230 71221360289676113410 3148873384255362311.8
C. briggsae21 75121 914302734655119874643510651783540276412.6
C. elegans26 17326 44718235738561631065715610761904401375514.2
C. japonica35 06335 06914266599895206112 2671239112323426797.6
C. remanei31 25232 133213859111784128210 3529371994981388012.1
Haemonchus contortus18 5803218155883101658267746792261218411.8
Brugia malayi*164311 561106683473757945404431399086785.9
Loa loa15 31915 3561178458846750577449374913877955.2
Wuchereria bancrofti19 29819 52518716677129870825439450412057343.8
Trichinella spiralis16 04116 2781793577073980538971343312349525.8
Ecdysozoa (Arthropoda)
Daphnia pulex30 13730 988222827892333143211 433888315213028499.2
Strigamia maritima14 97215 011191118428487265331683635183511377.6
Lophotrochozoa
Helobdella robusta23 32823 3791911706715992493851455866222611895.1
Capitella teleta31 20722218390776126310 8271067765391722057.1
Crassostrea gigas25 98226 8502619046338581410 1781407045291219307.2
Lottia gigantea23 721341683588486599382765530273417177.2
Platyhelminthes
Echinococcus granulosus11 1240614375381656285540351812606145.5
E. multilocularis10 572059132691656287848353212395915.6
Clonorchis sinensis13 60613 880656234955990429489507412345684.1
Schistosoma japonicum16 2361717676077085360863635111357178411.0
S. mansoni11 72312 8369605427203505449160374012426144.8
Other Invertebrates
Amphimedon queenslandica29 74129 8166149089365124611 722737333268514965.0
Nematostella vectensis24 43525 0356111355865810058651726385329311964.8
Strongylocentrotus purpuratus28 56729 56046219873794110198951458580402622447.6
Trichoplax adhaerens11 52011 590748221336489501344250217764894.2
Branchiostoma floridae28 54429 237372227710152114687991407800382622647.7

Notes : Data of other protein subcellular locations are summarized in Supplementary Table 1 . HLS: highly likely secreted; LS: likely secreted; Mito: mitochondrial; mem: membrane; non-mem: non-membrane; Cyt: cytoplasm (or cytosol); Nuc: nuclear; Secr: secretome. Species labeled with * has more (or less) 5000 protein entries than its reference proteome.

Table 2.

Summary of proteins located in some major subcellular locations in human and different animal species

ReferenceproteomeTotalproteinsCuratedsecreted Predicted
Mito
Nuc
PlasmamemSecr(%)
HLSLSmemnon-memCytmemnon-memSecr
Vertebrata (Actinopterygii)
Oryzias latipes24 63326 0601441805649141118573301628629358019497.5
Xiphophorus maculatus20 45120 527921476510739595288927237305215687.6
Oreochromis niloticus26 75327 5511222179638148105169711499638407823018.4
Gasterosteus aculeatus27 24828 1101141813618106141880801429443387719276.9
Takifugu rubripes47 85649 09026126551028172164512 63032317 843844329165.9
Tetraodon nigroviridis*23 07349 32719427001236182224813 33333716 961594428945.9
Danio rerio*41 05455 41437246351189282231914 90926719 521686650079.0
Vertebrata (Amphibia)
Xenopus tropicalis*23 49130 52119419266561691327967416810 086411321206.9
X. laevis16 011269105926216175251241095711163613288.3
Vertebrata (Mammalia)
 Glires
   Oryctolagus cuniculus21 15022 7883341670479135122256921507241366520048.8
   Heterocephalus glaber21 44921 54893126651390100963431036924326613596.3
   Cavia porcellus19 91120 3782361432461100101653491036410334216688.2
   Cricetulus griseus23 88424 442109140792796117070731167114278615166.2
   Mus musculus*43 53974 158179243501698717344320 45671423 226913761428.3
   Rattus norvegicus*27 34033 55596622116374761473915341110 094540731779.5
   Spermophilus tridecemlineatus19 96620 07911014374298393754881036603329015477.7
 Primates
   Macaca fascicularis *17 39628 9552331912928359197676391218186297021457.4
   M. mulatta*35 53669 56740745541694653371918 66732623 502729549617.1
   Gorilla gorilla gorilla27 28627 3712121994676218148067011519481335822068.1
   Homo sapiens*68 04913 5661202066823480373717 62334 82587734 27410 60787026.4
   Pan troglodytes*20 12633 32629622418204471825796613711 618419025377.6
   Pongo abelii22 78524 5292371818580229145764521688228287920558.4
   Nomascus leucogenys19 73419 837141148951811411435053996893245716308.2
   Callithrix jacchus*42 02555 08524437761280195286715 06430820 159617840207.3
   Otolemur garnettii19 93020 1569915154809310225226966801309916148.0
 Carnivora
   Canis familiaris25 43928 3623451813595385149170401709489404721587.6
   Mustela putorius furo38 8261732017984137212711 78515412 830407321905.6
   Neovison vison16 2371875035666839563652523315077684.7
   Ailuropoda melanoleuca*21 13635 74324720867791761746997516211 905513323336.5
   Felis catus20 30321 2301961406483108106558311076791309116027.5
 Cetartiodactyla
   Bos mutus18 92215013774531239314911855854315915278.1
   Bos taurus*23 84231 7808802215620508149181713179598449230959.7
   Sus scrofa*26 05433 9626452534779411156090381669463478731799.4
   Camelus ferus20 02867108463699113257151476257258811515.7
 Chiroptera
   Pteropus alecto19 52019 54897116248874116053641216774244712596.4
   Myotis brandtii19 3015810324329093858061046427225010905.6
   M. davidii15 44615 4666791634560782453073519418169836.4
   M. lucifugus20 65020 899143173843110010525716966782285518819.0
 Other mammalia
   Loxodonta africana25 61525 8321321744556128111965541298459483518767.3
   Equus caballus22 67627 8412721659514284104288861338701382519316.9
   Tupaia chinensis20 82420 85185127552764114956991256701311413606.5
   Sarcophilus harrisii22 38822 565107149055311086763681027495349515977.1
   Monodelphis domestica22 24022 794108150548584110363981067252393016137.1
   Ornithorhynchus anatinus23 55223 763113120269810311117229957184315713155.5
Vertebrata (Testudines + Archosauria group)
Anas platyrhynchos *16 37731 8791391360893123126910 5421489829331614994.7
Meleagris gallopavo16 53716 991114892413756735622835377207310065.9
Gallus gallus*17 62323 8004401640581278123164031477077328220808.7
Ficedula albicollis15 92216 14864985390577784669815208202110496.5
Taeniopygia guttata18 14119 7248571643277972674972619722118014.1
Chelonia mydas19 0667188047871794603197638421949515.0
Pelodiscus sinensis20 7841261271492647986724926703268313976.7
Other vertebrata
Petromyzon marinus13 1605452225566669394534379712395764.4
Latimeria chalumnae23 42923 5137512705938799381171167349290513455.7
Anolis carolinensis19 10919 5628112385102889365727896435247813196.7
Invertebrate
Chordata (Tunicata)
Oikopleura dioica*17 05029 0571518641177116149310 785928060254018796.5
Ciona intestinalis17 30818 639281507601957386231475014181215358.2
C. savignyi20 00420 1174582235952697779255568525358674.3
Ecdysozoa (Arachnida)
Tetranychus urticae18 08218 2431218916859570159865936152219190310.4
Ixodes ricinus16 1991355481913684835289736491246355521.9
I. scapularis20 47321 16221194376210314335706955905168119649.3
Rhipicephalus pulchellus11 205116204596096223804934241231162114.5
Ecdysozoa (Insecta)
Drosophila mojavensis14 52515 0862120493479583243027442411700207013.7
D. virilis14 45614 9283419413548477043085743231709197513.2
D. erecta15 1164422203627187339185943661721226415.0
D. grimshawi14 75414 7983118613626277343367941951698189212.8
D. ananassae14 96815 2982821393496479142436745031793216714.2
D. melanogaster*20 12039 9512544761923269212710 61319112 1014659501512.6
D. persimilis16 75416 8612821064207793048466850761701213412.7
D. pseudoobscura pseudoobscura 17 0474823164167793946327151261950236413.9
D. sechellia16 13416 3613722504107193644645447651734228714.0
D. simulans15 35419 057572372436100102854835653742165242912.7
D. willistoni15 44715 5642518753557781548086044341722190012.2
D. yakuba17 25741252139277100647525950241798256214.8
Megaselia scalaris11 46311 503107734173144949472424026397836.8
Anopheles darlingi10 44711 686879327293758368754397111628016.9
A. gambiae*13 07219 38450261041087110460275748341869266013.7
Aedes aegypti16 65417 68354236746918296150525648732008242113.7
Culex quinquefasciatus18 70319 062252345534128110457517355011866237012.4
Dendroctonus ponderosae23 65014199254910611538928966087250220068.5
Tribolium castaneum16 50217 0742617174236684658305041962109174310.2
Apis mellifera10 91012 2996575726781390444045329317148226.7
Camponotus floridanus14 78714 8011566232945744553365407813466774.6
Acromyrmex echinatior13 96213 9701759232736847522662421912536094.4
Atta cephalotes18 07918 11316753597991094657975471515597694.2
Solenopsis invicta14 19314 35926636437100748541331350811206624.6
Harpegnathos saltator15 02915 0421773932946696548460422312997565.0
Nasonia vitripennis17 04017 289141545305657016951554883142315599.0
Bombyx mori14 76717 915125177337910880662935445801681189810.6
Danaus plexippus16 25316 358341486441958085657664528149315209.3
Pararge aegeria15 104124285617585059831437635034402.9
Rhodnius prolixus15 18016 639441420537415626782623769147314648.8
Acyrthosiphon pisum35 80935 211241834814102173615 209668622173618585.3
Pediculus humanus subsp. corporis 10 8471151325737349429440317411935244.8
Ecdysozoa (Nematoda)
Ascaris suum*921318 53939122357710713025894754965243712626.8
Pristionchus pacificus29 07929 319143027103875136896999471573263304110.4
Caenorhabditis brenneri29 98230 71221360289676113410 3148873384255362311.8
C. briggsae21 75121 914302734655119874643510651783540276412.6
C. elegans26 17326 44718235738561631065715610761904401375514.2
C. japonica35 06335 06914266599895206112 2671239112323426797.6
C. remanei31 25232 133213859111784128210 3529371994981388012.1
Haemonchus contortus18 5803218155883101658267746792261218411.8
Brugia malayi*164311 561106683473757945404431399086785.9
Loa loa15 31915 3561178458846750577449374913877955.2
Wuchereria bancrofti19 29819 52518716677129870825439450412057343.8
Trichinella spiralis16 04116 2781793577073980538971343312349525.8
Ecdysozoa (Arthropoda)
Daphnia pulex30 13730 988222827892333143211 433888315213028499.2
Strigamia maritima14 97215 011191118428487265331683635183511377.6
Lophotrochozoa
Helobdella robusta23 32823 3791911706715992493851455866222611895.1
Capitella teleta31 20722218390776126310 8271067765391722057.1
Crassostrea gigas25 98226 8502619046338581410 1781407045291219307.2
Lottia gigantea23 721341683588486599382765530273417177.2
Platyhelminthes
Echinococcus granulosus11 1240614375381656285540351812606145.5
E. multilocularis10 572059132691656287848353212395915.6
Clonorchis sinensis13 60613 880656234955990429489507412345684.1
Schistosoma japonicum16 2361717676077085360863635111357178411.0
S. mansoni11 72312 8369605427203505449160374012426144.8
Other Invertebrates
Amphimedon queenslandica29 74129 8166149089365124611 722737333268514965.0
Nematostella vectensis24 43525 0356111355865810058651726385329311964.8
Strongylocentrotus purpuratus28 56729 56046219873794110198951458580402622447.6
Trichoplax adhaerens11 52011 590748221336489501344250217764894.2
Branchiostoma floridae28 54429 237372227710152114687991407800382622647.7
ReferenceproteomeTotalproteinsCuratedsecreted Predicted
Mito
Nuc
PlasmamemSecr(%)
HLSLSmemnon-memCytmemnon-memSecr
Vertebrata (Actinopterygii)
Oryzias latipes24 63326 0601441805649141118573301628629358019497.5
Xiphophorus maculatus20 45120 527921476510739595288927237305215687.6
Oreochromis niloticus26 75327 5511222179638148105169711499638407823018.4
Gasterosteus aculeatus27 24828 1101141813618106141880801429443387719276.9
Takifugu rubripes47 85649 09026126551028172164512 63032317 843844329165.9
Tetraodon nigroviridis*23 07349 32719427001236182224813 33333716 961594428945.9
Danio rerio*41 05455 41437246351189282231914 90926719 521686650079.0
Vertebrata (Amphibia)
Xenopus tropicalis*23 49130 52119419266561691327967416810 086411321206.9
X. laevis16 011269105926216175251241095711163613288.3
Vertebrata (Mammalia)
 Glires
   Oryctolagus cuniculus21 15022 7883341670479135122256921507241366520048.8
   Heterocephalus glaber21 44921 54893126651390100963431036924326613596.3
   Cavia porcellus19 91120 3782361432461100101653491036410334216688.2
   Cricetulus griseus23 88424 442109140792796117070731167114278615166.2
   Mus musculus*43 53974 158179243501698717344320 45671423 226913761428.3
   Rattus norvegicus*27 34033 55596622116374761473915341110 094540731779.5
   Spermophilus tridecemlineatus19 96620 07911014374298393754881036603329015477.7
 Primates
   Macaca fascicularis *17 39628 9552331912928359197676391218186297021457.4
   M. mulatta*35 53669 56740745541694653371918 66732623 502729549617.1
   Gorilla gorilla gorilla27 28627 3712121994676218148067011519481335822068.1
   Homo sapiens*68 04913 5661202066823480373717 62334 82587734 27410 60787026.4
   Pan troglodytes*20 12633 32629622418204471825796613711 618419025377.6
   Pongo abelii22 78524 5292371818580229145764521688228287920558.4
   Nomascus leucogenys19 73419 837141148951811411435053996893245716308.2
   Callithrix jacchus*42 02555 08524437761280195286715 06430820 159617840207.3
   Otolemur garnettii19 93020 1569915154809310225226966801309916148.0
 Carnivora
   Canis familiaris25 43928 3623451813595385149170401709489404721587.6
   Mustela putorius furo38 8261732017984137212711 78515412 830407321905.6
   Neovison vison16 2371875035666839563652523315077684.7
   Ailuropoda melanoleuca*21 13635 74324720867791761746997516211 905513323336.5
   Felis catus20 30321 2301961406483108106558311076791309116027.5
 Cetartiodactyla
   Bos mutus18 92215013774531239314911855854315915278.1
   Bos taurus*23 84231 7808802215620508149181713179598449230959.7
   Sus scrofa*26 05433 9626452534779411156090381669463478731799.4
   Camelus ferus20 02867108463699113257151476257258811515.7
 Chiroptera
   Pteropus alecto19 52019 54897116248874116053641216774244712596.4
   Myotis brandtii19 3015810324329093858061046427225010905.6
   M. davidii15 44615 4666791634560782453073519418169836.4
   M. lucifugus20 65020 899143173843110010525716966782285518819.0
 Other mammalia
   Loxodonta africana25 61525 8321321744556128111965541298459483518767.3
   Equus caballus22 67627 8412721659514284104288861338701382519316.9
   Tupaia chinensis20 82420 85185127552764114956991256701311413606.5
   Sarcophilus harrisii22 38822 565107149055311086763681027495349515977.1
   Monodelphis domestica22 24022 794108150548584110363981067252393016137.1
   Ornithorhynchus anatinus23 55223 763113120269810311117229957184315713155.5
Vertebrata (Testudines + Archosauria group)
Anas platyrhynchos *16 37731 8791391360893123126910 5421489829331614994.7
Meleagris gallopavo16 53716 991114892413756735622835377207310065.9
Gallus gallus*17 62323 8004401640581278123164031477077328220808.7
Ficedula albicollis15 92216 14864985390577784669815208202110496.5
Taeniopygia guttata18 14119 7248571643277972674972619722118014.1
Chelonia mydas19 0667188047871794603197638421949515.0
Pelodiscus sinensis20 7841261271492647986724926703268313976.7
Other vertebrata
Petromyzon marinus13 1605452225566669394534379712395764.4
Latimeria chalumnae23 42923 5137512705938799381171167349290513455.7
Anolis carolinensis19 10919 5628112385102889365727896435247813196.7
Invertebrate
Chordata (Tunicata)
Oikopleura dioica*17 05029 0571518641177116149310 785928060254018796.5
Ciona intestinalis17 30818 639281507601957386231475014181215358.2
C. savignyi20 00420 1174582235952697779255568525358674.3
Ecdysozoa (Arachnida)
Tetranychus urticae18 08218 2431218916859570159865936152219190310.4
Ixodes ricinus16 1991355481913684835289736491246355521.9
I. scapularis20 47321 16221194376210314335706955905168119649.3
Rhipicephalus pulchellus11 205116204596096223804934241231162114.5
Ecdysozoa (Insecta)
Drosophila mojavensis14 52515 0862120493479583243027442411700207013.7
D. virilis14 45614 9283419413548477043085743231709197513.2
D. erecta15 1164422203627187339185943661721226415.0
D. grimshawi14 75414 7983118613626277343367941951698189212.8
D. ananassae14 96815 2982821393496479142436745031793216714.2
D. melanogaster*20 12039 9512544761923269212710 61319112 1014659501512.6
D. persimilis16 75416 8612821064207793048466850761701213412.7
D. pseudoobscura pseudoobscura 17 0474823164167793946327151261950236413.9
D. sechellia16 13416 3613722504107193644645447651734228714.0
D. simulans15 35419 057572372436100102854835653742165242912.7
D. willistoni15 44715 5642518753557781548086044341722190012.2
D. yakuba17 25741252139277100647525950241798256214.8
Megaselia scalaris11 46311 503107734173144949472424026397836.8
Anopheles darlingi10 44711 686879327293758368754397111628016.9
A. gambiae*13 07219 38450261041087110460275748341869266013.7
Aedes aegypti16 65417 68354236746918296150525648732008242113.7
Culex quinquefasciatus18 70319 062252345534128110457517355011866237012.4
Dendroctonus ponderosae23 65014199254910611538928966087250220068.5
Tribolium castaneum16 50217 0742617174236684658305041962109174310.2
Apis mellifera10 91012 2996575726781390444045329317148226.7
Camponotus floridanus14 78714 8011566232945744553365407813466774.6
Acromyrmex echinatior13 96213 9701759232736847522662421912536094.4
Atta cephalotes18 07918 11316753597991094657975471515597694.2
Solenopsis invicta14 19314 35926636437100748541331350811206624.6
Harpegnathos saltator15 02915 0421773932946696548460422312997565.0
Nasonia vitripennis17 04017 289141545305657016951554883142315599.0
Bombyx mori14 76717 915125177337910880662935445801681189810.6
Danaus plexippus16 25316 358341486441958085657664528149315209.3
Pararge aegeria15 104124285617585059831437635034402.9
Rhodnius prolixus15 18016 639441420537415626782623769147314648.8
Acyrthosiphon pisum35 80935 211241834814102173615 209668622173618585.3
Pediculus humanus subsp. corporis 10 8471151325737349429440317411935244.8
Ecdysozoa (Nematoda)
Ascaris suum*921318 53939122357710713025894754965243712626.8
Pristionchus pacificus29 07929 319143027103875136896999471573263304110.4
Caenorhabditis brenneri29 98230 71221360289676113410 3148873384255362311.8
C. briggsae21 75121 914302734655119874643510651783540276412.6
C. elegans26 17326 44718235738561631065715610761904401375514.2
C. japonica35 06335 06914266599895206112 2671239112323426797.6
C. remanei31 25232 133213859111784128210 3529371994981388012.1
Haemonchus contortus18 5803218155883101658267746792261218411.8
Brugia malayi*164311 561106683473757945404431399086785.9
Loa loa15 31915 3561178458846750577449374913877955.2
Wuchereria bancrofti19 29819 52518716677129870825439450412057343.8
Trichinella spiralis16 04116 2781793577073980538971343312349525.8
Ecdysozoa (Arthropoda)
Daphnia pulex30 13730 988222827892333143211 433888315213028499.2
Strigamia maritima14 97215 011191118428487265331683635183511377.6
Lophotrochozoa
Helobdella robusta23 32823 3791911706715992493851455866222611895.1
Capitella teleta31 20722218390776126310 8271067765391722057.1
Crassostrea gigas25 98226 8502619046338581410 1781407045291219307.2
Lottia gigantea23 721341683588486599382765530273417177.2
Platyhelminthes
Echinococcus granulosus11 1240614375381656285540351812606145.5
E. multilocularis10 572059132691656287848353212395915.6
Clonorchis sinensis13 60613 880656234955990429489507412345684.1
Schistosoma japonicum16 2361717676077085360863635111357178411.0
S. mansoni11 72312 8369605427203505449160374012426144.8
Other Invertebrates
Amphimedon queenslandica29 74129 8166149089365124611 722737333268514965.0
Nematostella vectensis24 43525 0356111355865810058651726385329311964.8
Strongylocentrotus purpuratus28 56729 56046219873794110198951458580402622447.6
Trichoplax adhaerens11 52011 590748221336489501344250217764894.2
Branchiostoma floridae28 54429 237372227710152114687991407800382622647.7

Notes : Data of other protein subcellular locations are summarized in Supplementary Table 1 . HLS: highly likely secreted; LS: likely secreted; Mito: mitochondrial; mem: membrane; non-mem: non-membrane; Cyt: cytoplasm (or cytosol); Nuc: nuclear; Secr: secretome. Species labeled with * has more (or less) 5000 protein entries than its reference proteome.

It should be noted that the distribution data of protein subcellular locations in Table 2 and Supplementary Table S1 were based on all available protein entries for each species in the database, which were different from a complete or reference proteome in some species. Several species had more redundant proteins in the dataset. For example, human reference proteome contained 68 049 proteins while a total of 135 661 human proteins were retrieved and used for analysis ( Table 2 ). Thus, the proportions of each subcellular proteome might be slightly different for some species when a reference proteome was used. The two largest compartments having a large proportion of proteins were cytoplasm and nucleus ( Table 2 ). The proteins located in cytoplasm, not including cytoskeleton proteins, accounted for 21–43% (average 31%), and the proteins located in nucleus accounted for 20–37% (average 30%) of total proteins in these species. Approximately 3–19% (average 12%) of total proteins are predicted to be plasma membrane proteins, and 3–9% of proteins (average 5.6%) are predicted to be located in mitochondria. We noticed that 15.7% of human proteins are located in mitochondria. This number is much higher than the proportions in other species. This might be due to relatively a large number (∼7000) of curated human mitochondrial proteins in the dataset. Also, the prediction sensitivity for mitochondrial proteins was relatively low (∼42.5%) ( Table 1 ), thereby likely underestimating the proportions of mitochondrial proteins in animal species reported here.

Classical secreted proteins from a species, i.e. secretome, can be relatively accurately predicted. Combining curated secreted proteins and predicted highly likely secreted proteins (at least 3 positives out of 4 predictors) as a secretome, our method for a secretome prediction reached a MCC of 0.89 with 93.5% in sensitivity and 96.0% in specificity ( Table 1 ). The proportions of secretomes vary from 2.9% to 21.9% with an average of 8.1% in animal species . Pararge aegeria , the Speckled Wood butterfly, had the smallest secretome of 440 proteins (2.9%), and Homo sapiens (human) has the largest secretome of 8702 proteins with 2020 proteins curated as secreted. However, human protein dataset contained a large proportion of redundant entries. After mapping to the human reference proteome, a total of 4969 secreted proteins (∼7.3%) were identified (see next section, Table 3 ). After excluding species having a large number (>5000 proteins) of duplicated protein entries (species labeled with * in Table 2 ) and using human secreted proteins mapped to human reference proteome, we plotted the secretome size and proteome size of remaining 103 species ( Figure 1 ). Overall there is a good correlation between the proteome size (X) and the secretome size (Y) with a correlation coefficient of 0.658 ( Y  = 289.9 + 0.066X). However, clearly the secretome size is not only determined by its proteome size in a species. There are variations among different species. For example, secretomes in mammals had a range of 4.7–9.7% (average 7.3%), while the proportions of secretomes in insecta were more variable from 2.9 to 15% (average 9.8%), with Drosophila species had an average of 13.5% secretome ( Table 2 ). We also noticed that among five species in Caenorhabditis , four exhibited a secretome accounting >11% of its proteome ( Table 2 ). Caenorhabditis is a genus of nematodes that live in bacteria-rich environments like compost piles, decaying dead animals and rotting fruit. Their large secretomes may be related to their lifestyle for digesting complex biomolecules. Recently Suh and Hutter identified 3484 putative secreted proteins C. elegans , which were retrieved from WormBase ( 34 ). Interestingly, their retrieved numbers for potential secreted proteins and trasmembrane proteins (5458) in C. elegans closely coincide with our predictions (3755 secreted proteins and 5548 transmembrane proteins).

Figrue 1.

Relationship between the predicted secretome size and the proteome size in metazoa.

Table 3.

The secretome size and the proportion of secretome relative to their reference proteomes in different primates

HsapCjarGgorMfasMmulNleuOgarPtroPabe
Secretome496932042198146028481617160418521923
Secretome (%)7.37.68.18.48.08.28.09.28.4
HsapCjarGgorMfasMmulNleuOgarPtroPabe
Secretome496932042198146028481617160418521923
Secretome (%)7.37.68.18.48.08.28.09.28.4

Note : The reference proteome size can be found in Table 2 . Hsap: Homo sapiens; Cjar: Callithrix jacchus; Ggor: Gorilla gorilla gorilla; Mfas: Macaca fascicularis; Mmul: Macaca lulatta; Nleu: Nomascus leucogenys; Ogar: Otolemur garnettii; Ptro: Pan troglodytes; Pabe: Pongo abelii .

Table 3.

The secretome size and the proportion of secretome relative to their reference proteomes in different primates

HsapCjarGgorMfasMmulNleuOgarPtroPabe
Secretome496932042198146028481617160418521923
Secretome (%)7.37.68.18.48.08.28.09.28.4
HsapCjarGgorMfasMmulNleuOgarPtroPabe
Secretome496932042198146028481617160418521923
Secretome (%)7.37.68.18.48.08.28.09.28.4

Note : The reference proteome size can be found in Table 2 . Hsap: Homo sapiens; Cjar: Callithrix jacchus; Ggor: Gorilla gorilla gorilla; Mfas: Macaca fascicularis; Mmul: Macaca lulatta; Nleu: Nomascus leucogenys; Ogar: Otolemur garnettii; Ptro: Pan troglodytes; Pabe: Pongo abelii .

Comparative analysis of secretomes in primates

Completely analysing the secretomes of all species mentioned above ( Table 2 ) is beyond the scope of this work. Here we selected the secretomes of nine primates for comparative analysis ( Table 3 ). As there are some redundant entries in the dataset, we mapped the identified secreted proteins to the reference or complete proteomes that are compiled by UniProtKB ( http://www.uniprot.org/taxonomy/complete-proteomes ). Among the nine primate species, the proportions of secretomes remained unchanged in three of them and others showed a slight increase, for example, the proportion of human secretome increased from 6.4% in the whole collection to 7.3% in the complete proteome set ( Tables 2 and 3 ). Among the nine primate species, human has the largest proteome consisting of 68 049 proteins and the largest secretome size consisting of 4,969 proteins ( Table 3 ). The large proteome size in human is mainly due to intensive collection of proteins generated by alternative splicing of protein coding genes ( 35 , 36 ). We also noted that Macaca mulatta has a much larger, nearly doubled, proteome and secretome size than M. fascicularis has ( Table 3 ). Whether such a large difference in these two closely related species is caused by the extensive genome segment duplications in M. mulatta ( 37 ) needs to be further examined.

To provide an overview of the functionalities of primate secreted proteins, we categorized the predicted secreted proteins into protein families using the rpsBLAST tool to search the Pfam database with a cutoff E-value of 1e−10. The secretomes of primates can be classified into a total of 841 unique protein families. The summary of the Pfam analysis with 28 families having 17 or more entries in a family in human is shown in Table 4 . A complete list can be found in Supplementary Table S2 . The top 10 highly encoded secreted protein families in primates were Trypsin, Immunoglobulin V-set domain, Serpin (serine protease inhibitor), Small cytokines (intecrine/chemokine), wnt family, von Willebrand factor type A domain, Immunoglobulin I-set domain, Fibrinogen beta and gamma chains, CUB domain and C1q domain. There are both variations in the Pfam categories and the number of entries in each Pfam among different primates. The significance of these secreted proteins in primate development and evolution certainly needs to be further investigated.

Table 4.

Comparison of protein families in primate secretomes

Pfam IDPfam NameHsapCjarGgorMfasMmulNleuOgarPtroPabePfam description
Total25862222157399219071128118713001349
pfam00089Trypsin14810094549258767877Trypsin
pfam07686V-set72100619377214913106Immunoglobulin V-set domain
pfam00079Serpin603023222516202023Serpin (serine protease inhibitor)
pfam00048IL8423435283834233433Small cytokines (intecrine/chemokine)
pfam00110wnt423625162621192220wnt family
pfam00092VWA395129122417202318von Willebrand factor type A domain
pfam07679I-set37281613211492212Immunoglobulin I-set domain
pfam00147Fibrinogen_C323725142421191922Fibrinogen beta and gamma chains
pfam00431CUB3223124206899CUB domain
pfam00386C1q303924122218272517C1q domain
pfam00019TGF_beta253029182720252323Transforming growth factor beta like domain
pfam00754F5_F8_type_C2594445458F5/8 type C domain
pfam01403Sema2520871110376Sema domain
pfam00413Peptidase_M10241721141815131512Matrixin
pfam00059Lectin_C233827162118151618Lectin C-type domain
pfam05986ADAM_spacer123311891913131617ADAM-TS Spacer 1
pfam00151Lipase221010789777Lipase
pfam00061Lipocalin192522814618108Lipocalin/cytosolic fatty-acid binding
pfam00167FGF1916144107121414Fibroblast growth factor
pfam00193Xlink1917105146788Extracellular link domain
pfam02931Neur_chan_LBD1920110000Neurotransmitter-gated ion-channel ligand
pfam03024Folate_rec1964333445Folate receptor family
pfam00530SRCR1863033444Scavenger receptor cysteine-rich domain
pfam00055Laminin_N172614322910119Laminin N-terminal (Domain VI)
pfam00143Interferon17111481611101313Interferon alpha/beta domain
pfam00246Peptidase_M141718148151291412Zinc carboxypeptidase
pfam07546EMI17108374773EMI domain
pfam13895Ig_21752063181Immunoglobulin domain
Pfam IDPfam NameHsapCjarGgorMfasMmulNleuOgarPtroPabePfam description
Total25862222157399219071128118713001349
pfam00089Trypsin14810094549258767877Trypsin
pfam07686V-set72100619377214913106Immunoglobulin V-set domain
pfam00079Serpin603023222516202023Serpin (serine protease inhibitor)
pfam00048IL8423435283834233433Small cytokines (intecrine/chemokine)
pfam00110wnt423625162621192220wnt family
pfam00092VWA395129122417202318von Willebrand factor type A domain
pfam07679I-set37281613211492212Immunoglobulin I-set domain
pfam00147Fibrinogen_C323725142421191922Fibrinogen beta and gamma chains
pfam00431CUB3223124206899CUB domain
pfam00386C1q303924122218272517C1q domain
pfam00019TGF_beta253029182720252323Transforming growth factor beta like domain
pfam00754F5_F8_type_C2594445458F5/8 type C domain
pfam01403Sema2520871110376Sema domain
pfam00413Peptidase_M10241721141815131512Matrixin
pfam00059Lectin_C233827162118151618Lectin C-type domain
pfam05986ADAM_spacer123311891913131617ADAM-TS Spacer 1
pfam00151Lipase221010789777Lipase
pfam00061Lipocalin192522814618108Lipocalin/cytosolic fatty-acid binding
pfam00167FGF1916144107121414Fibroblast growth factor
pfam00193Xlink1917105146788Extracellular link domain
pfam02931Neur_chan_LBD1920110000Neurotransmitter-gated ion-channel ligand
pfam03024Folate_rec1964333445Folate receptor family
pfam00530SRCR1863033444Scavenger receptor cysteine-rich domain
pfam00055Laminin_N172614322910119Laminin N-terminal (Domain VI)
pfam00143Interferon17111481611101313Interferon alpha/beta domain
pfam00246Peptidase_M141718148151291412Zinc carboxypeptidase
pfam07546EMI17108374773EMI domain
pfam13895Ig_21752063181Immunoglobulin domain

Note : A complete list is shown as Supplementary Table 2 . The species full names can be found in the note of Table 3 .

Table 4.

Comparison of protein families in primate secretomes

Pfam IDPfam NameHsapCjarGgorMfasMmulNleuOgarPtroPabePfam description
Total25862222157399219071128118713001349
pfam00089Trypsin14810094549258767877Trypsin
pfam07686V-set72100619377214913106Immunoglobulin V-set domain
pfam00079Serpin603023222516202023Serpin (serine protease inhibitor)
pfam00048IL8423435283834233433Small cytokines (intecrine/chemokine)
pfam00110wnt423625162621192220wnt family
pfam00092VWA395129122417202318von Willebrand factor type A domain
pfam07679I-set37281613211492212Immunoglobulin I-set domain
pfam00147Fibrinogen_C323725142421191922Fibrinogen beta and gamma chains
pfam00431CUB3223124206899CUB domain
pfam00386C1q303924122218272517C1q domain
pfam00019TGF_beta253029182720252323Transforming growth factor beta like domain
pfam00754F5_F8_type_C2594445458F5/8 type C domain
pfam01403Sema2520871110376Sema domain
pfam00413Peptidase_M10241721141815131512Matrixin
pfam00059Lectin_C233827162118151618Lectin C-type domain
pfam05986ADAM_spacer123311891913131617ADAM-TS Spacer 1
pfam00151Lipase221010789777Lipase
pfam00061Lipocalin192522814618108Lipocalin/cytosolic fatty-acid binding
pfam00167FGF1916144107121414Fibroblast growth factor
pfam00193Xlink1917105146788Extracellular link domain
pfam02931Neur_chan_LBD1920110000Neurotransmitter-gated ion-channel ligand
pfam03024Folate_rec1964333445Folate receptor family
pfam00530SRCR1863033444Scavenger receptor cysteine-rich domain
pfam00055Laminin_N172614322910119Laminin N-terminal (Domain VI)
pfam00143Interferon17111481611101313Interferon alpha/beta domain
pfam00246Peptidase_M141718148151291412Zinc carboxypeptidase
pfam07546EMI17108374773EMI domain
pfam13895Ig_21752063181Immunoglobulin domain
Pfam IDPfam NameHsapCjarGgorMfasMmulNleuOgarPtroPabePfam description
Total25862222157399219071128118713001349
pfam00089Trypsin14810094549258767877Trypsin
pfam07686V-set72100619377214913106Immunoglobulin V-set domain
pfam00079Serpin603023222516202023Serpin (serine protease inhibitor)
pfam00048IL8423435283834233433Small cytokines (intecrine/chemokine)
pfam00110wnt423625162621192220wnt family
pfam00092VWA395129122417202318von Willebrand factor type A domain
pfam07679I-set37281613211492212Immunoglobulin I-set domain
pfam00147Fibrinogen_C323725142421191922Fibrinogen beta and gamma chains
pfam00431CUB3223124206899CUB domain
pfam00386C1q303924122218272517C1q domain
pfam00019TGF_beta253029182720252323Transforming growth factor beta like domain
pfam00754F5_F8_type_C2594445458F5/8 type C domain
pfam01403Sema2520871110376Sema domain
pfam00413Peptidase_M10241721141815131512Matrixin
pfam00059Lectin_C233827162118151618Lectin C-type domain
pfam05986ADAM_spacer123311891913131617ADAM-TS Spacer 1
pfam00151Lipase221010789777Lipase
pfam00061Lipocalin192522814618108Lipocalin/cytosolic fatty-acid binding
pfam00167FGF1916144107121414Fibroblast growth factor
pfam00193Xlink1917105146788Extracellular link domain
pfam02931Neur_chan_LBD1920110000Neurotransmitter-gated ion-channel ligand
pfam03024Folate_rec1964333445Folate receptor family
pfam00530SRCR1863033444Scavenger receptor cysteine-rich domain
pfam00055Laminin_N172614322910119Laminin N-terminal (Domain VI)
pfam00143Interferon17111481611101313Interferon alpha/beta domain
pfam00246Peptidase_M141718148151291412Zinc carboxypeptidase
pfam07546EMI17108374773EMI domain
pfam13895Ig_21752063181Immunoglobulin domain

Note : A complete list is shown as Supplementary Table 2 . The species full names can be found in the note of Table 3 .

We further performed Gene Ontology (GO) analysis with the human secretome by searching the UniProtKB/Swiss-Prot dataset using BLASTP with a cutoff E-value of 1e−10. GO information was retrieved from UniProt ID mapping data ( http://www.uniprot.org/downloads ) and analysed using GO SlimViewer with generic GO terms ( 38 ). Among 4969 human secreted proteins, 4,512 entries had at least one GO mapping. As the proteins in the dataset are predicted to be secreted, thus, only GO biological process and molecular function classification is further analysed ( Figure 2 ; Supplementary Table S3 ). Secreted proteins in humans are involved in 67 biological processes with a total of 25,887 GO IDs. The top five processes include anatomical structure development (13.8%), signal transduction (9.7%), immune system process (7.5%), response to stress (6.3%), and cell differentiation (5.8%) ( Figure 2 a). Molecular function analysis revealed human secreted proteins had 39 types of molecular functions with a total of 3,059 GO IDs. The top five main molecular functions include ion binding (28.5%), peptidase activity (11.8%), signal transducer activity (9.9%), enzyme regulator activity (7.5%) and oxidoreductase activity (5.9%) ( Figure 2 b). GO analysis and functional protein family domain analysis are consistent in showing these proteins play important roles in signal transduction, immune system, regulation of human structure development and many other biological processes.

Figure 2.

Gene Ontology classification of the human secreted protein distribution in ( a ) biological process and ( b ) molecular function ontology.

Discussion

The work described here represents our efforts to computationally predict the subcellular locations for all human and animal proteins, with a focus on secretomes. In addition, for the secretomes, we further classified them as curated, predicted to be highly likely secreted, likely secreted, and weakly likely secreted protein subsets. This refinement of classifications of secreted proteins and other subcellular locations is expected to greatly facilitate comparative analysis of subcellular proteomes in different species. Human secretome research is an active research subject due to its importance in human health and medicine, such as the human secretome atlas initiative with a goal for identifying potential biomarkers and therapeutic targets in the secretome that can be traced back in accessible human body fluids ( 12 ). For example, recently the human secreted enzyme Notum was found to inhibit the Wnt signaling pathway through removal of a lipid that is linked to the Wnt proteins and that is required for activation of Wnt receptor proteins ( 39 , 40 ). Analysis of the secretome can yield valuable data leading to an understanding of the intricate interaction between different tissues as it relates to the coordination of physiology in multicellular organisms. An example is found in the interaction between muscles and bones ( 41 ). Many muscle specific growth factors, in the myosecretome, have been shown to have effects on bone repair and remodeling. Myostatin, a myocyte derived growth factor that inhibits muscle growth and thus acting as a break on uncontrolled growth, also has effects on suppression of bone marrow-derived stem cells and cartilage formation ( 41 ). In this study, we compared secretomes in different primates, and revealed that the highly enriched families including Trypsin, Immunoglobulin V-set domain, Serpin (serine protease inhibitor), Small cytokines (intecrine/chemokine) and wnt family, etc. Further we analysed the molecular functions and biological processes of the human secretome. Our analysis revealed the secreted proteins in humans play important roles in human structure development, immune systems, and response to stress, etc.

In this work, the secretome identification was limited to classical secreted proteins, i.e. signal peptide containing proteins, and curated secreted proteins that may include both classical and leadless-secreted proteins (LSP). SecretomeP was a tool implemented for predicting these LSPs in bacteria and mammals ( http://www.cbs.dtu.dk/services/SecretomeP/ ). Because the accuracy of this tool for predicting animal LSPs is not evaluated, we did not include this tool in our data processing. Thus we would like to request the research community to submit metazoan protein subcellular locations, particularly LSPs, with experimental evidence traceable from literature to the database. The information provided in the database, the easy to download feature, and BLAST tool to allow users to search all protein data or the secretome data will provide useful supports to researcher working in these subjects. Researchers working with a new protein sequence can predict protein subcellular locations using the tools we have used in this work or other available tools that were summarized by Meinken and Min ( 32 ) and Caccia et al. ( 42 ).

The LOCATE database was developed for the human and mouse protein subcellular locations using multiple sources of information including literature data and computational prediction ( 17 ). However, the limit of the database was only for human and mouse proteins and the database has not been updated since 2009. Recently a new database named COMPARTMENTS was developed for seven model organisms including yeast, Arabidopsis, human, mouse, rat, fruit fly and Caenorhabditis elegans ( http://compartments.jensenlab.org ) ( 43 ). Our database contains protein data from all available metazoan species, with 121 species or subspecies having a complete proteome, including these model organisms. For plant and fungal protein data, we have specifically developed the plant secretome and subcellular proteome knowledgebase (PlantSecKB) ( 30 ) and the fungal secretome and subcellular proteome knowledgebase (FunSecKB and FunSecKB2) ( 3 , 4 ). The COMPARTMENTS database was implemented by integrating information from UniProtKB, STRING, GO annotations from respective model organism databases, text mining, as well as prediction information using WoLF PSORT and YLoc-HighRes methods. In comparing with our database, both used the annotation information from UniProtKB and WoLF PSORT was the common tool used for prediction information. However, some other tools are used in our database development including TargetP, SignalP, Phobius, TMHMM and PS-Scan. In contrast, the COMPARTMENTS database used YLoc-HighRes method and also STRING, GO annotations. And also the COMPARTMENTS database has developed an automatically updated web resource to update from the major eukaryotic model organisms. Our database remained static for the predicted information and will be updated periodically for manually curated data based on the literature. Thus LOCATE, COMPARTMETNS and MetazSecKB may complement each other as each of them had specific features derived from different sources or prediction tools. Therefore, we recommend researchers to cross search these databases for proteins from model organisms. However, we noticed that these databases used different identifiers for protein entries, thus the data may not be compared directly. We anticipate the MetazSecKB, along with our published fungal secretome and subcellular proteome knowledgebase (FunSecKB2) ( 4 ) and the newly developed protist secretome and subcellular proteome knowledgebase (ProtSecKB) ( http://proteomics.ysu.edu/secretomes/protist/index.php ), will serve the community valuable resources for proteome-wide comparative analysis and for investigating protein–protein interactions of host and fungal or protist pathogens.

Acknowledgements

We thank Dr. Feng Yu for his assistance in maintaining the server. The work is supported by the College of Science, Technology, Engineering, and Mathematics Dean’s reassigned time for research to X.J.M.

Funding

The work is funded by University Research Council (URC), Youngstown State University (YSU). J.M. was supported with a graduate research assistantship by the Center for Applied Chemical Biology, YSU. Funding for open access charge: the College of Graduate Studies and Research, YSU.

Conflict of interest . None declared.

References

1

Xue
H.
Lu
B.
Lai
M.
(
2008
)
The cancer secretome: a reservoir of biomarkers
.
J. Transl. Med.
,
6
,
20
.

2

Tjalsma
H.
Bolhuis
A.
Jongbloed
J.D.
et al.  . (
2000
)
Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome
.
Microbiol. Mol. Biol. Rev.
,
64
,
515
547
.

3

Lum
G.
Min
X.J.
(
2011
)
FunSecKB: the fungal secretome knowledgebase
.
Database (Oxford)
,
2011
,
bar001
.

4

Meinken
J.
Asch
D.K.
Neizer-Ashun
K.A.
et al.  . (
2014
)
FunSecKB2: a fungal protein subcellular location knowledgebase
.
Comput. Mol. Biol.
,
4
,
1
17
.

5

Dupont
A.
Corseaux
D
Dekeyzer
O.
et al.  . (
2005
)
The proteome and secretome of human arterial smooth muscle cells
.
Proteomics
,
5
,
585
596
.

6

Kim
W.K.
Kim
D.
Cui
J.
et al.  . (
2014
)
Secretome analysis of human oligodendrocytes derived from neural stem cells
.
PLoS ONE
,
9
,
e84292
.

7

Rocha
B.
Calamia
V.
Casas
V.
et al.  . (
2014
).
Secretome analysis of human mesenchymal stem cells undergoing chondrogenic differentiation
.
J. Proteome Res.
,
13
,
1045
1054
.

8

Katz-Jaffe
M.G.
Schoolcraft
W.B.
Gardner
D.K.
(
2006
)
Analysis of protein expression (secretome) by human and mouse preimplantation embryos
.
Fertil Steril.
,
86
,
678
685
.

9

Lim
J.M.
Wollaston-Hayden
E.E.
Teo
C.F.
et al.  . (
2014
)
Quantitative secretome and glycome of primary human adipocytes during insulin resistance
.
Clin. Proteomics.
,
11
,
20
.

10

Roca-Rivada
A.
Alonso
J.
Al-Massadi
O.
et al.  . (
2011
)
Secretome analysis of rat adipose tissues shows location-specific roles for each depot type
.
J. Proteomics.
,
74
,
1068
1079
.

11

Wu
C.C.
Hsu
C.W.
Chen
C.D.
et al.  . (
2010
)
Candidate serological biomarkers for cancer identified from the secretomes of 23 cancer cell lines and the human protein atlas
.
Mol. Cell Proteomics
,
9
,
1100
1117
.

12

Brown
K.J.
Seol
H.
Pillai
D.K.
et al.  . (
2013
)
The human secretome atlas initiative: implications in health and disease conditions
.
Biochim. Biophys. Acta.
,
1834
,
2454
2461
.

13

Grimmond
S.M.
Miranda
K.C.
Yuan
Z.
et al.  . (
2003
)
The mouse secretome: functional classification of the proteins secreted into the extracellular environment
.
Genome Res.
,
13
,
1350
1359
.

14

Klee
E.W.
Carlson
D.F.
Fahrenkrug
S.C.
et al.  . (
2004
)
Identifying secretomes in people, pufferfish and pigs
.
Nucleic Acids Res.
,
32
,
1414
1421
.

15

Klee
E.W.
(
2008
)
The zebrafish secretome
.
Zebrafish
,
5
,
131
138
.

16

Chen
Y.
Zhang
Y.
Yin
Y.
et al.  . (
2005
)
SPD–a web-based secreted protein database
.
Nucleic Acids Res.
,
33
,
D169
D173
.

17

Sprenger
J.
Lynn Fink
J.
Karunaratne
S.
et al.  . (
2008
)
LOCATE: a mammalian protein subcellular localization database
.
Nucleic Acids Res.
,
36
,
D230
D233
.

18

The UniProt Consortium.
(
2014
)
Activities at the Universal Protein Resource (UniProt)
.
Nucleic Acids Res.
,
42
,
D191
D198
.

19

Min
X.J.
(
2010
)
Evaluation of computational methods for secreted protein prediction in different eukaryotes
.
J. Proteomics Bioinform.
,
3
,
143
147
.

20

Bendtsen
J.D.
Nielsen
H.
von Heijne
G.
et al.  . (
2004
)
Improved prediction of signal peptides: SignalP 3.0
.
J. Mol. Biol.
,
340
,
783
795
.

21

Petersen
T.N.
Brunak
S.
von Heijne
G.
et al.  . (
2011
)
SignalP 4.0: discriminating signal peptides from transmembrane regions
.
Nature Methods
,
8
,
785
786
.

22

Käll
L.
Krogh
A.
Sonnhammer
E.L.L.
(
2007
)
Advantages of combined transmembrane topology and signal peptide prediction - the Phobius web server
.
Nucleic Acids Res.
,
35
:
W429
432
.

23

Horton
P.
Park
K.-J.
Obayashi
T.
et al.  . (
2007
)
WoLF PSORT: protein localization predictor
.
Nucleic Acids Res.
,
35
,
W585
W587
.

24

Emanuelsson
O.
Brunak
S.
von Heijne
G.
et al.  . (
2007
)
Locating proteins in the cell using TargetP, SignalP and related tools
.
Nat. Protoc.
,
2
,
953
971
.

25

Krogh
A.
Larsson
B.
von Heijne
G.
et al.  . (
2001
)
Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes
.
J. Mol. Biol.
,
305
,
567
580
.

26

Sigrist
C.J.A.
Cerutti
L.
de Casro
E.
et al.  . (
2010
)
PROSITE, a protein domain database for functional characterization and annotation
.
Nucleic Acids Res.
,
38
,
161
166
.

27

de Castro
E.
Sigrist
C.J.
Gattiker
A.
et al.  . (
2006
)
ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins
.
Nucleic Acids Res.
,
34
,
W362
W365
.

28

Lum
G.
Min
X.J
. (
2013
)
Bioinformatic protocols and the knowledge-base for secretomes in fungi
. In:
Gupta
V.K.
Tuohy
M.G.
Ayyachamy
M.
Turner
K.M.
O’Donovan
A.
(eds).
Laboratory Protocols in Fungal Biology: Current Methods in Fungal Biology
.
Springer
, pp.
545
557
.

29

Poisson
G.
Chauve
C.
Chen
X.
et al.  . (
2007
)
FragAnchor a large scale all Eukaryota predictor of Glycosylphosphatidylinositol-anchor in protein sequences by qualitative scoring
.
Genomics Proteomics Bioinform.
,
5
,
121
130
.

30

Lum
G.
Meinken
J.
Orr
J.
et al.  . (
2014
)
PlantSecKB: the plant secretome and subcellular proteome knowledgebase
.
Comput. Mol. Biol.
,
4
,
1
17
.

31

Melhem
H.
Min
X.J.
Butler
G
. (
2013
)
The impact of SignalP 4.0 on the prediction of secreted proteins. In: Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2013 IEEE Symposium on IEEE pp. 16–22
.

32

Meinken
J.
Min
X.J.
(
2012
)
Computational prediction of protein subcellular locations in eukaryotes: an experience report
.
Comput. Mol. Biol.
,
2
,
1
7
.

33

Matthews
B.W.
(
1975
)
Comparison of the predicted and observed secondary structure of T4 phage lysozyme
.
Biochim. Biophys. Acta
,
405
,
442
451
.

34

Suh
J.
Hutter
H.
(
2012
)
A survey of putative secreted and transmembrane proteins encoded in the C . elegans genome
. BMC Genomics
,
13
,
333
.

35

Maniatis
T.
Tasic
B.
(
2002
)
Alternative pre-mRNA splicing and proteome expansion in metazoans
.
Nature
,
418
,
236
243
.

36

Nilsen
T.W.
Graveley
B.R.
(
2010
)
Expansion of the eukaryotic proteome by alternative splicing
.
Nature
,
463
,
457
463
.

37

Rhesus Macaque Genome Sequencing and Analysis Consortium
. (
2007
)
Evolutionary and biomedical insights from the rhesus macaque genome
.
Science.
316
,
222
234
.

38

McCarthy
F.M.
Wang
N.
Magee
G.B.
et al.  . (
2006
)
AgBase: a functional genomics resource for agriculture
.
BMC Genomics
,
7
,
229
.

39

Nusse
R.
(
2015
)
Cell signalling: Disarming Wnt
.
Nature
,
519
,
163
164
.

40

Kakugawa
S.
Langton
P.F.
Zebisch
M.
et al.  . (
2015
)
Notum deacylates Wnt proteins to suppress signalling activity
.
Nature.
519
,
187
192
.

41

Hamrick
M.W.
(
2012
)
The skeletal muscle secretome: an emerging player in muscle-bone crosstalk
.
BoneKEy Reports
,
1
.

42

Caccia
D.
Dugo
M.
Callari
M.
et al.  . (
2013
)
Bioinformatics tools for secretome analysis
.
Biochim. Biophys. Acta.
,
1834
,
2442
2453
.

43

Binder
J.X.
Pletscher-Frankild
S.
Tsafou
K.
et al.  . (
2014
).
COMPARTMENTS: unification and visualization of protein subcellular localization evidence
.
Database
,
2014
,
bau012
.

Author notes

Present address: John Meinken, Center for Health Informatics, University of Cincinnati, Cincinnati, OH 45267-0840, USA

Citation details: Meinken,J., Walker,G., Cooper,C.R., et al . MetazSecKB: the human and animal secretome and subcellular proteome knowledgebase. Database (2015) Vol. 2015: article ID bav077; doi:10.1093/database/bav077

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data