SNED1 (Sushi, Nidogen, and EGF-like Domains) is an extracellular matrix (ECM) protein expressed at low levels in a wide range of tissues. The gene encoding SNED1 is located in the human chromosome 2 at locus q37.3. The corresponding mRNA isolated from the spleen and is 6834bp in length, and the corresponding protein is 1413 amino-acid long. The mouse ortholog of SNED1 was cloned in 2004 from the embryonic kidney by Leimester et al.[1] SNED1 present domains characteristic of ECM proteins, including an amino-terminal NIDO domain, several calcium binding EGF-like domains (EGF_CA), a Sushi domain also known as complement control protein (CCP) domain, and three type III fibronectin (FN3) domains in the carboxy-terminal region.

Gene

edit

Locus

edit

SNED1 is located on the plus strand of chromosome 2 at locus 2q37.3. The Refseq identification number is NM_001080437.3 The genomic DNA sequence of SNED1 contains 98,159bp and the longest spliced mRNA as predicted by AceView is 7048bp and contains 31 exons. There are 9 predicted splice variants of SNED1 that exhibited protein structure matches using the Phyre 2 database which is discussed under "Tertiary and Quaternary Structure".[2]

Common aliases

edit

SNED1 is an acronym for Sushi, Nidogen, and EGF-like Domains 1. Obsolete aliases for SNED1 include Snep, SST3, and IRE-BP1.[2]

Homology/evolution

edit

Homologs and phylogeny

edit

SNED1 is highly conserved throughout evolutionary history and is shown to exhibit this conservation across vertebrates including fish, reptiles, amphibians, birds, and mammals.[3] It is unclear that SNED1 is conserved in invertebrates, but protein domains found in SNED1 are also found in invertebrates.[3] It may be worth noting that the abundance of cysteine residues, mostly located within EGF-like domains where they form disulfide bonds, appears to be very highly conserved, suggesting that the cysteine richness is a very important feature of this protein.[4]

Paralogs

edit

SNED1 has several paralogs within the human genome, which cover small portions of the entire peptide sequence. Genes encoding proteins sharing domains (EGF-like, Sushi) with SNED 1 include the neurogenic locus notch homolog (NOTCH) proteins, the jagged proteins, eyes shut homolog proteins, the crumbs homolog proteins, delta and notch-like epidermal growth factor receptors, the sushi von Wilebrand factor A protein (SVEP1), and slit homolog three protein.

Protein

edit
 
Figure 3.The longest ORF of SNED1 was found and translated using the tool at SDSC biology workbench and the domains of interest found in NCBI were annotated along with a few secondary structures and post translational modifications.

Primary sequence

edit

The Protein Knowledge Database, UniProt, reports that the full length SNED1 protein is 1413 amino-acid long (UniProt Q8TER0).

The full sequence obtained by an NCBI BLAST search can be accessed with the reference ID NP_001073906.1. One presumably important feature of this protein that is worth noting is that it is extraordinarily cysteine rich, with 107 cysteines total, giving an overall cysteine composition of 13.2%.[5]

Domains and motifs

edit

SNED1 is a secreted protein of the extracellular matrix. It contains a signal peptide (amino acid 1-24) directing the protein to the secretory pathway.[5]

Precise prediction of domain boundaries can be obtained using the InterPro domain database or SMART.

There are various interesting domains in this protein.[1][3][5] The first in the annotated sequence above shown in pink, is the NIDO domain, also found in the Nidogen-1 protein, also known as Entactin. Other than SNED1, this domain is shared with only four human proteins: the basement membrane proteins nidogen-1, nidogen-2, and alpha-tectorin; and mucin-4, which has been demonstrated to play a role in promoting pancreatic cancer metastasis.[6][7]

The second regions of interest shown by an underline are calcium-binding EGF domain (EGF-CA). There are many of these domains in the sequence and they are often present in a large number of membrane bound and extracellular proteins. These EGF-CA domains may suggest a "sticky" nature to this protein as oftentimes extracellular matrix (ECM) proteins require calcium cations to form homo- and hetero-dimeric complexes between other ECM proteins. The Sushi domain or complement control protein (CCP) motif is annotated in green in the figure and this domain has been identified in many proteins involved in the complement system. Other aliases for this domain include short consensus repeats (SCRs) and the Sushi domain, from which the protein gets its name. The Fibronectin type III domain (FN3) is annotated in blue and the presence of this domain may suggest one of the properties of this protein as being involved in cell adhesion. SNED1 contains an RGD and a LDV sequence, important in the binding of other ECM proteins to integrins that are proteins found in cell membranes, an mediate cell-ECM interactions.[5]

Post-translational modifications

edit

13 N-glycosylation sites are predicted in the sequence of SNED1, and the presence of N-linked sites has been determined experimentally.[5] SNED1 also has several predicted attachment sites for O-linked glycans and glycosaminoglycans, but these have not yet been validated experimentally at this time.

There was only a few post-translational kinase dependant phosphorylation sites worth noting that resulted in a score of >0.8 by the NetPhosK program in the ExPASy Bioinformatics suite proteomics tools. These sites are annotated with yellow highlight in the conceptual translation above. All of these sites are predicted to be phosphorylated by either Protein kinase A (PKA) or Protein kinase C (PKC). Experimental evidence exists for phosphorylation at 12 residues: 5 serine, 5 threonine, and 2 tyrosine residues.[5]

Secondary structure

edit

The amino acid sequence of the longest variant is incredibly cysteine rich, presumably resulting in a large amount of disulfide bond formation. The beta sheets are annotated as purple text in the conceptual translation and the alpha-helices are annotated as red text.

The percentage of intrinsic disorder of processed human SNED1 (residues 25–1413) predicted by IUPred2A is 15.3%.[5] A large proportion of random coil (73%) was predicted in SNED1 together with 26% of β-strands, and 1% of helix corresponding to a sequence found in the amino-terminal region of SNED1[5]

Tertiary and quaternary structure

edit

[This section needs referencing to figures and experimental demonstration] The program Phyre2 was used to construct predictions of both the conserved domain regions NIDO, CCP, and FN3, as well as each of the splice variants. There were some interesting results consistent with the proposed function of an extracellular "sticky" protein possibly involved in cell-cell adhesion or in clotting. Protein matches found in Phyre2 comprise an array of proteins with functions of; clotting, hydrolysis, plasminogen activation, hormone/growth factor, protein binding, cell-adhesion, and ECM proteins. Splice variants a, b, and e, ihave >99% structural similarity to the protein neurexin 1-alpha (NRXN1). Neurexins are cell adhesion molecules and often contain EGF binding domains, enhancing intracellular junction forming between cells. NRXN1 is also proposed to play a role in angiogenesis. Alpha-neurexins interact with neurexophilins and possibly function in the synaptic junctions of the vertebrate nervous system. Alpha neurexins often utilize alternate promoters and splice sites, resulting in many different transcripts from one gene, may be an explanation of this gene's abundance of alternative transcripts. Splice variant d has a 100% structural match to Low density lipoprotein receptor-related protein 4 (LRP4). This protein is involved in SOST-mediated bone formation inhibition and inhibition of Wnt signaling. LRP4 plays an important role in the formation of neuromuscular junctions. Splice variants f and g have >99% similarity to fibrillin-1, an ECM protein that is a structural component of calcium binding microfibrils. Splice variant i and conserved domain CCP are >99% structurally similar to t-plasminogen activator (PLAT). PLAT is secreted by vascular endothelial cells and acts as a serine protease that converts plasminogen to plasmin. Plasmin is a fibrolytic enzyme that aids in the breakdown of blood clots and is used clinically for that exact purpose. The conserved domain NIDO, was >99% similar to coagulation factor IX, also known as Factor IX (F9). F9 is a secreted coagulation factor involved in the clotting cascade that required activation by multiple other coagulation factors within the cascade. The 3 consecutive conserved FN3 domains together are 100% similar with 100% coverage to anosmin 1. Anosmin-1 is an ECM glycoprotein responsible for normal neural development of the brain, spinal cord and kidney.

Interacting proteins

edit

Computational prediction by several databases, focusing on secreted proteins and membrane proteins, resulted in the prediction of 114 unique interactions by at least one algorithm, including SNED1 auto-interaction.[5] More than half of the protein partners of SNED1 were annotated as membrane proteins in UniProtKB. 47 extracellular proteins were identified as SNED1 binding partners, including 30 core matrisome proteins,[8] 10 matrisome-associated proteins, and seven secreted proteins. Among the 30 matrisome proteins are 6 collagens: COL6A3, found in basement membranes and other ECMs, COL7A1, and the Fibril-Associated Collagens with Interrupted triple-helices (FACITS), all containing a thrombospondin domain, COL12A1, COL14A1, COL16A1, COL20A1); and a number of ECM glycoproteins: 4 tenascins (TNC, TNN, TNR, and TNXB), fibronectin (FN1), the latent-TGFβ binding protein 2 (LTBP2), and the basement membrane glycoproteins nidogens 1 and 2.[5]

Independently, the STRING-Known and Predicted Protein Interaction database was used to determine proteins that may be interacting and the following proteins were candidates for interaction: somatostatin (SST), somatostatin receptor 2 (SSTR2)as well as a variety of other somatostatin receptors,[9] spermine synthase (SMS), and TMEM132C. All of the somatostatin related proteins are involved in the inhibition of hormones. There is very little known about TMEM132C and all publications related to the protein are mass genome screens. The protein expression profiles of TMEM132C and SNED1 are very similar to SNED1, with protein abundance found in blood plasma, platelets, and liver. All of the interacting proteins described are expressed in these three common areas.

Expression

edit

SNED1 is ubiquitously expressed at low to intermediate levels in adult tissues, making it unclear from RNA expression profiles, which cells are secreting SNED1 in tissues. Experimental data obtained in mice have shown that the Sned1 promoter is broadly active during embryogenesis, particularly in the limb buds, tail, sclerotome, vertebrate and ribs, lung, kidney, adrenal gland, cerebellum, choroid plexus, and head mesenchyme.[1][3] The protein expression profiles of SNED1 predicted with MOPED-Multi-Omics Profiling Expression Database and PaxB-Protein Abundance Across Organisms database indicate that the protein is found in blood serum, blood plasma, blood T-lymphocytes, platelets, kidney Hek-293 cells, liver, and low levels in the brain.

Transcript variants

edit

The program Aceview was used to predict transcript variants, shown in Figure 6. There are 9 spliced forms and 3 unspliced forms. Three of the transcript variants, b, c, and e, contain green regions that represent uORFs which indicate that they contain regulatory elements within the coding region of the transcript. All of the spliced transcript variants a-i were analyzed with the Phyre2 server to predict protein structure. See, "Tertiary and Quaternary Structure". The existence of the splice variants are has not been yet validated experimentally.

Promoter

edit

The promoter was predicted and analyzed for transcription factor binding sites using the ElDorado software on the Genomatix software suite. There were alternative promoters downstream of the selected 845bp promoter.

Transcription factors

edit

The following transcription factors were found with a matrix similarity of 1.00 and the entire binding domain was matched in the ElDorado predicted promoter.

Matrix Family Detailed Family Information Matrix Detailed Matrix information Strand Matrix similarity Sequence
BRAC Brachyury gene, mesoderm developmental factor TBX20.01 T-box transcription factor TBX20 (-) 1.00 gcatcgcggAGGTgtgcgggcgg
TF2B RNA polymerase II transcription factor II B BRE.01 Transcription factor II B (TFIIB) recognition element (-/+) 1.00 ccgCGCC
XCPE Activator-, mediator-, and TBO-dependent core promoter element for RNA polymerase II transcription from TATA-less promoter XCPE1.01 X gene core promoter element 1 (-) 1.00 ggGCGGgaccg
ZF02 C2H2 zinc finger transcription factors 2 ZKSCAN3.01 Zinc finger with KRAB and SCAN domains 3 (+) 1.00 catggCCCCaccacagggcgcgc
SP1F GC-Box factors SP1/GC SP1.03 Stimulating protein 1, ubiquitous zinc finger transcription factor (-) 1.00 cggggGGGCggggccat
PLAG Pleomorphic adeoma gene PLAG1.02 Pleomorphic adeoma gene 1 (+) 1.00 aaGGGGgcagcacggaacgggtt

Protein functions and Clinical significance

edit

A select cases on NCBI's GeoProfiles highlighted some clinically relevant expression data regarding SNED1 expression levels in response to certain conditions. In aldosterone producing adenoma versus control lung tissue, SNED1 expression decreased about 25 fold in the adenoma tissue. In a development study on the transition from oligodendrocyte precursors to mature oligodendrocytes, expression decreased almost 100 fold upon differentiation into mature oligodendrocytes. It may be interesting to explore the expression in clotting disorders or other blood related diseases. A seminal study published in 2014 has demonstrated that SNED1 was a promoter of breast cancer metastasis.[10][11]

The recent generation of a Sned1 knockout mouse model is also shedding light on the multiple roles of SNED1 in development and physiology.[3] The global Sned1 knockout leads to early post-natal lethality and severe craniofacial and skeletal anomalies, indicating that Sned1 is an essential gene.[3]

References

edit
  1. ^ a b c Leimeister C, Schumacher N, Diez H, Gessler M (June 2004). "Cloning and expression analysis of the mouse stroma marker Snep encoding a novel nidogen domain protein". Developmental Dynamics. 230 (2): 371–7. doi:10.1002/dvdy.20056. PMID 15162516.
  2. ^ a b "GeneCards". Weizmann Institute of Science. Retrieved 2013-05-13.
  3. ^ a b c d e f Barqué A, Jan K, De La Fuente E, Nicholas CL, Hynes RO, Naba A (February 2021). "Knockout of the gene encoding the extracellular matrix protein SNED1 results in early neonatal lethality and craniofacial malformations". Developmental Dynamics. 250 (2): 274–294. doi:10.1002/dvdy.258. hdl:1721.1/131167. ISSN 1058-8388. PMC 8721894. PMID 33012048.
  4. ^ Thompson JD, Higgins DG, Gibson TJ (November 1994). "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice". Nucleic Acids Research. 22 (22): 4673–80. doi:10.1093/nar/22.22.4673. PMC 308517. PMID 7984417.
  5. ^ a b c d e f g h i j Vallet SD, Davis MN, Barqué A, Thahab AH, Ricard-Blum S, Naba A (April 2021). "Computational and experimental characterization of the novel ECM glycoprotein SNED1 and prediction of its interactome". The Biochemical Journal. 478 (7): 1413–1434. doi:10.1042/BCJ20200675. PMID 33724335.
  6. ^ Zhu Y, Zhang JJ, Peng YP, Liu X, Xie KL, Tang J, et al. (February 2017). "NIDO, AMOP and vWD domains of MUC4 play synergic role in MUC4 mediated signaling". Oncotarget. 8 (6): 10385–10399. doi:10.18632/oncotarget.14420. PMC 5354666. PMID 28060749.
  7. ^ Senapati S, Gnanapragassam VS, Moniaux N, Momi N, Batra SK (July 2012). "Role of MUC4-NIDO domain in the MUC4-mediated metastasis of pancreatic cancer cells". Oncogene. 31 (28): 3346–56. doi:10.1038/onc.2011.505. PMC 3298579. PMID 22105367.
  8. ^ Naba A, Clauser KR, Hoersch S, Liu H, Carr SA, Hynes RO (April 2012). "The matrisome: in silico definition and in vivo characterization by proteomics of normal and tumor extracellular matrices". Molecular & Cellular Proteomics. 11 (4): M111.014647. doi:10.1074/mcp.M111.014647. PMC 3322572. PMID 22159717.
  9. ^ Hannon JP, Nunn C, Stolz B, Bruns C, Weckbecker G, Lewis I, et al. (Feb–Apr 2002). "Drug design at peptide receptors: somatostatin receptor ligands". Journal of Molecular Neuroscience. 18 (1–2): 15–27. doi:10.1385/JMN:18:1-2:15. PMID 11931345. S2CID 22652825.
  10. ^ Naba A, Clauser KR, Lamar JM, Carr SA, Hynes RO (March 2014). "Extracellular matrix signatures of human mammary carcinoma identify novel metastasis promoters". eLife. 3: e01308. doi:10.7554/eLife.01308. PMC 3944437. PMID 24618895.
  11. ^ Socovich AM, Naba A (May 2019). "The cancer matrisome: From comprehensive characterization to biomarker discovery". Seminars in Cell & Developmental Biology. 89: 157–166. doi:10.1016/j.semcdb.2018.06.005. PMID 29964200. S2CID 49646954.