Skip to Main Content

Article Navigation

Journal Article

MSGD: a manually curated database of genomic, transcriptomic, proteomic and drug information for multiple sclerosis

... Show more

,

Abstract

Multiple sclerosis (MS) is the most common inflammatory demyelinating disease of the central nervous system. ‘Omics’ technologies (genomics, transcriptomics, proteomics) and associated drug information have begun reshaping our understanding of multiple sclerosis. However, these data are scattered across numerous references, making them challenging to fully utilize. We manually mined and compiled these data within the Multiple Sclerosis Gene Database (MSGD) database, intending to continue updating it in the future. We screened 5485 publications and constructed the current version of MSGD. MSGD comprises 6255 entries, including 3274 variant entries, 1175 RNA entries, 418 protein entries, 313 knockout entries, 612 drug entries and 463 high-throughput entries. Each entry contains detailed information, such as species, disease type, detailed gene descriptions (such as official gene symbols), and original references. MSGD is freely accessible and provides a user-friendly web interface. Users can easily search for genes of interest, view their expression patterns and detailed information, manage gene sets and submit new MS-gene associations through the platform. The primary principle behind MSGD’s design is to provide an exploratory platform, aiming to minimize filtration and interpretation barriers while ensuring highly accessible presentation of data. This initiative is expected to significantly assist researchers in deciphering gene mechanisms and improving the prevention, diagnosis and treatment of MS.

Database URL: http://bio-bigdata.hrbmu.edu.cn/MSGD

Introduction

Multiple sclerosis (MS) is the most common autoimmune disorder of the central nervous system (CNS) characterized by neuroinflammation and neurodegeneration. In 2020, the Multiple Sclerosis Atlas report stated that, on a global scale, an individual with an average age of 32 is diagnosed with multiple sclerosis every 5 minutes. Currently, approximately 2.8 million people are living with multiple sclerosis worldwide, and this number continues to rise (1). Multiple sclerosis is a complex disease, and in addition to genetic variations, lifestyle and environmental factors also play a significant role in disease risk. The pathophysiology of MS is highly intricate; hence, uncovering precise molecular mechanisms through genomics, transcriptomics, proteomics and related fields represents the foremost challenge.

MS is influenced by both genetic and environmental factors (2–4). First-degree relatives and monozygotic twins of affected individuals have a significantly higher lifetime risk of MS, approximately 7 times and over 100 times higher than that of the general population, respectively, indicating a strong genetic susceptibility to the disease (5–7). Initial research has linked MS susceptibility to the major histocompatibility complex (MHC) locus, paving the way for the identification of other genetic factors (8). Subsequently, a plethora of genetic loci outside the MHC region have been discovered to be associated with the risk of MS. These loci include numerous variant positions in genes such as interleukin-7 receptor (IL7R) (9–12), vitamin D receptor (VDR) (13–16), tumor necrosis factor (TNF) (17–20), interleukin 2 receptor subunit alpha (IL2RA) (21–24) and others.

In recent years, there has been a rapid increase in the number of newly discovered genetic variants. Presently, this information remains highly fragmented, making comprehensive analysis challenging. One adverse consequence of this fragmentation is that efficiently retrieving relevant information from a substantial volume of text has become exceedingly difficult. The establishment of a comprehensive database containing all reliable information related to genetic and clinical data is now considered the optimal approach to meet this need.

To bridge this gap, Multiple Sclerosis Gene Database (MSGD), a manually curated database of experimentally supported gene-MS correlations, has been developed. This database encompasses MS-related gene variations, transcriptomics, proteins and drug-related information. MSGD is expected to serve as a valuable resource for researchers exploring the relationship between genes and MS.

Methods

Data collection and management

To ensure the high quality of the database, we referenced the management steps of databases previously established by our team, such as Lnc2Cancer 3.0 (25), LncACTdb 3.0 (26) and NSDNA (27). The key steps for data management are as follows: (i) a PubMed database search was conducted using the following keywords: ‘multiple sclerosis’, ‘experimental autoimmune encephalomyelitis’ and ‘gene’ with the cutoff date set at 20 November 2023, resulting in the retrieval of 5485 published studies related to these topics. (ii) We organized and summarized the relevant information from these studies. Each study was reviewed and analyzed by at least two researchers. (iii) The collected entries were categorized based on the research content into gene variations, mRNA, proteins, gene knockouts/knock-ins, drugs and high-throughput data. (iv) Standardization of gene names was performed, with mouse genes standardized according to Mouse Genome Informatics (http://www.informatics.jax.org/mgihome/nomen/), human genes according to HUGO Gene Nomenclature Committee (https://www.genenames.org) and rat genes according to Rat Genome Database (https://rgd.mcw.edu/nomen/nomen.shtml) and so on. (v) The descriptions of variant were standardized based on the retrieval information from the dbSNP database. This meticulous data collection and management approach ensures the reliability and integrity of the MSGD, making it a valuable resource for researchers exploring the gene–MS relationship.

We conducted a screening of 5485 publications to compile the current version of MSGD. This database encompasses 6255 entries, comprising 3274 variant entries, 1175 RNA entries, 418 protein entries, 313 knockout entries, 612 drug entries and 463 high-throughput entries. Each entry is comprehensive, detailing species, disease type, specific gene descriptions (such as official gene symbols) and original references (see Figure 1). The database contains 4547 entries based on humans, 1484 entries based on mice, 174 entries based on rats and 50 entries based on other species.

A schematic workflow of MSGD.

Figure 1.

A schematic workflow of MSGD.

Open in new tab Download slide

Database construction

All data in MSGD are stored and managed using MySQL, which is freely accessible data management software (https://www.mysql.com/). The web interface is built using Java Server Pages (https://www.java.com/). The data processing scripts are written in Java, and the web service is hosted on the Apache Tomcat Web server. You can access the MSGD database for free from http://bio-bigdata.hrbmu.edu.cn/MSGD.

Results

The web interface for MSGD

The web interface of MSGD is highly user-friendly for database queries (see Figure 2). Users can perform searches based on gene variations and gene symbols on the ‘Search’ page. It’s important to note that MSGD supports fuzzy searching and advanced searching. On the advanced search page for drug targets, users can filter by specific MS subtypes or MS animal models. All possible search results are displayed in tabular form, and users can click on the ‘Detailed Information’ hyperlink to access more specific details from the table. Users can sort the search results by clicking on the column names in ascending or descending order on the search results interface. Additionally, they can perform secondary filtering of results by entering any keyword in the ‘Search’ box. On the ‘details’ page, genes are linked to authoritative annotation databases, while high-throughput data are linked to available database sources.

The web interface and usage for MSGD.

Figure 2.

The web interface and usage for MSGD.

Open in new tab Download slide

Users can also explore information based on gene variations, transcriptomics, proteomics, drugs and more on the ‘Browse’ page. We generated ‘Hot Points’ to showcase genes of particular interest in recent publications on multiple sclerosis research, while also considering their search popularity within this database. In the ‘Download’ section, all collected data are available for free download. Furthermore, users have the option to submit new MS–gene association data via the ‘Submit’ page. Submitted data will be included in the database and made available to the public in the next version after review by our submission review committee. For additional guidance, a comprehensive tutorial is provided on the ‘Help’ page.

Data statistics in MSGD

We conducted an analysis of the yearly publication count related to MS-associated genes in PubMed (Figure 3a) and observed a noticeable upward trend, particularly in the past decade, signifying a substantial accumulation of research. This suggests an increasing effort by researchers and neurologists to decipher the precise molecular mechanisms involved in MS development. Consequently, genetic research may represent one of the prominent areas of focus in the field of MS over the past decade.

Data statistics, data integration and functional analysis in MSGD. (a) Annual publication counts. (b) Distribution of genetic variants on chromosomes. (c) Word cloud of MS risk genes (the size of the word represents the amount of evidence). (d) GO and KEGG functional enrichment analysis of MS risk genes. (e) MS drug target network. Nodes represent genes or drugs, while edges represent experimentally supported associations between genes and drugs.

Figure 3.

Data statistics, data integration and functional analysis in MSGD. (a) Annual publication counts. (b) Distribution of genetic variants on chromosomes. (c) Word cloud of MS risk genes (the size of the word represents the amount of evidence). (d) GO and KEGG functional enrichment analysis of MS risk genes. (e) MS drug target network. Nodes represent genes or drugs, while edges represent experimentally supported associations between genes and drugs.

Open in new tab Download slide

Data integration and functional analysis

In the analysis section of MSGD, we could find that evidence for MS-related risk genes is widely distributed across the chromosomes, with the highest evidence concentration on chromosome 6 (Figure 3b). Users can click on any chromosome to display the corresponding results. We counted the total amount of evidence for each gene (the top 20 was displayed in ‘Data Statistics’). Furthermore, we analyzed the positive evidence for each gene, generating a word cloud where the size of each word represents the amount of positive evidence for each gene (Figure 3c). According to the positive evidence amount, the top five genes associated with the risk of MS are as follows: HLA-DRB1 (10.76%), IL7R (3.13%), TNF (2.63%), VDR (2.57%) and HLA-DQB1 (2.5%).

Furthermore, MSGD provides the functional enrichment analysis on all genes exhibiting positive correlation results. The Gene Ontology (GO) analysis revealed a significant enrichment of genes associated with the cell membrane. Additionally, the Kyoto Encyclopedia of Genes and Genomes enrichment analysis highlighted their primary involvement in antigen presentation functions and activation of immune cell pathways (Figure 3d). To visualize the complex relationships, we constructed a dual-part network of MS genes and drugs using Cytoscape (version 3.7.1). In this network, nodes represent genes or drugs, while edges represent experimentally supported associations between genes and drugs (Figure 3e).

Discussion

With the rapid advancements in MS genetics, a substantial volume of genetic and MS-related data has been accumulated (8). However, data on gene-MS associations are scattered throughout various published articles. Therefore, a high-quality database containing comprehensive MS-related gene data is crucial for a thorough understanding of the MS process. Yet, there are currently few databases that provide comprehensive resources for gene–MS associations across different species. Hence, we have developed a specialized MS database known as MSGD, which encompasses a wide range of data, including gene variations, transcripts, proteins, drugs, high-throughput data and more, for various species. This database serves as a valuable resource for researchers looking to explore gene–MS relationships comprehensively.

In addition to collecting a broader range of gene-MS associations, MSGD offers several advantages. Firstly, MSGD provides detailed gene information, including official gene symbols, Entrez Gene IDs, official full names (also known as gene types), map positions and dbXrefs, along with article information as described in the database content. Secondly, MSGD offers cross-species data and a user-friendly web interface for users to retrieve and download all available data. Thirdly, MSGD incorporates data on gene-related variations, targeted drugs and knockout information. Therefore, MSGD is a specialized database that serves as a comprehensive resource for gene–MS associations.

We plan to update the database every 1–2 years, depending on the volume of newly published data during that period. Currently, we are actively collecting relevant data and planning an update to MSGD. The next version will include the following enhancements: Firstly, updates on newly validated gene–MS associations. Secondly, essential interface optimizations based on user feedback. Thirdly, enhanced integration and visualization of high-throughput datasets. Lastly, the incorporation of gene targets for approved drugs or those in clinical trials, alongside RNA expression data. This ongoing effort ensures that MSGD remains an up-to-date and valuable resource for researchers and scientists exploring gene–MS associations in the context of multiple sclerosis.

Conclusion

In summary, with the support of experimental data, MSGD not only provides a comprehensive specialized database for multiple sclerosis but also offers a broader perspective on gene functions within MS. In the future, we plan to regularly update the database. Furthermore, we have plans to integrate more sources and information, along with providing tools for predicting gene–MS associations. We believe that MSGD will serve as a valuable resource, assisting researchers in deciphering gene mechanisms and improving the diagnosis and treatment of multiple sclerosis.

Data availability

All data used in the analysis can be obtained at http://bio-bigdata.hrbmu.edu.cn/MSGD.

Funding

National Natural Science Foundation of China (81820108014, 82171396).

Conflict of interest statement

None declared.

References

1.

Walton

C.

,

King

R.

,

Rechtman

L.

et al. (

2020

)

Rising prevalence of multiple sclerosis worldwide: insights from the Atlas of MS, third edition

.

Mult. Scler.

,

26

,

1816

–

1821

.

2.

Olsson

T.

,

Barcellos

L.F.

and

Alfredsson

L.

(

2017

)

Interactions between genetic, lifestyle and environmental risk factors for multiple sclerosis

.

Nat. Rev. Neurol.

,

13

,

25

–

36

.

3.

Thompson

A.J.

,

Baranzini

S.E.

,

Geurts

J.

et al. (

2018

)

Multiple sclerosis

.

Lancet

,

391

,

1622

–

1636

.

4.

Reich

D.S.

,

Lucchinetti

C.F.

and

Calabresi

P.A.

(

2018

)

Multiple sclerosis

.

N. Engl. J. Med.

,

378

,

169

–

180

.

5.

Charabati

M.

,

Wheeler

M.A.

,

Weiner

H.L.

et al. (

2023

)

Multiple sclerosis: neuroimmune crosstalk and therapeutic targeting

.

Cell

,

186

,

1309

–

1327

.

6.

Robertson

N.P.

,

Fraser

M.

,

Deans

J.

et al. (

1996

)

Age-adjusted recurrence risks for relatives of patients with multiple sclerosis

.

Brain

,

119

,

449

–

455

.

7.

Westerlind

H.

,

Ramanujam

R.

,

Uvehag

D.

et al. (

2014

)

Modest familial risks for multiple sclerosis: a registry-based study of the population of Sweden

.

Brain

,

137

,

770

–

778

.

8.

Baranzini

S.E.

and

Oksenberg

J.R.

(

2017

)

The genetics of multiple sclerosis: from 0 to 200 in 50 years

.

Trends Genet.

,

33

,

960

–

970

.

9.

Sombekke

M.H.

,

van der Voort

L.F.

,

Kragt

J.J.

et al. (

2011

)

Relevance of IL7R genotype and mRNA expression in Dutch patients with multiple sclerosis

.

Mult. Scler.

,

17

,

922

–

930

.

10.

Akkad

D.A.

,

Hoffjan

S.

,

Petrasch-Parwez

E.

et al. (

2009

)

Variation in the IL7RA and IL2RA genes in German multiple sclerosis patients

.

J. Autoimmun.

,

32

,

110

–

115

.

11.

Heidari

M.

,

Behmanesh

M.

and

Sahraian

M.A.

(

2011

)

Variation in SNPs of the IL7Ra gene is associated with multiple sclerosis in the Iranian population

.

Immunol. Invest.

,

40

,

279

–

289

.

12.

Simsek

H.

,

Geckin

H.

,

Sensoz

N.P.

et al. (

2019

)

Association between IL7R promoter polymorphisms and multiple sclerosis in Turkish population

.

J. Mol. Neurosci.

,

67

,

38

–

47

.

13.

Ben-Selma

W.

,

Ben-Fredj

N.

,

Chebel

S.

et al. (

2015

)

Age- and gender-specific effects on VDR gene polymorphisms and risk of the development of multiple sclerosis in Tunisians: a preliminary study

.

Int. J. Immunogenet.

,

42

,

174

–

181

.

14.

Bulan

B.

,

Hoscan

A.Y.

,

Keskin

S.N.

et al. (

2022

)

Vitamin D receptor polymorphisms among the Turkish population are associated with multiple sclerosis

.

Balk. J. Med. Genet.

,

25

,

41

–

50

.

15.

Narooie-Nejad

M.

,

Moossavi

M.

,

Torkamanzehi

A.

et al. (

2015

)

Positive association of vitamin D receptor gene variations with multiple sclerosis in South East Iranian population

.

Biomed Res. Int.

,

2015

, 427519.

OpenURL Placeholder Text

16.

Cancela Diez

B.

,

Perez-Ramirez

C.

,

Maldonado-Montoro

M.D.M.

et al. (

2021

)

Association between polymorphisms in the vitamin D receptor and susceptibility to multiple sclerosis

.

Pharmacogenet. Genom.

,

31

,

40

–

47

.

17.

Napier

M.D.

,

Poole

C.

,

Satten

G.A.

et al. (

2016

)

Heavy metals, organic solvents, and multiple sclerosis: an exploratory look at gene-environment interactions

.

Arch. Environ. Occup. Health

,

71

,

26

–

34

.

18.

Huizinga

T.W.

,

Westendorp

R.G.

,

Bollen

E.L.

et al. (

1997

)

TNF-alpha promoter polymorphisms, production and susceptibility to multiple sclerosis in different groups of patients

.

J. Neuroimmunol.

,

72

,

149

–

153

.

19.

Lucotte

G.

,

Bathelier

C.

and

Mercier

G.

(

2000

)

TNF-alpha polymorphisms in multiple sclerosis: no association with −238 and −308 promoter alleles, but the microsatellite allele a11 is associated with the disease in French patients

.

Mult. Scler.

,

6

,

78

–

80

.

OpenURL Placeholder Text

20.

Fernandez-Arquero

M.

,

Arroyo

R.

,

Rubio

A.

et al. (

1999

)

Primary association of a TNF gene polymorphism with susceptibility to multiple sclerosis

.

Neurology

,

53

,

1361

–

1363

.

21.

Dendrou

C.A.

,

Plagnol

V.

,

Fung

E.

et al. (

2009

)

Cell-specific protein phenotypes for the autoimmune locus IL2RA using a genotype-selectable human bioresource

.

Nat. Genet.

,

41

,

1011

–

1015

.

22.

Alcina

A.

,

Fedetz

M.

,

Ndagire

D.

et al. (

2009

)

IL2RA/CD25 gene polymorphisms: uneven association with multiple sclerosis (MS) and type 1 diabetes (T1D)

.

PLoS One

,

4

, e4137.

OpenURL Placeholder Text

23.

Chorazy

M.

,

Wawrusiewicz-Kurylonek

N.

,

Adamska-Patruno

E.

et al. (

2020

)

Some common SNPs of the T-Cell homeostasis-related genes are associated with multiple sclerosis, but not with the clinical manifestations of the disease, in the Polish population

.

J. Immunol. Res.

,

2020

, 8838014.

OpenURL Placeholder Text

24.

Buhelt

S.

,

Laigaard

H.M.

,

von Essen

M.R.

et al. (

2021

)

IL2RA methylation and gene expression in relation to the multiple sclerosis-associated gene variant rs2104286 and Soluble IL-2Ralpha in CD8(+) T Cells

.

Front. Immunol.

,

12

, 676141.

OpenURL Placeholder Text

25.

Gao

Y.

,

Shang

S.

,

Guo

S.

et al. (

2021

)

Lnc2Cancer 3.0: an updated resource for experimentally supported lncRNA/circRNA cancer associations and web tools based on RNA-seq and scRNA-seq data

.

Nucleic Acids Res.

,

49

,

D1251

–

D1258

.

26.

Wang

P.

,

Guo

Q.

,

Qi

Y.

et al. (

2022

)

LncACTdb 3.0: an updated database of experimentally supported ceRNA interactions and personalized networks contributing to precision medicine

.

Nucleic Acids Res.

,

50

,

D183

–

D189

.

27.

Wang

J.

,

Cao

Y.

,

Zhang

H.

et al. (

2017

)

NSDNA: a manually curated database of experimentally supported ncRNAs associated with nervous system diseases

.

Nucleic Acids Res.

,

45

,

D902

–

D907

.

© The Author(s) 2024. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Views

2,327

Altmetric

Total Views 2,327

1,817 Pageviews

510 PDF Downloads

Since 5/1/2024

Month:	Total Views:
May 2024	167
June 2024	140
July 2024	126
August 2024	106
September 2024	116
October 2024	112
November 2024	103
December 2024	74
January 2025	118
February 2025	104
March 2025	119
April 2025	93
May 2025	106
June 2025	98
July 2025	122
August 2025	105
September 2025	96
October 2025	97
November 2025	92
December 2025	108
January 2026	88
February 2026	37