Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD)

6

Van Auken

K.

,

Fey

P.

,

Berardini

T.Z.

et al. (

2012

)

Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

.

Database

,

bas040.

7

Van Auken

K.

,

Jaffery

J.

,

Chan

J.

et al. (

2009

)

Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation

.

BMC Bioinformatics

,

10

,

228.

8

Bult

C.J.

,

Eppig

J.T.

,

Blake

J.A.

et al. (

2016

)

The Mouse Genome Database 2016

.

Nucleic Acids Res

.,

44(Database issue)

,

D840

–

D847

.

9

Finger

J.H.

,

Smith

C.M.

,

Hayamizu

T.F.

et al. (

2015

)

The mouse gene expression database: new features and how to use them effectively

.

Genesis

,

53

,

510

–

522

.

10

Smith

C.M.

,

Finger

J.H.

,

Hayamizu

T.F.

et al. (

2015

)

GXD: a community resource of mouse Gene Expression Data

.

Mamm. Genome

,

26

,

314

–

324

.

11

Bult

C.J.

,

Krpuke

D.M.

,

Begley

D.A.

et al. (

2015

)

Mouse Tumor Biology (MTB): a database of mouse models for human cancer

.

Nucleic Acids Res

.,

43(Database issue)

,

D818

–

D824

.

) http://davis.wpi.edu/xmdv/datasets/ohsumed

12

Motenko

H.

,

Neuhauser

S.B.

,

O'keefe

M.

et al. (

2015

)

MouseMine: a new data warehouse for MGI

.

Mamm. Genome

,

26

,

325

–

330

.

13

OHSUMED

Dataset (

2005

14

Hliaoutakis

A.

,

Zervanou

K.

,

Petrakis

E.G.

(

2009

)

The AMTEx approach in the medical document indexing and retrieval application

.

Data Knowl. Eng

.,

68

,

380

–

392

.

) https://www.nlm.nih.gov/mesh/

15

Medical Subject Headings

(

2016

16

Ren

D.

,

Ma

L.

,

Zhang

Y.

et al. (

2015

) Online biomedical publication classification using multi-instance multi-label algorithms with feature reduction. IEEE 14th International Conference on Congnitive Informatics & Cognitive Computing, 234-241.

17

Fox

P.T.

,

Larid

A.R.

,

Fox

S.P.

(

2005

)

BrainMap taxonomy of experimental design: description and evaluation

.

Hum. Brain Mapp

.,

25

,

185

–

198

.

18

BrainMap

(

2016

) http://www.brainmap.org

19

Yu

Z.

,

Bernstam

E.

,

Cohen

T.

et al. (

2016

)

Improving the utility of MeSH® terms using the TopicalMeSH representation

.

J. Biomed. Inform

.,

61

,

77

–

86

.

20

Muller

H.M.

,

Kenny

E.E.

,

Sternberg

P.W.

(

2004

)

Textpresso: an ontology-based information retrieval and extraction system for biological literature

.

Plos Biol

.,

2

,

e309.

21

Kevin

L.H.

,

Bruce

J.B.

,

Scott

C.

(

2016

)

WarmBase 2016: expending to enable helminth genomic research

.

Nucleic Acids Res

.,

44

,

D774

–

D780

.

22

WarmBase (

2016

) http://www.warmbase.org

23

Ashburner

M.

,

Ball

C.A.

,

Blake

J.A.

et al. (

2000

)

Gene Ontology: tool for the unification of biology

.

Nat. Genet

.,

25

,

25

–

29

.

24

OntoGene (

2016

) http://www.ontogene.org

25

Lu

Z.

,

Hirschman

L.

(

2012

)

Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II

.

Database

,

bas043.

26

Regev

Y.

,

Finkelstein-Landau

M.

,

Feldman

R.

et al. (

2002

)

Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1)

.

ACM SIGKDD Explorations Newsletter

,

4

,

90

–

92

.

27

Agarwal

S.

,

Liu

F.

,

Yu

H.

(

2011

)

Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions

.

BMC Bioinformatics

,

12

,

1.

28

Shatkay

H.

,

Chen

N.

,

Blostein

D.

(

2006

)

Integrating image data into biomedical text categorization

.

Bioinformatics

,

22

,

e446

–

e453

.

29

Xue

Z.

,

You

D.

,

Chachra

S.

et al. (

2015

) Extraction of endoscopic images for biomedical figure classification. SPIE Medical Imaging. International Society for Optics and Photonics, 94180P-94180P-94113.

30

Domingos

P.

,

Pazzani

M.

(

1997

)

On the optimality of the simple Bayesian classifier under zero-one loss

.

Mach. Learn

.,

29

,

103

–

130

.

31

Breiman

L.

(

2001

)

Random forests

.

Mach. Learn

.,

45

,

5

–

32

.

32

Ho

T.K.

(

1995

) Random decision forests. Document Analysis and Recognition, Proceedings of the Third IEEE International Conference on Document Analysis and Recognition, 278-282.

33

Ho

T.K.

(

1998

)

The random subspace method for constructing decision forests

.

IEEE Trans. Pattern Anal. Mach. Intell

.,

20

,

832

–

844

.

) http://www.ncbi.nlm.nih.gov/pmc/about/mscollection

34

Briesemeister

S.

,

Rahnenführer

J.

,

Kohlbacher

O.

(

2010

)

Going from where to why—interpretable prediction of protein subcellular localization

.

Bioinformatics

,

26

,

1232

–

1238

.

35

Briesemeister

S.

,

Rahnenführer

J.

,

Kohlbacher

O.

(

2010

)

YLoc—an interpretable web server for predicting subcellular localization

.

Nucleic Acids Res

.,

38

,

W497

–

W502

.

36

González

J.F.

,

Pavón

R.

,

Laza

R.

The class imbalance problem in Medline documents classification.

37

Cohen

A.M.

,

Bhupatiraju

R.T.

,

Hersh

W.R.

(

2004

) Feature Generation, Feature Selection, Classifiers, and Conceptual Drift for Biomedical Document Triage. Trec.

38

PMC Author Manuscript Collection

(

2015

39

Brady

S.

,

Shatkay

H.

(

2008

)

EpiLoc: a (working) text-based system for predicting protein subcellular location

.

Pacific Symp. Biocomput

.,

604

–

615

.

40

Ma

K.

,

Jeong

H.

,

Roihth

M.V.

et al. (

2015

) Utilizing image-based features in biomedical document classification. IEEE International Conference on Image Processing (ICIP), 4451–4455.

41

Shatkay

H.

,

Ramya

N.

,

Santosh

S.

et al. (

2012

) OCR-based image features for biomedical image and article classification: identifying documents relevant to cis-regulatory elements. Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, 98–104.

42

Leskovec

J.

,

Rajaraman

A.

,

Ullman

J.D.

(

2012

)

Mining of Massive Datasets

.

Cambridge University Press

,

Cambridge

.

Google Preview

43

Myers

S.L.

,

Myers

R.H.

,

Walpole

R.E.

et al. (

1993

)

Probability and Statistics for Engineers and Scientists

.

Macmillan

,

New York

.

Google Preview

44

Hall

M.

,

Frank

E.

,

Holmes

G.

et al. (

2009

)

The WEKA data mining software: an update

.

ACM SIGKDD Explorations Newsletter

,

11

,

10

–

18

.