Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

PubMed

3

Baxevanis

A.D.

Bateman

A.

(

2006

)

The importance of biological databases in biological discovery

.

Curr. Protoc. Bioinformatics

, 50, 1.1.1.–1.1.8.

4

Wei

C.H.

Kao

H.Y.

Lu

Z.

(

2013

)

PubTator: a web-based text mining tool for assisting biocuration

.

Nucleic Acids Res

.,

41

,

W518

–

W522

.

5

The Europe PMC Consortium (

2015

)

Europe PMC: a full-text literature database for the life sciences and platform for innovation

.

Nucleic Acids Res

.,

43

,

D1042

–

D1048

.

PubMed

6

Lemberger

T.

(

2014

)

Tools of discovery

.

Mol. Syst. Biol

.,

10

,

715.

7

Hirschman

L.

Burns

G.A.

Krallinger

M.

et al. . (

2012

)

Text mining for the biocuration workflow

.

Database (Oxford)

, 2012,

bas020.

8

Lu

Z.

Hirschman

L.

(

2012

)

Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II

.

Database (Oxford)

, 2012,

bas043.

9

Van Auken

K.

Jaffery

J.

Chan

J.

et al. . (

2009

)

Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation

.

BMC Bioinformatics

,

10

,

228.

10

Van Auken

K.

Fey

P.

Berardini

T.Z.

et al. . (

2012

)

Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

.

Database (Oxford)

, 2012,

bas040.

11

Wei

C.H.

Harris

B.R.

Li

D.

et al. . (

2012

)

Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts

.

Database (Oxford)

, 2012,

bas041.

12

Tudor

C.O.

Arighi

C.N.

Wang

Q.

et al. . (

2012

)

The eFIP system for text mining of protein interaction networks of phosphorylated proteins

.

Database (Oxford)

, 2012,

bas044.

13

Cejuela

J.M.

McQuilton

P.

Ponting

L.

et al. . (

2014

)

tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles

.

Database (Oxford)

, 2014,

bau033.

14

Arighi

C.N.

Roberts

P.M.

Agarwal

S.

et al. . (

2011

)

BioCreative III interactive task: an overview

.

BMC Bioinformatics

,

12

,

S4.

15

Arighi

C.N.

Carterette

B.

Cohen

K.B.

et al. . (

2013

)

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task

.

Database (Oxford)

, 2013,

bas056.

16

Matis-Mitchell

S.

Roberts

P.

Tudor

C.O.

Arighi

C.N.

(

2013

) BioCreative IV interactive task. Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, vol 1. Bethesda, MD, pp. 190–203.

17

Wang

Q.H.

Abdul

S.S.

Almeida

L.

et al. . (

2016

)

Overview of the interactive task in BioCreative V

.

Database

(Oxford), 2016, baw119.

18

Liechti

R.

George

N.

El-Gebali

S.

et al. . (

2016

)

SourceData - a semantic platform for curating and searching figures

.

bioRxiv

, 058529.

19

Singhal

A.

Kasturi

R.

Sivakumar

V.

et al. . (

2013

)

Leveraging web intelligence for finding interesting research datasets. Proceedings of the International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)

.

2013 IEEE/WIC/ACM. IEEE

, vol.

1

, pp.

321

–

328

.

20

Singhal

A.

(

2014

)

Leveraging Open Source Web Resources to Improve Retrieval of Low Text Content Items

.

Department of Computer science. University of Minnesota

,

Minneapolis, MN, 144

.

Google Preview

21

Martin

F.

Thomson

T.M.

Sewer

A.

et al. . (

2012

)

Assessment of network perturbation amplitudes by applying high-throughput data to causal biological networks

.

BMC Syst. Biol

.,

6

,

54.

22

Thomson

T.M.

Sewer

A.

Martin

F.

et al. . (

2013

)

Quantitative assessment of biological impact using transcriptomic data and mechanistic network models

.

Toxicol. Appl. Pharmacol

.,

272

,

863

–

878

.

23

Catlett

N.L.

Bargnesi

A.J.

Ungerer

S.

et al. . (

2013

)

Reverse causal reasoning: applying qualitative causal knowledge to the interpretation of high-throughput data

.

BMC Bioinformatics

,

14

,

340.

24

Boue

S.

Talikka

M.

Westra

J.W.

et al. . (

2015

)

Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems

.

Database (Oxford)

,

2015

,

bav030.

25

Fluck

J.

Madan

S.

Ansari

S.

et al. . (

2014

)

BELIEF-a Semiautomatic Workflow for BEL Network Creation. Proceedings of the 6th International Symposium on Semantic Mining in Biomedicine (SMBM)

,

University of Aveiro

,

Portugal

,

109

–

113

.

Google Preview

26

Gomez

A.

Rothman

J.M.

Petrzelkova

K.

et al. . (

2016

)

Temporal variation selects for diet-microbe co-metabolic traits in the gut of Gorilla spp

.

ISME J

.,

10

,

532.

27

Sampson

T.R.

Mazmanian

S.K.

(

2015

)

Control of brain development, function, and behavior by the microbiome

.

Cell Host Microbe

,

17

,

565

–

576

.

28

Neish

A.S.

(

2014

)

Mucosal immunity and the microbiome

.

Ann. Am. Thoracic Soc

.,

11

,

S28

–

S32

.

29

McDermott

A.J.

Huffnagle

G.B.

(

2014

)

The microbiome and regulation of mucosal immunity

.

Immunology

,

142

,

24

–

31

.

30

Human Microbiome Project

Consortium

(

2012

)

Structure, function and diversity of the healthy human microbiome

.

Nature

,

486

,

207

–

214

.

31

Schulz

M.D.

Atay

C.

Heringer

J.

et al. . (

2014

)

High-fat-diet-mediated dysbiosis promotes intestinal carcinogenesis independently of obesity

.

Nature

,

514

,

508

–

512

.

32

Wommack

K.E.

Bhavsar

J.

Polson

S.W.

et al. . (

2012

)

VIROME: a standard operating procedure for analysis of viral metagenome sequences

.

Stand. Genomic Sci

.,

6

,

421–433.

33

Glass

E.M.

Wilkening

J.

Wilke

A.

et al. . (

2010

)

Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes

.

Cold Spring Harb. Protoc

., 2010, pdb. prot5368.

34

Wu

C.H.

Huang

H.

Arminski

L.

et al. . (

2002

)

The Protein Information Resource: an integrated public resource of functional annotation of proteins

.

Nucleic Acids Res

.,

30

,

35

–

37

.

35

Pafilis

E.

Buttigieg

P.L.

Ferrell

B.

et al. . (

2016

)

EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation

.

Database (Oxford)

, 2016.

36

Pafilis

E.

Frankild

S.P.

Schnetzer

J.

et al. . (

2015

)

ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life

.

Bioinformatics

,

31

,

1872

–

1874

.

37

Buttigieg

P.L.

Morrison

N.

Smith

B.

et al. . (

2013

)

The environment ontology: contextualising biological and biomedical entities

.

J. Biomed. Semant

.,

4

,

43.

38

Leaman

R.

Lu

Z.

(

2016

)

TaggerOne: joint named entity recognition and normalization with semi-Markov models

.

Bioinformatics

,

32

,

2839

–

2846

.

39

Wei

C.H.

Kao

H.Y.

Lu

Z.

(

2012

)

SR4GN: a species recognition software tool for gene normalization

.

PloS ONE

,

7

,

e38460.

40

Wiegers

T.C.

Davis

A.P.

Mattingly

C.J.

(

2014

)

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification

.

Database (Oxford)

,

2014

, bau050.

41

Ferrucci

D.

Lally

A.

(

2004

)

UIMA: an architectural approach to unstructured information processing in the corporate research environment

.

Nat. Lang. Eng

.,

10

,

327

–

348

.

42

Verspoor

K.

Baumgartner

W.

Jr

Roeder

C.

et al. . (

2009

) Abstracting the types away from a UIMA type system. In C. Chiarcos, R. Eckhart de Castilho, M. Stede (Eds.), Form to Meaning: Processing Texts Automatically, pp. 249–256.

43

Comeau

D.C.

Islamaj Dogan

R.

Ciccarese

P.

et al. . (

2013

)

BioC: a minimalist approach to interoperability for biomedical text processing

.

Database (Oxford)

, 2013,

bat064.

44

Comeau

D.C.

Batista-Navarro

R.T.

Dai

H.J.

et al. . (

2014

)

BioC interoperability track overview

.

Database (Oxford)

,

2014

, bau053.

45

Pafilis

E.

O'donoghue

S.I.

Jensen

L.J.

et al. . (

2009

)

Reflect: augmented browsing for the life scientist

.

Nat. Biotechnol

.,

27

,

508

–

510

.

46

Wei

C.H.

Leaman

R.

Lu

Z.

(

2016

)

Beyond accuracy: creating interoperable and scalable text-mining web services

.

Bioinformatics

,

32

,

1907

–

1910

.

47

Gerner

M.

Nenadic

G.

Bergman

C.M.

(

2010

)

LINNAEUS: a species name identification system for biomedical literature

.

BMC Bioinformatics

,

11

,

85.

48

Gerner

M.

Sarafraz

F.

Bergman

C.M.

et al. . (

2012

)

BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events

.

Bioinformatics

,

28

,

2154

–

2161

.

49

Sun

Z.

Errami

M.

Long

T.

et al. . (

2010

)

Systematic characterizations of text similarity in full text biomedical publications

.

PloS ONE

,

5

,

e12704.

50

Thomas

P.

Starlinger

J.

Vowinkel

A.

et al. . (

2012

)

GeneView: a comprehensive semantic search engine for PubMed

.

Nucleic Acids Res

.,

40

,

W585

–

W591

.

51

Caporaso

J.G.

Deshpande

N.

Fink

J.L.

et al. . (

2008

)

Intrinsic evaluation of text mining tools may not predict performance on realistic tasks

.

Pac Symp Biocomput

,

640

–

651

.

52

Cohen

K.B.

Johnson

H.L.

Verspoor

K.

et al. . (

2010

)

The structural and content aspects of abstracts versus bodies of full text journal articles are different

.

BMC Bioinformatics

,

11

,

492.

53

Huang

C.C.

Lu

Z.

(

2016

)

Community challenges in biomedical text mining over 10 years: success, failure and the future

.

Brief. Bioinformatics

,

17

,

132

–

144

.

54

Moult

J.

Pedersen

J.T.

Judson

R.

et al. . (

1995

)

A large-scale experiment to assess protein structure prediction methods

.

Proteins

,

23

,

ii

–

iv

.

55

Moult

J.

Fidelis

K.

Kryshtafovych

A.

et al. . (

2014

)

Critical assessment of methods of protein structure prediction (CASP)–round x

.

Proteins

,

82

,

1

–

6

.

56

Mao

Y.

Van Auken

K.

Li

D.

et al. . (

2014

)

Overview of the gene ontology task at BioCreative IV

.

Database (Oxford)

, 2014, bau086.