Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge

2

Canese

K.

(

2016

)

PubMed celebrates its 20th anniversary

.

NLM Tech. Bull

.,

410

,

e12

.

3

Benson

D.A.

,

Cavanaugh

M.

,

Clark

K.

et al. (

2013

)

GenBank

.

Nucleic Acids Res

.,

41

,

D36

–

D42

.

4

Metzker

M.L.

(

2010

)

Sequencing technologies – the next generation

.

Nat. Rev. Genet

.,

11

,

31

–

46

.

5

Clark

K.

,

Vendt

B.

,

Smith

K.

et al. (

2013

)

The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository

.

J. Digit Imaging

,

26

,

1045

–

1057

.

6

Marcus

D.S.

,

Fotenos

A.F.

,

Csernansky

J.G.

et al. (

2010

)

Open access series of imaging studies: longitudinal MRI data in nondemented and demented older adults

.

J. Cogn. Neurosci

.,

22

,

2677

–

2684

.

7

Haines

L.L.

,

Light

J.

,

O’Malley

D.

,

Delwiche

F.A.

(

2010

)

Information-seeking behavior of basic science researchers: implications for library services

.

J. Med. Libr. Assoc

.,

98

,

73

–

81

.

8

Grefsheim

S.F.

,

Rankin

J.A.

(

2007

)

Information needs and information seeking in a biomedical research setting: a study of scientists and science administrators

.

J. Med. Libr. Assoc

.,

95

,

426

–

434

.

9

Stein

L.D.

(

2003

)

Integrating biological databases

.

Nat. Rev. Genet

.,

4

,

337

–

345

.

10

Ostell

J.

(

2014

) The Entrez Search and Retrieval System. 2nd edn.

National Center for Biotechnology Information (US)

,

Bethesda, MD

.

11

Squizzato

S.

,

Park

Y.M.

,

Buso

N.

et al. (

2015

)

The EBI search engine: providing search and retrieval functionality for biological data from EMBL-EBI

.

Nucleic Acids Res

.,

43

,

W585

–

W588

.

12

Ohno-Machado

L.

,

Sansone

S.-A.

,

Alter

G.

et al. (

2017

)

Finding useful data across multiple biomedical data repositories using DataMed

.

Nat. Genet

.,

49

,

816

–

819

.

13

Sansone

S.-A.

,

Gonzalez-Beltran

A.

,

Rocca-Serra

P.

et al. (

2017

)

DATS: the data tag suite to enable discoverability of datasets

.

Sci. Data

,

4

,

170059

.

14

Roberts

K.

,

Gururaj

A.

,

Chen

X.

et al. (

2017

)

Information retrieval for biomedical datasets: the 2016 bioCADDIE dataset retrieval challenge

.

Database

,

2017

,

1

–

9

.

15

Sayers

E.W.

,

Barrett

T.

,

Benson

D.A.

et al. (

2011

)

Database resources of the national center for biotechnology information

.

Nucleic Acids Res

.,

39

,

D38

–

D51

.

16

Butte

A.J.

,

Kohane

I.S.

(

2006

)

Creation and implications of a phenome-genome network

.

Nat. Biotechnol

.,

24

,

55

–

62

.

17

Lindberg

D.A.B.

,

Humphreys

B.L.

,

McCray

A.T.

(

1993

) The Unified Medical Language System. IMIA Yearb. (1993: IMIA Yearbook 1993): 41–51.

18

Shah

N.H.

,

Jonquet

C.

,

Chiang

A.P.

et al. (

2009

)

Ontology-driven indexing of public datasets for translational bioinformatics

.

BMC Bioinformatics

,

10

,

S1.

19

Carpineto

C.

,

Romano

G.

(

2012

)

A survey of automatic query expansion in information retrieval

.

ACM Comput. Surv

.,

44

,

1

–

50

.

20

Chum

O.

,

Mikulik

A.

,

Perdoch

M.

,

Matas

J.

(

2011

) Total recall II: query expansion revisited. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Colorado Springs, CO, USA. pp. 889–96.

21

Dramé

K.

,

Mougin

F.

,

Diallo

G.

(

2014

) Query expansion using external resources for improving information retrieval in the biomedical domain. In: Cappellato,L., Ferro,N., Halvey,M., Kraaij,W. (eds). Working Notes for CLEF 2014 Conference. CEUR Workshop Proceedings, Sheffield, UK. pp. 189–94.

22

Almeida

H.

,

Jean-Louis

L.

,

Meurs

M.-J.

(

2016

) Mining biomedical literature: an open source and omdular approach. In:

Khoury

R.

,

Drummond

C.

(eds).

Advances in Artificial Intelligence AI 2016 Lecture Notes in Computer Science

, Vol

9673

.

Springer

,

Cham

. pp.

168

–

79

.

23

Abdulla

A.A.A.

,

Lin

H.

,

Xu

B.

,

Banbhrani

S.K.

(

2016

)

Improving biomedical information retrieval by linear combinations of different query expansion techniques

.

BMC Bioinformatics

,

17

,

443

–

454

.

24

Mei

T.

,

Rui

Y.

,

Li

S.

,

Tian

Q.

(

2014

)

Multimedia search reranking: a literature survey

.

ACM Comput. Surv

.,

46

,

1.

25

Cohen

T.

,

Roberts

K.

,

Gururaj

A.

et al. (

2017

)

A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

.

Database

,

2017

,

1

–

10

.

26

BioCADDIE 2016 dataset retrieval challenge [Internet]. https://biocaddie.org/biocaddie-2016-dataset-retrieval-challenge (29 January 2018, date last accessed).

27

Bird

S.

,

Klein

E.

,

Loper

E.

(

2009

)

Natural Language Processing with Python

.

O’Reilly Media Inc

., CA, USA.

28

Sayers

E.

(2010)

A General Introduction to the E-utilities

.

National Center for Biotechnology Information (US)

,

Bethesda, MD

. https://www.ncbi.nlm.nih.gov/books/NBK25497/. (29 January 2018, date last accessed).

29

Metzler

D.

,

Croft

W.B.

(

2005

) A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Salvador, Brazil. pp. 472–9.

30

Bendersky

M.

,

Metzler

D.

,

Croft

W.B.

(

2010

) Learning concept importance using a weighted dependence model. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining. ACM, New York, NY, USA. pp. 31–40.

31

Metzler

D.

,

Croft

W.B.

(

2007

)

Linear feature-based models for information retrieval

.

Inf. Retr. Boston

,

10

,

257

–

274

.

32

Aronson

A.R.

,

Lang

F.-M.

(

2010

)

An overview of MetaMap: historical perspective and recent advances

.

J. Am. Med. Informatics Assoc

.,

17

,

229

–

236

.

33

Hiemstra

D.

(

2001

) Using language models for information retrieval. Taaluitgeverij Neslia Paniculata, Enschede, Netherlands. http://wwwhome.cs.utwente.nl/~hiemstra/papers/thesis.pdf. (29 January 2018, date last accessed).

34

Yilmaz

E.

,

Kanoulas

E.

,

Aslam

J.A.

(

2008

) A simple and efficient sampling method for estimating AP and NDCG. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY. pp. 603–10.

35

Yilmaz

E.

,

Aslam

J.A.

(

2006

) Inferred AP : estimating average precision with incomplete judgments. In: Fifteenth ACM International Conference on Information and Knowledge Management. ACM Press, Arlington, Virginia, USA. pp. 102–11.

36

Ohno-Machado

L.

,

Bafna

V.

,

Boxwala

A.A.

et al. (

2012

)

iDASH: integrating data for analysis, anonymization, and sharing

.

J. Am. Med. Informatics Assoc

.,

19

,

196

–

201

.