A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

8

Jonquet

C.

,

Musen

M.A.

,

Shah

N.

(

2008

) A system for ontology-based annotation of biomedical data. In:

Proceedings of the 5th International Workshop on Data Integration in the Life Sciences

.

Springer-Verlag

,

Berlin, Heidelberg

, pp.

144

–

152

. (DILS ’08).

9

Doan

S.

,

Lin

K.-W.

,

Conway

M.

et al. (

2014

)

PhenDisco: phenotype discovery system for the database of genotypes and phenotypes

.

J. Am. Med. Inform. Assoc

.,

21

,

31

–

36

.

10

Noy

N.F.

,

Shah

N.H.

,

Whetzel

P.L.

et al. (

2009

)

BioPortal: ontologies and integrated data resources at the click of a mouse

.

Nucleic Acids Res

.,

37(Suppl 2)

,

W170

–

W173

.

11

Voorhees

E.M.

,

Harman

D.K.

(

2005

)

TREC. Experiment and Evaluation in Information Retrieval

.

MIT Press

, p.

368

.

Google Preview

OpenURL Placeholder Text

12

Hersh

W.R.

,

Cohen

A.M.

,

Roberts

P.M.

et al. (

2006

) TREC 2006 genomics track overview. In: The Fifteenth Text Retrieval Conference (TREC 2006). NIST, Gaithersburg, Maryland, pp.

52

–

78

.

13

Hersh

W.R.

,

Bhuptiraju

R.T.

,

Ross

L.

et al. (

2004

) TREC 2004 genomics track overview. In: The Thirteenth Text Retrieval Conference (TREC 2004). National Institute for Standards and Technology. Gaithersburg, Maryland.

14

Roberts

K.

,

Simpson

M.

,

Demner-Fushman

D.

et al. (

2016

)

State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track

.

Inf. Retr. J

.,

19

,

113

–

148

.

15

Roberts

K.

,

Simpson

M.S.

,

Voorhees

E.M.

et al. (

2015

) Overview of the TREC 2015 clinical decision support track. In: TREC [Internet]. http://trec.nist.gov/pubs/trec24/papers/Overview-CL.pdf (13 March 2017, date last accessed).

16

Voorhees

E.M.

,

Hersh

W.R.

(

2012

) Overview of the TREC 2012 medical records track. In: TREC 2012. NIST. Gaithersburg, Maryland.

17

Voorhees

E.M.

(

2013

) The trec medical records track. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics [Internet]. ACM. p. 239. http://dl.acm.org/citation.cfm?id=2506624 (13 March 2017, date last accessed).

18

Hersh

W.

(

2008

)

Information Retrieval: A Health and Biomedical Perspective

.

Springer, New York

.

Google Preview

OpenURL Placeholder Text

19

Ohno-Machado

L.

bioCADDIE white paper–Data Discovery Index. Figshare;

2015

.

20

Sansone

S.-A.

,

Gonzalez-Beltran

A.

,

Rocca-Serra

P.

et al. (

2017

)

DATS: the data tag suite to enable discoverability of datasets

.

Sci. Data

,

4

,

170059

.

21

Selected Competency Questions–bioCADDIE DDI–Metadata WG3 [Internet]. Google Docs. https://docs.google.com/document/d/1KD44RMM60nPq29_NrocP4rVHBPOJNt6FtjhyUL13vgY/edit?usp=embed_facebook (13 March 2017, date last accessed).

22

Strohman

T.

,

Metzler

D.

,

Turtle

H.

et al. Indri: a language model-based search engine for complex queries. In: Citeseer, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.3502&rep=rep1&type=pdf (17 March 2017, date last accessed).

23

Apache Lucene [Internet]

. https://lucene.apache.org/ (1 November 2016, date last accessed).

24

Ounis

I.

,

Amati

G.

,

Plachouras

V.

et al. (

2016

) Terrier: a high performance and scalable information retrieval platform. http://www.academia.edu/download/30680950/10.1.1.106.8824.pdf#page=18 (17 March 2017, date last accessed).

25

Salton

G.

,

Wong

A.

,

Yang

C.S.

(

1975

)

A vector space model for automatic indexing

.

Commun. ACM

,

18

,

613

–

620

.

26

Ponte

J.M.

,

Croft

W.B.

(

1998

) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval [Internet]. ACM, pp. 275–281. http://dl.acm.org/citation.cfm?id=291008 (4 May 2016, date last accessed).

27

Widdows

D.

,

Ferraro

K.

(

2008

)

Semantic vectors: a scalable open source package and online technology management application. Sixth Int Conf Lang Resour Eval LREC 2008.

28

Widdows

D.

,

Cohen

T.

The semantic vectors package: new algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth International Conference on Semantic Computing (ICSC). pp. 9–15.

29

Turtle

H.

,

Croft

W.B.

(

1990

) Inference networks for document retrieval. In: Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [Internet]. New York, NY, USA: ACM. pp. 1–24. (SIGIR ’90). http://doi.acm.org/10.1145/96749.98006 (12 March 2017, date last accessed).

30

Amati

G.

,

Van Rijsbergen

C.J.

(

2002

)

Probabilistic models of information retrieval based on measuring the divergence from randomness

.

ACM Trans. Inf. Syst. TOIS

,

20

,

357

–

389

.

31

Robertson

S.E.

,

Spärck Jones

K.

(

1994

) Simple, proven approaches to text retrieval [Internet]. University of Cambridge, Computer Laboratory; Report No.: UCAM-CL-TR-356. http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-356.html (12 March 2017, date last accessed).

32

Kanerva

P.

,

Kristofersson

J.

,

Holst

A.

(

2000

) Random indexing of text samples for latent semantic

analysis. In: Proceedings of 22nd Annual Conference Cognitive Science Society, p. 1036

.

Lawrence Erlbaum Associations. Mahwah New Jersey, London

.

33

Deerwester

S.

,

Dumais

S.T.

,

Furnas

G.W.

et al. (

1990

)

Indexing by latent semantic analysis

.

J. Am. Soc. Inf. Sci

.,

41

,

391

–

407

.

34

Cohen

T.

,

Schvaneveldt

R.

,

Widdows

D.

(

2010

)

Reflective random indexing and indirect inference: a scalable method for discovery of implicit connections

.

J. Biomed. Inform

.,

43

,

240

–

256

.

35

Roberts

P.M.

,

Cohen

A.M.

,

Hersh

W.R.

(

2009

)

Tasks, topics and relevance judging for the TREC genomics track: five years of experience evaluating biomedical text information retrieval systems

.

Inf. Retr

.,

12

,

81

–

97

.

36

Oliveros

J.C.

Venny. An interactive tool for comparing lists with Venn’s diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html [Internet]. http://bioinfogp.cnb.csic.es/tools/venny/ (13 March 2017, date last accessed).

37

Zoubarev

A.

,

Hamer

K.M.

,

Keshav

K.D.

et al. (

2012

Sep 1)

Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data

.

Bioinformatics

,

28

,

2272

–

2273

.

38

Wilkinson

M.D.

,

Dumontier

M.

,

Aalbersberg

I.J.

et al. (

2016

)

The FAIR Guiding Principles for scientific data management and stewardship

.

Sci. Data

,

3

,

sdata201618

.