A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

Jonquet

Musen

M.A.

Shah

(

2008

) A system for ontology-based annotation of biomedical data. In:

Proceedings of the 5th International Workshop on Data Integration in the Life Sciences

Springer-Verlag

Berlin, Heidelberg

, pp.

144

–

152

. (DILS ’08).

Doan

Lin

K.-W.

Conway

et al. (

2014

)

PhenDisco: phenotype discovery system for the database of genotypes and phenotypes

J. Am. Med. Inform. Assoc

–

Noy

N.F.

Shah

N.H.

Whetzel

P.L.

et al. (

2009

)

BioPortal: ontologies and integrated data resources at the click of a mouse

Nucleic Acids Res

37(Suppl 2)

W170

–

W173

Voorhees

E.M.

Harman

D.K.

(

2005

)

TREC. Experiment and Evaluation in Information Retrieval

MIT Press

, p.

368

Google Preview

OpenURL Placeholder Text

Hersh

W.R.

Cohen

A.M.

Roberts

P.M.

et al. (

2006

) TREC 2006 genomics track overview. In: The Fifteenth Text Retrieval Conference (TREC 2006). NIST, Gaithersburg, Maryland, pp.

–

Hersh

W.R.

Bhuptiraju

R.T.

Ross

et al. (

2004

) TREC 2004 genomics track overview. In: The Thirteenth Text Retrieval Conference (TREC 2004). National Institute for Standards and Technology. Gaithersburg, Maryland.

Roberts

Simpson

Demner-Fushman

et al. (

2016

)

State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track

Inf. Retr. J

113

–

148

Roberts

Simpson

M.S.

Voorhees

E.M.

et al. (

2015

) Overview of the TREC 2015 clinical decision support track. In: TREC [Internet]. http://trec.nist.gov/pubs/trec24/papers/Overview-CL.pdf (13 March 2017, date last accessed).

Voorhees

E.M.

Hersh

W.R.

(

2012

) Overview of the TREC 2012 medical records track. In: TREC 2012. NIST. Gaithersburg, Maryland.

Voorhees

E.M.

(

2013

) The trec medical records track. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics [Internet]. ACM. p. 239. http://dl.acm.org/citation.cfm?id=2506624 (13 March 2017, date last accessed).

Hersh

(

2008

)

Information Retrieval: A Health and Biomedical Perspective

Springer, New York

Google Preview

OpenURL Placeholder Text

Ohno-Machado

bioCADDIE white paper–Data Discovery Index. Figshare;

2015

Sansone

S.-A.

Gonzalez-Beltran

Rocca-Serra

et al. (

2017

)

DATS: the data tag suite to enable discoverability of datasets

Sci. Data

170059

Selected Competency Questions–bioCADDIE DDI–Metadata WG3 [Internet]. Google Docs. https://docs.google.com/document/d/1KD44RMM60nPq29_NrocP4rVHBPOJNt6FtjhyUL13vgY/edit?usp=embed_facebook (13 March 2017, date last accessed).

Strohman

Metzler

Turtle

et al. Indri: a language model-based search engine for complex queries. In: Citeseer, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.3502&rep=rep1&type=pdf (17 March 2017, date last accessed).

Apache Lucene [Internet]

. https://lucene.apache.org/ (1 November 2016, date last accessed).

Ounis

Amati

Plachouras

et al. (

2016

) Terrier: a high performance and scalable information retrieval platform. http://www.academia.edu/download/30680950/10.1.1.106.8824.pdf#page=18 (17 March 2017, date last accessed).

Salton

Wong

Yang

C.S.

(

1975

)

A vector space model for automatic indexing

Commun. ACM

613

–

620

Ponte

J.M.

Croft

W.B.

(

1998

) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval [Internet]. ACM, pp. 275–281. http://dl.acm.org/citation.cfm?id=291008 (4 May 2016, date last accessed).

Widdows

Ferraro

(

2008

)

Semantic vectors: a scalable open source package and online technology management application. Sixth Int Conf Lang Resour Eval LREC 2008.

Widdows

Cohen

The semantic vectors package: new algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth International Conference on Semantic Computing (ICSC). pp. 9–15.

Turtle

Croft

W.B.

(

1990

) Inference networks for document retrieval. In: Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [Internet]. New York, NY, USA: ACM. pp. 1–24. (SIGIR ’90). http://doi.acm.org/10.1145/96749.98006 (12 March 2017, date last accessed).

Amati

Van Rijsbergen

C.J.

(

2002

)

Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Trans. Inf. Syst. TOIS

357

–

389

Robertson

S.E.

Spärck Jones

(

1994

) Simple, proven approaches to text retrieval [Internet]. University of Cambridge, Computer Laboratory; Report No.: UCAM-CL-TR-356. http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-356.html (12 March 2017, date last accessed).

Kanerva

Kristofersson

Holst

(

2000

) Random indexing of text samples for latent semantic

analysis. In: Proceedings of 22nd Annual Conference Cognitive Science Society, p. 1036

Lawrence Erlbaum Associations. Mahwah New Jersey, London

Deerwester

Dumais

S.T.

Furnas

G.W.

et al. (

1990

)

Indexing by latent semantic analysis

J. Am. Soc. Inf. Sci

391

–

407

Cohen

Schvaneveldt

Widdows

(

2010

)

Reflective random indexing and indirect inference: a scalable method for discovery of implicit connections

J. Biomed. Inform

240

–

256

Roberts

P.M.

Cohen

A.M.

Hersh

W.R.

(

2009

)

Tasks, topics and relevance judging for the TREC genomics track: five years of experience evaluating biomedical text information retrieval systems

Inf. Retr

–

Oliveros

J.C.

Venny. An interactive tool for comparing lists with Venn’s diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html [Internet]. http://bioinfogp.cnb.csic.es/tools/venny/ (13 March 2017, date last accessed).

Zoubarev

Hamer

K.M.

Keshav

K.D.

et al. (

2012

Sep 1)

Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data

Bioinformatics

2272

–

2273

Wilkinson

M.D.

Dumontier

Aalbersberg

I.J.

et al. (

2016

)

The FAIR Guiding Principles for scientific data management and stewardship

Sci. Data

sdata201618