GeniePool: genomic database with corresponding annotated samples based on a cloud data lake architecture Open Access

PubMed

Ferreira

C.R.

(

2019

)

The burden of rare diseases

Am. J. Med. Genet. A

179

885

–

892

Gudmundsson

Singer‐Berk

Watts

N.A.

et al. (

2021

)

Variant interpretation using population databases: lessons from gnomAD

Hum. Mutat.

1012

–

1030

Leinonen

Sugawara

and

Shumway

(

2011

)

The sequence read archive

Nucleic Acids Res.

D19

–

D21

Barrett

Clark

Gevorgyan

et al. (

2012

)

BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata

Nucleic Acids Res.

D57

–

D63

Kulkarni

and

Frommolt

(

2017

)

Challenges in the setup of large-scale next-generation sequencing analysis workflows

Comput. Struct. Biotechnol. J.

471

–

477

Lim

C.K.

Nirantar

Yew

W.S.

et al. (

2021

)

Novel modalities in DNA data storage

Trends Biotechnol.

990

–

1003

10.

Weintraub

Gudes

and

Dolev

(

2021

)

Needle in a haystack queries in cloud data lakes

EDBT/ICDT Workshops

11.

Armbrust

Ghodsi

Xin

et al. (2021)

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM

. In: 11th Annual Conference on Innovative Data Systems Research (CIDR ’21).

12.

Bolger

A.M.

Lohse

and

Usadel

(

2014

)

Trimmomatic: a flexible trimmer for Illumina sequence data

Bioinformatics

2114

–

2120

13.

(

2013

)

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM

. arXiv preprint arXiv:1303.3997.

14.

McKenna

Hanna

Banks

et al. (

2010

)

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data

Genome Res.

1297

–

1303

15.

Cingolani

Platts

Wang

L.L.

et al. (

2012

)

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3

Fly (Austin)

–

16.

Hossain

(

2019

)

Visualization of bioinformatics data with Dash Bio

. In: Proceedings of the 18th Python in Science Conference,

SciPy

Austin, Texas

, pp.

126

–

133

17.

Sherry

S.T.

et al. (

2001

)

dbSNP: the NCBI database of genetic variation

Nucleic Acids Res.

308

–

311

18.

Sobreira

Schiettecatte

Valle

et al. (

2015

)

GeneMatcher: a matching tool for connecting investigators with an interest in the same gene

Hum. Mutat.

928

–

930

19.

Fakhro

K.A.

Staudt

M.R.

Ramstetter

M.D.

et al. (

2016

)

The Qatar genome: a population-specific tool for precision medicine in the Middle East

Hum. Genome Var.

–

Crossref

20.

Wang

Jia

and

Zhao

(

2015

)

VERSE: a novel approach to detect virus integration in host genomes through reference genome customization

Genome Med

–

PubMed

21.

Chen

Schulz-Trieglaff

Shaw

et al. (

2016

)

Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications

Bioinformatics

1220

–

1222

22.

Torene

R.I.

Galens

Liu

et al. (

2020

)

Mobile element insertion detection in 89,874 clinical exomes

Genet. Med.

974

–

978

23.

Angelescu

and

Dobrescu

(

2021

)

MIDGET: detecting differential gene expression on microarray data

Comput. Methods Programs Biomed.

211

, 106418.

24.

Tryka

K.A.

Hao

Sturcke

et al. (

2014

)

NCBI’s database of genotypes and phenotypes: dbGaP

Nucleic Acids Res.

D975

–

D979

25.

Wong

K.M.

Langlais

Tobias

G.S.

et al. (

2017

)

The dbGaP data browser: a new tool for browsing dbGaP controlled-access genomic data

Nucleic Acids Res.

D819

–

D826

26.

Wang

Song

et al. (

2021

)

Spatial-frequency dual-branch attention model for determining KRAS mutation status in colorectal cancer with T2-weighted MRI

Comput. Methods Programs Biomed.

209

, 106311.

27.

Nurk

Koren

Rhie

et al. (

2022

)

The complete sequence of a human genome

Science (1979)

376

–