Bioinformatics tools developed to support BioCompute Objects Open Access

Training module user survey

Question	Answer
How clear were the evaluation unit objectives?	1 (unclear) → 5 (extremely clear)
Did the structure and sequence of the lectures make sense?	1 (not at all) → 5 (very much so)
Did the unit expose you to new knowledge, tools and practices?	Definitely, yes Yes, sort of Not really Definitely not
Of the new knowledge, tools and practices this module taught, how comfortable do you feel using GitHub as a means of code/project management and collaboration?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with running computational analysis on HIVE?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with running computational analysis on CGC?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with generating BioCompute Objects in CGC?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with creating BioCompute Objects in the BCO Editor?	1 (not at all comfortable) → 5 (extremely comfortable)
GUI: In your opinion, how user friendly is HIVE’s interface?	1 (not at all) → 5 (extremely)
GUI: In your opinion, how user friendly is CGC’s interface?	1 (not at all) → 5 (extremely)
GUI: In your opinion, how user friendly is BCO Editor’s interface?	1 (not at all) → 5 (extremely)

Question	Answer
How clear were the evaluation unit objectives?	1 (unclear) → 5 (extremely clear)
Did the structure and sequence of the lectures make sense?	1 (not at all) → 5 (very much so)
Did the unit expose you to new knowledge, tools and practices?	Definitely, yes Yes, sort of Not really Definitely not
Of the new knowledge, tools and practices this module taught, how comfortable do you feel using GitHub as a means of code/project management and collaboration?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with running computational analysis on HIVE?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with running computational analysis on CGC?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with generating BioCompute Objects in CGC?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with creating BioCompute Objects in the BCO Editor?	1 (not at all comfortable) → 5 (extremely comfortable)
GUI: In your opinion, how user friendly is HIVE’s interface?	1 (not at all) → 5 (extremely)
GUI: In your opinion, how user friendly is CGC’s interface?	1 (not at all) → 5 (extremely)
GUI: In your opinion, how user friendly is BCO Editor’s interface?	1 (not at all) → 5 (extremely)

Table 1.

Training module user survey

Question	Answer
How clear were the evaluation unit objectives?	1 (unclear) → 5 (extremely clear)
Did the structure and sequence of the lectures make sense?	1 (not at all) → 5 (very much so)
Did the unit expose you to new knowledge, tools and practices?	Definitely, yes Yes, sort of Not really Definitely not
Of the new knowledge, tools and practices this module taught, how comfortable do you feel using GitHub as a means of code/project management and collaboration?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with running computational analysis on HIVE?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with running computational analysis on CGC?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with generating BioCompute Objects in CGC?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with creating BioCompute Objects in the BCO Editor?	1 (not at all comfortable) → 5 (extremely comfortable)
GUI: In your opinion, how user friendly is HIVE’s interface?	1 (not at all) → 5 (extremely)
GUI: In your opinion, how user friendly is CGC’s interface?	1 (not at all) → 5 (extremely)
GUI: In your opinion, how user friendly is BCO Editor’s interface?	1 (not at all) → 5 (extremely)

Question	Answer
How clear were the evaluation unit objectives?	1 (unclear) → 5 (extremely clear)
Did the structure and sequence of the lectures make sense?	1 (not at all) → 5 (very much so)
Did the unit expose you to new knowledge, tools and practices?	Definitely, yes Yes, sort of Not really Definitely not
Of the new knowledge, tools and practices this module taught, how comfortable do you feel using GitHub as a means of code/project management and collaboration?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with running computational analysis on HIVE?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with running computational analysis on CGC?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with generating BioCompute Objects in CGC?	1 (not at all comfortable) → 5 (extremely comfortable)
Of the new knowledge, tools and practices this module taught, how comfortable do you feel with creating BioCompute Objects in the BCO Editor?	1 (not at all comfortable) → 5 (extremely comfortable)
GUI: In your opinion, how user friendly is HIVE’s interface?	1 (not at all) → 5 (extremely)
GUI: In your opinion, how user friendly is CGC’s interface?	1 (not at all) → 5 (extremely)
GUI: In your opinion, how user friendly is BCO Editor’s interface?	1 (not at all) → 5 (extremely)

User stories

The following user stories describe how integrating BCOs within existing applications are advancing FAIR principals.

Knowledge base

GlyGen, an NIH-funded glycoconjugate database leverages the BCO Portal to document data integration pipelines and provide complete transparency and accessibility to its users and collaborators. GlyGen data managers use a separate instance of the BCO Portal to generate a BCO for each integrated dataset in GlyGen. The ease of use of this form-based system allows the data managers to add relevant information into predefined rules and fields efficiently. This information is readily converted into a JSON format which is hosted in the GlyGen database. In addition to the user-friendly interface, the generated GlyGen BCOs can be easily viewed under one domain and searched based on BCO name, contributors or BCO IDs. In addition to maintaining a standardized format, the BCO Portal significantly reduces the GlyGen data managers’ manual effort and time otherwise required to generate a single BCO.

Computation

HIVE platform team employs BCOs as communication of internal testing protocols. BCOs have been developed using the HIVE BCO App and the BCO Portal to record routine testing pipelines to evaluate functionality of novel and third party tools within an instance of HIVE hosted on Amazon Web Services (28). Further testing computations and BCO details can be found in Table 2.

Table 2.

BCOs recording HIVE platform testing protocol

Tool(s)	BCO(s)	Computation	Inputs
HIVE-Hexagon	BioCompute Portal: BCO_026619	HIVE Object ID: 3249	Read (paired-end) Name: SRR1004397_1 SRR1004397_2
	GitHub: HIVE-hexagon Test Computation_BCO_026619		HIVE Object ID: 729, 731 Reference Name: GrCH38.2 MARCH14–2016
HIVE-Heptagon	BioCompute Portal: BCO_023769	HIVE Object ID: 3260	Alignment: SRR1004397_1 against GrCH38.2 MARCH14–2016
	GitHub: HIVE-heptagon_Test_Computation_BCO_023769		HIVE Object ID: 3249
CensuScope	BioCompute Portal: BCO_015623	HIVE Object ID: 3242	Read: mgm4461125.3.050.upload.fna 219458
	GitHub: HIVE-CensuScope Test Computation_BCO_015623		HIVE Object ID: 2218 Reference: filtered_nt_July_2018 HIVE Object ID: 2242

Tool(s)	BCO(s)	Computation	Inputs
HIVE-Hexagon	BioCompute Portal: BCO_026619	HIVE Object ID: 3249	Read (paired-end) Name: SRR1004397_1 SRR1004397_2
	GitHub: HIVE-hexagon Test Computation_BCO_026619		HIVE Object ID: 729, 731 Reference Name: GrCH38.2 MARCH14–2016
HIVE-Heptagon	BioCompute Portal: BCO_023769	HIVE Object ID: 3260	Alignment: SRR1004397_1 against GrCH38.2 MARCH14–2016
	GitHub: HIVE-heptagon_Test_Computation_BCO_023769		HIVE Object ID: 3249
CensuScope	BioCompute Portal: BCO_015623	HIVE Object ID: 3242	Read: mgm4461125.3.050.upload.fna 219458
	GitHub: HIVE-CensuScope Test Computation_BCO_015623		HIVE Object ID: 2218 Reference: filtered_nt_July_2018 HIVE Object ID: 2242

Table 2.

BCOs recording HIVE platform testing protocol

Tool(s)	BCO(s)	Computation	Inputs
HIVE-Hexagon	BioCompute Portal: BCO_026619	HIVE Object ID: 3249	Read (paired-end) Name: SRR1004397_1 SRR1004397_2
	GitHub: HIVE-hexagon Test Computation_BCO_026619		HIVE Object ID: 729, 731 Reference Name: GrCH38.2 MARCH14–2016
HIVE-Heptagon	BioCompute Portal: BCO_023769	HIVE Object ID: 3260	Alignment: SRR1004397_1 against GrCH38.2 MARCH14–2016
	GitHub: HIVE-heptagon_Test_Computation_BCO_023769		HIVE Object ID: 3249
CensuScope	BioCompute Portal: BCO_015623	HIVE Object ID: 3242	Read: mgm4461125.3.050.upload.fna 219458
	GitHub: HIVE-CensuScope Test Computation_BCO_015623		HIVE Object ID: 2218 Reference: filtered_nt_July_2018 HIVE Object ID: 2242

Tool(s)	BCO(s)	Computation	Inputs
HIVE-Hexagon	BioCompute Portal: BCO_026619	HIVE Object ID: 3249	Read (paired-end) Name: SRR1004397_1 SRR1004397_2
	GitHub: HIVE-hexagon Test Computation_BCO_026619		HIVE Object ID: 729, 731 Reference Name: GrCH38.2 MARCH14–2016
HIVE-Heptagon	BioCompute Portal: BCO_023769	HIVE Object ID: 3260	Alignment: SRR1004397_1 against GrCH38.2 MARCH14–2016
	GitHub: HIVE-heptagon_Test_Computation_BCO_023769		HIVE Object ID: 3249
CensuScope	BioCompute Portal: BCO_015623	HIVE Object ID: 3242	Read: mgm4461125.3.050.upload.fna 219458
	GitHub: HIVE-CensuScope Test Computation_BCO_015623		HIVE Object ID: 2218 Reference: filtered_nt_July_2018 HIVE Object ID: 2242

Testing Pipeline #1 evaluates Hexagon (29), a sequence alignment tool that allows the user to align reads from a high-throughput experiment to a reference genome. Hexagon was used to align human DNA samples from Whole Exome Sequencing of lung squamous carcinoma (SQCC) patients against human reference genome GRCh38. BCOs were created using the BCO Portal and the HIVE BCO App. These BCOs are freely available in the BCOs GitHub repository (https://github.com/biocompute-objects).
Testing Pipeline #2 evaluates Heptagon (30), a tool that performs base and SNP-calling for a previously computed alignment and provides quality and noise assessment profiles. Heptagon was used to identify SNPs from the previous Hexagon alignment of Whole Exome Sequencing of lung SQCC patients against human reference genome GRCh38. BCOs were created using the BCO Portal and the HIVE BCO App. These BCOs are freely available in the BCOs GitHub repository (https://github.com/biocompute-objects).
Testing Pipeline #3 evaluates CensuScope (31), a tool designed and optimized for the quick detection of the components of a given NGS metagenomic dataset, providing users with a species-level composition of a given sample. CensuScope was used to map a human gut microbiome sample (sourced from MG-RAST) against FilteredNT to view the sample’s taxonomic composition. BCOs were created using the BCO Portal and HIVE BCO App. These BCOs are freely available in the BCOs GitHub repository (https://github.com/biocompute-objects).

Regulatory submission

To investigate how a BCO would supplement the submission of a Phase II, randomized, open-label clinical trial that evaluated the efficacy and safety of a combination of HCV1a drugs, the 2019 BCO Proof of Concept project (32) started as a collaboration between GW, FDA and DDL. The project objective was the replication of a clinical trial submission with mock clinical data from the FDA to confirm if BCO facilitates the regulatory agency submission process by investigating potential discrepancies found between data analysis pipelines. The DDL Athena NGS pipeline has been used to test more than 20 000 samples from clinical trials involving hepatitis C virus (HCV), hepatitis B virus (HBV), cytomegalovirus (CMV), respiratory syncytial virus (RSV) and SARS-CoV-2 (33). While there are clear guidelines on how to report NGS data to FDA, there is no standardization on how to describe the computational workflow used during the data analysis. A BCO would not only help to clearly communicate with the regulatory agencies but would also be an aid to show the high-quality sequencing results appropriately. Additionally, with the BCO, sponsors such as DDL can generate the necessary submission documentation faster and therefore reduce internal costs. Two separate analyses were executed: one to simulate a pharmaceutical submission to the FDA and another to simulate the FDA review. BCOs from the process were generated for communication of process and comparison of result.

Discussion

This paper introduces four novel tools for generating BCOs: BCO Portal, HIVE BCO App, CGC BCO App and Galaxy BCO API Extension. The stand-alone BCO Portal supports multi-platform workflows and provides a universal method to BCO creation and storage. The tools used in the context of a platform—CGC, HIVE and Galaxy—are designed to semi-automate the process of generating a BCO by extracting pertinent information from workflows native to the specific platform. These platform-specific tools generate a formatted BCO JSON object by extracting pipeline steps, platform information, data locations and parameters, while allowing a user to manually enter provenance and metadata information if not already extracted. As a key feature of BioCompute is interoperability (34), these four BioCompute tools were developed with the capability to ingest and store the same BCO. It is envisioned that other platforms may also integrate support for the standard, enabling researchers to more easily collaborate across environments, or to communicate workflows to a central authority like the FDA or to a publisher.

To evaluate the BCO Portal and CGC BCO App, bioinformaticians at the George Washington University built and curated BCOs from published workflows using the BCO creation tools introduced in this paper, and the completed BCOs were subsequently submitted to the precisionFDA BCO Challenge (21). Prior to pFDA challenge submission, each BCO was submitted for review to a BioCompute technical assistant in the GitHub repository, allowing rapid feedback from the BCO reviewer through the built-in issue-tracking system, and leveraging reviewer metadata intrinsic to the standard. This review process simulated a real quality and integrity review and established an official reviewer that was included in the BCO provenance domain. This user evaluation of the BioCompute tools via training module served two purposes: indicating usability of tools and furthering adoption initiatives.

The primary purpose of user evaluation was to indicate the usability of the tools and intelligibility of the BioCompute standard itself; this evaluation highlighted potential challenges within the BCO creation process. These challenges led to further development in the form of tool bug fixes and the introduction of new features. Assessing these tools in this manner engaged users and benefitted tool developers by providing specific areas their tools can be improved.

User evaluation also resulted in an introduction to the process of testing recently developed tools and further developing an emerging standard for novice bioinformaticians. Building BCOs from published work provided users: (i) exposure to collaborative workflows and the process of building bioinformatics pipelines, (ii) hands-on experience participating in a review process of a public repository, (iii) exposure to tool development as testers and (iv) a portfolio item in the form of a pFDA challenge submission. Users had the opportunity to work with biotechnology professionals active in the development of BioCompute, imparting upon them a greater understanding of the interaction between academic, industry and government institutions. Learning to navigate new code bases (bugs) by engaging directly with developers is a learning experience most novice bioinformaticians are not exposed to. Users who reported and discussed challenges in generating BCOs developed a strong understanding of both the tools and the standard. These novice non-informatician biologists ultimately produced actionable feedback as participants in the testing and development of the tools and training materials that will further enhance the BCO Specification and likely accelerate the acceptance of the BCO standards.

FAIR compliance

As BCOs are compliant with FAIR principles, the specification and schema contain features designed to make the encapsulated workflows and datasets findable, accessible, interoperable and reusable software, datasets and workflows. Each BCO provides execution data with corresponding scripts and script drivers necessary for workflow reproducibility, and data location accessibility requirements are transparent.

In addition to the adherence to FAIR data standards, the BioCompute Framework aligns with USFDA guidelines for Database Procedures and Operations (35); it enables transparency and public accessibility of data sources and standard operating procedures, in addition to ensuring secure version control.

Database applications

Beyond bioinformatics analysis, the BioCompute framework has successfully been applied to knowledge base data integration. Over the last two years, GlyGen (36), an NIH-funded glycoinformatics project, has generated BCOs for over 200 individual datasets (https://data.glygen.org/). Each individual BCO not only provides complete transparency of its data integration process to its authors, contributors and users but also includes detailed information of its data usability, data modification, versioning, keywords and quality control pipelines. Using the I/O (Input/Output) and execution domains, GlyGen provides the input, output (validated and failed) and script files to allow easy reproducibility and replicability by its users. Through BCO’s predefined fields and rules, GlyGen is able to document different data-specific workflows in a standardized format effectively. The generated BCOs are freely accessible for browsing and downloading through the GlyGen data portal (https://data.glygen.org/) under license CC BY 4.0. Similar BCOs also are available for the OncoMX knowledge base (https://data.oncomx.org/) (37).

In summary, in addition to the evaluation of the BCO framework, the process has been an effective method for evaluating BCO creation tools (CGC BCO App and BCO Portal) and training users to be resourceful in tool development. The BioCompute tools this paper presents make it easier to create tools for an emerging standard and are available prior to release. A preliminary review of the feedback provided identified potential changes to the BioCompute Specification Document, additions to the CGC BCO App training materials and BCO Portal modifications. Future tool releases will have increased usability due to tool enhancements and documentation revisions recommended by users.

Future applications

Future work will build on these tools, such as by building databases and repositories of validated BCOs, integrating them into relevant government and academic systems and working with private sector participants to help integrate the standard into their existing platforms, based on the work presented here, to expedite communication. The BCO-based system could evolve to become a formalized mechanism of communication, such as by the Drug Master File or as part of its own section in an application, to government agencies like the FDA, USPTO (United States Patent and Trademark Office), CMS (Center for Medicare and Medicaid Services), CDC (Center for Disease Control and Prevention), EPA (Environmental Protection Agency) and others.

Conclusion

Emerging data analysis challenges include increasing dataset size and complexity that cannot be practically copied for analysis due to slow transfer rate, archival maintenance, privacy concerns and data ownership restrictions. As datasets grow very large, there is a growing interest in bringing computations to data rather than the other way around, such as through cloud service providers partnering with the STRIDES (Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability) program. Consequentially, there is a need to document NGS workflow analyses, including NGS data provenance across a range of computational environments including computational platforms that support genomic analysis, with the goal of ensuring that these analyses can be replicated in a variety of environments. We further posit that detailed NGS documentation requires a clear communication of NGS analysis (workflows and data usage) that is both human and machine readable. The BioCompute standard, IEEE 2791-2020, aims to fill this need to clearly communicate NGS analysis workflows and has the potential for accelerating research (34). BCO tools allow researchers to create BCOs that adhere to the community-developed BioCompute Specification to encode pertinent information to record data provenance, facilitate regulatory review and improve reproducibility of results, and they allow them to do so quickly and easily, without needing to learn the standard.

Supplementary data

Supplementary data are available at Database Online.

Acknowledgements

We would like to thank Holly Stephens and Sean Watford from Booz Allen Hamilton for their participation in the Bioinformatics unit that piloted the methods described in this manuscript.

Funding

This project was supported in part by funds from the NIH National Cancer Institute (awards HHSN261201400008C and HHSN261201500003I for Cancer Genomics Cloud and for development of the CGC BCO App), U.S. Food and Drug Administration (75F40119C10136 and HHSF223201510129C to RM), U.S. National Institute of Health, Glycoscience Common Fund (1U01GM125267 to RM) and U.S. National Institute of Health, National Cancer Institute (CA215010 to RM).

Disclaimer

This paper is an informal communication between FDA employees and a collaborator. It represents their own best judgment. This article does not bind or obligate FDA nor does it express any opinions of the Agency.

References

Simonyan

Chumakov

Dingerdissen

et al. (

2016

)

High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis

Database (Oxford)

2016

–

) https://www.ga4gh.org/work_stream/genomic-knowledge-standards/ (

Simonyan

Mazumder

(

2014

)

High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis

Genes (Basel)

957

–

981

Lau

J.W.

Lehnert

Sethi

et al. (

2017

)

The cancer genomics cloud: collaborative, reproducible, and democratized – a new paradigm in large-scale computational research

Cancer Res.

–

Jalili

Afgan

et al. (

2020

)

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update

Nucleic Acids Res.

W395

–

W402

Genomic Knowledge Standards

. (

2017

31 December 2020, date last accessed

Watkins

Rynearson

Henrie

et al. (

2019

)

Implementing the VMC specification to reduce ambiguity in genomic variant representation

AMIA Annu. Symp. Proc.

2019

1226

–

1235

PubMed

Hl7. (

2014

)

FHIR Specification FHIR v0.0.82

. http://hl7.org/fhir/ (

31 December 2020, date last accessed

Amstutz

Chapman

Chilton

et al. (

2016

)

Common Workflow Language, v1.0 Common Workflow Language (CWL) Command Line Tool Description, v1.0

OpenWDL. (

2018

)

Workflow Description Language

. https://openwdl.org/#three (

31 December 2020, date last accessed

10.

Koster

and

Rahmann

(

2012

)

Snakemake—a scalable bioinformatics workflow engine

Bioinformatics

2520

–

2522

11.

Seqera Labs. (

2020

)

Nextflow - A DSL for Parallel and Scalable Computational Pipelines

. https://www.nextflow.io/ (

31 December 2020, date last accessed

12.

Carragáin

E.Ó.

Goble

Sefton

et al. (

2019

)

A lightweight approach to research object data packaging

13.

Kanwal

Khan

F.Z.

Lonie

et al. (

2017

)

Investigating reproducibility and tracking provenance – a genomic workflow case study

BMC Bioinform.

, 337.

) https://standards.ieee.org/standard/2791-2020.html (

14.

IEEE 2791–2020 - IEEE Standard for Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication

. (

2020

31 December 2020, date last accessed

15.

Simonyan

Goecks

and

Mazumder

(

2017

)

Biocompute Objects-A Step towards Evaluation and Validation of Biomedical Scientific Computations

PDA J. Pharm. Sci. Technol

136

–

146

16.

BCO_Specification

. (

2018

)

Repository for Support of the IEEE 2791–2020 Standard

. https://github.com/biocompute-objects/BCO_Specification (

6 May 2020, date last accessed

17.

Pezoa

Reutter

J.L.

Suarez

et al. (

2016

)

Foundations of JSON schema

. In:

25th International World Wide Web Conference, WWW 2016

International World Wide Web Conferences Steering Committee

, Montréal Québec, Canada, pp.

263

–

273

18.

Federal Register

. (

2020

)

Electronic Submissions; Data Standards; Support for the International Institute of Electrical and Electronics Engineers Bioinformatics Computations and Analyses Standard for Bioinformatic Workflows

https://www.federalregister.gov/documents/2020/07/22/2020-15771/electronic-submissions-data-standards-support-for-the-international-institute-of-electrical-and (

8 January 2021, date last accessed

19.

Xiao

Koc

Roberson

et al. (

2020

)

BCO app: tools for generating BioCompute Objects from next-generation sequencing workflows and computations

F1000Research

, 1144.

20.

Hornik

(

2012

)

The comprehensive R archive network

Wiley Interdiscip. Rev. Comput. Stat.

394

–

398

21.

Stephens

S.H.

King

C.H.

Watford

et al. (

2020

)

Strengthening the BioCompute standard by crowdsourcing on PrecisionFDA

bioRxiv

, 2020.11.02.365528.

22.

Wilkinson

M.D.

Dumontier

Aalbersberg

I.J.

et al. (

2016

)

Comment: the FAIR guiding principles for scientific data management and stewardship

Sci. Data

, 160018.

23.

Afgan

Baker

Batut

et al. (

2018

)

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update

Nucleic Acids Res.

W537

–

W544

24.

Grüning

Chilton

Köster

et al. (

2018

)

Practical computational reproducibility in the life sciences

Cell Syst.

631

–

635

25.

Sloggett

Goonasekera

and

Afgan

(

2013

)

BioBlend: automating pipeline analyses within Galaxy and CloudMan

Bioinformatics

1685

–

1686

26.

Dingerdissen

H.M.

Torcivia-Rodriguez

et al. (

2018

)

BioMuta and BioXpress: mutation and expression knowledgebases for cancer biomarker discovery

Nucleic Acids Res

D1128

–

D1136

27.

Martin

(

2011

)

Cutadapt removes adapter sequences from high-throughput sequencing reads

EMBnet.J.

, 10.

10.1371/journal.pone.0099033

28.

Amazon

. (

2015

)

About AWS

. https://aws.amazon.com/about-aws/ (

15 October 2019, date last accessed

29.

Santana-Quintero

Dingerdissen

Thierry-Mieg

et al. (

2014

)

HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis

PLoS One

, e99033. doi:

30.

Simonyan

Chumakov

Donaldson

et al. (

2017

)

HIVE-heptagon: a sensible variant-calling algorithm with post-alignment quality controls

Genomics

109

131

–

140

31.

Shamsaddini

Pan

Johnson

W.E.

et al. (

2014

)

Census-based rapid and accurate metagenome taxonomic profiling

BMC Genomics

, 918. doi:

10.1186/1471-2164-15-918

32.

Hadley

King

Keeney

et al. (

2020

)

Communicating regulatory high throughput sequencing data using BioCompute Objects disclaimer

bioRxiv

, 2020.12.07.415059.

. https://www.ddl.nl/bio-informatics/#athena-virology-pipeline (

33.

Diagnostic Laboratory (DDL). (2016)

Bioinformatics - DDL Diagnostic Laboratory

8 January 2021, date last accessed

34.

Alterovitz

Dean

Goble

et al. (

2018

)

Enabling precision medicine via standard communication of HTS provenance, analysis, and results

PLoS Biol.

, e3000099.

. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/use-public-human-genetic-variant-databases-support-clinical-validity-genetic-and-genomic-based-vitro (

35.

FDA

Use of public human genetic variant databases to support clinical validity for genetic and genomic-based in vitro diagnostics

15 October 2019, date last accessed

36.

York

W.S.

Mazumder

Ranzinger

et al. (

2019

)

GlyGen: computational and informatics resources for glycoscience

Glycobiology

–

37.

Dingerdissen

H.M.

Bastian

Vijay-Shanker

et al. (

2020

)

OncoMX: a knowledgebase for exploring cancer biomarkers in the context of related cancer and healthy data

JCO Clin. Cancer Inform.

210

–

220

PubMed