A combinatorial approach implementing new database structures to facilitate practical data curation management of QTL, association, correlation and heritability data on trait variants

‘Modifiers’ and ‘qualifiers’ used in the implementation of a new trait variant management system, where trait variants are curated at the experiment level

	Modifiers	Qualifiers
1	Analysis	Adjusted, calculated and estimated
2	Anatomy location	Above, anterior, at, below, by, dorsal, in, of, on and posterior
3	Environment	Challenge, confinement and stress
4	Population	Calves, cows, ewes, heifers, layers and sows
5	Instrument	Manufacturer, name and type
6	Measurement	Amount, area, character, color, composition, count, length, maximum, response, speed and weight
7	Parity	Count
8	Kinship	Dam, daughter, maternal, paternal and sire
9	Stage	Adult, end, feeder, finisher, gestation, lactation, nursing, parturition, start, weaning and yearling
10	Time	After, age, at, basis, before, by, duration and weight
11	Treatment	Challenge, drug, fast, feed, freeze, thaw and trim

	Modifiers	Qualifiers
1	Analysis	Adjusted, calculated and estimated
2	Anatomy location	Above, anterior, at, below, by, dorsal, in, of, on and posterior
3	Environment	Challenge, confinement and stress
4	Population	Calves, cows, ewes, heifers, layers and sows
5	Instrument	Manufacturer, name and type
6	Measurement	Amount, area, character, color, composition, count, length, maximum, response, speed and weight
7	Parity	Count
8	Kinship	Dam, daughter, maternal, paternal and sire
9	Stage	Adult, end, feeder, finisher, gestation, lactation, nursing, parturition, start, weaning and yearling
10	Time	After, age, at, basis, before, by, duration and weight
11	Treatment	Challenge, drug, fast, feed, freeze, thaw and trim

This scheme helped relieve curation and data management burdens caused by long and unwieldy lists of ‘sibling traits’. In addition to these modifiers and qualifiers as controlled vocabularies, we also have a free-text field to allow additional descriptions when the modifier/qualifier does not precisely cover the scenario. The data collected with this free-text field will be used to improve the controlled list of modifiers/qualifiers.

Table 1.

Open in new tab Download slide

‘Modifiers’ and ‘qualifiers’ used in the implementation of a new trait variant management system, where trait variants are curated at the experiment level

	Modifiers	Qualifiers
1	Analysis	Adjusted, calculated and estimated
2	Anatomy location	Above, anterior, at, below, by, dorsal, in, of, on and posterior
3	Environment	Challenge, confinement and stress
4	Population	Calves, cows, ewes, heifers, layers and sows
5	Instrument	Manufacturer, name and type
6	Measurement	Amount, area, character, color, composition, count, length, maximum, response, speed and weight
7	Parity	Count
8	Kinship	Dam, daughter, maternal, paternal and sire
9	Stage	Adult, end, feeder, finisher, gestation, lactation, nursing, parturition, start, weaning and yearling
10	Time	After, age, at, basis, before, by, duration and weight
11	Treatment	Challenge, drug, fast, feed, freeze, thaw and trim

	Modifiers	Qualifiers
1	Analysis	Adjusted, calculated and estimated
2	Anatomy location	Above, anterior, at, below, by, dorsal, in, of, on and posterior
3	Environment	Challenge, confinement and stress
4	Population	Calves, cows, ewes, heifers, layers and sows
5	Instrument	Manufacturer, name and type
6	Measurement	Amount, area, character, color, composition, count, length, maximum, response, speed and weight
7	Parity	Count
8	Kinship	Dam, daughter, maternal, paternal and sire
9	Stage	Adult, end, feeder, finisher, gestation, lactation, nursing, parturition, start, weaning and yearling
10	Time	After, age, at, basis, before, by, duration and weight
11	Treatment	Challenge, drug, fast, feed, freeze, thaw and trim

Figure 3.

A screenshot of a curation web form showing part of the experiment curation environment. It shows how this implementation allows trait variants to be created from their base traits using controlled vocabulary lists to define modifiers/qualifiers.

Multiple modifiers

With this new management scheme at the experiment level, we are able to maintain minimum numbers of controlled vocabulary terms for modifiers and qualifiers in order to facilitate the consistent use of terms over time. Currently, the system can accommodate up to three (3) modifiers attached to a base trait to cover most, if not all, trait variants we encounter. An example of a trait variant with multiple modifiers is ‘drip loss in pectoralis muscle at 24-hr post-mortem’. If more than one modifier is required to define a trait variant, the new curator tool has a mechanism to denote the relationships between modifiers. For instance, in the example mentioned earlier, the anatomy location (‘in pectoralis muscle’) and time (‘at 24-hr post-mortem’) modifiers are dependent on each other to fully describe the trait; we consider these modifiers linked, or ‘chained’. On the other hand, body weight at weaning could be described either by the stage (weaning) or the age at which it occurs (e.g. 21 days). In this case, ‘at 21 days’ and ‘at weaning’ are independent modifiers (alternatives or ‘in parallel’).

Improvements

As part of the database transition to using the new curation scheme described earlier, we have begun a migration of all ‘sibling trait’ data curated in previous years to the ‘trait variant’ scheme under the new structure in both the QTLdb and CorrDB. Throughout this transition, a total of 1256 new trait variants have been created for 278 base traits. The new trait variants include 418 for QTL/associations, 425 for correlations and 413 for heritability. This process has affected 22 205 curated data, including 16 227 QTL/associations, 5573 correlations and 415 heritability data (Table 2). As a result of these changes, we have effectively reduced the number of extended trait data managed within the database trait ontology structure by an average of 71.5% for QTL/association/correlation/heritability data in both the QTLdb and CorrDB (Table 3). These results reflect a significant positive impact on the QTLdb and CorrDB, in terms of not only providing a simpler structure for trait concepts but also helping to standardize the curation protocols and setting a sustainable stage for future database developments.

Table 2.

The number of experiments and annotated data affected in the QTLdb and CorrDB due to trait management changes from ‘sibling traits’ to ‘trait variants’ in 2022

Data	Affected data types	Cattle	Chicken	Goat	Horse	Pig	Rainbow trout	Sheep	Total
QTL/association	Total base traits (BT)	678	370	25	65	692	28	265	2123
	BT with variants	28	10	2	1	33	6	13	93
	New trait variants	112	114	4	1	110	6	71	418
	Experiments affected	123	342	2	1	93	1	39	625
	Annotated data affected	10 010	648	10	16	4906	174	463	16 227
Correlation	Total BTs	373	106	33	36	252		76	876
	BT with variants	42	13		1	17		18	91
	New trait variants	181	52		1	101		90	425
	Experiments affected	40	21		1	18		22	102
	Annotated data affected	1392	135		10	3143		893	5573
Heritability	Total BTs	395	112	2	53	285		96	943
	BT with variants	43	13	1	1	18		18	94
	New trait variants	170	52	3	1	97		90	413
	Experiments affected	45	19	1	1	19		22	107
	Annotated data affected	163	9	3	1	203		36	415

Data	Affected data types	Cattle	Chicken	Goat	Horse	Pig	Rainbow trout	Sheep	Total
QTL/association	Total base traits (BT)	678	370	25	65	692	28	265	2123
	BT with variants	28	10	2	1	33	6	13	93
	New trait variants	112	114	4	1	110	6	71	418
	Experiments affected	123	342	2	1	93	1	39	625
	Annotated data affected	10 010	648	10	16	4906	174	463	16 227
Correlation	Total BTs	373	106	33	36	252		76	876
	BT with variants	42	13		1	17		18	91
	New trait variants	181	52		1	101		90	425
	Experiments affected	40	21		1	18		22	102
	Annotated data affected	1392	135		10	3143		893	5573
Heritability	Total BTs	395	112	2	53	285		96	943
	BT with variants	43	13	1	1	18		18	94
	New trait variants	170	52	3	1	97		90	413
	Experiments affected	45	19	1	1	19		22	107
	Annotated data affected	163	9	3	1	203		36	415

Table 2.

The number of experiments and annotated data affected in the QTLdb and CorrDB due to trait management changes from ‘sibling traits’ to ‘trait variants’ in 2022

Data	Affected data types	Cattle	Chicken	Goat	Horse	Pig	Rainbow trout	Sheep	Total
QTL/association	Total base traits (BT)	678	370	25	65	692	28	265	2123
	BT with variants	28	10	2	1	33	6	13	93
	New trait variants	112	114	4	1	110	6	71	418
	Experiments affected	123	342	2	1	93	1	39	625
	Annotated data affected	10 010	648	10	16	4906	174	463	16 227
Correlation	Total BTs	373	106	33	36	252		76	876
	BT with variants	42	13		1	17		18	91
	New trait variants	181	52		1	101		90	425
	Experiments affected	40	21		1	18		22	102
	Annotated data affected	1392	135		10	3143		893	5573
Heritability	Total BTs	395	112	2	53	285		96	943
	BT with variants	43	13	1	1	18		18	94
	New trait variants	170	52	3	1	97		90	413
	Experiments affected	45	19	1	1	19		22	107
	Annotated data affected	163	9	3	1	203		36	415

Data	Affected data types	Cattle	Chicken	Goat	Horse	Pig	Rainbow trout	Sheep	Total
QTL/association	Total base traits (BT)	678	370	25	65	692	28	265	2123
	BT with variants	28	10	2	1	33	6	13	93
	New trait variants	112	114	4	1	110	6	71	418
	Experiments affected	123	342	2	1	93	1	39	625
	Annotated data affected	10 010	648	10	16	4906	174	463	16 227
Correlation	Total BTs	373	106	33	36	252		76	876
	BT with variants	42	13		1	17		18	91
	New trait variants	181	52		1	101		90	425
	Experiments affected	40	21		1	18		22	102
	Annotated data affected	1392	135		10	3143		893	5573
Heritability	Total BTs	395	112	2	53	285		96	943
	BT with variants	43	13	1	1	18		18	94
	New trait variants	170	52	3	1	97		90	413
	Experiments affected	45	19	1	1	19		22	107
	Annotated data affected	163	9	3	1	203		36	415

Table 3.

Total number of trait changes due to the database transition from using ‘sibling traits’ to ‘trait variants’ in 2022

	Sibling traits	Trait variants	Change (%)
QTL/association	2272	418	−81.6
Correlation	902	425	−52.9
Heritability	1061	413	−79.9
Average			−71.5

Table 3.

Total number of trait changes due to the database transition from using ‘sibling traits’ to ‘trait variants’ in 2022

	Sibling traits	Trait variants	Change (%)
QTL/association	2272	418	−81.6
Correlation	902	425	−52.9
Heritability	1061	413	−79.9
Average			−71.5

The successful migration of ‘sibling traits’ to ‘trait variants’ in a relatively short period of time demonstrates that the new data management implementation works well as designed. Furthermore, this implementation has also significantly reduced many of the frustrations of our data curators, as well as database maintainers, regarding the day-to-day work dealing with emerging cases when curating ‘sibling traits’. Allowing trait variants to be curated at the experiment level gives curators the flexibility to address them on a case-by-case basis and helps reduce clutter in the database trait hierarchy while maintaining data stringency at the database level.

From a database management perspective, this work added ‘trait variants’ as an extension to trait ontology terms (‘base traits’), which separates the management of trait variants from the handling of the trait ontology hierarchy (Figure 2b). The addition of MySQL tables in the current implementation (Figure 2a versus Figure 2b) facilitated trait data partitioning, compartmentalization, relationship building and other logistics. To accommodate the data structural changes, web interface tools have been created or updated to facilitate the trait variant curation, integrity checking and data display/download. Overall, these database changes have helped simplify the manual curation of trait nomenclature information, while simultaneously capturing the complexity of published traits.

Appended to base traits, trait variant information is valuable to facilitate data comparisons for end users evaluating data across time and experiments. At the time of this report, we are in the process of making the newly produced trait variant information available in data downloads and web tools. These data will be visible to the public by the April 2023 database release.

Discussion

Not only does the sheer volume of newly published data create challenges for Animal QTLdb and CorrDB curation, but also curation/database processes must be adapted to accommodate different data formats, new analysis methods and varying levels of trait data granularity. In contrast to our earlier ‘sibling traits’ system, which attempted to add trait variations into a trait ontology and presented extra challenges for ontology development, our method of developing ‘trait variants’ as extensions of ontology terms (‘base traits’) helps isolate complex trait handling outside of trait ontology development. While the concept partitioning method is effective in simplifying the management of complex trait information, we wish to point out that the level of granularity captured needs careful consideration in order to maximize the overall benefits. For example, the need to consider how traits are defined in multiple animal species further increases the level of complexity.

Gkoutos et al. (9) demonstrated the use of a decomposition strategy to dissect the terms in the Human Phenotype Ontology into their entity/quality properties using the Phenotype and Trait Ontology. While this was effective in their work using human medical data, it is obvious that more factors are needed for the accurate dissemination of trait information in livestock animals. Our approach using modifiers/qualifiers demonstrates the possibility of partitioning complex traits using additional trait descriptor information and provides a better structure for the curation management of trait details.

Our approach has effectively helped reduce the lengthy list of ‘compound modifiers’, which were impractical to use. (In our previous ‘sibling trait’ management system, trait modifiers were almost developed into a separate ‘ontology’ structure.) While the modifier factor partitioning approach provides possibilities for a more scalable system, it also opens additional opportunities for complex trait curation and management. For example, while we have implemented mechanisms to handle ‘chained’ or ‘parallel’ modifiers, more complex modifier relationships (such as mixed ‘chained’ and ‘parallel’ modifiers) may exist which require solutions in the near future. This is one area in which the current system is still subject to further development to refine the details.

Note that on the trait variant curation form, there is a free-text field (Figure 3) to collect the trait name reported in a publication. This serves to link real-world trait terms used by researchers and/or producers to ontology terms via the trait variant structure and is useful from a data comparison perspective.

Trait ontology development is an ongoing process, and it is expected that the trait variant system will also need to be expanded or updated in the future. It is important to carefully consider the details regarding the implementation of the trait variant system to ensure its ongoing stability and viability. For instance, it is necessary to appropriately distinguish ‘base traits’ and ‘trait variants’. As an example, since 305-day MY is such a widely used measurement standard for bovine dairy production, people may consider it to be synonymous with MY, but there are several other potential modifiers that may apply to the base ‘MY’ trait. In cases like these, there are multiple factors to consider before determining the most appropriate base trait.

Since trait variants are now created and managed at the experiment level, each trait variant must be re-created for every experiment in which it is used. This will be simplified once the patterns of common complex traits partitions/compositions are established. However, it requires curators to be familiar with the commonly reused complex traits or to refer to the established trait variant list for references (https://www.animalgenome.org/QTLdb/doc/meta/tvarinfo). It could be a steep learning curve for new curators, however, necessitating further improvements to the trait curation environment. One possibility is the implementation of an artificial intelligence helper to suggest trait variants and make them easier to introduce. Overall, these changes have not only provided a workable solution for curating complex traits but also given opportunities for further improvements with better-structured data that are more accessible using programs.

Data availability

The database contents and tools are all freely available online. QTLdb: https://www.animalgenome.org/QTLdb/; CorrDB: https://www.animalgenome.org/CorrDB/. In addition, the data is also available upon release at several data alliance partner websites, including NCBI: http://www.ncbi.nlm.nih.gov/gene; Ensembl: http://www.ensembl.org/; UCSC: https://genome.ucsc.edu/cgi-bin/hgGateway; Reuters Data Citation Index: http://wokinfo.com/products_tools/multidisciplinary/dci/.

Funding

This work was supported by the United States Department of Agriculture National Institute of Food and Agriculture [grant number GR-024831-00002].

Conflict of interest statement

None declared.

References

Z.L.

Park

C.A.

and

Reecy

J.M.

(

2022

)

Bringing the Animal QTLdb and CorrDB into the future: meeting new challenges and providing updated services

Nucleic Acids Res.

D956

–

D961

Z.L.

and

Reecy

J.M.

(

2007

)

Animal QTLdb: beyond a repository. A public platform for QTL comparisons and integration with diverse types of structural genomic information

Mamm. Genome

–

Shimoyama

Nigam

McIntosh

L.S.

et al. (

2012

)

Three ontologies to define phenotype measurement data

Front. Genet.

, 87.

Park

C.A.

Bello

S.M.

Smith

C.L.

et al. (

2013

)

The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species

J. Biomed. Semantics

, 13.

Z.L.

Park

C.A.

and

Reecy

J.M.

(

2016

)

Developmental progress and current status of the Animal QTLdb

Nucleic Acids Res.

D827

–

D833

Smith

J.R.

Park

C.A.

Nigam

et al. (

2013

)

The clinical measurement, measurement method and experimental condition ontologies: expansion, improvements and new applications

J. Biomed. Semantics

, 26.

Fabian

Wächter

and

Schroeder

(

2012

)

Extending ontologies by finding siblings using set expansion techniques

Bioinformatics

i292

–

i300

Z.L.

Park

C.A.

and

Reecy

J.M.

(

2019

)

Building a livestock genetic and genomic information knowledgebase through integrative developments of Animal QTLdb and CorrDB

Nucleic Acids Res.

D701

–

D710

Gkoutos

G.V.

Mungall

Dolken

et al. (

2009

)

Entity/quality-based logical definitions for the human skeletal phenome using PATO

Annu. Int. Conf. IEEE Eng. Med. Biol. Soc.

2009

7069

–

7072

PubMed