Loading
Loading
We Still Cannot Predict an Organism's Traits from Its Genome
Decades after the first genome was sequenced, the biggest gap in biological knowledge remains our inability to look at an organism's genetics and environment and predict its observable characteristics (phenotype). NSF has designated "Understanding the Rules of Life: Predicting Phenotype" as one of its formal Big Ideas for Future Investments. The theoretical constructs that explain and predict characteristics of living systems — from molecular components to cells, whole organisms, communities, and biomes — remain largely undiscovered. This is not a data quantity problem: we have petabytes of genomic data. It is a knowledge gap about how genetic information interacts with epigenetic regulation, environmental context, and emergent properties across levels of biological organization to produce phenotypes.
Genotype-to-phenotype prediction would transform medicine (predicting disease risk and drug response from genome), agriculture (designing crops for specific environments), conservation (predicting which species can adapt to climate change), and biotechnology (engineering organisms with desired properties). Current approaches to crop improvement, personalized medicine, and conservation genetics all rely on statistical associations rather than mechanistic understanding — associations that break down in new genetic backgrounds, new environments, or new species. For plants specifically, species with complex genomes, difficult breeding systems, and long generation times cannot be improved using the statistical approaches that work for model organisms, leaving most of the world's crop diversity inaccessible to genomics-assisted improvement.
Functional genomics has focused narrowly on candidate genes and single-gene manipulation for genes of large effect, which captures only a fraction of phenotypic variation — most traits are influenced by hundreds or thousands of loci with small effects. Genome-wide association studies (GWAS) identify statistical associations but not causal mechanisms, and predictions do not transfer across populations or environments. Epigenetic complexity — heritable phenotypic properties that occur without genome sequence modification — has fundamentally complicated the picture beyond what sequence-based approaches can capture. Emergent properties arise from complex, nonlinear interactions among biological systems that in isolation do not exhibit such properties — determining these emergent network properties is a critical unsolved problem. For plants, transformation bottlenecks (genotype-dependent regeneration from tissue culture) prevent functional validation in most crop species.
Systems-level analyses of gene-regulatory networks and their emergent functional properties, elucidating causal connections across levels of biological organization. High-throughput phenotyping tools linked to genomic data, especially for non-model organisms and field conditions. AI/ML approaches capable of integrating multi-omic datasets (genomic, proteomic, metabolomic, phenomic) into predictive frameworks. Genotype-independent plant transformation methods that circumvent tissue culture bottlenecks. Integrative approaches bridging plant physiology, ecology, evolution, and development through engineering and quantitative modeling.
A student team could select a specific quantitative trait in a model organism (e.g., drought tolerance in Arabidopsis) and compare the predictive accuracy of different modeling approaches — GWAS, gene network models, machine learning on multi-omic data — using publicly available datasets from TAIR or Phytozome, identifying where each approach fails and why. Alternatively, a team could develop a high-throughput phenotyping protocol using computer vision and low-cost imaging hardware for a specific crop trait (plant height, leaf area, color-based stress indicators) in a campus greenhouse. Relevant skills: genomics, bioinformatics, machine learning, plant biology, image processing.
- "Understanding the Rules of Life" is one of NSF's 10 Big Ideas for Future Investments, reflecting the fundamental importance and difficulty of this problem. - Cross-domain connection: shares structure with `bio-organism-climate-response-prediction` (predicting organismal response to climate requires genotype-phenotype prediction in environmental context) and `bio-ecological-forecasting-skill-gap` (ecological forecasting is limited by our inability to predict how individual organisms will respond). - The `failure:not-attempted` tag applies because the multi-scale, multi-omic integration required for genuine genotype-to-phenotype prediction has not been attempted at the systems level for most organisms — work has focused on individual genes or statistical associations rather than mechanistic, predictive models. - The `failure:disciplinary-silo` tag applies because genomics, phenomics, ecology, and systems biology remain largely separate fields with different data types, tools, and conceptual frameworks. - The IntBIO track within IOS explicitly funds proposals that integrate biology with engineering, math, and physical sciences to address the rules of life.
"Understanding the Rules of Life: Predicting Phenotype," NSF Big Ideas; "IOS Core Programs," NSF 24-546 (IntBIO Track); "PGRP: Plant Genome Research Program," NSF 24-547; "URoL:ASC," NSF 23-512. https://www.nsf.gov/funding/opportunities/pgrp-plant-genome-research-program/5338/nsf24-547 (accessed 2026-02-15).