Gene ancestries reveal diverse microbial associations during eukaryogenesis
Researchers have reconstructed the proteome of the Last Eukaryotic Common Ancestor (LECA) by analyzing 256 eukaryotic proteomes and a massive prokaryotic dataset. According to the technical documentation, this process identified genetic contributions from Alphaproteobacteria, Asgard archaea, Planctomycetota, Myxococcota, and the viral phylum Nucleocytoviricota.
The project utilized a rigorous data pipeline to filter contamination and redundancy. Analysts used three distinct datasets, named eTOLDBA, eTOLDBB, and eTOLDBC, to ensure the reconstructed proteome remained consistent across different species sets.
How was the LECA proteome reconstructed?
The team retrieved 276 eukaryotic proteomes from sources including NCBI, UniProt, and Ensembl. After removing proteins based on low-complexity regions and size constraints, 256 clean proteomes remained for analysis.
To identify ancestral genes, researchers used OrthoFinder v.2.5.5 and a broad prokaryotic database called BroadDB. BroadDB included representatives from 62,291 bacteria and 3,412 archaea, alongside 1,319,927 viral protein clusters from RVDB v.25.0.
Genes were classified as “innovations” if they had no non-eukaryotic homologues, or “acquisitions” if they were nested within non-eukaryotic sequences. The researchers used a “taxonomic bootstrap” to ensure the donor groups were identified accurately and to remove noise from the data.
Which organisms contributed to the LECA genome?
The analysis confirmed the expected contributions from Alphaproteobacteria and Asgard archaea. However, the “stress test” parameters identified three additional non-negligible donors.
These additional contributors include two bacterial clades, Planctomycetota and Myxococcota, and the viral phylum Nucleocytoviricota. Some acquisitions were categorized as “viral-mediated,” meaning the gene pointed to a viral sister followed by prokaryotic sisters.
To validate these findings, the team performed a “verticality test.” This involved excluding all alphaproteobacterial sequences to determine if other bacterial signals were simply a result of vertical evolution from the main alphaproteobacterial donor.
What are the implications for eukaryotic metabolism?
The reconstruction allowed for the creation of a consensus proteome. This proteome includes KO (KEGG Orthology) annotations that were pervasive across the three TOLDB datasets or supported by at least five supergroups.
Researchers used the anvi’o v.8 package to infer the presence and completeness of metabolic pathways. This resulted in a reconstructed metabolic map that defines the general metabolism of the LECA.
The team also identified “eukaryotic signature proteins.” These are defined as KOs from the consensus proteome that were found exclusively as innovations, meaning they emerged within the eukaryotic lineage itself.
What may happen next?
Future analysis could involve comparing the inferred LECA proteome against a wider array of unicellular, heterotrophic eukaryotic organisms. This could further refine the percentage of shared KOs between the ancestor and modern species.
Researchers may also apply the Bayesian framework used for “acquisition waves” to other ancestral reconstructions. This could potentially establish more precise probabilistic timelines for how different gene families were transferred to the protoeukaryote.
Frequently Asked Questions
How many proteomes were used in the final eTOLDB construction?
The researchers started with 276 proteomes and discarded 30 based on quality and completeness checks, leaving 256 proteomes.
What is an “innovation” in the context of this study?
An innovation is an mLECA-OG (orthologous group) that has no non-eukaryotic homologues, suggesting it evolved within eukaryotes.
What criteria were used to define a “strict” LECA group?
A strict definition required the group to comprise proteins from at least five species, representatives of the two stems, and five or more different supergroups.
Do you think the discovery of viral contributions to the LECA genome changes how we view the origin of complex life?