Overinterpreting underpowered multi-omics experiments
Betaine, Exercise-in-a-pill, Geroprotection
I enjoy scrolling the “recently published” pages of journals, searching for new interesting papers. I have a list of journals and preprint servers that I’m screening almost weekly. While I’ve been doing this for years, only recently did I start sharing papers I find interesting. I generally avoid being critical, preferring instead to stay positive and optimistic about the incredibe opportunities that modern science continues to open for our future.
While screening paper titles a couple of weeks ago, I came across this Cell paper titled "Systematic profiling reveals betaine as an exercise mimetic for geroprotection". Despite using more buzzwords than one would expect (“exercise in a pill”, “geroprotection”), deep profiling studies using multi-omics in human populations are in my search focus. Thus, the paper attracted my attention and I wrote my thoughts about it on an X post.
The post drew more attention than I’m used to, so I decided to follow up with a few thoughts sparked by some of the replies and comments it received.
A summary of the study’s human multiomics analyses
The researchers enrolled 13 healthy young (aged 24-33 years) men in a three-phase experiment:
Baseline (BL): 45 days of low physical activity.
Acute Exercise (AE): A single 5km run.
Long-Term Exercise (LE): 25 days of regular 5km runs.
The researchers collected blood and stool samples across these phases and applied a battery of high-throughput assessments: complete blood counts, inflammatory biomarkers, classical biochemistry, single-cell transcriptomics peripheral blood mononuclear cells (PBMCs), plasma proteomics and metabolomics, and fecal microbiome and metabolome analyses. Furthermore, they used fitness trackers to ensure adherence. The study’s omics profiling depth is impressive for a longitudinal human cohort.
This resulted in a very rich multi-dimensional dataset that aimed to map how exercise reshapes the human body at a molecular level. The authors reported a number of interesting findings that align with what we know about exercise’s systemic benefits, from reducing inflammation to improving metabolic health. For example:
Acute exercise led to increases in pro-inflammatory markers, while long-term exercise reduced them.
Long-term exercise lowered gamma-glutamyl transferase (gGT), indirect bilirubin, and triglycerides, suggesting positive metabolic effects.
Long-term exercise shifted immune cells toward a lymphoid lineage and reduced heart rate and BMI.
Such datasets are extremely expensive to generate with questionable return of investment at such small sample sizes. However, their multimodal nature makes them unique and potentially valuable for a number of reasons. As an open-source resource, this dataset could act as:
the foundation for generating new hypotheses
a reference atlas to other researchers for biomarker changes in response to acute or long-term exercise
a guide for designing more efficient future studies in larger sample sizes with way fewer measurements.
Most importantly, this could be a stepping stone toward generating larger multi-omics datasets in global collaborative consortia.
Until here, all fine.
tl;dr It is a very expensive experiment (it wouldn’t be the first, and it won’t be the last), but the generated dataset could have value for the scientific community as a free resource.
The experimental data on betaine
Could this lead to a good publication? Most likely yes.
Could this be a Cell paper by itself? Probably not.
Unfortunately, high-profile academic publishing incentivizes “comprehensive narratives”. Digging into this rabbithole, the authors went a step further, building a narrative around the idea that this small sample of 13 healthy men could reveal groundbreaking results, claiming to have identified a molecule that mimics the effects of exercise.
Based on a pattern of differences between acute and long-term exercise in the metabolomic data, the study identified a pathway signal suggesting increased amino acid metabolism and particularly betaine metabolism in the long-term condition.
Building on this finding, the authors conducted a series of in vitro and in vivo experiments, suggesting betain to have anti-inflammatory and “geroprotective” effects. Specifically, they found
In vitro: Betaine pretreatment in human cell lines under LPS challenge showed anti-inflammatory effects (LPS challenge is a model of microbial infection, as lipopolysaccharide or LPS is a component of Gram -negative bacteria and induces a similar pro-inflammatory response as a bacterial infection). The authors identified a potential mechanism by showing that betaine bind and potentially inhibits a kinase involved in immune responses called TBK1 (a kinase that promotes the phosphorylation of other molecules enhancing or inhibiting their action).
In vivo: In LPS-challenged mice, betaine was confirmed to inhibit TBK1 in kidney and to lead to systemic anti-inflammatory activity.
Additional aging models: Finally, they tested betaine in senescent cells and aged mice, where 3 months of oral treatment led to behavioral and molecular changes consistent with anti-aging effects across multiple organs.
Based on these data, the study comes to the conclusions that betaine is an exercise-mimicking molecule that has geroprotective effect and should be explored for pharmacological interventions aiming to deliver “exercise in a pill”.
The authors’ words:
“These findings systematically elucidate the molecular benefits of exercise and position betaine as an exercise mimetic for healthy aging.”
“Elucidating the multi-faceted health effects of exercise in humans identifies kidney-derived betaine as an exercise mimetic acting as a TBK1 inhibitor with anti-inflammatory and geroprotective activities.”
“Our study delineated the molecular blueprint through which exercise reshapes human physiology, providing mechanistic insights into its health benefits. The identified exercise-induced factors, including betaine, offer potential for developing "exercise in a pill" to promote healthy aging.”
“There might be some questionable statistics here. So what?"*
As exciting as this sounds, the narrative becomes less convincing when we take a closer look at the human data, which after all were the foundation for generating the betaine hypothesis.
First, changes across baseline, acute, and long-term exercise in plasma and fecal metabolomics, proteomics, Olink inflammation panels, biochemistry, and metagenomics were not corrected for multiple comparisons. As the authors write repetitively:
“…to mitigate the risk of high false-negative rates in differential analysis of plasma proteome data in our cohort, we used raw p values without multiple hypothesis correction in our statistical analyses. […] Differentially expressed proteins (DEPs) analysis was achieved using the two-tailed paired Wilcoxon test with threshold p value < 0.05.”
This means the authors treated a p-value < 0.05 as significant without any correction for multiple testing. This is really unheard of and completely against basic widely accepted current standards in omics analyses. I really can’t imagine how no one (co-authors, reviewers, editors) flagged this. With thousands of features measured in just 13 individuals, this approach guarantees hundreds of false positive results.
But would it matter if the effect of betaine were so strong that it would stand out regardless of the statistical approach? Fair question. I have to admit that it was hard to pinpoint the main result for betaine in the paper, as the presentation of the results was too descriptive, avoiding concrete numbers. Hidden in Table S5 and without reporting any p-value, betaine was found to be among the upregulated metabolites in the long-term vs. acute exercise comparisons. Tt was the 77th most upregulated metabolite with a log2FC of 0.106. This means that long-term exercise induced a ~ 7% increase in betaine levels when compared to acute exercise.
This is a tiny effect! Still, the authors argue that they replicate this finding also with targeted metabolomics in the same samples. This time, they provide a detailed graph, which shows a huge overlap across conditions.
I find it really hard to believe that a molecule upregulated by just 7% is the main causal driver of the effects of exercise and could be used as a pill that mimics exercise. As such, the preclinical results, while intriguing, feel like a narrative built on a foundation that starts to fall apart when you look more closely.
Thoughts and implications
Some thoughts regarding this paper. Many of them were inspired by comments and feedback I received on my X post.
N=13 is far too small of a sample size for most biological questions. There is a massive number of such omics datasets out there. This is to be expected, as most of these analyses are very expensive and difficult to scale. But we shouldn’t forget that analyses at such sample sizes can only reveal very obvious low-hanging fruits that we most have likely been able to capture with less advanced techniques. In a biostatistics lecture many years ago in medical school, our lecturer illustrated this by showing us that even for a straightforward difference like height between men and women, you’d need about 10 individuals per group to detect it reliably.
As anyone, who has worked with omics data know, technical variation is assay-specific and substantial across most modalities and datasets. What makes it even more challenging is that many times, we don’t understand where this variation comes from. To detect reliable differences, either sample sizes must be large enough to rise above the noise or the differences need to be massive. This makes aggressive correction for multiple comparisons, and ideally external validation, essential.
I received some comments along the lines of “the dataset is worthless”. I think I was already critical about the paper. But it’s short-sighted to dismiss such datasets entirely just because of small sample sizes. Every effort to build larger, more informative resources has to start somewhere. Sure, in an ideal world of unlimited resources, we’d design a perfectly powered study, run multi-omics across thousands of individuals, and thoroughly characterize the molecular effects of exercise. But that’s pure imagination and rarely feasible. So, generating smaller well-characterized datasets, especially if made open-source, can guide the design of new experiments and serve as seeds for future meta-analyses or pooled efforts.
Omics is an expensive hobby. We often like to say that such datasets are transformative for research and have the potential to change clinical practice, yet the cost is a major bottleneck to scaling. I followed-up with a post estimating the cost of the human part of this study: $130K-190K just for the omics analyses, without considering working hours, costs of recruiting and following-up with the study participants, the fitness tracking systems, or bioinformatics. If we add all those, then it’s easily over $200K. These are massive costs for an academic lab and the result is a dataset of only 13 participants.
Some have criticized the choice to include only healthy male participants. I agree that if the study findings have any validity, they cannot be generalized outside of this setting. Given the known and well-documented differences in metabolic profiles of men and women, this is another major red flag that the findings might not reflect universal biological truths.
I received some feedback that my argument focuses too narrowly on statistics without considering the broader biological implications. I don’t doubt that the results of the experimental part of the study are valid. Betaine might have an anti-aging effect (although experts in preclinical aging research argue that demonstrating effects on mortality is the gold standard for anti-aging claims in mouse models, which this study did not show). The authors themselves, however, built their narrative on a statistical argument: that betaine is upregulated as a result of long-term exercise based on a statistical comparison of betaine measurements in plasma after long-term and acute exercise. This 7% difference would have never been picked up across thousands of measurements without using statistics. My main concern was not the experiments themselves, but the way the authors tried to pool them into a narrative with a logical hole—in this case, a statistical one. Similar to biology, biostatistical methods have evolved across >100 years and ignoring established best practices can easily lead to misconceptions.
In the end, this study exemplifies some interesting paradoxes of modern science. A huge potential from data-rich datasets, accompanied by statistically fragile conclusion. Journal editors, academic committees, paper and grant application reviewers all want to see some groundbreaking novel outcome coming out of most projects. We’d all love to generate a multi-modal dataset, run it through an AI model, and get a concrete idea for a miracle pill. But unfortunately, more often than not, this is not how it works. Despite advances in the way we pursue biomedical discovery, drug development continues to become harder and harder.
There is hardly any space to actually open-source a dataset with an objective description of the results without overinterpreting key exemplary findings as of transformative potential. Most of these conclusions end up not being true. Much greater value lies in the datasets themselves as either building blocks of larger consortia or as training data for future AI models.
*PS: I do intend to start using “There might be some questionable statistics here. So what?" in my presentations! I’ll acknowledge the source!
Original paper
This is really interesting. Being a STEM student looking to get into omics in future, these kind of articles help me to find things that I need to keep in mind for future.
Great article! Thanks for sharing your thoughts and clearing up some systemic facts!