Data interpretation is often overlooked in omics training. But without knowing how to make sense of our results in the broader biological context, we’ll struggle to pull out actionable insights from our data.
The biological interpretation of metabolomics is a long but rewarding process. It requires a broad set of skills that can be honed as we go, making it a great place for those of us who love to learn. However, depending on how and why you perform data interpretation, you may find that the required skill set varies.
My experience of data interpretation has centered on understanding molecular mechanisms. Whether I was considering toxicological or medical contexts, the work was roughly the same: plan the experiment, generate and collect data, proof and clean the data, analyze it with various bioinformatic tools, and then piece together the results.
I’ve discovered that the success of any data interpretation project depends on three key assets:
• the right tools for the job
• knowledge of metabolism
You may wonder if you have the level required to perform this type of work… The good news is that all three can be learned. Here, I discuss what makes each one so crucial to data interpretation projects and how you can hone them yourself.
The right tools for the job
Automated data interpretation
If you are used to working with automated workflows for sample or data processing, you may be living in hope that there is also one for data interpretation. Well… there isn’t. This makes data interpretation both beautiful and a little painful.
I often say that data interpretation is an art, which may not sit well with those aiming to describe the hard facts of biology. But this is how biology knowledge grows. We perform and analyze measurements, then interpret what the results (most likely) mean, sometimes adding a pinch of theory that remains to be proven or refuted by future works.
Happily, there are bioinformatic tools that help us get closer to our final story. The basic starting point is often the use of classical statistical methods to identify statistically significant differences between groups. More elaborate tools exist to harness the power of machine learning (data-driven tools), of pre-existing biological knowledge (biology-driven tools) or of structural similarities between metabolites (chemistry-driven tools).
Some of these tools are online software with easy-to-use graphical user interfaces, others are scripts provided by publications and research institutes. Many are free, and often come with tutorials and related publications to help us understand how best to use them.
All of them require data preparation (or pre-processing) to make sure your dataset will be properly handled. For example, you’ll need to prepare your data to remove low quality information, remove data not relevant to your analysis, add compatible identifiers, and more.
In my experience, the first limiting factor to using any tool is to even know of its existence. In the same way that you want to be aware of the latest technological advances for the type of instrument you use, you need to know the latest approaches to prepare and handle your data type.
Some of the best strategies to stay on top of this ever-growing list of resources are to:
• discuss with colleagues
• attend conferences and webinars
• screen the literature for new methods
• read reviews focused on bioinformatic tools.
To help with this, there are people in the community generous enough to compile regularly updated lists of software and code, often shared as peer-reviewed publications. Once you find them in PubMed, don’t hesitate to set up an automated alert so you know when their next publication or review comes out. This is a great way to get regular updates on the latest developments in the field.
The worst case scenario is to discover the “perfect” tool right at the end of a project and wonder if you should repeat your work. To protect yourself from this nightmare, make sure to scour the literature for new tools that may be relevant to your research.
Choosing the right tool
The right tool for the job satisfies multiple needs and possibilities, not all of which are scientifically “honorable.” Ideally, we would choose tools that are the best fit for our data. In real life, we must consider cost, analysis time, learning curves, influence from colleagues and other factors.
We’re also limited by our ability to run certain tools. What should you do if you have no programming skills, but your perfect tool runs exclusively on R? It may be time to ask a colleague or even the developers of the code in question if they want to collaborate and check how well the tool works for your data.
In reality, compromise is likely.
Another factor that may influence your choice is comparability. If you want to be able to compare your results with your last three studies, you might opt for the exact same analytical workflow that you used before. If a new tool or algorithm has become available since running those past studies, it may be an idea to re-run all four with that new tool, to see if this brings you more information. It may be time-consuming, but it could provide the material for another publication on a new meta-analysis of past data, in addition to your current project.
Know your tools
A critical aspect of running any data analysis is to know what your bioinformatics tools can and cannot do and what type of data they take and don’t take. In short: know your tools’ limitations.
No tool can do everything. This is why combining several tools is often a great way to approach an interpretation project. We often run more tools than those described in the final paper. Don’t be afraid to perform many analyses if there is a sound reason to do so.
Whichever tool you use, imagine a warning label that reads, “no tool will write your paper for you.”
No tool will tell you exactly what’s happening in your biological system and how this fits into the larger context of biomedicine today. That’s a job for the human operating the tools.
Many tools can help identify the mechanisms that will form the core of your interpretation, but at the end of the day, you’ll still have to make the last connections yourself by studying your data in its biological context. This is where your knowledge of metabolism comes in.
Knowledge of metabolism
How much is enough to start?
If you have master’s level knowledge in a biology-related field, you have the understanding required to begin interpreting metabolomics. You don’t need to know everything, but you need to know enough about biology to make sense of the literature while you research your topic further.
A first year PhD student may have less chance of coming up with a thick plot of what’s happening in a metabolomics experiment than someone with 10 or 20 years’ experience, but that doesn’t mean that beginners shouldn’t play.
In fact, this type of work is exactly how experience is gained. Inexperience can be compensated for through hard work and automated tools that help focus the work on crucial metabolites or pathways.
While proficiency in molecular biology and metabolism is certainly a plus, it can be refined along the way. You can begin a metabolomics interpretation project without an extensive knowledge of metabolism and expect to know quite a bit more by the end of it.
Best source of knowledge
I would argue that working on a data interpretation project is possibly the best way to acquire knowledge about metabolism and biology in general. Here is an example of how that may play out in practice.
Let’s say you measured a broad panel of metabolites in blood samples from patients with Alzheimer’s disease and found that the levels of several bile acids are different compared to healthy controls. If your knowledge on this class of metabolites is a bit fuzzy, all you need to do is go to the literature to fill in the gaps with the latest knowledge on these molecules.
You may begin at the more generic level, using textbooks and web searches to get an overview of what’s known about these metabolites. You may discover that bile acids are synthesized primarily in the liver, but require the gut microbiome for certain steps.
You may learn about primary and secondary bile acids, about so-called “toxic” bile acids, and about their roles in digestion and as signaling molecules. Other metabolites may be less well documented, indicating that more research is needed to understand their roles in biology.
Next, you may look into what’s known about these metabolites in relation to your disease of interest, Alzheimer’s disease. You will find several publications describing changes in bile acids in Alzheimer’s. You’ll compare levels in different sample types (blood-derived, cerebrospinal fluid, and maybe even tissues) and see what others have concluded on the topic. You will also ask the literature for answers to questions such as “do bile acids cross the blood-brain barrier, and how?” or “can bile acids be synthesized in the brain too?”
After that, you may want to look into how these metabolites have been studied in other contexts. Much of the knowledge we have on metabolites is repeated across different fields of biomedicine, and what is common knowledge to a cardiology specialist, may be groundbreaking to a neurologist. So don’t hesitate to jump around at the beginning of your search to get inspired about the possibilities of your dataset.
Someone with more experience may already know (or think they know) the answers to these questions, but there’s nothing to stop you learning about this with a thorough literature search.
Finally, you will connect these pieces of information together to build the story of what may be happening in your experiment at the level of bile acids. You’ll look at how this relates to other metabolites in your dataset or with the disease in general, including known symptoms and risk factors.
By combining a tailored literature search with your own findings, you’ll create a storyline that becomes your version of what’s happening in the experiment. This story becomes the narrative you use to explain your work to yourself and to others. It will also form one of the many building blocks for your own knowledge of metabolism. That is, if you stick to your plan…
The final asset needed for data interpretation is perseverance. Perseverance may seem like a soft skill of little relevance, but it’s incredibly important if you ever want to complete your project.
It’s easy to progress through the first exciting steps of collecting information. But connecting the dots and coming up with an enticing story to tell about your experiment can be a bit more challenging.
You need perseverance to stick to an interpretation when the results are not as interesting as you expected, when there are too many things to discuss, or when you feel overwhelmed by the number of possible directions to pursue.
You certainly need perseverance when your supervisor makes a suggestion that is more than a suggestion that requires you to start over. The list of things that can get in your way is literally endless. When the road is already difficult to navigate, it can be hard to stick to the plan.
The challenge is compounded by the fact that there’s no protocol for data interpretation. You’re doing well to stay motivated when there’s no step-by-step process to tell you how to go from start to finish, or even when the interpretation is finished.
The STORY principle
In an attempt to answer this need, I’ve developed the STORY principle. This is a 5-step framework designed to help you construct your own plan, avoid the pitfalls commonly associated with data interpretation projects, and execute your interpretation as smoothly as humanly possible.
The STORY principle is based on my own experience and that of others who have experience planning and executing metabolomics data interpretation projects. I describe the five steps in detail in a book that will soon be published, and in a webinar that you can sign up for today.
I have fallen into all sorts of traps over the years, from interpretation quicksand, where you can’t seem to see the end of a project, to chasing butterflies, where other priorities distract you from your goal. These pitfalls are typical of omic data analysis and very much apply to metabolomics as well.
The STORY principle is my best advice on how to steer clear of these traps so you can be as efficient as possible in your data interpretation and enjoy it at the same time.
But here is perhaps the most important point: data interpretation can (and should) be fun!
It’s a great way to learn about metabolism and biology, and every project is a new opportunity to refresh your knowledge of the latest bioinformatic tools. Lastly, sharing the insightful interpretations that you made of your data is an entryway into the metabolomics community where you’ll get to connect with like-minded scientists.
Are you ready to get started?
If you want to learn more about strategies to plan and execute your data interpretation projects, register for my webinar on the STORY principle.