We believe that accurately identifying and articulating the most critical unmet needs in health is the first and most fundamental step in deriving solutions that positively impact health at scale. A meaningful understanding of such needs requires a broad view, one that embraces how questions of science and technology are tied inextricably to economic, policy, and social circumstances and histories.
Here we write and publish on life sciences, health technologies & services, and animal health.
Over the past year, I’ve been collecting papers with surprising results about the success of machine learning in biology, ones that run against the grain of popular conceptions, that throw into question whether our models are learning biology at all. Papers that demonstrate models fixating on patterns in datasets that are too complex for a human to identify that turn out to be noise, not signal.
Engineering Biology
The hardest problem in biopharma today is picking the right targets. Our ability to modify biology has increased exponentially over the past decades. No longer is it a question of if we can hit a target (or a pathway) with some compound. Today, we can hit nearly any biological target with multiple different modalities from traditional small molecules to antibodies to interfering RNAs to cell and gene therapies and beyond. The key questions today are what should we hit and how.
Engineering Biology
Over at the new OpenProtein.ai blog, Tristan Bepler and I wrote about the seemingly mysterious power of Deep Protein Language Models. Not only do they identify related proteins, they predict functionality, stability, and immunogenicity, in many cases “out-of-the-box.” Why should this be?
Engineering Biology
The combination of years of “Big Data” hype and obviously flawed inferences, of overpromising and under-delivering, has led to pervasive online tracking and a miasma of distrust. It is simultaneously too difficult to deploy novel consumer-facing information technology and avoid the sale or at least use of personal information.
Engineering Biology
The story goes that an angry father confronted Target employees after his daughter was mailed coupons for maternity products unnecessarily, only to find out later that she was pregnant. A triumph of big data combined with statistical learning, and a creepy portent of the future, right? That’s how the story went at least.
Engineering Biology
My focus in writing over the past three months has been the interplay between powerful new computational methods, digital technologies, and operational processes. It began with the observation that successful Machine Learning (ML) integrated biopharma companies have a moat in data generation and the scientific application of computation to these data—not in machine learning itself. Operational excellence is requisite for these companies, not merely a nice-to-have.
Engineering Biology
I was on a panel about digitization and the data revolution at the annual Academy of Management meeting last weekend. My co-panelist and I were there to give an operational perspective on how data are used in biopharma for everything from R+D to commercialization and how it compared to the empirical studies from a variety of industries presented earlier in the session.
Engineering Biology
The integration of Machine Learning (ML) into scientific work exists on a continuum between whole-scale replacement of human processes and providing inputs to complement the judgment of a human arbiter. As I’ve argued previously, current models are insufficient at best for fully substituting human knowledge in biology for all but base-level tasks…
Engineering Biology
The past weeks have seen a flurry of articles debating the efficacy (and proof thereof) of “AI” in drug discovery and biotech writ large, kicked off by a large layoff at Benevolent, an “AI” drug developer. I would argue the lesson of the recent AI boom in biopharma is a simple one: If you don't have novel and effective science in the first place, no amount of data science will save you: Data Science and Machine Learning* (hereafter ds/ml) will be most successful in biology where they sit atop transformative science that needs no special analytics.
Engineering Biology
Back in 2017, when I was just starting to build out data science at Indigo, Tristan Bepler joined us as a summer intern. We had a large and growing amount of sequencing data from microbial communities both their composition from marker genes and whole genomes of organisms of interest. Both of these datasets resisted conventional methods. The mathematical modeling of microbial communities remains underdeveloped with heuristic methods that produce nonsense and potentially more correct ones that are difficult to implement.
Engineering Biology
The hope with Machine Learning has long been that we can eliminate complex, slow, and expensive physical processes with accurate predictions, inferred directly from data. As I’ve written previously, the complexity and scarcity of data make supervised learning like this less relevant in problems of biology. Unlike internet companies, generating reams of labeled data daily, our experimental throughput is orders of magnitude lower and our data modalities considerably more complex.
Engineering Biology
Where does Machine Learning belong in biology? Nearly all successful efforts fall into one of three categories: Exploration—Summarizing large complex datasets that cannot be fathomed by the human mind: gene sequences, chemical structures, images, etc and enabling scientists to explore them. Scaling—Automating, standardizing, and debiasing heuristics and calculations. Prediction—Estimation of how a complex process will perform on a new element.
Engineering Biology
We don’t know what the hard problems are going to be. Most of us were trained as academic scientists in a culture of finding winding paths through the dark forest of the unknown. Today, we are much closer to engineers — using data and computation to industrialize the production of knowledge. Biology presents an endless series of learning and inference problems for us to solve.
Engineering Biology
How does data science fit into the biopharma tech stack? The analytical operations involved are certainly more complex than the transformation and aggregation of data. This might suggest that data science is an artisanal, intellectual operation built off of the core data repository; in essence, an extension of laboratory science to computation. While tempting, this pattern only leads to confusion, frustration, and a misuse of human and silicon capital. Just as we are industrializing biological discovery and drug development, so must we with data science.
Engineering Biology
Technology in a biopharma company tends to grow by accretion rather than design. Tools and systems are brought in house as functions are brought on line. LIMS comes with the establishment of a lab, a compound registry with the first experiments with small molecules, a chem informatics tool when it’s time to start digging into SAR. Growth reflects staffing and capabilities — much as you don’t hire a medicinal chemist until it’s time to design small molecules, you don’t bring in the systems they would use until the function is present
Engineering Biology
Capturing and recording all relevant data is only half the battle. We then need to make it useful. In practice, we will have a deluge of information, much of which will be hard to parse without the relevant context: high throughput instrumental recordings, metadata tables, and the tracing of samples throughout laboratory workflows.
Engineering Biology
What do we need to generate data at scale? Practically, we need tools to allow us to run experiments: laboratory informatics, automation, and data capture. More is needed in order to always be performing the key experiment. We need to be able to design new experiments based on results as they come in, not laying out ten thousand in advance and waiting a month. These are fundamentally data problems, yet we do not have systems designed to enable their solution.
Engineering Biology
In mid 2017, my data science team was tasked with building out a new genome assembly and annotation pipeline that could cover the vast expanse of fungal and bacterial diversity to support our development of novel microbial products. Our company was engaged in bioprospecting of microbes from sites across the US. Back in the lab, we were isolating, identifying, and then assaying a previously unmeasured wealth of biological diversity.
Engineering Biology
So where do we begin? With a hypothesis and a key set of experiments. From there, we must process the data, analyze them, and make decisions. If we are lucky enough to have seized on a real insight, there will be the immediate paired questions of replication and scale. How do we confirm these results and generalize beyond? To properly modify a system, as in drug development, we will need to move from science to engineering, and work with a myriad of slightly different experiments to arrive at the one we can use to, say, improve human health.
Engineering Biology
There’s been an exceptional amount of talk about and investment in learning from data in biology, especially with the advent of effective ML systems. The ability to quantitatively model and learn from data at scale is real: look at the continued progress in protein structure prediction in CASP. Every biopharma company now has a data science org with diverse operational models from centralized to distributed, and there is continual talk of innovation and AI.
Engineering Biology