NOTES

We believe that accurately identifying and articulating the most critical unmet needs in health is the first and most fundamental step in deriving solutions that positively impact health at scale. A meaningful understanding of such needs requires a broad view, one that embraces how questions of science and technology are tied inextricably to economic, policy, and social circumstances and histories.

Here we write and publish on life sciences, health technologies & services, and animal health.

Engineering Biology: The Unreasonable Effectiveness of Design / Build / Test

Jacob Oppenheim, PhD

June 20, 2024

Over the past year, I’ve been collecting papers with surprising results about the success of machine learning in biology, ones that run against the grain of popular conceptions, that throw into question whether our models are learning biology at all. Papers that demonstrate models fixating on patterns in datasets that are too complex for a human to identify that turn out to be noise, not signal.

Engineering Biology: Announcing Fresnel

Jacob Oppenheim, PhD

April 9, 2024

The hardest problem in biopharma today is picking the right targets. Our ability to modify biology has increased exponentially over the past decades. No longer is it a question of if we can hit a target (or a pathway) with some compound. Today, we can hit nearly any biological target with multiple different modalities from traditional small molecules to antibodies to interfering RNAs to cell and gene therapies and beyond. The key questions today are what should we hit and how.

Engineering Biology: Learning from Evolution—Why Protein Language Models Work

Jacob Oppenheim, PhD

March 26, 2024

Over at the new OpenProtein.ai blog, Tristan Bepler and I wrote about the seemingly mysterious power of Deep Protein Language Models. Not only do they identify related proteins, they predict functionality, stability, and immunogenicity, in many cases “out-of-the-box.” Why should this be?

Engineering Biology: Podcast—Data in Biotech

Jacob Oppenheim, PhD

March 14, 2024

Ross Katz from CorrDyn generously hosted me on the Data in Biotech podcast last week.

Engineering Biology: The World is not as Digitized as it Seems

Jacob Oppenheim, PhD

January 29, 2024

Stranded airplanes, packages arriving months late, organizations caught in endless spin unable to make decisions. Stories like these expose broken systems and paradigms that have failed to scale and reveal a quiet truth: the world is not as digitized as it seems.

Engineering Biology: Big Data—A Path Forward

Jacob Oppenheim, PhD

October 27, 2023

The combination of years of “Big Data” hype and obviously flawed inferences, of overpromising and under-delivering, has led to pervasive online tracking and a miasma of distrust. It is simultaneously too difficult to deploy novel consumer-facing information technology and avoid the sale or at least use of personal information.

Engineering Biology: Big Data—A False Panacea

Jacob Oppenheim, PhD

October 6, 2023

The story goes that an angry father confronted Target employees after his daughter was mailed coupons for maternity products unnecessarily, only to find out later that she was pregnant. A triumph of big data combined with statistical learning, and a creepy portent of the future, right? That’s how the story went at least.

Engineering Biology: Software is Management Technology

Jacob Oppenheim, PhD

September 22, 2023

My focus in writing over the past three months has been the interplay between powerful new computational methods, digital technologies, and operational processes. It began with the observation that successful Machine Learning (ML) integrated biopharma companies have a moat in data generation and the scientific application of computation to these data—not in machine learning itself. Operational excellence is requisite for these companies, not merely a nice-to-have.

The Age of Engineering Biology

Jacob Oppenheim, PhD

August 15, 2023

I was on a panel about digitization and the data revolution at the annual Academy of Management meeting last weekend. My co-panelist and I were there to give an operational perspective on how data are used in biopharma for everything from R+D to commercialization and how it compared to the empirical studies from a variety of industries presented earlier in the session.

Engineering Biology: ML as Process Efficiency

Jacob Oppenheim, PhD

July 31, 2023

The integration of Machine Learning (ML) into scientific work exists on a continuum between whole-scale replacement of human processes and providing inputs to complement the judgment of a human arbiter. As I’ve argued previously, current models are insufficient at best for fully substituting human knowledge in biology for all but base-level tasks…

Engineering Biology: ML + Medicine—A Hammer in Search of Nails

Jacob Oppenheim, PhD

June 29, 2023

We have heard stories about how computation and Machine Learning (ML) are poised to change medicine for well over a decade now. Conferences, press reports, promising results – yet little has changed…

Engineering Biology: How to Build Data-Centric Biotech

Jacob Oppenheim, PhD

June 14, 2023

The past weeks have seen a flurry of articles debating the efficacy (and proof thereof) of “AI” in drug discovery and biotech writ large, kicked off by a large layoff at Benevolent, an “AI” drug developer. I would argue the lesson of the recent AI boom in biopharma is a simple one: If you don't have novel and effective science in the first place, no amount of data science will save you: Data Science and Machine Learning* (hereafter ds/ml) will be most successful in biology where they sit atop transformative science that needs no special analytics.

Engineering Biology: Machine Learning for Biology that Works—a Journey with OpenProtein.ai

Jacob Oppenheim, PhD

June 2, 2023

Back in 2017, when I was just starting to build out data science at Indigo, Tristan Bepler joined us as a summer intern. We had a large and growing amount of sequencing data from microbial communities both their composition from marker genes and whole genomes of organisms of interest. Both of these datasets resisted conventional methods. The mathematical modeling of microbial communities remains underdeveloped with heuristic methods that produce nonsense and potentially more correct ones that are difficult to implement.

Engineering Biology: ML in Bio—Supervised Learning is Core IP

Jacob Oppenheim, PhD

April 26, 2023

The hope with Machine Learning has long been that we can eliminate complex, slow, and expensive physical processes with accurate predictions, inferred directly from data. As I’ve written previously, the complexity and scarcity of data make supervised learning like this less relevant in problems of biology. Unlike internet companies, generating reams of labeled data daily, our experimental throughput is orders of magnitude lower and our data modalities considerably more complex.

Engineering Biology: ML in Bio—There’s No Labeled Data to Fit

Jacob Oppenheim, PhD

April 11, 2023

Where does Machine Learning belong in biology? Nearly all successful efforts fall into one of three categories: Exploration—Summarizing large complex datasets that cannot be fathomed by the human mind: gene sequences, chemical structures, images, etc and enabling scientists to explore them. Scaling—Automating, standardizing, and debiasing heuristics and calculations. Prediction—Estimation of how a complex process will perform on a new element.

Engineering Biology: Learning Biology from Data—Focus on Simplicity

Jacob Oppenheim, PhD

March 28, 2023

How do we build models under resource constraints? We almost never have enough data to adjust for every possible confounding factor, nor do we know what all those factors are.

Engineering Biology: The Ladder of Computational Sophistication

Jacob Oppenheim, PhD

March 15, 2023

We don’t know what the hard problems are going to be. Most of us were trained as academic scientists in a culture of finding winding paths through the dark forest of the unknown. Today, we are much closer to engineers — using data and computation to industrialize the production of knowledge. Biology presents an endless series of learning and inference problems for us to solve.

Engineering Biology: Whither Data Science?

Jacob Oppenheim, PhD

February 27, 2023

How does data science fit into the biopharma tech stack? The analytical operations involved are certainly more complex than the transformation and aggregation of data. This might suggest that data science is an artisanal, intellectual operation built off of the core data repository; in essence, an extension of laboratory science to computation. While tempting, this pattern only leads to confusion, frustration, and a misuse of human and silicon capital. Just as we are industrializing biological discovery and drug development, so must we with data science.

Engineering Biology: Systems, Tools, and Technology

Jacob Oppenheim, PhD

February 7, 2023

Technology in a biopharma company tends to grow by accretion rather than design. Tools and systems are brought in house as functions are brought on line. LIMS comes with the establishment of a lab, a compound registry with the first experiments with small molecules, a chem informatics tool when it’s time to start digging into SAR. Growth reflects staffing and capabilities — much as you don’t hire a medicinal chemist until it’s time to design small molecules, you don’t bring in the systems they would use until the function is present

Engineering Biology: Generating Data at Scale—The Organization of Information

Jacob Oppenheim, PhD

January 24, 2023

Capturing and recording all relevant data is only half the battle. We then need to make it useful. In practice, we will have a deluge of information, much of which will be hard to parse without the relevant context: high throughput instrumental recordings, metadata tables, and the tracing of samples throughout laboratory workflows.

Engineering Biology: Generating Data at Scale—Tools and Systems

Jacob Oppenheim, PhD

January 17, 2023

What do we need to generate data at scale? Practically, we need tools to allow us to run experiments: laboratory informatics, automation, and data capture. More is needed in order to always be performing the key experiment. We need to be able to design new experiments based on results as they come in, not laying out ten thousand in advance and waiting a month. These are fundamentally data problems, yet we do not have systems designed to enable their solution.

Engineering Biology: Case Study I—Genomics

Jacob Oppenheim, PhD

January 10, 2023

In mid 2017, my data science team was tasked with building out a new genome assembly and annotation pipeline that could cover the vast expanse of fungal and bacterial diversity to support our development of novel microbial products. Our company was engaged in bioprospecting of microbes from sites across the US. Back in the lab, we were isolating, identifying, and then assaying a previously unmeasured wealth of biological diversity.

Engineering Biology: Data and Bio—What do we need to Solve?

Jacob Oppenheim, PhD

January 3, 2023

So where do we begin? With a hypothesis and a key set of experiments. From there, we must process the data, analyze them, and make decisions. If we are lucky enough to have seized on a real insight, there will be the immediate paired questions of replication and scale. How do we confirm these results and generalize beyond? To properly modify a system, as in drug development, we will need to move from science to engineering, and work with a myriad of slightly different experiments to arrive at the one we can use to, say, improve human health.

Engineering Biology: Data and Bio—Are We Learning?

Jacob Oppenheim, PhD

January 2, 2023

There’s been an exceptional amount of talk about and investment in learning from data in biology, especially with the advent of effective ML systems. The ability to quantitatively model and learn from data at scale is real: look at the continued progress in protein structure prediction in CASP. Every biopharma company now has a data science org with diverse operational models from centralized to distributed, and there is continual talk of innovation and AI.