September 22, 2023
My focus in writing over the past three months has been the interplay between powerful new computational methods, digital technologies, and operational processes. It began with the observation that successful Machine Learning (ML) integrated biopharma companies have a moat in data generation and the scientific application of computation to these data—not in machine learning itself. Operational excellence is requisite for these companies, not merely a nice-to-have.
This implies that ML is a process-efficiency: when you have great science and technology, ML makes you better at the generation and application of novel results, but it does not replace the need for fundamental scientific insights. Further, the reason we can exploit computation in biology today is not merely due to novel computational algorithms, but rather that biological science has moved from an alchemical era to an engineering one. Standardization, the ability to manipulate biological primitives, and reusability of machinery, assays, and catalogs of “parts” enable a fundamentally different way of developing therapeutics with implications for both wet-lab biology and digitization on top of it.
With common, reusable components, we can design standardized processes. Software sits atop these processes, reifies them, and enforces consistency. Data accumulates, enabling analytics, computation, and learning, both human and machine. A technological superstructure can be built.
The first step is software. We need tools to capture our workflows, track them, and collect the results. We need the digital primitives that correspond to the biological primitives we are manipulating. Software is neither data science nor machine learning, though. It is not the application of scientific methods to computational science. It is many steps further from biology, or chemistry, or any experimental science. Rather, software is pure engineering and management technology.
Software is opinionated—it does not allow you to do anything under the sun. Machines are dumb—they need clear instructions and can only take certain inputs and instructions. A word processor is not a spreadsheet tool; neither is it a tool for publishing design, nor a photo editor. On the other hand, all of us have experienced the pain of trying to use software that did too much, and so ended up doing nothing particularly well.
Software has a thesis—a core representation of information (a data model) and capabilities for manipulating it. For a word processor, this is recording text; for a spreadsheet, manipulation of columns of data; for publishing design, the layout of a formal page with text and images; for photo editing, the manipulation of a single image. Each of these capabilities has an underlying set of digital primitives to store and manipulate the information it is given, be it alphanumeric characters, data cells, page geometry, or pixel values.
Software requires standardization of human activities. It is nearly impossible to create a software system that can handle any type of erroneous input put in by a user. Are numbers or letters allowed? For numerical values, what about fractions or negative numbers? For letters, what language, alphabet, and character set? What happens when emojis appear on the keyboard and people try typing them in?1
Software can shape behavior—It can catalyze design and implementation of engineering processes, acting as a forcing function for alignment on what to do and how to do it. Software can and should often represent the first step towards order from the chaos of experimentation.
The application of software to business processes requires us to define what the underlying data model is and what capabilities are for manipulating it. What the accepted values will be and to train users to type them. What actions need to be performed before and after using the tool. Software demands standardization of processes. Software is the logical culmination of Taylorism. Software is management technology.
Consider where we commonly see digital technology and software today: repetitive workflows that require fine detail, non-duplication, and audit trails. Precision manufacturing has had computational interfaces for over half a century now and financial data processing far longer than that. More recently, Salesforce and others built a novel category of software, CRM (Customer Relationship Management) to enable sales and marketing across the diversity of new digital channels provided by the internet. At the same time, they created a standardized process for an entire industry2.
The example of digital marketing and CRM seems to be the exception that proves the rule, however. We all are familiar with organizations that are happy with haphazard Excel sheets, despite the chaos that ensues. Excel gives the illusion of digitization, possessing just enough capabilities to manipulate data and track activities that new technology can be integrated and throughput increased, while creating massive downstream costs in focus, auditability, and compliance. Using versatile software without clear processes is akin to running an organization without management: you can only hope everyone else did their job right.
Similarly, many biopharmas will hire software engineers to build databases and tools to use them, thinking that they are digitizing, only to find a year later that a considerable portion of the engineers’ time is spent uploading scientific data by hand. The lack of standardization of processes and records makes every use of the digital tools ad hoc and inaccessible to all but the most technical. The personnel costs are high: engineers are not cheap and uploading data by hand is boring. The process costs are high: data generators in the lab have no immediate feedback: they cannot rapidly use the data in digital tools, limiting the utility of the tools themselves, and making scientists doubt the benefits of the whole initiative. No one gets rapid feedback on whether things are working. Processes do not improve. Bringing in software without designing workflows for it adds costs and inefficiencies for little to no gain and a breeding ground for resentment.
When more opinionated software is brought in, it frequently imports workflow and management paradigms from other industries. LIMS is designed based on parts tracking in factories, which has certain commonalities with sample management in the lab. The match, unsurprisingly, is imperfect in volume, consistency, and flexibility. And so laboratory and informatics teams spend hours managing their software rather than improving their processes and workflows, or doing science. Bringing in the wrong software is bringing in the wrong management paradigm.
Software is management technology: When you have a process, software routinizes it, enforces it, and enables scale and speed and breadth. Software does not create the process sui generis. Backing into digitization does not work: factory systems brought into laboratories don’t fix operations. Overly versatile software is not the answer. Either it enables chaos at higher speed (Excel) or it requires monthslong, agonizing, and costly install processes that frequently do not even get processes right (Configurable Workflow Systems)3. We need to design and implement effective, engineering processes at scale and then build software around them.
I plan on returning to this topic in the future: How can we learn from successfully digitized industries? How can we make software versatile without creating chaos? How can we make tools configurable without requiring monthslong installations by committee?
I’m curious what readers think4: How have other industries digitized? Who does this well? Have you seen software create good process? Am I wrong? What am I missing?
—————
1 This is a famous problem that all too easily ends in engineer exhaustion, bloated software, and unhappy users. It’s often far easier to train people to use software properly with conventions that match their workflows, and put some basic reminders and sanity checks in software tools.
2 Historical Footnotes: Despite being dramatically different in practice, manipulating physical objects versus abstract electronic representations, both require precisely performing the same process over and over again, beyond the tolerance of the human mind and body. Our most digitized (and hence standardized) legacy industries and processes look like accounting, payroll, transaction processing, and numerical control of manufacturing (CNC).
Standardization began for these industries before digital technology was a possibility. For hardware manufacturing, the need to supply consistent machine tools and parts across a rapidly growing and diversifying economy led to government mandated standardization efforts in the 1920s, driven to completion by the scale of manufacturing production necessitated by the Second World War. The institutions created by these efforts (e.g., ANSI, NIST) continue to design and promulgate new industry-wide standards. Similarly, the 1929 stock market crash led to the creation of new institutions in finance (e.g., FASB) which have standardized much of how accounting and financial reporting (e.g., GAAP) is to be done. There’s an excellent discussion of these efforts, especially in manufacturing, in Immerwahr’s How to Hide an Empire.
In contrast to these standardization “pushes,” there can also be “pulls,” where following a standardized process allowed the use of dramatically productivity-enhancing technology, such as occurred in digital marketing. The internet created an explosion in potential marketing and advertising channels for companies. To coordinate and manage campaigns across media and technologies required novel tools. Companies like Salesforce capitalized on these needs, building a new industry of digital marketing and CRM (Customer Relationship Management) tools.
3 Examples in Biopharma would include Clinical Trial and Quality Management Systems
4 Initial inspiration from the article here and provocative discussion here
To subscribe to Engineering Biology by Jacob Oppenheim, and receive newly published articles via email, please enter your email address below.