all news
November 28, 2023
March 1, 2023
November 28, 2023
March 1, 2023

Where does AI fit in the therapeutics discovery equation?


The biopharmaceutical research and development (R&D) sector has been struggling with high failure rates, and its ever-growing costs reached over $100 billion annually in the US alone. Moreover, pipeline shortages and the impact of Covid-19 have exacerbated existing operational hurdles. Recent technological breakthroughs, particularly in Artificial Intelligence (AI), have enabled new approaches to drug discovery, leading to a turnaround in productivity and efficiency. In fact, the use of AI in drug discovery has led to more than thirty assets that are currently in clinical trials, up from zero in 2020. Whilst more than half of AI-driven drug discovery programs are focused on small-molecule drugs, the rise of biologic therapeutics and gene therapies presents exciting new opportunities for AI to transform existing processes.

In this article, we investigate the alleged impact of AI on the challenging field of drug discovery. We review the current state of the sector, as well as the significance of investments by venture capital firms. The latter have substantially slowed down, but have enabled early players to develop ambitious pipelines. Thereafter, to elucidate which forces are shaping the AI-driven drug discovery landscape, we summarize the latest notable deals involving startups, big tech, and large pharmaceutical companies. In this context, we examine the major players that are leading the charge, and briefly present novel discovery methods and technologies. We conclude by exploring current and future hurdles, and provide a roadmap for new and incumbent players to successfully integrate AI technology in drug discovery.

Has AI Reached an Inflection Point in Drug Discovery?

Bringing a new drug to the market is a long and costly journey that takes an average of 13–15 years and upwards of US$2–3 billion (1). The biopharmaceutical R&D sector has seen a steady decline in efficiency, as evidenced by what has become known as Eroom’s Law (Moore backwards). The biopharma industry has been grappling with high failure rates and ever-growing R&D costs, which reached a staggering $100+ billion annually in just the US. Yet, the emergence of profound technological progress in the last decade has enabled radically new approaches to the way we discover and develop new drugs. Importantly, a turnaround in biopharma R&D that seems to have broken Eroom’s law is the availability of higher quality information and improvements in its use. With AI at the forefront, drug discovery is undergoing a major transformation. As more and more AI-driven therapeutics move into clinical development, a question of paramount importance arises: where does AI fit in the therapeutics discovery equation?

The current excitement surrounding AI in drug discovery is driven by a number of factors, such as the impact of COVID-19 on the digital economy, the rising cost of R&D, and pharmaceutical company pipeline shortages. With the abundance of massive datasets to build more advanced models and the entry of tech giants such as Google, Microsoft, and Amazon into the drug discovery arena through partnerships with industry leaders, the enthusiasm for AI in drug discovery is staggering (2). For example, in September 2022, Novo Nordisk and Microsoft announced their collaboration to combine Microsoft’s computational services, cloud, and artificial intelligence with Novo Nordisk’s drug discovery, development, and data science capabilities.

At the end of the first half of 2022, over 60% of AI-driven drug discovery programs were developing small-molecule drugs. However, a key trend in biopharma has been the rise of biologic therapeutics (e.g., antibodies, peptides, vaccines, and other large molecules extracted or manufactured in living organisms), protein degraders, and gene therapies. Already 7 out of 10 bestselling drugs are biologics. This therapeutic class requires more complex development processes, such as genetic engineering and cell modification, than would be needed for small molecule drugs. Protein degraders (which hijack naturally occurring biological machines by regulating the abundance of protein targets with induced proteolysis) and gene therapies (which treat or prevent disease by correcting the underlying genetic cause) belong to a new class of therapeutic modalities that, together with biologics, seem ripe to benefit from the aid of AI. All in all, terrific technological progress holds the promise to find meaningful patterns in the complexity of patient medical data, the discovery of new compounds, and the development of new processes to bring novel medicines to patients more efficiently and predictably.

DALL-E: scientists discovering new drugs in a pharmaceutical laboratory

Thanks to improvements in areas such as faster target identification and validation, and shorter iterations of molecule design and optimization, AI seems to be disrupting the status quo in drug discovery. In this context, the industry landscape is undoubtedly evolving at a very fast pace. Several new players, the likes of Isomorphic Laboratories, Data2Discovery, Standigm, Exscientia, Genome Biologics, Zephyr AI, Outpace*, and, Unnatural Products* (*ARTIS portfolio companies) among others, are spearheading this change. As of August 2022, AI-driven drug discovery companies had a combined pipeline that was the equivalent of ~50% of the in-house discovery and preclinical output of traditional “big-pharma” pharmaceutical companies. Importantly, AI-driven drug discovery has led to more than 30 assets in clinical trials, up from 0 in 2020. Between 2011-2021, funding and capital investment in AI-driven drug discovery companies has grown at an impressive 62% CAGR, with US and UK based companies taking home more than half of the investments. With such a large amount of money flowing into AI-driven drug discovery, how many of these programs will reach the clinic and ultimately end up in the hands of patients?

AI-driven drug discovery has led to more than 30 assets in clinical trials, up from 0 in 2020.

The number and significance of investments going into companies working on AI-enabled technology for drug discovery has further piqued the conversation around its real value for transforming healthcare with over $1.6B invested in the space in 2022 at the Seed/Series A stages alone (3). Several startups are valued at over $1 billion making them ripe for unicorn buzz conversations, such as Insitro and Owkin. Owkin received a $180 million investment from Sanofi. Softbank-backed Exscientia also had unicorn-status before its IPO, which closed at $510 million in September 2021. In December 2021, Recursion received an upfront investment of $150 million with the potential for milestone payments of $300 million from Roche/Genentech to use their AI-guided high content screening platform to identify novel targets and therapeutics in neuroscience and oncology for up to 40 research programs (4). Finally, stealing the scene of pharma-led investments with an impressive deal, Nimbus Therapeutics is set to receive $4 billion in upfront cash from Takeda (and up to $2 billion in commercial milestone payments) for its AI-developed, highly selective allosteric TYK2 inhibitor.

To truly grasp the alleged impact of AI in drug discovery, it is essential to clarify some key nuances. Take for instance the contentious role played by computational methods in the Nimbus deal. Was the TYK2 inhibitor molecule really discovered by AI, or does the computational method of structure-based drug design (the specific mechanism used for this discovery) fall in a completely different category? The answer boils down to what we define as AI. In light of similar controversies, we need to wonder: How many of the new molecules reaching the clinic were truly discovered de novo by AI? Was AI employed to find ways to repurpose other drugs? Or, was AI the tool that discovered novel and effective delivery methods for abandoned therapeutics? Since most drugs being developed with AI are still in the discovery or preclinical stages, there seems to be little evidence to support the vague and broad claims that AI will accelerate the drug discovery process or reduce costs (5).

In this article, we describe some of the most exciting deals of 2022 and early 2023 in AI-driven drug discovery, and explore whether the hype in this space justifies further interest from institutions and investors. We start by providing a brief overview of the traditional drug discovery process for context. Thereafter, we get to the core of this article by reviewing the current landscape and then diving into the opportunities and challenges that the latest advancements of AI bring to drug discovery. We refer the interested reader to the Appendix for a short primer on AI.

Traditional Drug Discovery

Bringing a new drug to the market is a complex and lengthy process that requires years of expertise and research. Traditional drug discovery starts with target selection –a molecule that the drug is intended to interact with. It takes years of experience for discovery researchers and a large amount of literature to suggest potential genes or pathways that may lead to disease phenotypes. The process continues with the target-to-hit process, which resorts to high-throughput screening. The latter is a method of experimentation that allows researchers to conduct an extremely large number of chemical, genetic, or pharmacological tests using technological aids, such as robotics, data processing software, and liquid handling devices. The process can take several months and often results in low hit rates or poor-quality hits. Advances in computational technology have allowed for increased exploration of the vast chemical space –which comprises a whopping >1060 molecules. After promising hits are found, the more drug-like compounds are selected through what is known as hit-to-lead process. The most promising leads enter a lead optimization process that ends with one or more drug candidates entering preclinical development. However, only 10% of successful candidates make it through preclinical development and into clinical trials. The high failure rate can be due to few hits progressing, preclinical toxicity, off-target effects, and unsatisfactory pharmacodynamic and pharmacokinetic profiles.

Only 10% of successful candidates make it through preclinical development and into clinical trials.

Current Landscape: Key Players, Collaborations, and VC Deals

The current landscape of AI-driven drug discovery can be partitioned into 4 main groups (6). To reverse the emphasis, we refer to biotech companies that utilize an AI and Machine Learning (ML)-first approach to drug discovery as TechBio®  –a term coined by ARTIS Ventures (7).

  1. BioPharma
  2. Big Tech
  3. AI Driven TechBio
  4. Service Providers

The first two groups (in blue) are established companies, whereas the remaining two groups (in red) are relatively young players that provide specific platforms, technologies, and capabilities (8).

By the end of 2022 there were already more than 270 companies in the AI-driven drug discovery business (9). In this crowded space, the early movers of the AI-driven TechBio group are leading the pack in terms of the number of drugs under development. These pioneer companies have a significant advantage over newer players in this field. As of June 2022, Recursion Pharmaceuticals, SOM Biotech, HealX, InSilico, and Aria Pharmaceuticals had 82, 33, 26, 17, and 16 drugs in their pipelines, respectively (10). Despite the head start of established players, newer companies are carving their niche in the expansive landscape of AI-driven drug discovery, and even mine genomes and metabolomes to find molecules that are outside of the chemical space of clinical-stage drugs. Celeris Therapeutics is designing novel ML-enabled proximity-inducing compounds to degrade proteins, BigHat Biosciences focuses on AI-first antibody design, Enveda Biosciences and Brightseed are creating new medicines from plants through platforms that combine AI and metabolomics, Erebagen and Hexagon Bio mine genomes to discover new medicines, Persephone Biosciences explores microbial genomes to develop microbial therapeutics, and Abalone Bio uses ML to identify antibodies to control G-protein coupled receptors.

The increasing adoption of AI-based technology is reflected in the growth of the AI-first small molecule drug pipeline, which is projected to expand at an annual rate of approximately 40% (11). The proliferation of startups investing in AI's potential to improve the drug discovery process emphasizes the promises of this innovative technology in the pharmaceutical industry. In the figure below, we display a (non-exhaustive) list of key players by specific focus areas (12):

2022 Key Collaborations in AI Drug Discovery

High valuations, multiple deals, and significant capital raised have led to a number of novel partnerships (13):

Venture Capital Investments in AI Drug Discovery Startups

The COVID-19 pandemic has spurred a surge in Venture Capital (VC) investments in healthcare and biotechnologies, further solidifying its status as a highly sought-after sector for investment. AI-enabled biotechnology, in particular, continued to be attractive to investors for the first three quarters of 2022, especially in the United States. The aging population, talent shortages, and medical systems in need of modernization are expected to keep the interest of investors high for the foreseeable future (14). Supporting this prediction, venture funds raised nearly $22B to invest into healthcare companies, marking 2022 the second-largest fundraising year ever (15).

Despite investor interest being high, 2022 has seen a sizable contraction in the healthcare investment landscape as a whole. Data from Silicon Valley Bank shows that early-stage biopharma companies’ valuations in 2022 were similar to 2021 levels, but the number of $50M+ deals saw a sizable drop in the second half of 2022 (30 in H1, 17 in H2). Total 2022 biopharma deals comprised $29.5B through 778 deals, almost a $10B decline from the $38.7B invested in 2021 through 910 deals (16). This table from Silicon Valley Bank data illustrates the stark contrast in investments deployed between Q2 and Q3 of 2022:

In contrast to the aforementioned trend, investors were more willing to fund companies that applied a computational approach to discover and develop drugs. Companies that fall within this group must have a team with computational experience and the ability or potential for platform creation –key components of TechBio companies, which are presented further below. Platform computational biology companies reached high valuations in 2022 despite having no assets in the clinic. In fact, among the highest-valued 2022 financings, we find 3 platform companies (17):

  • Eikon Therapeutics, which is focused on tracking and measuring the movements of individual proteins in living cells, valued at $3.0B post
  • Tessera Therapeutics, which develops novel genome engineering technology, valued at $1.7B post
  • Treeline Biosciences, which leverages computational approaches to improve the druggability of molecular targets in oncology, valued at $1.3B post

Besides maintaining high valuations, this category showcases higher median multiples than non-platform computational biology companies at the earlier stages: 2.9x vs 2.3x in the Seed to A step-up, and 2.0x vs 1.7x in the A-B step-up. Data from Silicon Valley Bank highlight the top step-ups for companies that apply novel computational tools to drug discovery (18):

Finally, exits were tepid in 2022. Off a record 92 biopharma IPOs in 2021, there were just 19 IPOs in 2022. Altogether, pre-money valuations have fallen to 2019 levels and we have witnessed a mixed post-IPO performance for the 11 US and EU IPOs. Of the nine M&As in 2022, five were in the preclinical stage (19). Notably, the Nimbus deal mentioned above was announced at the end of the year, which is expected to be the second largest upfront payment ever for a venture-backed biopharma company.

Opportunities for the use of AI-Enabled Technology in Drug Discovery

Biotech companies want AI models that predict whether a specific molecule can access a target, that can generate a molecule to bind to a target, and that can explain how the target and molecules interact with each other. Thousands of compounds are screened and put into preclinical testing, but there is a high attrition rate, which makes finding new molecules extremely difficult and expensive.  AI holds the promise of alleviating many of the burdens pharma and biotech companies face in the post-blockbuster drug era. For instance, Nature Machine Intelligence published in October 2022 an article on a context-aware deconfounding autoencoder (CODE-AE), an AI technology developed by researchers at CUNY Graduate Center. CODE-AE utilizes AI to predict human response to novel drug candidates with remarkable accuracy by extracting intrinsic biological signals masked by context-specific patterns and confounding factors.

The excitement for AI in drug discovery is unmistakable, with a growing number of companies exploring novel approaches based on machine learning models. Finding ways in which known drugs can be used for new indications could reduce development costs since much of the preclinical testing would already be done –as highlighted by the CODE-AE example above. Furthermore, identifying safety issues at the earliest stages of the drug development process would prevent investing time and money toward the development of drugs that eventually need to be abandoned (20).

Another critical part of modern drug discovery is high-throughput screening (HTS). Embedding HTS with AI has been shown to significantly improve speed, autonomy, and accuracy. AI-enabled HTS can be broken down into 6 parts, as shown in the figure below (21).

Screening starts with diverse sets of chemical compounds spanning a wide range of chemical structures. Next, individual compounds are automatically transferred to wells wherein cells replicate specific experimental conditions by employing high-throughput hardware. Promising compounds are labeled “hits” after evaluation of the cell response to each compound by means of AI tools such as computer vision-based analysis. This last step enables the training of a ML model from the first cell response outcomes to each type of tested chemical structure. Once the model reaches desired accuracy, it is used to scan the remainder of the library compounds to predict which plates should be prioritized and to increase the number of hits in the next screening. Finally, compound selection is automated based on the ML model recommendations, which are used in the next screening. The algorithm continuously learns at each new screening iteration, and recommendations are used by scientists to find which new structures should be investigated next, thus feeding back into the selection of chemical compounds in the first step.

Challenges of AI-driven Drug Discovery

While there is considerable excitement about the transformative potential of AI in drug discovery, many experts advise caution. It is important to note that the hype surrounding AI has not yet been fully supported by robust evidence to persuade investors. In fact, no drug has yet to be approved that was identified using an AI-first approach. Recent setbacks for drug candidates in clinical trials also add weight to these warnings. For example, Sumitomo Dainippon Pharma, in a joint venture with Exscientia, were first to enter a Phase 1 clinical trial with an AI-generated long-acting 5-HT1A receptor agonist drug for the treatment of obsessive-compulsive disorder (OCD). The companies claim that the exploratory research took 12-months, which is significantly less than standard timeframes for discovering molecule candidates. However, it was announced earlier last year that the study failed to meet its pre-defined endpoints. Many skeptics are convinced that “it will take years for AI use to peak in the drug discovery and development process.”

As more and more AI-derived therapeutics enter clinical trials, none have advanced to Phase 3 yet. The figure below illustrates the clinical stage of the most advanced candidate in the respective pipeline of early players in this space.

Major Challenges to Consider

Building AI enablement capabilities in-house is difficult: assembling the cross-functional teams required to drive the transformation is challenging, and it has been observed that AI enablement is often implemented in a relatively isolated way. These obstacles are exacerbated in large biopharma companies by the inherent operational inertia (due to their size), and the fragmented corporate hierarchy. What’s more, AI-enabled approaches are often undertaken separately from day-to-day science, with computational tools that are not fully integrated into routine research activities. Such practices lead to myopic experiments that are not aligned with modern scientific and operational processes, thereby hindering their potential impact and limiting their effectiveness at a larger scale. Furthermore, investment in digitized drug discovery capabilities and datasets frequently translates into leveraging partner platforms, ultimately enriching the partner’s IP and delaying the construction of in-house end-to-end tech stacks and capabilities. Biopharma companies, therefore, need to strike a balance between internal capability-building and partnerships with AI-enabled drug discovery companies (22).

Aside from the aforementioned challenges with partnerships, several other obstacles arise when adopting AI within an organization. For instance, R&D departments face limited availability of data and privacy concerns. Further, when integrating AI into drug discovery processes, companies face a scarcity of talent and difficulty interpreting insights generated from AI. Finally, lack of clarity hinders AI adoption due to a rapidly evolving landscape, changing requirements for algorithm transparency, and the very few tangible proofs that AI-enhanced pipelines can deliver on their promises.

Is AI for drug discovery in the "trough of disillusionment" of the technology hype cycle?

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair.” – Charles Dickens, A Tale of Two Cities

Depending on whom you ask, opinions on the status of AI in drug discovery vary wildly. According to research from Gartner (23), however, the general consensus seems to place this technology right in between the peak of inflated expectations and the trough of disillusionment. Besides the obvious reasons listed in the previous paragraph, other operational issues justify this placement in the technology hype cycle. Such operational issues include: the struggles of business leaders to overcome challenges with the validation of AI processes, challenges in building up a bench of AI experts to support initiatives, and governance issues that will require orchestration and ongoing mediation between scientists and AI-promoting vendors.

Acquiring appropriate data and tech talent to unlock AI benefits

The most difficult and time-consuming part of creating ML algorithms is acquiring robust training data for mathematical model development. If the data quality is poor, insufficient, or biased then the model is unable to provide predictions with high probability of success. Hence, the saying, “garbage in, garbage out.” Big datasets are used to train algorithms so they can be used to apply what is learned through the training set to new situations. Data scientists and AI specialists believe that drug discovery organizations are not using appropriate data and lack the skill set needed to use ML tools properly.

Current challenges for biopharma and AI-first biotech companies

  • Ensuring quality data is used for modeling
  • Lack of AI/ML and data science talent
  • Lack of infrastructure to support increase in technological assets and scaling requirements
  • Vertical integration/compatibility (tech integration from startup to biopharma, CROs, and CDMOs)
  • Evolving regulatory requirements

Closing Remarks and Future Directions

From the patients’ point of view – especially those currently underserved due to lack of drug developments for their specific disease states – AI has created a surge of shots on goal that were not possible before. Will AI enable the era of industrialized drug discovery? The jury is still out, but there seems to be sufficient evidence that it might very well be the case.

Vasudev Bailey (Senior partner at ARTIS Ventures): ‘For us, we’re investing in technology towards the discovery or development of a particular drug so even if the drug didn't make it, or the asset didn't make it through the FDA, there's still value on the table because of the technology or how to make the drug, or how to discover the drug. And that's the key difference.’

Critical components of successful TechBio companies that leverage AI at their core

A growing class of companies are taking an engineering-first approach to drug discovery by focusing on the development of technologies and platforms that enable the fast and simultaneous discovery of multiple drugs. These ventures fall into the category of TechBio. To achieve success in this field, however, several key components must be in place, both in terms of human capital and technical expertise.

To harness the full potential of AI in drug discovery, companies need to build their in-house data science and engineering capabilities, as well as equip teams with the technical know-how to effectively leverage AI. Resourcing, technical skill sets, management focus, maturity, and culture are major elements to consider when making the decision to develop AI capabilities. Specifically, the company should possess a capable team with (i) a clear vision for a problem that is well-suited for AI-enabled solutions, and (ii) a deep understanding of what the synergy between human and AI will look like in day-to-day activities. Looking ahead, companies should gear up with in-house ethics committees as an integral part of company and product design. This step will ensure that structure, behavior, and governance are set up to assure fairness, mitigate discrimination, and maintain the integrity of data, algorithms, and solutions.

The technical components of a successful TechBio drug discovery company are equally crucial. Robust and interpretable AI algorithms, optimized data acquisition, storage, analysis, and a strong data moat are all vital components of this approach. On the one hand, building a data moat through in-house, large-scale, and high-quality experiments that leverage a healthy, secure, and scalable infrastructure is extremely valuable. On the other hand, the team should be aware of the opportunity cost incurred when generating new data, and understand when to prioritize the analysis over the collection of data. With a focus on efficiency, predictability, and replicability, TechBio companies are poised to revolutionize the world of drug discovery and lead the way towards a healthier future.

How do we take an AI-first approach to the next level for drug discovery companies?

As this exciting and critical space continues to develop, the following key factors should be considered:

  • The successful implementation of AI in drug discovery requires a strategic approach that leverages available open-source data and industry partnerships. The alignment of specific technologies and datasets with the associated drug discovery business activity is crucial.

  • IT leaders working within R&D should evaluate potential applications in specific disciplines and specify the leading technologies and vendors with associated competencies. To fully realize the benefits of AI, data generation and collection must be specifically designed for AI applications, with a focus on clear annotation, efficient archiving, and consistency. This goes beyond simply collecting data points from laboratory experiments (24).

  • Improving the accuracy of current models will yield improved outcomes for:
  1. various therapeutic interventions such as DNA, RNA, and ribosomal targets, not just enzyme/protein targets;
  2. various therapeutic modalities: large molecules, such as biologics, and not just small molecules.

  • Models should be able to simultaneously predict multiple properties, such as toxicity and solubility, as well as efficacy. To construct better models, companies should develop data science and engineering capabilities in parallel with AI technology acumen. For instance, AI-enabled screening assays of organoids (25) and state-of-the-art interconnected organ chips (devices that mimic the structural and functional interactions across desired organs) could overcome the pre-clinical gold standard of animal models, allowing scientists to save time and money.

  • Advancements in computing power offer a long-term opportunity: the advent of Exascale computing (26) (computers that can perform more than 1,000,000,000,000,000,000 FLOPS, or 1 exaFLOPS) will massively speed up the analysis of extremely large data volumes and complex environmental genomes. Boosted medical research support will enable faster and automated discovery of previously hidden insights through the analysis of, e.g., patient genetics, tumor genomes, and molecular simulations.

  • Employing the latest AI research and tools can further enhance and disrupt current pipelines, for example:
  1. Diffusion Models can be applied to drug discovery and protein design. In fact, many researchers have started applying this powerful novel class of generative models to molecular graphs. Namely, diffusion models have been used for structure-based drug design and generation of novel ligands (27), molecular docking (predicting the binding structure of a small molecule ligand to a protein), (28) and predicting molecular conformations from molecular graphs (29).
  2. Large Language Models have already been shown to be capable of fast single-sequence protein structure prediction (30). We have also seen the first applications of generative pre-trained transformers for advanced biomedical text generation and mining (BioGPT). (31) In the near future, small-molecule and protein-design tools may look like predictive AI tools such as GitHub Copilot, where a scientist can simply describe in plain language the chemical, structural, and functional features of a desired output.

As AI continues to evolve, with increasing potential for transforming the drug discovery process, groundbreaking technical advancements are making experiments, assays, computational tools, and processes that were once considered impossible, possible. The potential of AI to improve human health is vast, but it will require steadfast dedication and commitment from all stakeholders involved. Practitioners, scientists, engineers, investors, and technology enthusiasts should remain focused on the ultimate goal: to swiftly bring innovative and improved therapeutics to patients in need.


Appendix: Short Primer on Artificial Intelligence

While the scope of this article is to analyze AI in the context of therapeutics discovery, it may be useful to lay out definitions that address the core of this piece. Readers who are already familiar with the basics of AI and ML can use this section as a refresher or ignore it altogether.

According to the 2004 definition from John McCarthy, one of the founding fathers of artificial intelligence together with Alan Turing, AI “is the science and engineering of making intelligent machines, especially intelligent computer programs”. In the life sciences, AI is used as a tool to aid physicians and scientists in their workflows by providing actionable insights. AI is highly correlated with the field of Data Science, which studies the origin, the value, and the transformations of data to gain valuable information. At the intersection of Data Science and AI lies Machine Learning, which is the part of the artificial intelligence space dealing with the development of prescriptive methods that leverage data to improve machine performance on some tasks. Early examples include email filtering, computer vision, language recognition, and translation. ML is particularly useful when the dataset a scientist needs to analyze is extremely large (thousand, millions, or billions of data points) or too complex (large number of features) for human analysis, or when automation of the data processing pipeline is desirable.

As it is often the case, data from biological processes possess these properties. As a natural consequence, most AI algorithms used in drug discovery are based on ML and make use of deep learning. Deep learning is an artificial intelligence technique that relies on layered artificial neural networks, which are loosely based on how neurons are interconnected in the brain. The adjective “deep” refers to the use of many hidden layers in the network, which are all the layers in addition to the input and output. In a neural network, data flows from the input layer to the output layer, and is transformed with many operations performed by the hidden layers, which depend on the chosen architecture. In the case of static data, such as clinical images and empirical measurements, there even exist theorems proving that  large neural networks can practically approximate any mathematical function. Despite a sense of novelty, we would like to emphasize that artificial neural networks in chemoinformatics had their first heyday in the 1990s (32).

ML is mainly divided into supervised learning and unsupervised learning. To build a supervised ML algorithm, a scientist must specify a model, such as linear regression, support vector machine, or a neural network. The model is then fit to a large and labeled training dataset –that is, where each data point has been manually assigned to a category. The trained model is then evaluated on a test set of data that was not used in training. Finally, the accuracy of the trained model on the test set is analyzed, and the model parameters are fine-tuned in order to reduce the observed error. By contrast, unsupervised learning algorithms are developed to identify patterns in unlabeled data. That is, there is no need to provide a ground truth to the algorithm in the form of predetermined labels. For instance, unsupervised learning could be used to predict the mutation effects from gene sequence co-variation. Finally, the two approaches can be combined in what is called semi-supervised learning. The latter is used when only a small portion of a dataset is labeled, and the remaining data comprises data points that would be labeled analogously. The following figure summarizes the (simplified) procedure for choosing and training an ML model (33).


(1) Source: S. Pushpakom et al., Nature Reviews Drug Discovery, volume 18, 41–58 (2019)

(2) Global Data Thematic Analysis June 2022 AI in Drug Discovery, Analyst Briefing, Report Code: GDHC4632EI

(3) SVB Healthcare Investments and Exits 2022 (Annual Report) - Comp Bio category

(4) Source 1: Global Data Thematic Research: Pharma Artificial Intelligence in Drug Discovery June 2022, Table 3, page 16, Report Code: GDHCHT333. Source 2: company websites and press releases

(5) Global Data Analysis July 2022, Analyst Briefing, Report Code: GDHC4695EI

(6) The list of companies is not exhaustive.

(7) Vator News, “Meet Vasudev Bailey, Senior Partner at ARTIS Ventures ”:

(8) Companies partition adapted from “An Ounce of Biotechnology – Machine learning-powered drug discovery: Now and Tomorrow” 2022 blog post by Lin Ning

(9) McKinsey 2022 report “AI in biopharma research: A time to focus and scale”

(10) Global Data Thematic Research: Pharma Artificial Intelligence in Drug Discovery June 2022, Table 3, page 16, Report Code: GDHCHT333

(11) Jayatunga MKP, Xie W, Ruder L, Schulze U, Meier C. AI in small-molecule drug discovery: a coming wave? Nat Rev Drug Discov. 2022 Mar;21(3):175-176. doi: 10.1038/d41573-022-00025-1

(12) Sources: company websites, LEK Consulting research, BCG research.

(13) Source 1: company websites and press releases. Source 2: Source 3:

(14) McKinsey 2022 report “AI in biopharma research: A time to focus and scale”. Data: PitchBook data, 2022; IQVIA Pharma Deals, 2022

(15) SVB Healthcare Investments and Exits 2022 (Annual Report)

(16) - (19) SVB Healthcare Investments and Exits 2022 Q3 Update (Seed/A Advisory Board)

(20) Global Data Thematic Research: Pharma Artificial Intelligence in Drug Discovery June 2022, Table 3, page 16, Report Code: GDHCHT333

(21) - (22) McKinsey 2022 report “AI in biopharma research: A time to focus and scale”

(23) Gartner Hype Cycle for Life Science Discovery Research, 2022

(24) K. Huang et al.,  Nature Chemical Biology, volume 18, pages 1033–1036 (2022)

(25) A. Mullard, “Mini-organs attract big pharma”, Nature Reviews Drug Discovery. doi:

(26) McKinsey 2022 Explainer: What is exascale computing?




(30) Chowdhury, R., Bouatta, N., Biswas, S. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat Biotechnol 40, 1617–1623 (2022).

(31) R. Luo et al., Briefings in Bioinformatics, Volume 23, Issue 6, November 2022

(32) J. Zupan, and J. Gasteiger, 1993. Neural networks for chemists; an introduction. VCH publishers.

(33) Figure adapted and updated from