The Molecular Engineering Layer in Precision Oncology: From Actionable Target to Clinical Candidate

For forty years, KRAS was the oncogene that could not be drugged. Mutated in roughly one-quarter of all human cancers — including 90% of pancreatic ductal adenocarcinomas, 40% of colorectal cancers, and 30% of non-small cell lung cancers — KRAS was the single most validated oncology target in existence, and the single most resistant to pharmaceutical intervention. The protein's surface was smooth and featureless, and its binding to its endogenous substrate GTP was picomolar in affinity, leaving no obvious pocket for a competitive small molecule. There was, as far as medicinal chemists could determine, no pocket to place a drug.

Then, in 2013, Kevan Shokat's laboratory at UCSF identified something unexpected: a cryptic binding site beneath the switch-II loop of KRAS G12C that only exists when the protein is in its inactive, GDP-bound conformation. The pocket was shallow, conformationally dynamic, and transiently accessible. But it was adjacent to the mutant cysteine at position 12 — a nucleophilic residue that does not exist in wild-type KRAS, and therefore a potential anchor for a covalent inhibitor.

The molecular engineering problem was now defined. Design a molecule that: (a) fits a shallow, flexible pocket with weak reversible affinity; (b) positions a reactive electrophilic warhead within covalent bonding distance of Cys-12; (c) reacts selectively with the mutant cysteine and not with the thousands of other cysteines in the cell; and (d) does all of this while remaining orally bioavailable, metabolically stable, and synthetically accessible. This was not a target identification problem. It was not a biological insight problem. It was, from initial design hypothesis to final compound, a molecular engineering problem.

Amgen solved it in 2021 with sotorasib (Lumakras), a 560.61-dalton small molecule bearing an acrylamide warhead that forms an irreversible covalent bond with Cys-12 in the switch-II pocket. Mirati Therapeutics solved it differently in 2022 with adagrasib (Krazati), a 604.13-dalton compound with a vinyl fluoride warhead that achieves stronger reversible affinity to the same pocket, compensating for the attenuated reactivity of its electrophile. Same target, same pocket, fundamentally different molecular engineering strategies reflecting different balances of the same design tradeoffs.

As of March 2026, there are 34 active clinical trials testing KRAS inhibitors across multiple mutation subtypes. Eli Lilly is running a 1,264-patient Phase 3 pivotal trial of LY3537982 combined with pembrolizumab in KRAS G12C-mutant NSCLC (NCT06119581). Roche has a 600-patient Phase 3 trial of divarasib — a next-generation G12C inhibitor pushing the boundaries of drug-likeness at 622 daltons and two Rule-of-Five violations — combined with pembrolizumab (NCT06793215). Revolution Medicines is testing zoldonrasib (RMC-9805), a first-in-class tri-complex inhibitor of KRAS G12D that recruits a cellular chaperone protein to form a ternary complex with the mutant oncoprotein — a paradigm that did not exist five years ago. All three compounds, along with nearly every KRAS inhibitor in clinical development, share the `-rasib` USAN (United States Adopted Name) stem, designating them as members of a drug class that did not exist before 2021. A protein that was undruggable for four decades is now the target of three fundamentally different classes of molecular intervention, each representing a distinct solution to a distinct molecular engineering challenge.

This is the pattern. Precision oncology's diagnostic infrastructure (next-generation sequencing, molecular tumor boards, basket and umbrella trial designs) has matured to the point where identifying actionable mutations is routine. What remains non-routine, and what determines whether a genomic finding translates into a therapeutic option, is the molecular engineering layer: the design, evaluation, and synthesis of molecules that can selectively engage the mutant protein with sufficient potency, selectivity, and pharmacological properties to become a clinical candidate. This article examines how that layer works, where it is advancing, and where it must go next.

The Precision Oncology Bottleneck Is Molecular

The infrastructure for molecularly-guided cancer treatment has reached a scale that would have been unimaginable a decade ago. The NCI's ComboMATCH trial (NCT05564377) enrolls 2,900 patients across 475 sites, matching patients to targeted therapy combinations based on their tumor's molecular profile. Cancer Research UK's DETERMINE platform (NCT05722886) tests six different targeted agents across 825 patients with rare molecular subtypes. Roche's TAPISTRY trial (NCT04589845) runs eleven treatment arms across 920 patients. ASCO's TAPUR study (NCT02693535) has enrolled over 4,200 patients across 160 sites, testing 15 commercially available targeted agents in molecularly defined populations.

These are not exploratory studies. They are industrial-scale infrastructure for matching patients to drugs based on the molecular identity of their tumors. And they share a common dependency: every treatment arm requires a molecule that was designed, synthesized, optimized, and clinically developed by someone who solved a specific molecular engineering problem.

The word "actionable" in precision oncology (as in "actionable mutation") is routinely used as though it were a property of the mutation itself. It is not. A mutation is actionable only if molecular engineering has produced a compound that exploits it. KRAS G12C was not actionable in 2012. It became actionable in 2021, when sotorasib received FDA approval. KRAS G12D was not actionable in 2023. It is becoming actionable now, as MRTX1133, zoldonrasib, HRS-4642, GFH375, and at least six other compounds enter clinical testing — each the product of a different molecular engineering campaign that solved the problem of targeting an aspartate mutation without a covalent handle.

The distinction matters because it reframes where a critical bottleneck in precision oncology actually lies. Molecular profiling can now identify hundreds of potentially significant alterations in a single tumor. The number of those alterations for which a viable therapeutic molecule exists is a small fraction of the total, and the rate at which that fraction grows is determined not by sequencing throughput or bioinformatics capacity, but by the pace of molecular engineering: how quickly new chemical matter can be designed, evaluated for drug-likeness and synthesizability, synthesized, and advanced to the point where it can enter a clinical arm.

To be precise about where this claim does and does not apply: molecular engineering is not the only bottleneck in precision oncology. Clinical development timelines consume years and hundreds of millions of dollars per compound. Target validation failures account for the majority of Phase 2 and Phase 3 attrition in oncology; most drugs fail because the biology does not cooperate, not because the chemistry was wrong. And even when molecular engineering succeeds, clinical reality can be humbling. Sotorasib, the molecular engineering triumph that broke the KRAS barrier, faced significant regulatory headwinds after the confirmatory CodeBreaK 200 trial demonstrated progression-free survival improvement but no overall survival advantage versus docetaxel (de Langen, A.J. et al., 2023). The FDA maintained its accelerated approval but required dose-optimization studies, and the drug's regulatory path became a case study in the limitations of surrogate endpoints for oncology approvals. The molecule was exquisitely designed. The tumor's biology (resistance kinetics, pathway bypass, intratumoral heterogeneity) limited the clinical benefit that molecular engineering alone could deliver.

But the argument here is specific: for the subset of precision oncology that depends on matching drugs to molecularly defined targets, the molecular engineering layer is the gateway constraint. Regulatory timelines and clinical development infrastructure cannot begin until a molecule exists. Target validation cannot proceed without a chemical tool compound to probe the biology. The platform trials enrolling thousands of patients are limited not by enrollment capacity or statistical design, but by the number of treatment arms they can populate, which is determined by how many targets have viable molecules. In that specific sense, molecular engineering gates the system. And in no therapeutic area is this constraint more consequential than oncology, where the target landscape is enormous, the molecular diversity of driver mutations is vast, and the clinical urgency is visceral.

Three Paradigms of Molecular Intervention

The KRAS revolution is instructive not because it produced one successful drug, but because it produced three fundamentally different classes of molecular intervention — each imposing different design constraints, requiring different computational tools, and presenting different resistance liabilities. For molecular engineers working in oncology, understanding these paradigms and the tradeoffs between them is essential to navigating the design space for the next generation of targeted agents.

A caveat is warranted. KRAS is, in important respects, a best-case scenario for molecular engineering: a single oncoprotein with well-characterized point mutations, high-resolution crystal structures, clear driver biology, and a defined binding pocket once the switch-II site was identified. Many oncology targets are far less tractable. Transcription factors like MYC, protein-protein interactions like beta-catenin/TCF, intrinsically disordered proteins, and epigenetic regulators resist molecular engineering for reasons that are fundamentally different from the "no pocket" problem KRAS presented — they lack stable, well-defined binding surfaces entirely. The paradigms described below apply most directly to targets with defined structural features. That said, the design principles (balancing affinity, selectivity, and synthesizability; choosing between covalent, non-covalent, and cooperative mechanisms; anticipating resistance) are generalizable even where the specific structural solutions are not.

Covalent Inhibition: Engineering Selectivity Through Warhead Chemistry

Sotorasib and adagrasib both target the switch-II pocket (SIIP) of KRAS G12C in its GDP-bound inactive state, and both form covalent bonds with the mutant cysteine at position 12. But their molecular engineering strategies diverge in a way that illuminates a fundamental design tension in covalent drug design: the balance between reversible binding affinity and electrophile reactivity.

Sotorasib (CHEMBL4535757) has weak reversible affinity to the SIIP. Its potency depends primarily on the irreversible covalent reaction between its acrylamide warhead and Cys-12. The compound's molecular properties (560.61 Da, ALogP 4.48, four aromatic rings, one Rule-of-Five violation, polar surface area of 104.45 A^2) reflect a design philosophy that prioritizes warhead positioning over non-covalent complementarity. The crystal structure reveals that the acrylamide sits in precisely the geometric relationship to Cys-12 required for Michael addition, and the remainder of the molecule provides just enough reversible interaction to orient the warhead correctly.

Adagrasib (CHEMBL4594350) inverts this balance. Its vinyl fluoride warhead (sometimes classified as a fluoroacrylamide) is a weaker electrophile than sotorasib's acrylamide, but the compound achieves stronger reversible affinity to the SIIP, with a reported Ki of approximately 4 uM for the non-covalent interaction. At 604.13 Da, ALogP 4.73, and zero hydrogen bond donors (an unusual property for an oral drug, reflecting a design that derives its binding entirely from hydrophobic packing and polar contacts with backbone atoms rather than classical donor-acceptor pairs) adagrasib is a larger, more lipophilic molecule that fills more of the pocket through non-covalent contacts. Recent biophysical analysis has shown that adagrasib is strictly dependent on interaction with His-95 in the SIIP for its binding, while sotorasib is less dependent on this residue — a distinction with direct implications for resistance, since H95D/G/N/R mutations are among the most frequently observed acquired resistance alterations.

Divarasib (CHEMBL5095236), Roche's next-generation G12C inhibitor now in Phase 3 trials, pushes the molecular envelope further: 622.07 Da, ALogP 5.09, two Ro5 violations. Structural analysis reveals optimized switch-II pocket engagement through additional fluorine substituents and a rearranged core scaffold. This progression (sotorasib → adagrasib → divarasib) illustrates the iterative molecular engineering cycle: each generation incorporates structural learning from the previous, trading some drug-likeness parameters for improved target engagement or resistance coverage.

The design tradeoff is generalizable. Covalent inhibitors in oncology require tuning three interdependent variables: warhead reactivity (determines covalent bond formation rate and off-target liability), reversible affinity (determines residence time in the binding pocket and selectivity window), and overall molecular properties (determine oral bioavailability, metabolic stability, and manufacturability). No single optimization addresses all three. The molecular engineer must navigate a multidimensional landscape where improving one parameter often degrades another. The optimal balance depends on the specific biology of the target, the geometry of the binding site, and the clinical context.

The warhead design space itself is expanding. Twelve targeted covalent inhibitors have been approved for cancer treatment as of late 2025, targeting six different protein classes, and the majority rely on acrylamide or related Michael acceptor warheads. But acrylamides have known limitations: off-target reactivity with glutathione (present at millimolar concentrations intracellularly) can deplete the drug before it reaches its target, and irreversible binding precludes dose titration once the covalent bond is formed. Emerging warhead chemistries (reversible covalent warheads such as cyanoacrylamides, non-traditional electrophiles like vinyl sulfonamides and 2-sulfonylpyrimidines, and strain-release alkylation strategies) are being explored precisely because the standard acrylamide toolkit is insufficient for the next generation of targets. Each new warhead chemistry introduces new computational requirements: different reactivity models, different selectivity prediction frameworks, and different metabolic liability profiles. The molecular engineering community must expand its computational infrastructure in parallel with its chemical toolkit.

Non-Covalent Inhibition: The G12D Challenge

KRAS G12D presents a fundamentally different molecular engineering problem. The aspartate at position 12 is not a nucleophile. It cannot form a covalent bond with an electrophilic warhead. Any inhibitor of KRAS G12D must achieve its potency and selectivity entirely through non-covalent interactions — a far more demanding design requirement, since wild-type KRAS (with glycine at position 12, which has no side chain — just a hydrogen atom) differs from G12D by the addition of an aspartate's carboxylate-bearing side chain, and any therapeutic molecule must discriminate between them to avoid inhibiting the normal RAS signaling that healthy cells depend on.

MRTX1133, developed by Mirati Therapeutics, achieves this selectivity by exploiting ionic interactions between a positively charged pyrrolidine group in the inhibitor and the negatively charged carboxylate of the mutant aspartate. This electrostatic complementarity provides selectivity over wild-type KRAS, where no such negative charge exists at position 12. The approach works — MRTX1133 has demonstrated low-nanomolar potency for KRAS G12D with reported selectivity of over 700-fold versus wild-type KRAS (Shi, Z. et al., 2025) — but it required structure-based design guided by extensive molecular dynamics simulations to map the conformational landscape of the switch-II pocket in the G12D context, where the pocket geometry differs from the G12C case due to the different side chain at position 12.

The computational demands of non-covalent selectivity engineering are qualitatively different from those of covalent design. In covalent inhibitor design, the warhead provides a thermodynamic anchor — once the covalent bond forms, the compound is irreversibly bound regardless of its reversible affinity. Non-covalent inhibitors have no such anchor. Every interaction must earn its binding energy through van der Waals contacts, hydrogen bonds, electrostatic complementarity, and hydrophobic packing — and the sum of these interactions must be substantially more favorable for the mutant than for the wild-type protein. Achieving this discrimination computationally requires free energy perturbation (FEP) calculations or thermodynamic integration methods that can resolve differences of 1-2 kcal/mol between binding to two proteins that differ by a single amino acid. These calculations are computationally expensive (hundreds of GPU-hours per compound) but increasingly feasible for lead optimization campaigns, and their accuracy has improved markedly with modern force fields and enhanced sampling methods.

At least ten KRAS G12D inhibitors are now in clinical studies, including HRS-4642, GFH375/VS-7375, and D-1553. Each represents an independent solution to the same molecular engineering problem — achieving selective target engagement without covalent anchoring — and each navigates slightly different regions of the design space. The diversity of approaches itself is informative: it reflects the difficulty of the problem and the degree to which non-covalent selectivity remains a frontier challenge in molecular engineering for oncology. It also reflects an emerging truth about the field: computational methods have compressed the time from target structure to clinical candidate, enabling multiple independent groups to tackle the same "impossible" target in parallel and arrive at distinct solutions within a narrow time window.

Tri-Complex Inhibition: Engineering Cooperativity

Zoldonrasib (RMC-9805), developed by Revolution Medicines, represents a paradigm that is conceptually distinct from either covalent or non-covalent inhibition. Rather than binding KRAS alone, zoldonrasib recruits cyclophilin A (CypA) — an abundant intracellular chaperone — to form a ternary complex with the active, GTP-bound state of KRAS G12D. The drug molecule sits at the interface between KRAS and CypA, deriving its binding energy from contacts with both proteins simultaneously.

This is a three-body molecular engineering problem. The design must satisfy: (a) sufficient affinity for the KRAS surface to initiate the interaction; (b) a molecular geometry that creates a complementary interface for CypA recruitment; and (c) a covalent warhead positioned at the KRAS-CypA interface in an orientation that enables crosslinking with the mutant Asp-12 — which is not a classical nucleophile and required structure-guided warhead selection. Revolution Medicines reported that the warhead chosen for zoldonrasib showed no intrinsic reactivity to a model aspartic acid system in solution; the reaction only proceeds in the context of the ternary complex, where the geometric constraints of the protein-drug-protein interface position the warhead with sufficient precision to enable bond formation. This is molecular engineering at its most exacting — designing reactivity that is context-dependent rather than intrinsic.

The clinical results justify the engineering effort. In Phase 1 data reported at ASCO 2025, zoldonrasib achieved a 30% objective response rate and 80% disease control rate in patients with KRAS G12D-mutant pancreatic ductal adenocarcinoma — a malignancy with a five-year survival rate below 12% and historically negligible responses to targeted therapy.

A fourth paradigm is emerging alongside these three: targeted protein degradation through PROTACs (Proteolysis-Targeting Chimeric Molecules) and molecular glues. ASP3082, a selective KRAS G12D degrader, uses a heterobifunctional design to simultaneously engage KRAS G12D and the VHL E3 ubiquitin ligase, inducing ubiquitination and proteasomal destruction of the oncoprotein. Recently, researchers have identified highly cooperative PROTAC degraders targeting GTP-loaded KRAS alleles, with VHL-based PROTACs achieving greater than 95% maximum degradation at nanomolar concentrations in KRAS G12D-mutant cancer cells. Machine learning is increasingly applied to this design problem — predicting ternary complex formation, optimizing linker geometry, and scoring cooperativity — because the three-body interactions involved are computationally expensive to model with physics-based methods alone.

Each paradigm imposes fundamentally different constraints on the molecular engineer. Covalent inhibition requires warhead tuning and selectivity engineering. Non-covalent inhibition demands exquisite shape and electrostatic complementarity. Tri-complex design requires engineering cooperativity across a three-body interface. PROTAC design adds the constraint of linker optimization and E3 ligase recruitment. The computational toolkit must be paradigm-aware — the right tools and the right scoring functions differ depending on which class of molecule is being designed.

The Computational Molecular Engineering Toolkit

The molecular engineering challenges described above — covalent warhead optimization, non-covalent selectivity engineering, ternary complex design, synthesizability assessment — are not solved by intuition and manual design alone. They require computational infrastructure that operates at every stage of the design cycle, from initial hit identification through lead optimization to synthesis planning. The components of this infrastructure are well-established individually, but their integration into a coherent pipeline for oncology drug design remains inconsistent across the field.

Structure Standardization and Compound Identity

Screening campaigns against oncology targets — whether physical high-throughput screens or virtual screens against crystal structures — generate thousands to millions of candidate structures drawn from diverse compound libraries and generative models. Before any structure-activity analysis can proceed, these structures must be standardized: salt forms stripped, tautomers canonicalized, stereochemistry resolved, and unique identifiers assigned.

The mechanics of standardization pipelines are well-documented elsewhere (including in prior articles on this platform). What is worth emphasizing here is the specific failure mode that standardization prevents in oncology target campaigns. In the KRAS context, medicinal chemistry teams iterate rapidly through analogs of initial hits — hundreds of compounds synthesized and tested across a multi-year campaign. Compound identity errors that survive into SAR analysis do not merely inflate hit counts; they corrupt the structure-activity models that drive design decisions, potentially directing synthesis effort toward a region of chemical space that appears active only because of duplicate counting or tautomeric ambiguity. When the design cycle operates on timelines of weeks per iteration and the target is a patient population with progressing disease, months lost to corrupted SAR are months that matter. Standardization is not preprocessing — it is quality infrastructure for the design loop itself.

Substructure Pattern Matching

SMARTS-based substructure analysis is standard in medicinal chemistry, but its integration into oncology-specific design workflows remains underexploited. The conventional applications — pharmacophore identification from hit sets, PAINS/toxicophore filtering, warhead classification — are well-established. The more consequential and less common application is using SMARTS as an engineering tool for resistance-aware scaffold design.

Specifically: given a set of on-target resistance mutations observed or predicted for a clinical candidate, structural analysis of crystal structures or docking poses can identify which pharmacophoric features of the candidate make contacts with the mutated residues. SMARTS patterns can then encode those vulnerable substructures, enabling automated classification across compound libraries — flagging which analogs share the vulnerable contacts and which substitute alternative interactions. This decomposition of molecules into "resistance-vulnerable" and "resistance-resilient" substructures, identified through 3D structural analysis and operationalized through 2D SMARTS encoding, informs where modifications should be explored to maintain activity against resistance mutants. In the covalent inhibitor context, SMARTS-based warhead classification becomes particularly important as the design space expands beyond acrylamides: systematically enumerating which warhead chemistries (vinyl sulfonamides, cyanoacrylamides, 2-sulfonylpyrimidines, strain-release electrophiles) are compatible with a given scaffold's geometry and the target's nucleophile position is a combinatorial problem that benefits from automated substructure enumeration rather than manual inspection.

Synthetic Accessibility Scoring

Generative molecular design — whether through variational autoencoders, graph neural networks, reinforcement learning, or genetic algorithms — can produce millions of candidate structures optimized for predicted activity against an oncology target. The fundamental limitation is not generating candidates; it is determining which candidates can actually be synthesized.

Synthetic accessibility (SA) scoring assigns a numerical estimate of how difficult a molecule is to synthesize. The Ertl SA score, the most widely used approach, combines fragment-based complexity analysis with structural features to produce a score on a 1-10 scale (1 = easy to synthesize, 10 = extremely difficult). More recent approaches integrate AI-driven retrosynthetic analysis into the scoring: the DFRscore (Drug-Focused Retrosynthetic score) predicts the minimum number of synthetic steps required, trained exclusively on drug-focused reactions rather than the broader chemical literature.

For oncology drug design specifically, SA scoring serves as a critical reality check on computational output. A virtual screen or generative model may identify a compound with predicted sub-nanomolar KRAS G12D affinity, excellent selectivity, and favorable ADMET properties — but if its SA score is 8.5 and retrosynthetic analysis cannot identify a viable three-step route from commercially available starting materials, the compound has no practical value. The time pressure in oncology — where patients are on treatment and resistance is evolving — makes synthesizability not just a convenience factor but a clinical constraint. A compound that takes eighteen months to synthesize through a twelve-step linear route is a compound that arrives after the disease has progressed.

Integration of SA scoring directly into generative design loops — penalizing candidates with poor synthesizability during optimization, rather than filtering them post hoc — has been shown to produce candidate sets that are both active and makeable, without significant sacrifice in predicted potency. This integration is particularly valuable in oncology, where the chemical matter being designed is often structurally complex (the two approved and one Phase 3 KRAS G12C inhibitors all exceed 560 Da and feature stereocenters and constrained ring systems) and the synthesis challenge is real.

Retrosynthetic Analysis

Where SA scoring estimates whether a molecule can be made, retrosynthetic analysis determines how. AI-driven retrosynthesis tools — both template-based systems that apply known reaction rules and template-free neural models that predict transformations directly — decompose target molecules into progressively simpler precursors until commercially available starting materials are reached.

In the context of oncology drug design, retrosynthetic analysis serves two critical functions. First, during lead optimization, it enables chemists to assess whether proposed structural modifications are synthetically tractable before committing resources to synthesis attempts. If adding a fluorine substituent to improve metabolic stability requires an entirely different synthetic route, that cost is visible before the decision is made. Second, at the candidate selection stage, retrosynthetic analysis informs the critical path to clinical supply. A drug candidate that can be synthesized in five steps from commodity building blocks has a fundamentally different development timeline than one requiring twelve steps including a chiral resolution and a hazardous organometallic coupling.

The pairing of SA scoring with retrosynthetic analysis creates a two-stage filter: SA scoring rapidly triages millions of candidates to identify the synthesizable fraction, and retrosynthetic analysis maps detailed routes for the survivors. This workflow is especially valuable when applied to the output of generative models, which are agnostic to synthetic feasibility unless explicitly constrained.

Molecular Feature Engineering for Predictive Modeling

Quantitative structure-activity relationship (QSAR) models, molecular property predictors, and machine learning classifiers all depend on the quality and consistency of molecular feature representations. Morgan fingerprints (ECFP), physicochemical descriptors (molecular weight, LogP, PSA, hydrogen bond counts, rotatable bonds), and pharmacophore features encode different aspects of molecular structure, and the choice of representation materially affects model performance.

For oncology drug design, feature engineering intersects with the target-specific challenges described in Section 2. Predicting covalent inhibitor potency requires features that capture not just non-covalent complementarity but warhead reactivity — metrics such as electrophile softness, warhead strain energy, and Michael acceptor propensity. Predicting PROTAC efficacy requires features that describe ternary complex stability — linker length and flexibility distributions, surface complementarity scores, and cooperativity metrics. Standard molecular fingerprints do not capture these properties; they must be engineered specifically for the design paradigm.

Consistency of feature computation between training and inference environments — the training-serving skew problem documented extensively in the data engineering literature — is equally critical here. A QSAR model trained on Morgan fingerprints computed with RDKit 2023.03 will produce subtly incorrect predictions if served features computed with RDKit 2025.09, where aromatic perception rules may have changed. For oncology programs where model predictions influence compound prioritization and synthesis decisions, this is not an academic concern — it is a source of silent design error that can waste months of medicinal chemistry effort.

The convergence of machine learning and genomics is opening new possibilities for feature engineering that transcends traditional molecular descriptors. Foundation models trained on large corpora of molecular structures can generate learned representations that capture patterns invisible to handcrafted descriptors. Graph neural networks operating directly on molecular graphs can learn task-specific features without manual feature engineering. These approaches are particularly relevant for oncology, where the structure-activity relationships are often complex, non-linear, and influenced by protein-protein interaction contexts (as in the tri-complex paradigm) that traditional descriptors were never designed to capture. The molecular engineer's challenge is not choosing between classical and learned representations, but determining which combination provides the most predictive power for the specific design question at hand — and ensuring that whatever representation is chosen is computed identically at every stage of the design cycle.

Resistance as a Design Input, Not a Surprise

The most consequential limitation of the current molecular engineering paradigm for oncology is that it treats resistance as an outcome to be observed rather than a constraint to be designed against. Drugs are optimized for potency and selectivity against the initial target. When resistance emerges — months or years later, in clinical populations — a new molecular engineering campaign begins to address it. This sequential approach cedes the evolutionary initiative to the tumor.

The Resistance Landscape

Acquired resistance to KRAS G12C inhibitors illustrates the scope of the problem. Genomic analysis of patient samples resistant to sotorasib or adagrasib has identified multiple categories of resistance mechanism. On-target mutations — including R68S, H95D/G/N/R, Y96C/D/H/N, and others — cluster in the switch-II pocket and directly interfere with inhibitor binding. H95 mutations are particularly consequential for adagrasib, whose binding is critically dependent on this residue. Off-target pathway activation includes secondary mutations in NRAS, BRAF, EGFR, and FGFR, gene fusions involving RET, ALK, and RAF1, and amplification of MET, KRAS itself, and MYC. Non-genetic resistance through epithelial-to-mesenchymal transition (EMT) and lineage plasticity enables tumor cells to bypass KRAS dependence entirely.

Recent work has demonstrated that adaptive signaling rewiring enables rapid, sequential resistance to both mutation-specific KRAS inhibitors and broader pan-RAS inhibitors. The tumor does not wait for the "right" resistance mutation to appear by chance. It activates pre-existing transcriptional programs that dampen the therapeutic response through feedback reactivation of receptor tyrosine kinases, which in turn reactivate downstream signaling pathways. This adaptive resistance operates on a timescale of days to weeks — fast enough to blunt initial response before acquired genetic resistance has time to emerge through clonal selection.

Computational Anticipation of Resistance

Every on-target resistance mutation is, at its core, a structural perturbation of the inhibitor binding site. The effect of that perturbation on inhibitor binding can, in principle, be predicted computationally before it is observed clinically. Molecular dynamics simulations can model the impact of specific mutations on pocket geometry, binding energy, and residence time. Free energy perturbation (FEP) calculations can rank the severity of predicted resistance mutations. SMARTS-based scaffold analysis can identify which structural features of the current inhibitor are most vulnerable to specific mutations and which modifications would maintain binding despite the mutation.

The accuracy limitations of these tools must be stated plainly. Current FEP methods for protein-ligand binding have root-mean-square errors of 1-2 kcal/mol against experimental data — which corresponds to roughly an order of magnitude in binding affinity. A mutation predicted to cause a 10-fold loss in binding could, within the method's error bars, cause a 100-fold loss or negligible loss. This uncertainty is real and should not be minimized. But for resistance prediction, the question is not whether FEP can predict exact binding affinities to resistant mutants — it cannot, reliably. The question is whether it can rank-order a set of mutations by severity and identify which ones are most likely to be disruptive. Rank-ordering is a less demanding task than absolute prediction, and current methods are substantially more reliable for ranking than for absolute values. Moreover, even imperfect predictions that correctly identify the top three most vulnerable binding contacts provide actionable design intelligence that is categorically better than no prediction at all.

This is not speculative. The biophysical analysis showing that adagrasib depends on H95 while sotorasib does not was published after resistance mutations at H95 were observed clinically — but the computational tools to predict this differential vulnerability existed before either drug was approved. Molecular dynamics simulations of the switch-II pocket with in silico mutations at H95 would have revealed the allele-specific binding constraints that now define the clinical resistance landscape. The information was computationally accessible. The workflow to extract it was not standard practice.

Consider the specific example. The crystal structure of adagrasib bound to KRAS G12C shows a critical hydrogen bond network involving His-95 that stabilizes the compound in the switch-II pocket. A molecular dynamics simulation introducing the H95D mutation — replacing the bulky imidazole ring of histidine with the shorter carboxylate of aspartate — would predict loss of critical hydrogen bond contacts (the shorter side chain cannot reach the same interaction points) and introduction of electrostatic repulsion between the negatively charged aspartate and electronegative groups on the inhibitor, resulting in substantially reduced binding affinity. Other H95 mutations cause resistance through different structural mechanisms: H95R introduces a larger, positively charged guanidinium group that creates steric clash with the inhibitor scaffold. The same simulation performed on sotorasib, whose binding contacts with H95 are less extensive, would predict a smaller effect. This analysis requires no experimental data beyond the crystal structures of the two drug-protein complexes and standard molecular dynamics protocols. The computational cost is measured in GPU-days, not GPU-years. Yet this analysis was not part of the standard development workflow for either drug — not because it was technically infeasible, but because resistance-aware design was not a design objective.

The deeper problem is that adaptive resistance — the rapid, non-genetic rewiring of signaling pathways in response to KRAS inhibition — operates on a timescale that structural analysis alone cannot address. When KRAS is inhibited, receptor tyrosine kinases including EGFR, FGFR, and MET are reactivated through feedback loops that restore downstream MAPK and PI3K signaling within days. This is not a structural problem that molecular dynamics can model directly. It is a systems-level problem that requires network-level modeling of signaling pathway dynamics under drug perturbation — a different class of computation, but one that can inform molecular design by identifying which combination partners would most effectively block the adaptive escape routes.

The argument is this: the molecular engineering pipeline for oncology drugs should produce not just a lead compound optimized against the initial target, but a resistance-contingent design portfolio. For each clinical candidate, the pipeline should:

  1. Enumerate probable on-target resistance mutations based on the binding site contacts of the candidate, weighted by the evolutionary accessibility of each mutation (single nucleotide changes are more likely than those requiring two).

  2. Computationally evaluate the impact of each predicted mutation on candidate binding, using molecular dynamics and free energy calculations.

  3. Design backup compounds that maintain activity against the most probable resistance mutations, using structure-based design informed by the predicted mutant pocket geometries.

  4. Score the backup compounds for synthetic accessibility and generate retrosynthetic routes, so that synthesis can begin rapidly when clinical resistance signals emerge.

  5. Pre-position the backup portfolio so that the time from resistance detection to backup compound availability is measured in months, not years.

This is a fundamentally different engineering philosophy from the current sequential approach. It treats the tumor's evolutionary response as a design constraint — as known and plannable as the target's binding site geometry or the compound's metabolic liabilities. The computational tools to implement each step exist, though with the accuracy limitations discussed above — FEP rank-ordering is imperfect, fitness estimation is approximate, and synthesis route prediction can fail for novel scaffolds. What is missing is not perfection of individual tools, but the integration of these imperfect-but-useful tools into a resistance-aware design workflow, and the organizational commitment to produce portfolio-based molecular engineering rather than single-compound optimization.

The Combination Design Problem

The clinical trial landscape reflects an emerging consensus that combinations will be necessary to address resistance. KRAS G12C inhibitors are being tested in combination with PARP inhibitors (adagrasib + olaparib, NCT06130254), CDK4/6 inhibitors (almonertinib + palbociclib, NCT06947811), checkpoint inhibitors (multiple trials), and pathway inhibitors targeting the escape routes that resistance exploits.

Combination design introduces an additional molecular engineering constraint: the off-target profiles of the combined agents must be complementary, not overlapping. Two drugs that both cause hepatotoxicity cannot be safely combined regardless of their on-target synergy. Two drugs that compete for the same metabolic enzymes will have unpredictable pharmacokinetics at combination doses. Computational prediction of these interactions from molecular features is partially mature: CYP-mediated metabolic interactions can be predicted with reasonable accuracy from structural features, and known toxicophore databases enable automated screening for overlapping organ toxicity liabilities. But pharmacodynamic synergy prediction — whether two agents will produce greater-than-additive tumor killing — remains unreliable from molecular features alone, and idiosyncratic toxicity (immune-mediated, off-target organ damage) is poorly predicted by any current computational method. What computation can do today is narrow the combinatorial search space by eliminating combinations with obvious metabolic or toxicity conflicts, leaving a tractable number of candidates for empirical testing. Without even this level of computational triage, the search space of possible combinations (dozens of KRAS inhibitors crossed with dozens of potential combination partners) vastly exceeds what can be explored through clinical trials alone.

Toward Adaptive Molecular Engineering

The current paradigm of oncology drug design is fundamentally sequential. A target is identified. A molecular engineering campaign produces a clinical candidate. The candidate is tested. Resistance emerges. A new campaign begins. Each cycle takes years. Tumor evolution, unconstrained by our development timelines, proceeds in real time.

The vision of adaptive molecular engineering inverts this sequence. Instead of reacting to resistance after it emerges, the molecular engineering pipeline anticipates it from the outset and produces a portfolio of compounds that cover the predicted resistance landscape before the first compound enters patients.

This is not science fiction. The computational components exist: molecular dynamics for resistance mutation modeling, SMARTS-based analysis for scaffold vulnerability assessment, SA scoring and retrosynthetic analysis for synthesis planning, QSAR models for activity prediction against mutant panels. The clinical infrastructure exists: platform trials like ComboMATCH, DETERMINE, and TAPISTRY are designed to add and drop treatment arms based on emerging molecular data, creating a feedback loop between clinical observation and therapeutic intervention. The synthetic chemistry infrastructure exists: modern parallel synthesis and flow chemistry can produce analogs on timescales of days to weeks, not months.

What does not yet exist is the integration layer — the engineering system that connects computational resistance prediction to molecular design to synthesis planning to clinical deployment in a continuous, automated workflow. Building this integration layer is not primarily a computational challenge. It is an organizational, regulatory, and economic challenge that must be confronted honestly.

Every backup compound in a resistance-contingent portfolio is a new chemical entity. Before it can enter a patient — even through a platform trial — it requires IND-enabling studies: GLP toxicology, pharmacokinetics, formulation development, and manufacturing characterization. This takes 12-18 months and costs $2-5 million per compound at minimum. A portfolio of five backup compounds means $10-25 million in preclinical development costs for molecules that may never be needed. No pharmaceutical company currently funds drug development this way, and the economic logic of doing so is far from self-evident.

The honest framing is this: the full adaptive vision — pre-positioned backup compounds with completed IND-enabling packages ready to enter clinical testing within months of a resistance signal — requires either a fundamentally different funding model (government-backed resistance preparedness programs, BARDA-style advance development contracts for oncology) or a fundamentally different regulatory model (platform INDs that enable faster bridging of structurally related backup compounds, or regulatory sandbox frameworks for resistance-contingent portfolios). Neither exists today. But the computational and molecular design components of the vision can be built now, and they retain value even in a more conservative implementation: having backup compounds designed, synthesized, and pharmacologically characterized — even without completed IND-enabling packages — compresses the response timeline by months to years when resistance does emerge and the decision to advance a backup is made.

The organizational challenge is no less real: aligning the incentives, data flows, and decision processes of computational chemistry, medicinal chemistry, process chemistry, and clinical operations around a shared model of the tumor's evolutionary trajectory.

The platform trials already function as real-time signal generators. When a treatment arm in DETERMINE or TAPISTRY shows declining response rates in a molecular subgroup, that signal contains information about resistance that could, in principle, feed directly back into the molecular design pipeline. Genomic analysis of resistant tumors identifies the specific mutations or pathway alterations driving escape. Computational modeling predicts which pre-designed backup compounds maintain activity against those alterations. SA scoring and retrosynthetic routes — already computed for the backup portfolio — enable rapid synthesis. The backup compound, if IND-enabling studies have been completed or can be bridged from closely related compounds, enters the next treatment arm of the platform trial. Even where full IND readiness is not pre-positioned, the cycle from resistance signal to clinical testing compresses from the current baseline of years to a realistic target of months.

This is not a minor efficiency improvement. It is a qualitative change in the relationship between drug design and tumor evolution. The current paradigm allows tumors to evolve faster than we can design against them. The adaptive paradigm aims to design faster than tumors can evolve — not by speeding up any single step, but by eliminating the sequential dependencies that make the current workflow slow.

For the molecular engineering community, this vision raises specific technical challenges that warrant focused attention:

  • Resistance mutation enumeration and prioritization: Which mutations are evolutionarily accessible from the current state? Which are most likely to confer selective advantage under drug pressure? Combining structural analysis (which mutations disrupt binding) with evolutionary modeling (which mutations require one vs. two nucleotide changes) and fitness estimation (which mutations preserve enough KRAS function to sustain oncogenic signaling) is a multi-scale problem that no single tool currently addresses end-to-end.

  • Multi-target scaffold design: Can a single molecular scaffold be designed with substituent positions that can be varied to maintain activity across a panel of predicted resistance mutations, analogous to how broadly neutralizing antibodies are engineered to cover viral escape variants? SMARTS-based scaffold analysis combined with enumeration of substituent libraries and predictive modeling against mutant panels could, in principle, identify such scaffolds — but the search space is vast and the fitness landscape is rugged.

  • Real-time synthesis-aware design: Integrating SA scoring and retrosynthetic analysis directly into the resistance-contingent design loop so that every backup compound in the portfolio has a validated synthesis route and an estimated time-to-supply from the moment it is designed, not as a post hoc assessment.

  • Feedback integration: Establishing data pipelines from clinical resistance genomics (through platform trial biobanking and real-time sequencing) back into computational design tools, closing the loop between clinical observation and molecular engineering response.

  • Evolutionary fitness landscape mapping: The most ambitious — and most speculative — version of this vision borrows from evolutionary biology: constructing a fitness landscape that maps the relationship between KRAS genotype, drug exposure, and tumor proliferative capacity. If the fitness cost of each resistance mutation can be estimated — how much oncogenic signaling capacity does KRAS H95R retain compared to KRAS G12C? — then the evolutionary trajectory of the tumor under drug pressure becomes partially predictable. Mutations that confer resistance but carry high fitness costs are less likely to dominate than mutations that preserve both resistance and oncogenic function. A resistance-contingent drug portfolio that targets the high-fitness resistance mutations first could, in principle, force the tumor into lower-fitness evolutionary paths, extending the time before clinically significant resistance emerges. This is the most difficult challenge on this list, and the caveats are substantial: fitness effects are context-dependent (varying by tissue type, co-mutations, and tumor microenvironment), poorly measured in clinical settings, and likely non-additive across mutations. Current methods for estimating mutant KRAS fitness rely on cell-line growth assays that may not reflect in vivo selective pressures. Nevertheless, even approximate fitness estimates — sufficient to distinguish high-fitness from low-fitness resistance mutations — would provide useful design guidance. The framework requires integrating structural biology, population genetics, and synthetic chemistry in a way that does not currently exist as a unified system, but the component disciplines are mature enough that the integration is tractable for focused efforts.

These are hard problems. They are also, in the author's assessment, the problems most worth solving in computational oncology today. The era of single-compound, single-target, sequential drug design will not end because it has failed — it has, by the evidence of the KRAS revolution, succeeded spectacularly. It will end because it is too slow. Tumors evolve. Our molecules must evolve faster.

Conclusion

The KRAS story is, at every level, a story about molecular engineering. The biological insight — that a cryptic pocket exists on a protein everyone had given up on — was necessary. But it was not sufficient. Converting that insight into sotorasib required solving a specific covalent design problem. Converting it differently into adagrasib required solving the same problem with a different balance of design variables. Extending it to KRAS G12D required abandoning covalent strategies entirely and engineering non-covalent selectivity from scratch. Extending it further with zoldonrasib required inventing a new paradigm of ternary complex formation that has no precedent in small molecule drug design.

Each of these achievements depended on computational tools that are now standard in the field: structure-based design, molecular dynamics, docking and scoring, synthetic accessibility assessment, and QSAR modeling. None of them depended on a single tool alone. All of them required the integration of multiple computational capabilities into a coherent design workflow tailored to the specific engineering challenge at hand. And none of them, on its own, was sufficient to guarantee clinical success — as sotorasib's subsequent clinical limitations demonstrated. Molecular engineering is necessary. It is not sufficient. The molecule must be designed well, and the design must account for the biological reality of resistance, heterogeneity, and adaptation that determines whether molecular potency translates into patient survival.

The precision oncology infrastructure — the sequencing, the molecular tumor boards, the platform trials — is built. It is waiting for molecules. The rate at which it can deliver therapeutic benefit to patients is gated by the rate at which molecular engineering can produce clinical candidates for the targets that profiling identifies. For the majority of actionable oncology targets, that rate is not fast enough, and resistance narrows the window further with each month of treatment.

The molecular engineering community has the tools to change this. Structure standardization, substructure analysis, synthetic accessibility scoring, retrosynthetic planning, and predictive feature computation are mature capabilities. What remains is to deploy them not as isolated utilities applied ad hoc at various stages of drug design, but as an integrated, resistance-aware engineering system that produces compound portfolios rather than single candidates, and that operates at the tempo of tumor evolution rather than the tempo of traditional pharmaceutical development.

The KRAS revolution proves that the "undruggable" is a temporary designation. It lasts exactly as long as the molecular engineering takes. The question for the field is not whether computational tools can accelerate that engineering — they demonstrably can. The question is whether we will integrate them into a design philosophy that matches the adaptive intelligence of the disease we are trying to treat.

Forty years from target to first drug. Two years from first drug to three distinct intervention paradigms. The acceleration is real. It is not fast enough.

References

  1. Ostrem, J.M. et al. K-Ras(G12C) inhibitors allosterically control GTP affinity and effector interactions. *Nature* 503, 548-551 (2013). doi:10.1038/nature12796

  2. Canon, J. et al. The clinical KRAS(G12C) inhibitor AMG 510 drives anti-tumour immunity. *Nature* 575, 217-223 (2019). doi:10.1038/s41586-019-1694-1

  3. Fell, J.B. et al. Identification of the Clinical Development Candidate MRTX849, a Covalent KRASG12C Inhibitor for the Treatment of Cancer. *J. Med. Chem.* 63, 6679-6693 (2020). doi:10.1021/acs.jmedchem.9b02052

  4. Hallin, J. et al. The KRASG12C Inhibitor MRTX849 Provides Insight toward Therapeutic Susceptibility of KRAS-Mutant Cancers in Mouse Models and Patients. *Cancer Discov.* 10, 54-71 (2020). doi:10.1158/2159-8290.CD-19-1167

  5. Tanaka, N. et al. Mechanisms of Resistance to KRAS Inhibitors: Cancer Cells' Strategic Use of Normal Cellular Mechanisms to Adapt. *Cancer Sci.* (2025). doi:10.1111/cas.16441

  6. Awad, M.M. et al. Acquired Resistance to KRASG12C Inhibition in Cancer. *N. Engl. J. Med.* 384, 2382-2393 (2021). doi:10.1056/NEJMoa2105281

  7. Shi, Z. et al. Targeting KRAS G12D: Advances in Inhibitor Design. *Thorac. Cancer* (2025). doi:10.1111/1759-7714.70203

  8. Seton-Rogers, S. Convergence of machine learning and genomics for precision oncology. *Nat. Rev. Cancer* (2025). doi:10.1038/s41568-025-00897-6

  9. Discovery of RMC-9805, an oral, covalent tri-complex KRASG12D(ON) inhibitor. Presented at AACR 2024. *Cancer Res.* 84(7_Supplement), ND03.

  10. Preliminary safety, antitumor activity, and ctDNA changes with RMC-9805 in KRAS G12D PDAC. *J. Clin. Oncol.* 43(4_suppl), 724 (2025). doi:10.1200/JCO.2025.43.4_suppl.724

  11. Beyond KRAS(G12C): Biochemical and Computational Characterization of Sotorasib and Adagrasib Binding Specificity and the Critical Role of H95 and Y96. *ACS Chem. Biol.* (2024). doi:10.1021/acschembio.4c00315

  12. The structure of KRASG12C bound to divarasib highlights features of potent switch-II pocket engagement. *Small GTPases* (2025). doi:10.1080/21541248.2025.2505441

  13. Identification of a Highly Cooperative PROTAC Degrader Targeting GTP-Loaded KRAS(On) Alleles. *J. Am. Chem. Soc.* (2025). doi:10.1021/jacs.5c10354

  14. Targeting cancer with small-molecule pan-KRAS degraders. *Science* (2024). doi:10.1126/science.adm8684

  15. Genomic landscape of clinically acquired resistance alterations in patients treated with KRASG12C inhibitors. *Ann. Oncol.* (2025). doi:10.1016/j.annonc.2025.01.009

  16. Adaptive Signaling Rewiring Enables Rapid, Sequential Resistance to KRAS and Pan-RAS Inhibitors. Abstract B028, presented at AACR Annual Meeting 2026. *Cancer Res.* 86(5_Supplement_1), B028.

  17. KRAS inhibitors: resistance drivers and combinatorial strategies. *Trends Cancer* (2025). doi:10.1016/S2405-8033(24)00275-9

  18. Pan-KRAS inhibition: unlocking broad-spectrum targeted therapy for KRAS-mutant cancers. *Cancer Biol. Med.* (2026). doi:10.20892/j.issn.2095-3941.2025.0612

  19. Machine learning in targeted protein degradation drug design: a technical review of PROTACs and molecular glues. *Drug Discov. Today* (2025). doi:10.1016/j.drudis.2025.104344

  20. Integrating synthetic accessibility with AI-based generative drug design. *J. Cheminform.* 15, 83 (2023). doi:10.1186/s13321-023-00742-8

  21. Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis to Evaluate AI-Generated Drug Molecules Synthesizability. *BioMedInformatics* 4(2), 26 (2024). doi:10.3390/biomedinformatics4020026

  22. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis. *J. Chem. Inf. Model.* (2024). doi:10.1021/acs.jcim.3c01134

  23. Advancing Covalent Ligand and Drug Discovery beyond Cysteine. *Chem. Rev.* (2025). doi:10.1021/acs.chemrev.5c00001

  24. Recent advances in the design of small molecular drugs with acrylamides covalent warheads. *Med. Chem. Res.* (2024). doi:10.1007/s00044-024-03313-0

  25. Targeting the untargetable: accelerated discovery of KRAS G12D inhibitors through a deep learning-enhanced in silico pipeline. *Comput. Struct. Biotechnol. J.* (2025). doi:10.1016/j.csbj.2025.03.043

  26. de Langen, A.J. et al. Sotorasib versus docetaxel for previously treated non-small-cell lung cancer with KRASG12C mutation: a randomised, open-label, phase 3 trial (CodeBreaK 200). *Lancet* 401, 733-746 (2023). doi:10.1016/S0140-6736(23)00221-0

This article was written for Teapot Commons. For prior articles in the pharmaceutical ML and data engineering series, see: Common Failure Modes in Pathogen Genomics Machine Learning Pipelines, Developing Customizable Machine Learning Pipelines, Ten Essential Practices for Building Sustainable ML Systems in Pharma, and Building the Data Foundation.

Previous
Previous

Uncertainty Quantification in Molecular Property Prediction: From Research Metric to Deployment Requirement

Next
Next

Building the Data Foundation: Data Engineering Patterns for Molecular and Genomic ML in Pharma