Replacing probabilistic drug discovery chatbots with deterministic Graph Neural Networks (GNNs) and Policy-as-Code toxicity firewalls.
A research team attempted to use a standard generative AI agent to discover new applications for existing drugs. Because the LLM was allowed to rely on its internal weights rather than a grounded topology, it failed silently and dangerously.
The LLM confidently hallucinated protein-protein interactions that do not exist in nature, wasting weeks of expensive laboratory validation time.
Standard vector-similarity RAG retrieved documents where a drug and disease were mentioned together, failing to recognize when the drug actually caused the disease rather than curing it.
Without hardcoded physical constraints, the AI frequently proposed drug repurposing hypotheses that violated known hepatotoxic (liver damage) structural alerts.
Symbolic logic predicts. Neural logic translates.
We separated probabilistic generation from biomedical reasoning. The language model is completely stripped of its ability to "invent" science. It is strictly relegated to translating the deterministic mathematical outputs of a Knowledge Graph.
Goal: Grounding in reality. Millions of PubMed abstracts are parsed via NLP and loaded into a Neo4j graph database. Data is strictly modeled as semantic triples (e.g., [Metformin] -> INHIBITS -> [mTOR]), ensuring relationships have explicit biomedical directionality.
The LLM does NOT generate hypotheses. Instead, a Graph Neural Network (GNN) analyzes the topology of the Neo4j graph and calculates the mathematical probability of a hidden "edge" (relationship) existing between a specific drug node and a disease node.
Before any mathematically viable hypothesis is forwarded, an Open Policy Agent (OPA) intercepts the graph path. Symbolic rules evaluate the drug's SMILES structure against known toxicity databases. If the molecule trips a cardiotoxic or hepatotoxic alert, the hypothesis is deterministically killed.
The orchestrator feeds the LLM only the verified, policy-cleared graph path. The LLM is strictly prompted to act as a scientific translator, converting the exact nodes and edges (e.g., Drug -> Target -> Pathway -> Disease) into a readable mechanistic summary for human researchers.
Scenario: Discovering secondary applications for an existing FDA-approved compound, bypassing toxic candidates.
// Layer 1: Deterministic retrieval of grounded biological topology.
MATCH path = (d:Drug {name: 'Metformin'})-[r1:TARGETS]->(p:Protein)
-[r2:PARTICIPATES_IN]->(pw:Pathway)
<-[r3:ASSOCIATED_WITH]-(dz:Disease)
WHERE NOT (d)-[:TREATS]->(dz)
RETURN p.name, pw.name, dz.name, type(r1), type(r2)
LIMIT 50;
// Layer 2: GNN proposes a hypothesis via mathematical edge probability.
{
"hypothesis_id": "HYP-992-B",
"source_node": { "type": "Drug", "name": "Metformin" },
"target_node": { "type": "Disease", "name": "Endometrial Neoplasms" },
"predicted_edge": "TREATS",
"metrics": {
"edge_probability": 0.942,
"mechanistic_path": ["AMPK_activation", "mTOR_inhibition"]
}
}
# Governance Gate (NAYAR): Symbolic rules evaluate the structural safety.
package hypothesis.governance
default allow_hypothesis = false
# RED FLAG DETECTED: Hepatotoxic substructure match
deny_hypothesis[msg] {
input.drug.smiles_alert == "hepatotoxic_substructure_A2"
msg := "FATAL: Molecule violates DILI (Drug-Induced Liver Injury) threshold."
}
# PASS: Allow only if safety gates clear and probability is high
allow_hypothesis {
input.metrics.edge_probability >= 0.85
count(deny_hypothesis) == 0
}
Hallucinated Pathways
Every hypothesis is grounded in explicit PubMed triples.Semantic Triples Parsed
Replacing vector-search with deterministic graph traversal.Toxicity-Gated Output
Failed molecular structures are blocked by OPA rules.