Complex functional materials and devices sit in a regime mainstream ML-for-science rarely touches: the systems are complicated, every data point is expensive, and there is no database or benchmark to start from. That regime is what my research targets, with an AI co-scientist stack running from hypothesis generation through device digital twins to physical validation.
Public databases and benchmarks cover a thin slice of materials science: mostly single crystals, small molecules, idealized surfaces. The systems I work on (FET sensors, membranes, electrocatalytic assemblies) are multi-component and strongly coupled across scales, and one data point can cost days of synthesis and testing. The big data is not coming. So I design AI systems that work the way experimentalists do: pull priors from the literature, build physics-constrained surrogates, and spend the experimental budget where it counts.
Text: an agentic pipeline, prompt-optimized with TextGrad, extracts structured knowledge graphs from raw publication corpora (21.8% BLEU improvement in knowledge extraction). Twin: a graph neural network that knows the device topology, trained as a digital twin of the coupled material–device system; on FET sensors it predicts sensitivity with 92.3% accuracy. Translation: the twin then screens 123.2 million PubChem compounds for out-of-distribution tasks such as PFAS-sensing probe design. Accepted at SIGKDD 2026 AI4Science; Spotlight Oral at ICLR 2026 AI4Mat.
Text
AI reads the literature and extracts structured device knowledge.
Twin
The full device context becomes a topology-aware graph.
Translation
The twin ranks new candidate materials and molecules.
The Twin is a message-passing GNN over the device topology graph: $$h_v^{(k+1)}=\phi\Big(h_v^{(k)},\ \bigoplus_{u\in\mathcal{N}(v)}\psi\big(h_v^{(k)},h_u^{(k)},e_{uv}\big)\Big)$$ where edges $e_{uv}$ encode physical couplings between material, channel, and electrode nodes.
Deep Tree of Research (DToR) is a hypothesis-generation engine, not a chat agent: a local-first RAG system with a tree-structured orchestrator that adaptively expands and prunes research branches for coverage, depth, and coherence. Benchmarked on 27 nanomaterials/device topics against 44 agent configurations, its reports achieved a ~79% mean pairwise win rate against commercial deep-research systems, while running entirely on in-house, consumer-level hardware with open-source LLMs.
RAPIDS is the atomistic validation engine of the stack. It benchmarks machine-learning interatomic potentials against DFT on 5,567 probe–target dimer interactions across 18 benchmarks; the result is that geometry, not the energy surface, drives the neutral MLIP–DFT gap. It is also packaged as a tool autonomous LLM agents can call for fast physical sanity checks. The last gate is the one I was trained for: nanomaterial synthesis, device fabrication, and electrochemical testing in the wet lab.
Not every candidate deserves full-cost validation. Drag the fidelity up and watch the field thin out.
Electrocatalysis
Accelerated discovery of OER/HER catalysts for clean hydrogen production.
Chemical Sensing
Neuromorphic approaches for FET sensor design and environmental monitoring.
Water & Environment
PFAS detection probes and AI-guided design of PFAS-selective membrane interfaces.