Portrait of Arash Nicoomanesh

Arash Nicoomanesh

Agentic AI Architect

Deterministic Core · Stochastic Range · Multi-Agent Scale

AI Engineering Services & Consulting

Don't Build Just "Expensive Chatbots" — Real "Agentic Systems" Reason and Learn within "Adaptive Architecture"

  • The era of "chat" is over. Businesses need systems that read reports, flag risks, draft notices, and update ledgers — while they sleep. Intelligence without execution is just entertainment.
  • Most "AI Agents" fail the Planning Rubicon — they react but never truly plan, execute, or persist state across time. Stateless LLM wrappers are expensive chatbots, not autonomous systems.
  • True agentic architectures separate the "brain" (stochastic reasoning) from the "skeleton" (deterministic execution). Governance first, intelligence second — because intelligence without control is liability.
Agentic AI Architectures Mindmap

Selected Case Studies

Medical Triage Agent

Hybrid conversational agent using Gemini Pro, Med-PaLM 2

Engineered to mimic a clinician's stepwise reasoning process, combining multiple AI technologies for comprehensive patient assessment and support

Medical Triage Agent preview

Conversational Drug Repurposing

Applying Google LLMs for alternative therapy recommendations from RWD EHR

The generator uses advanced AI techniques to analyze biomedical data and generate novel hypotheses for drug repurposing

Drug Repurposing preview

ICU Metrics Forecasting

Multi-variate time-series models predicting readmission, mortality and LOS using clinical variables

Multi-Turn Product Recommendation QA Bot

Fine-tuned open source LLMs with hybrid retrieval with efficient inference and scalable deployment

Banking Resource Management System

End-to-end customer LTV and churn prediction as well as transactional fraud detection

Covid-19 Diagnosis with Audio Biomarkers

COVID-19 diagnosis through acoustic analysis of breathing, cough, and speech signals

Featured Articles

Fine-Tuning DeepSeek R1 on Medical Chain-of-Thought

Latest technical walk-through on enhancing medical-reasoning LLMs with CoT fine-tuning

Gemma 3n Edge AI for Support Bots

Low-memory, high-speed training on customer-support data

LLM Output Config & Guardrails

Master reasoning prompts and guardrails for reliable outputs

Few-Shot & Zero-Shot Learning Deep-Dive

Push LLMs beyond narrow fine-tuning.

In the age of large language models (LLMs), the ability to perform complex tasks with minimal data is revolutionizing how we approach artificial intelligence. Few-shot and zero-shot learning are two pivotal techniques that push the boundaries of machine learning, enabling models to generalize across domains and perform tasks they were not explicitly trained on. This article delves into these learning paradigms, explaining their origins, mechanisms, and real-world applications

Model Drift: A Survival Guide

Monitor & remediate production ML models

Fine-Tune Gemma-3 12B with Unsloth

End-to-end Unsloth & TRL workflow for customer service

In this article, I have dived into the technical intricacies of Unsloth and Gemma 3, showcasing their powerful features and how they can be leveraged together to build a highly optimized, fine-tuned model for any type of customer support assistant, whether it be a sophisticated chatbot, an intelligent agent, or an interactive FAQ system. I provided a step-by-step guide through the fine-tuning process, from data preparation to model deployment, highlighting best practices and practical considerations for achieving optimal performance in real-world customer service scenarios

Is RAG Dead in 2025?

Rethinking documentation Q&A with large-context LLMs.

Talk: Agentic RAG in Healthcare

15-min deck on self-evolving retrieval agents.

Anthropic Claude API Azure AI Foundry Azure ML BentoML BitsAndBytes Celery Chainlit Cloudflare Workers AI CrewAI Databricks Mosaic ML Docker ElasticSearch FAISS FastAPI GCP Vertex AI GitHub Actions GoLang GraalVM Gradio Haystack (Deepset) Hugging Face Transformers IBM Granite 3.0 Jupyter / JupyterHub Kafka Kedro Kubernetes LangChain LangGraph Milvus MLflow Modal MongoDB Atlas Vector Search Nebula Graph Neo4j Nginx Nvidia Merlin Nvidia Triton Inference Server Okta Ollama OpenAI Swarm OpenTelemetry PGVector PostHog Prefect Prometheus Pulumi Pydantic Python PyTorch Quarkus Ray Serve Redis Replicate Rust S3 (MinIO, AWS) Semantic Kerne Snowflake Arctic SQLAlchemy Streamlit Supabase Temporal TensorRT Terraform TGI (Text Generation Inference) Torch Serve TypeScript Vercel AI SDK Vespa

Get in touch