Arash Nicoomanesh

Arash Nicoomanesh

Agentic AI Architect

Deterministic Core · Stochastic Range · Multi-Agent Scale

AI Engineering Services & Consulting

Don't Build Just "Expensive Chatbots" Real "Agentic Systems" Reason and Learn within "Adaptive Architecture"

Execution over Conversation

Build AI systems that read reports, flag risks, generate artifacts, and update systems autonomously while learning from outcomes. Intelligence that cannot act or improve from action is not a system; it's a demo.

Planning over Reaction

Most so-called "AI agents" respond to prompts but fail to plan, execute multi-step workflows, persist state, or refine behavior over time. Stateless LLM wrappers are costly — and fragile.

Control over Cleverness

Apply robust agentic architectures that separate stochastic reasoning from deterministic execution, enabling learning within explicit constraints. Governance, validation, and recovery come first.

Agentic AI Architectures Mindmap

Case Studies

Medical Triage Agent

Hybrid conversational agent using Gemini Pro, Med-PaLM 2

Engineered to mimic a clinician's stepwise reasoning process, combining multiple AI technologies for comprehensive patient assessment and support

Medical Triage Agent preview

Conversational Drug Repurposing

Applying Google LLMs for alternative therapy recommendations from RWD EHR

The generator uses advanced AI techniques to analyze biomedical data and generate novel hypotheses for drug repurposing

Drug Repurposing preview

ICU Metrics Forecasting

Multi-variate time-series models predicting readmission, mortality and LOS using clinical variables

Multi-Turn Product Recommendation QA Bot

Fine-tuned open source LLMs with hybrid retrieval with efficient inference and scalable deployment

Banking Resource Management System

End-to-end customer LTV and churn prediction as well as transactional fraud detection

Covid-19 Diagnosis with Audio Biomarkers

COVID-19 diagnosis through acoustic analysis of breathing, cough, and speech signals

Knowledge Base

Fine-Tuning DeepSeek R1 on Medical Chain-of-Thought

Latest technical walk-through on enhancing medical-reasoning LLMs with CoT fine-tuning

Gemma 3n Edge AI for Support Bots

Low-memory, high-speed training on customer-support data

LLM Output Config & Guardrails

Master reasoning prompts and guardrails for reliable outputs

Few-Shot & Zero-Shot Learning Deep-Dive

Push LLMs beyond narrow fine-tuning.

In the age of large language models (LLMs), the ability to perform complex tasks with minimal data is revolutionizing how we approach artificial intelligence. Few-shot and zero-shot learning are two pivotal techniques that push the boundaries of machine learning, enabling models to generalize across domains and perform tasks they were not explicitly trained on. This article delves into these learning paradigms, explaining their origins, mechanisms, and real-world applications

Model Drift: A Survival Guide

Monitor & remediate production ML models

Fine-Tune Gemma-3 12B with Unsloth

End-to-end Unsloth & TRL workflow for customer service

In this article, I have dived into the technical intricacies of Unsloth and Gemma 3, showcasing their powerful features and how they can be leveraged together to build a highly optimized, fine-tuned model for any type of customer support assistant, whether it be a sophisticated chatbot, an intelligent agent, or an interactive FAQ system. I provided a step-by-step guide through the fine-tuning process, from data preparation to model deployment, highlighting best practices and practical considerations for achieving optimal performance in real-world customer service scenarios

Is RAG Dead in 2025?

Rethinking documentation Q&A with large-context LLMs.

Talk: Agentic RAG in Healthcare

15-min deck on self-evolving retrieval agents.

Anthropic Claude API Azure AI Foundry Azure ML BentoML BitsAndBytes Celery Chainlit Cloudflare Workers AI CrewAI Databricks Mosaic ML Docker ElasticSearch FAISS FastAPI GCP Vertex AI GitHub Actions GoLang GraalVM Gradio Haystack (Deepset) Hugging Face Transformers IBM Granite 3.0 Jupyter / JupyterHub Kafka Kedro Kubernetes LangChain LangGraph Milvus MLflow Modal MongoDB Atlas Vector Search Nebula Graph Neo4j Nginx Nvidia Merlin Nvidia Triton Inference Server Okta Ollama OpenAI Swarm OpenTelemetry PGVector PostHog Prefect Prometheus Pulumi Pydantic Python PyTorch Quarkus Ray Serve Redis Replicate Rust S3 (MinIO, AWS) Semantic Kerne Snowflake Arctic SQLAlchemy Streamlit Supabase Temporal TensorRT Terraform TGI (Text Generation Inference) Torch Serve TypeScript Vercel AI SDK Vespa

Get in touch