Narayana Sastry

Test Every Idea, Together: An Autonomous ML Agent That Multiplies Team Research Bandwidth

Submitted Jun 26, 2026

Session Description

Research teams are bottlenecked on the number of experiments they can run. A team’s research output can be roughly formalized as (# of people) x (# of experiments) x (research taste). With fixed headcount, the question becomes: how do we reduce the marginal cost of running one more experiment, without lowering the quality of judgment behind what gets tested?

ML teams generate ideas at every stage of the lifecycle: feasibility, accuracy, latency, deployment, and business-impact evaluation. The constraint is not lack of ideas; it is the cost of testing them, and the lack of shared visibility into what teammates are already trying. Project Nexus addresses this by adapting Hugging Face’s ML-Intern into a 24x7, always-on, asynchronous ML agent that can autonomously execute experiments on team infrastructure. A researcher submits an idea in natural language, and Nexus reviews relevant literature, writes training and evaluation code, runs GPU jobs, iterates on failures, benchmarks results, pushes code, and documents the experiment where the team can reuse it.

In two weeks of production use by eight engineers, Nexus ran 27 distinct experiments across about 40 sessions. Turnaround per idea dropped from roughly two weeks to hours, often overnight; cost per substantive experiment fell to $7-$22 of compute. Of 27 ideas, 20 succeeded and 7 were parked with evidence. The wins translated into 3 patent pursuits, 3 production-model improvements, 6 product-capability expansions, and 8 new capabilities.

The talk is not just a demo. We explain how we adapted Hugging Face’s open-source ML-Intern agent for an enterprise environment, then focus on the harder production lessons: what broke in long autonomous runs, the reliability and observability fixes we had to add, and the active research questions that remain.

HF ML-Intern vs. NetApp Nexus — What Changed For Enterprise Adoption

Criteria HF ML-Intern NetApp Nexus
Infrastructure HF Jobs, public APIs, HF repos On-prem Multi-Node GPU, internal LLM service
Context HF Hub + GitHub Enterprise context (Code repo, document) + Web (Arxiv, HF MCP/CLI)
Collaboration Single-user CLI Multi-user system: one URL, single queue, and team-visible history
Governance Public artifacts as-is Enterprise-approved artifacts backed by AD
Memory No persistent memory Learns from past experiments and recommends next ideas

Key Takeaways

  1. A practical blueprint for running a 24x7 autonomous ML research agent inside a regulated enterprise, on internal compute and tooling.
  2. What failed in production, what we fixed, and what still needs research before autonomous ML agents can be treated as dependable teammates.

Who This Session Is For

  • ML researchers and data scientists testing ideas across the ML lifecycle.
  • ML research managers deciding which ideas deserve team time.
  • ML infrastructure leads bringing agentic workflows onto internal compute and enterprise tooling.

Bio

Narayana Sastry is a Staff Data & Applied Scientist at NetApp on the Research and Data Science team. The team builds AI-powered governance and security products for compliance and cyber-resilience, spanning classical ML, Generative AI, and AI agents for real-world data governance and security challenges.

Sastry leads data governance initiatives, with interests in AI security, multimodal file support, and high-ROI uses of AI to accelerate development. They adapted Hugging Face’s ML-Intern agent for NetApp’s enterprise infrastructure as Project Nexus, a shared autonomous ML research service used by the team to test ideas across the ML lifecycle on internal compute.

{Add the link to draft slides - PDF/PPT - with comments access}

{Add the link to 2-min elevator pitch video}

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures