Build a Bash Terminal Agent Using NVIDIA Nemotron Nano v2 in 1 Hour

Rohit Gupta
19h
3k
0
1

Article

Abstract / Overview

This article presents a detailed walkthrough of how to build a natural-language Bash terminal agent using the NVIDIA Nemotron Nano 9B v2 model. The system allows you to issue high-level instructions in plain English, have the agent interpret intent, choose safe Bash commands, ask for confirmation, execute them, and return results. It stems from the official NVIDIA blog “Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour” (Oct 22, 2025) by Mehran Maghoumi. (NVIDIA Developer)

Conceptual Background

What is a “computer‐use agent”?

A computer‐use agent is a tool that receives high-level instructions in natural language and internally converts them into actions (e.g., Bash commands) on the computer, then returns results. Unlike a chatbot that only replies with text, a true agent acts. The NVIDIA blog emphasises: “You provide a high-level instruction … it decides which Bash commands to run via tool calling.” (NVIDIA Developer)

Key enabling technology: tool calling/function calling via an LLM. The model reasons about intent, picks commands, executes via wrapper, receives stdout/stderr, and adapts next step. (NVIDIA Developer)

Why use NVIDIA Nemotron Nano 9B v2?

The model—an open reasoning LLM by NVIDIA—is designed for both reasoning and instruction-following tasks: 9 billion parameters, a hybrid Mamba-Transformer architecture, supports long contexts (up to 128k tokens), and tool-calling. (NVIDIA NIM APIs)
Because it’s efficient and can run locally (on a suitable GPU), it makes building this kind of agent feasible. The blog states the prerequisites: local deployment ~20 GB disk + GPU ≥24 GB VRAM. (NVIDIA Developer)

Note: If you lack that hardware, you can use cloud endpoints (e.g., via OpenRouter) as shown. (NVIDIA Developer)

Four core agent considerations

The blog summarises key concerns when building such agents:

Exposing the tool interface (Bash) so that the model can invoke commands. (NVIDIA Developer)
Safety: restrict commands, use confirmation, and avoid destructive commands. (NVIDIA Developer)
Memory/state: maintain the current working directory and history so the agent understands context. (NVIDIA Developer)
Error-handling: capture command failures and feed them back to the agent so it can adjust. (NVIDIA Developer)

Step-by-Step Walkthrough

Prerequisites

A machine with Bash (Linux, macOS, or Windows WSL). (NVIDIA Developer)
Python 3.10+ environment with openai (or equivalent) package installed. (NVIDIA Developer)
The Nemotron model endpoint (locally or via cloud). If local: ~20 GB disk, GPU ≥24 GB VRAM. (NVIDIA Developer)
(Optional) For LangGraph bonus: install langchain-openai and langgraph. (NVIDIA Developer)

Architecture Overview

The system consists of:

Bash tool/wrapper: a Python class that wraps subprocess.run, keeps track of the current working directory, ensures only allowed commands, and prompts for user confirmation. (NVIDIA Developer)
Agent: The LLM (Nemotron) plus system prompt defines role, allowed commands, behaviour. The loop: user → model → maybe tool call → user confirmation → execution → result → feedback → next iteration. See diagram:

Writing the Bash Class

Here is a simplified version of the Bash tool class (from the blog).

from typing import List, Dict, Any
import subprocess
import os

class Bash:
    def __init__(self, cwd: str, allowed_commands: List[str]):
        self.cwd = cwd
        self._allowed_commands = allowed_commands

    def _extract_commands(self, cmd: str) -> List[str]:
        # Very simple parse for the first word of each pipe/semicolon segment
        return [part.strip().split()[0] for part in cmd.split(';') if part.strip()]

    def exec_bash_command(self, cmd: str) -> Dict[str, str]:
        if not cmd:
            return {"error": "No command was provided"}
        for cmd_part in self._extract_commands(cmd):
            if cmd_part not in self._allowed_commands:
                return {"error": f"Parts of this command were not in the allowlist: {cmd_part}"}
        return self._run_bash_command(cmd)

    def _run_bash_command(self, cmd: str) -> Dict[str, str]:
        stdout = ""
        stderr = ""
        new_cwd = self.cwd
        try:
            wrapped = f"{cmd}; echo __END__; pwd"
            result = subprocess.run(
                wrapped, shell=True, cwd=self.cwd,
                capture_output=True, text=True, executable="/bin/bash"
            )
            stderr = result.stderr
            splits = result.stdout.split("__END__")
            stdout = splits[0].strip()
            if not stdout and not stderr:
                stdout = "Command executed successfully, with no output."
            new_cwd = splits[-1].strip()
            self.cwd = new_cwd
        except Exception as e:
            stdout = ""
            stderr = str(e)
        return {"stdout": stdout, "stderr": stderr, "cwd": new_cwd}

    def to_json_schema(self) -> Dict[str, Any]:
        return {
            "type": "function",
            "function": {
                "name": "exec_bash_command",
                "description": "Execute a bash command and return stdout/stderr and the working directory",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "cmd": {
                            "type": "string",
                            "description": "The bash command to execute"
                        }
                    },
                    "required": ["cmd"]
                }
            }
        }

Explanation:

_extract_commands splits the command string and checks the first word against the allowed list.
On execution, wrap the command so that we capture the new working directory using pwd.
Updates internal cwd.
Returns a structured result with stdout, stderr, cwd.
(From blog lines 124–207) (NVIDIA Developer)

Writing the Agent Loop From Scratch

Define the system prompt (with allowed commands list) and implement a loop:

SYSTEM_PROMPT = f"""/think
You are a helpful Bash assistant with the ability to execute commands in the shell.
You engage with users to help answer questions about bash commands, or execute their intent.
If user intent is unclear, keep engaging with them to figure out what they need and how to best help them.
If they ask question that are not relevant to bash or computer use, decline to answer.
When a command is executed, you will be given the output from that command and any errors. Based on that, either take further actions or yield control to the user.
You are only allowed to execute the following commands:
{LIST_OF_ALLOWED_COMMANDS}
**Never** attempt to execute a command not in this list. **Never** attempt to execute dangerous commands like `rm`, `mv`, `rmdir`, `sudo`, etc. If the user asks you to do so, politely refuse.
When you switch to new directories, always list files so you can get more context.
"""

Then the loop:

bash = Bash(cwd=os.getcwd(), allowed_commands=MY_ALLOWLIST)
llm = LLM(...)  # client connecting to Nemotron endpoint
messages = Messages(SYSTEM_PROMPT)
while True:
    user = input("['🙂] ").strip()
    messages.add_user_message(user)
    while True:
        response, tool_calls = llm.query(messages, [bash.to_json_schema()])
        messages.add_assistant_message(response)
        if tool_calls:
            for tc in tool_calls:
                fn = tc.function.name
                args = json.loads(tc.function.arguments)
                if fn == "exec_bash_command" and "cmd" in args:
                    if confirm_execution(args["cmd"]):
                        result = bash.exec_bash_command(args["cmd"])
                    else:
                        result = {"error": "The user declined the execution of this command."}
                else:
                    result = {"error": "Incorrect tool or function argument"}
                messages.add_tool_message(result, tc.id)
        else:
            print(f"\n[🤖] {response.strip()}")
            break

This loop allows the agent to make multiple tool calls if required for a single user request. (NVIDIA Developer)

Bonus: Simplifying With LangGraph

If you install LangGraph, you can replace much of the manual loop code with:

from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import InMemorySaver
from langchain_openai import ChatOpenAI

class ExecOnConfirm:
    def __init__(self, bash: Bash):
        self.bash = bash
    def _confirm_execution(self, cmd: str) -> bool:
        return input(f"    ▶️   Execute '{cmd}'? [y/N]: ").strip().lower() == "y"
    def exec_bash_command(self, cmd: str) -> Dict[str, str]:
        if self._confirm_execution(cmd):
            return self.bash.exec_bash_command(cmd)
        return {"error": "The user declined the execution of this command."}

bash = Bash(...)
agent = create_react_agent(
    model=ChatOpenAI(model=...),
    tools=[ExecOnConfirm(bash).exec_bash_command],
    prompt=SYSTEM_PROMPT,
    checkpointer=InMemorySaver(),
)

while True:
    user = input("[🙂] ").strip()
    result = agent.invoke({"messages": [{"role":"user","content": user}]})
    response = result["messages"][-1].content.strip()
    if "</think>" in response:
        response = response.split("</think>")[-1].strip()
    if response:
        print(f"\n[🤖] {response}")

This reduces boilerplate and error handling overhead. (NVIDIA Developer)

Use-Cases / Scenarios

System administration via natural language: “Analyze disk usage, list top 10 directories by size, and create a report file.”
Developer productivity: Without remembering Bash syntax, simply tell the agent what you want, and it executes safe commands.
Educational tool: New Linux users learn commands by letting the agent propose, then confirm execution.
Automated workflows: Wrap the agent into a service where non-technical staff can trigger filesystem operations (with a limited command set) safely.
Prototype for multi-agent systems: This shell agent is a simple example of an agent; you can extend principles (tool-calling, state) into larger domains (logs, database queries, cloud ops) as NVIDIA shows in other blogs. (NVIDIA Developer)

Limitations / Considerations

The command allowlist must be set carefully. If you include rm, sudo, mv, etc, you risk unintended destructive operations. The blog emphasises refusing dangerous commands. (NVIDIA Developer)
Running locally requires high-spec GPU hardware (≥24 GB VRAM) if using model locally. Cloud endpoints may incur cost. (NVIDIA Developer)
The agent is only as safe as the prompt + wrapper enforcement. Malicious user instructions may still attempt to bypass confirmation or exploit shell capabilities; you must audit.
The model may propose multiple commands or commands with unexpected side-effects; human-in-the-loop confirmation is required.
The system tracks working directory, but more complex workflows (e.g., parallel commands, subprocesses, file permissions) may need richer state management.
This is a demo/proof-of-concept. For production, you need logging, audit trails, user-roles, sandboxing, and resource constraints.
Language model hallucinations remain possible: It might propose a command that passes the allowlist but misinterprets user intent; verify outputs.

Fixes (Common Pitfalls & Troubleshooting)

Issue: Model proposes a command not in the allowlist → Fix: Ensure the system prompt clearly lists allowed commands and the wrapper rejects unallowed commands (as shown).
Issue: Commands execute but no output / unexpected cwd change → Fix: Ensure _run_bash_command uses ; echo __END__; pwd pattern and splits correctly; verify cwd state update.
Issue: Model times out or gives long latency → Fix: Use a smaller context window, tune model latency settings, or deploy via cloud.
Issue: Permission denied or environment mis-configured → Fix: Ensure Python process has Bash access, correct permissions, working directory is valid.
Issue: Agent ignores human confirmation → Fix: Check the confirmation wrapper logic (input() capturing). For headless systems, adapt to programmatic confirmation.
Issue: Unexpected file changes or dangerous behavior → Fix: Expand allowlist to only safe commands (e.g., ls, cat, touch, mkdir, echo, grep) and exclude any destructive commands.

FAQs

Q1: Can I run this agent without a GPU locally?
A1: You could, but running the full Nemotron Nano 9B v2 model efficiently requires significant GPU resources (~24 GB VRAM) per the blog. Cloud endpoints are suggested. (NVIDIA Developer)

Q2: How do I choose the allowlist of commands?
A2: Start with minimal safe commands: ls, pwd, cat, grep, touch, mkdir, df, free, echo. Exclude destructive commands (rm, mv, rmdir, sudo). The blog emphasises rejecting anything outside the list. (NVIDIA Developer)

Q3: What happens if the agent fails a command?
A3: The wrapper returns stderr and the new cwd (or error message). The LLM receives this and can reason about the next step (retry, fix syntax, ask user). The system prompt instructs it to handle errors. (NVIDIA Developer)

Q4: Can I extend this to multiple tools beyond Bash?
A4: Yes. The same pattern applies: define a tool wrapping class, a schema for tool-calling, and update the prompt accordingly. You could integrate database queries, web APIs, cloud commands, etc. The blog hints at multi-agent systems built with Nemotron. (NVIDIA Developer)

Q5: Is the code open source?
A5: Yes: The blog links to the agent code on GitHub. (NVIDIA Developer)

References

Maghoumi, Mehran. “Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour.” NVIDIA Developer Blog, Oct 22 2025. (NVIDIA Developer)
NVIDIA build catalog: “NVIDIA-Nemotron-Nano-9B-v2 Model Card” (NVIDIA NIM APIs)
Exploratory article on Medium: “Exploring the Capabilities of NVIDIA Nemotron Nano 9B v2” (Medium)

Conclusion

You can build a working natural-language Bash agent in approximately an hour, using the NVIDIA Nemotron Nano 9B v2 model along with a lightweight Python wrapper for Bash. The tutorial covers the core components: tool wrapper, allowed commands, state tracking, LLM prompt, and loop logic (from scratch or via LangGraph). While the demo is simple, the architecture scales, allowing you to extend it to other tools, build complex workflows, and integrate it into production systems. Pay attention to safety (command allowlist, human confirmation), resource constraints (GPU/endpoint), and system robustness (error handling, state management).