Build a Bash terminal agent with NVIDIA Nemotron in one hour

Rohit Gupta
14h
2.5k
0
1

Article

Overview

This article walks you through building a command-line agent that understands natural-language instructions and executes safe Bash commands on your computer, using the NVIDIA Nemotron Nano v2 language model. The agent architecture is simple: a Bash-wrapper tool plus an LLM reasoning engine. With around 200 lines of Python, you can create a fully functional “computer use agent” in under an hour.

You’ll understand: prerequisites, architecture, code snippets, safety and error-handling, integration with LangGraph, use cases, limitations, and troubleshooting.

Statistics to anchor:

The agent uses roughly 200 lines of code and minimal dependencies. (NVIDIA Developer)
The Nemotron Nano v2 model requires about ~20 GB of disk space and a GPU with at least 24 GB VRAM if deployed locally. (NVIDIA Developer)
It uses an “allowed commands” list and human-in-loop confirmation to maintain safety. (NVIDIA Developer)

Conceptual Background

Agentic AI vs Chatbot

Unlike a traditional chatbot, which only replies to user messages, an AI agent reasons about high-level goals, plans steps, executes actions via tools and returns results. This tutorial’s agent accepts a natural-language goal (“Make a directory, create a file with system info”) and executes Bash commands accordingly. (NVIDIA Developer)

Why Nemotron?

Nemotron Nano v2 is a compact reasoning model from NVIDIA, with strong reasoning skills, optimized for agentic use-cases. It’s efficient, responsive, and suitable for lightweight agents. (NVIDIA Developer)

Role of the Bash-wrapper tool

The Bash class wraps Python’s subprocess.run(), tracks working directory changes, enforces an allowed list of command,s and returns results (stdout, stderr, cwd) to the agent. (NVIDIA Developer)

System Architecture

Here’s a high-level diagram of the components and control flow:

In practice:

User issues a command in plain English.
The LLM interprets intent, chooses commands (must be from the allowlist).
The bash tool executes commands when the user confirms.
Results get fed back to the model; it may decide the next step or finish.
The loop repeats.

Step-by-Step Walkthrough

Prerequisites

Make sure you have:

A GPU with at least 24 GB VRAM if running locally (or use the cloud). (NVIDIA Developer)
~20 GB disk space for the model. (NVIDIA Developer)
OS with Bash (Ubuntu, macOS, or WSL on Windows). (NVIDIA Developer)
Python 3.10+ environment. (NVIDIA Developer)
Python packages: e.g., openai (or the API you use for Nemotron) and optionally langchain-openai, langgraph if using LangGraph. (NVIDIA Developer)

Build the Bash class

Here is a simplified snippet (adapted):

import subprocess
import json
from typing import List, Dict, Any

class Bash:
    """Tool that executes Bash commands with an allowlist and tracks cwd."""
    def __init__(self, cwd: str, allowed_commands: List[str]):
        self.cwd = cwd
        self._allowed_commands = allowed_commands

    def _extract_commands(self, cmd: str) -> List[str]:
        tokens = cmd.strip().split()
        return [tokens[0]] if tokens else []

    def exec_bash_command(self, cmd: str) -> Dict[str, str]:
        if not cmd:
            return {"error": "No command was provided"}
        for cmd_part in self._extract_commands(cmd):
            if cmd_part not in self._allowed_commands:
                return {"error": f"Parts of this command ('{cmd_part}') were not in the allowlist."}
        return self._run_bash_command(cmd)

    def _run_bash_command(self, cmd: str) -> Dict[str, str]:
        stdout = ""
        stderr = ""
        new_cwd = self.cwd
        try:
            wrapped = f"{cmd}; echo __END__; pwd"
            result = subprocess.run(
                wrapped, shell=True, cwd=self.cwd,
                capture_output=True, text=True,
                executable="/bin/bash"
            )
            stderr = result.stderr
            split = result.stdout.split("__END__")
            stdout = split[0].strip()
            if not stdout and not stderr:
                stdout = "Command executed successfully, without any output."
            new_cwd = split[-1].strip()
            self.cwd = new_cwd
        except Exception as e:
            stderr = str(e)
        return {"stdout": stdout, "stderr": stderr, "cwd": new_cwd}

    def to_json_schema(self) -> Dict[str, Any]:
        return {
            "type": "function",
            "function": {
                "name": "exec_bash_command",
                "description": "Execute a bash command and return stdout/stderr and the working directory",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "cmd": {
                            "type": "string",
                            "description": "The bash command to execute"
                        }
                    },
                    "required": ["cmd"]
                }
            }
        }

This snippet matches the one in the tutorial. (NVIDIA Developer)

Build the Agent

Define your system prompt for alignment:

/system think
You are a helpful Bash assistant with the ability to execute commands in the shell.
You engage with users to help answer questions about bash commands, or execute their intent.
If user intent is unclear, keep engaging with them to figure out what they need and how to best help them.
If they ask question that are not relevant to bash or computer use, decline to answer.
The bash interpreter’s output and current working directory will be given to you every time a command is executed.
Take that into account for the next conversation.
You are only allowed to execute the following commands:
{LIST_OF_ALLOWED_COMMANDS}
Never attempt to execute a command not in this list. Never attempt to execute dangerous commands like `rm`, `mv`, `rmdir`, `sudo`, etc. If the user asks you to do so, politely refuse.
When you switch to new directories, always list files so you can get more context.

(NVIDIA Developer)

Then your loop in Python might look like:

bash = Bash(cwd=os.getcwd(), allowed_commands=[...])
llm = LLM(...)  # initialize connection to Nemotron or API
messages = Messages(system_prompt)

while True:
    user = input("['🙂'] ").strip()
    messages.add_user_message(user)

    while True:
        response, tool_calls = llm.query(messages, [bash.to_json_schema()])
        messages.add_assistant_message(response)
        if tool_calls:
            for tc in tool_calls:
                fn = tc.function.name
                args = json.loads(tc.function.arguments)
                if fn == "exec_bash_command" and "cmd" in args:
                    if confirm_execution(args["cmd"]):
                        result = bash.exec_bash_command(args["cmd"])
                    else:
                        result = {"error": "The user declined the execution of this command."}
                else:
                    result = {"error": "Incorrect tool or function argument"}
                messages.add_tool_message(result, tc.id)
        else:
            print(f"\n[🤖] {response.strip()}")
            break

(NVIDIA Developer)

Bonus: Use LangGraph

If you prefer less boilerplate, you can use LangGraph to simplify the logic:

from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import InMemorySaver
from langchain_openai import ChatOpenAI

class ExecOnConfirm:
    def __init__(self, bash: Bash):
        self.bash = bash
    def _confirm_execution(self, cmd: str) -> bool:
        return input(f"    ▶️   Execute '{cmd}'? [y/N]: ").strip().lower() == "y"
    def exec_bash_command(self, cmd: str) -> Dict[str, str]:
        if self._confirm_execution(cmd):
            return self.bash.exec_bash_command(cmd)
        return {"error": "The user declined the execution of this command."}

bash = Bash(...)
agent = create_react_agent(
    model=ChatOpenAI(model=...),
    tools=[ExecOnConfirm(bash).exec_bash_command],
    prompt=SYSTEM_PROMPT,
    checkpointer=InMemorySaver(),
)

while True:
    user = input(f"[🙂] ").strip()
    result = agent.invoke({"messages": [{"role": "user", "content": user}]}, config=...)
    response = result["messages"][-1].content.strip()
    if "</think>" in response:
        response = response.split("</think>")[-1].strip()
    if response:
        print(f"\n[🤖] {response}")

(NVIDIA Developer)

Use Cases / Scenarios

A developer can ask: “Create a directory logs, run df -h then free -h, save them to logs/sysinfo.txt, and give me a summary.” The agent translates that to commands and executes them with confirmation. (NVIDIA Developer)
A sysadmin could use it to inspect states: “List all .conf files in /etc, show me the first 20 lines of each.”
Non-technical users may ask: “What’s my disk usage? Write that to a file and summarise.”
As a learning tool: experiments with allowed command lists, or modelling agent behaviour for educational purposes.

Limitations / Considerations

The allowlist restricts potential destructive commands, but any new command you add must be carefully evaluated for safety.
Running shell commands still carries risk: if you misconfigure the working directory or forget to restrict certain operations, misuse is possible.
Local model deployment requires heavy hardware (24 GB+ VRAM) unless you use the cloud API.
The reasoning capabilities may still misinterpret ambiguous instructions; always review agent suggestions.
This setup is best for demonstrations or controlled environments, not production sensitive systems without additional security checks.

Fixes (Common Pitfalls & Troubleshooting)

Model fails to understand tool-call schema → Ensure the to_json_schema() matches the API spec exactly, and the LLM tool calling is properly configured.
Agent suggests a prohibited command (e.g., rm -rf) → Confirm the allowlist includes only safe commands and enforce the prompt’s “Never attempt…” clause.
Wrong working directory tracking → The wrapped command must use ; echo __END__; pwd so that the working directory is captured correctly. If outputs are split incorrectly, adjust the marker. (NVIDIA Developer)
Tool call not executed → Check that the inner loop in the agent logic correctly distinguishes when tool_calls is present and processes them before breaking.
Model hangs or is slow → Large model or insufficient hardware. Use a cloud endpoint or a smaller model.
Unexpected output formats → Model might not account for newline conventions; sanitize shell output before further reasoning.

FAQs

Q: Can I add commands like sudo or rm?
A: Yes, technically, but it is strongly discouraged unless you fully understand the risk. The tutorial explicitly forbids dangerous commands in both code and the prompt. (NVIDIA Developer)

Q: Do I need a GPU to run Nemotron locally?
A: Yes. The blog mentions that an NVIDIA GPU with at least 24 GB VRAM is required for local deployment. (NVIDIA Developer)

Q: Can the agent handle multi-step tasks?
A: Yes. The loop allows the model to perform multiple tool calls in one user instruction and then decide when to finish. (NVIDIA Developer)

Q: Is this production-ready?
A: Not out-of-the-box. It’s a strong prototype. For production, you'd need stronger sandboxing, security, permission controls, logging, error recovery, and monitoring.

Q: Can I connect this agent to GUI applications?
A: The tutorial covers only Bash/terminal commands. With extension, you could integrate other tools, but you'd need to wrap them with safe APIs and update the model/tool schema accordingly.

References

Maghoumi, Mehran (Oct 22 2025). “Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour”. NVIDIA Developer Blog. (NVIDIA Developer)
NVIDIA Developer Blog: on agentic AI concepts. (NVIDIA Developer)
LangGraph documentation (for create_react_agent). (NVIDIA Developer)

Conclusion

You now have a blueprint to build a natural-language Bash-terminal agent using NVIDIA Nemotron and a lightweight Python wrapper. This tutorial covers architecture, code, safety, use cases, and troubleshooting. While built in one hour, this design lays the foundation for more advanced agentic systems. Experiment by adding commands, tweaking prompts, and integrating other tools.