Run LLMs Faster on AnythingLLM Using NVIDIA RTX AI PCs

AnythingLLM

Large language models (LLMs) — trained on billions of tokens — are the engines behind today’s most popular AI applications, from chatbots and virtual assistants to code generators and beyond. While cloud-based LLMs have powered much of this revolution, a new wave of privacy-focused, all-in-one desktop solutions is bringing the power of AI directly to users’ PCs. Among the most accessible of these is AnythingLLM, a desktop app designed for AI enthusiasts who want flexible, private, and high-performance AI at their fingertips.

What Is AnythingLLM?

AnythingLLM is a feature-rich AI application that enables users to run local LLMs, retrieval-augmented generation (RAG) systems, and agentic tools — all from a single, intuitive interface. It acts as a bridge between your favorite LLMs (like Llama and DeepSeek R1) and your personal data, letting you:

  • Answer Questions: Query top LLMs locally or in the cloud for instant answers, without incurring usage costs.
  • Query Personal Data: Use RAG to privately search and analyze your own documents, PDFs, codebases, and more.
  • Summarize Documents: Generate concise summaries of research papers, reports, and lengthy files.
  • Analyze Data: Load files and extract insights using LLM-powered queries.
  • Agentic Actions: Automate research, run generative tools, and perform tasks based on your prompts.

AnythingLLM supports a wide range of open-source local LLMs, as well as cloud-based models from OpenAI, Microsoft, and Anthropic. Its extensible “skills” system and community hub allow users to add new agentic capabilities with ease. With one-click installation and the option to run as a standalone app or browser extension, AnythingLLM is especially appealing to users with NVIDIA GeForce RTX or NVIDIA RTX PRO GPUs.

RTX Acceleration: Bringing AI Performance to the Desktop

NVIDIA’s GeForce RTX and RTX PRO GPUs are engineered for AI, featuring Tensor Cores that dramatically accelerate inference and agentic tasks in AnythingLLM. The app leverages Ollama for on-device execution, with Llama.cpp and ggml tensor libraries optimized for RTX GPUs. This means users see significant speed-ups: for example, the GeForce RTX 5090 delivers up to 2.4x faster LLM inference in AnythingLLM compared to Apple’s M3 Ultra, both on Llama 3.1 8B and DeepSeek R1 8B models.

AnythingLLM Now Supports NVIDIA NIM Microservices

The latest update brings NVIDIA NIM microservices to AnythingLLM. NIMs are performance-optimized, prepackaged generative AI models that make it simple to launch AI workflows on RTX AI PCs via a streamlined API. For developers and enthusiasts, NIMs remove the friction of downloading, configuring, and connecting models, offering a single container that works both locally and in the cloud.

Performance on AnythingLLM

Inside AnythingLLM’s user-friendly interface, users can quickly experiment with NIM microservices for tasks like language and image generation, computer vision, and speech processing. Whether prototyping locally or preparing for cloud deployment, NIMs make it easier than ever to integrate advanced AI into your workflows.

Expanding the AI Ecosystem: Skills, Blueprints, and Community

AnythingLLM’s modular design means users can extend its capabilities through community-contributed skills and integrate with NVIDIA AI Blueprints — ready-made reference workflows for multimodal AI use cases. The app’s community hub offers documentation, sample code, and a growing library of tools for building creative workflows, digital humans, productivity apps, and more.