The world is on fire, thanks to the orange clown who wages war for personal gain. Or is it because data centers are super-heated running all these AI models 24/7 ?
In any case, the AI boom wreaked havoc with my plans to purchase a new computer in order to replace my ageing build server here at home. RAM sticks are 4 to 5 times as expensive now compared to half a year ago, and hard drives are pretty hard to come by.
I decided to wait a bit with a full replacement of the server hardware. Instead I bought one item which has had only a moderate price increase until now: a GeForce RTX 5060 Ti graphics card with 16 GB of VRAM. I installed that as the second GPU card in the server and did not connect any screen to it.
Instead I decided to do a local experiment with Artificial Intelligence. The result of that experiment is a new episode in my Slackware Cloud Server series. I am going to show you how to make the un-used Nvidia GPU available to Slackware, how to install and configure a tool that manages (downloads, runs) local Large Language Models (aka LLM’s aka AI models) and then expose the AI models via a web page that looks a lot like a Claude Chat or a ChatGPT instance.
Check out the list below which shows past, present and future episodes in my Slackware Cloud Server series. If the article has already been written you’ll be able to click on the subject.
The first episode also contains an introduction with some more detail about what you can expect.
- Episode 1: Managing your Docker Infrastructure
- Episode 2: Identity and Access management (IAM)
- Episode 3 : Video Conferencing
- Episode 4: Productivity Platform
- Episode 5: Collaborative document editing
- Episode 6: Etherpad with Whiteboard
- Episode 7: Decentralized Social Media
- Episode 8: Media Streaming Platform
- Episode 9: Cloudsync for 2FA Authenticator
- Episode 10: Workflow Management
- Episode 11: Jukebox Audio Streaming
- Episode 12 (this article): Local AI
Setting up Ollama and Open WebUI, adding Open Terminal as well, to give you a fully local and offline AI chatbot using any open source Large Language model of your choice. Your data, under your control.- Why the hell would I want to run a local AI at home?
- Preamble
- Web hosts
- Secret keys
- Docker network
- Port numbers
- File locations
- Server configuration steps
- Install the Nvidia GPU card
- Install the NVidia binary driver and kernel module
- The CUDA toolkit
- Install Ollama
- Create Docker network and local directories
- Install Open WebUI
- Start and configure Open WebUi
- Apache reverse proxy (https)
- Check the result
- Add Open Terminal
- Enable Open Terminal in Open WebUI
- Ollama integration in Nextcloud
- Single Sign-On (SSO)
- Attribution
- Final thoughts
- Episode X: Docker Registry
Why the hell would I want to run a local AI at home?
I think that the advantages are pretty obvious, but let me spell them out for you.
- Privacy & data security
It’s my main reason for doing this. The whole Cloud Server series is about owning, controlling and managing your own data without giving it away to the big tech corporations. Literally all of the processing happens on our Slackware server. None of your sensitive files, private documents or even the software code that you are developing gets sent to someone else’s infrastructure. - Offline accessibility
The local LLM does not require an active internet connection. It’ll be your conscious decision to give the AI model access to Internet search engines. - Cost efficiency (at least long-term)
I had to make an upfront investment in the required hardware of course. LLM’s need to run inside the VRAM of a high-end GPU in order to respond with a decent speed. If you have a spare GPU in your gaming rig, then by all means re-use that card! But really, running LLMs locally will remove the need for monthly subscription fees and per-token API costs. I know people who pay 100 euros per month to be able to consume the API tokens that they need for business development.
If you are a high-volume LLM user, running the model locally can lead to substantial savings over time. Your local AI may not be one the fancy new commercial models and the speed of answering may be a bit slower, but there’s always going to be trade-offs. - Reduced Latency
Often overlooked actually, but all your ChatGPT, Gemini or Claude queries involve a “network round trip” to a remote server. If you want to create an interactive AI service like a voice assistant, a local model may be able to offer snappier responses. - Customization & Control
Obviously, you have full ownership of the model you downloaded and you control its environment. This allows you to:- Fine-tune the model on your own niche datasets.
- Choose specific open-source models (like Llama 3, Gemma 4 or Mistral) and quantization levels that fit your hardware.
- Avoid content restrictions or “guardrails” defined by commercial providers.
- Reliability & Independence
You are not at the mercy of Big Tech! Any downtime is your own problem to solve; you never run into rate limits; you will never be hit with sudden policy changes that deprecate the model you rely on overnight.
Here is an architectural overview what the stack looks like. We install Ollama on bare-metal and will be running Open WebUI in a Docker container.
Preamble
This section describes the technical details of our setup, as well as the things which you should have prepared before trying to implement the instructions in this article.
Web Hosts
For the sake of this instruction, I will use the URL “https://ai.darkstar.lan” as your landing page for your private AI chatbot.
Setting up your domain (which will hopefully be something else than “darkstar.lan”…) with new hostnames and then setting up web servers for the hostnames in that domain is an exercise left to the reader. Before continuing, please ensure that your equivalent for the following host has a web server running. It doesn’t have to serve any content yet but we will add some blocks of configuration to the VirtualHost definition during the steps outlined in the remainder of this article:
- ai.darkstar.lan
Using a Let’s Encrypt SSL certificate to provide encrypted connections (HTTPS) to your webserver is documented in an earlier blog article.
Note that I am talking about webserver “hosts” but in fact, all of these are just virtual webservers running on the same machine, at the same IP address, served by the same Apache
httpdprogram, but with different DNS entries. There is no need at all for multiple computers when setting up your Slackware Cloud server.
Port numbers
- Ollama uses TCP port 11434 by default and we are not going to change that.
- The Open WebUI docker container will listen at the loopback TCP port 3456 which is where the Apache Reverse Proxy will direct incoming traffic.
Secret keys
- For persistent logins:
WEBUI_SECRET_KEY=eePAAjEgEnZdgAQcVKb/DA993rwU+xbBb1scG0Zz1sQ= - Connecting Open Terminal to Open WebUI:
OPEN_TERMINAL_API_KEY=qIShpFT2IUZaglqLTX5UCw6oQSyuCuKgpgF/xViqUWA=
Random strings like these can be generated using a convoluted series of commands like:
$ cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1 | openssl dgst -binary -sha256 | openssl base64
… which outputs a 45-character string ending on ‘=’.
Or generate 32 random characters with a truncated version of that commandline:
$ cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1
Docker network
- We assign a Docker network segment to our Open WebUI container: 172.24.1.0/24 and call it “localai.lan“
File Locations
- The Docker configuration goes into: /usr/local/docker-localai/
- The vector database for maintaining chat history and other data downloaded by the Ollama server or generated by Open Terminal go into: /opt/dockerfiles/localai/
- Our Nextcloud server is installed into /var/www/htdocs/nextcloud/
Server configuration steps
I will break down the story into its main parts:
- Physically install a GPU with sufficient Video RAM (VRAM)
- Install the Nvidia binary driver and kernel module
- Install CUDA toolkit
- Install and configure Ollama – which will make use of the Nvidia driver and the CUDA toolkit to load LLMs into the GPU’s VRAM
- Create the Docker network and the local directory structure
- Install Open WebUI – which gives us a nice web page where we can manage and query our AI models
- Configure Apache reverse proxy to expose the Open WebUI to the network
Install the GPU card
Before you buy any new GPU hardware, you need to make sure that your motherboard and PSU support the card. In my case, the GeForce RTX 5060 Ti card needs a 8-pin MOLEX power connector and requires a 650W PSU. Even my 10-year old server meets those requirements. Caveat: the GeForce RTX 5060 supports PCIe 5.0 but my old ASUS Prime B350-plus motherboard only supports PCIe 3.0. This is a backward compatible protocol, so this rather recent graphics card still works in my server, but it will not be able to reach its full performance and speed. Eventually, I will have to upgrade the rest of my server hardware also.
However, the point is to run a local AI model entirely in the Video RAM (VRAM) and then PCIe speeds are not an important consideration.
I kept my fanless GeForce GT 1030 card in the server as well. It is connected to a monitor using a regular kernel driver. That way, the new card can be fully utilized for AI inference and I still have local access to the server console..
Install the NVidia binary driver and kernel module
The GeForce RTX 5060 (which is based on the Blackwell architecture) requires the Nvidia open GPU kernel modules for proper functionality on Linux. The standard proprietary kernel module downright refuses to support this rather new card.
On the other hand, my old GT1030 card is not even detected by the open GPU kernel module, which made it really easy for me to keep both cards in the server – the old card using the Linux kernel driver to allow local access to the console, and the new card using the Nvidia open driver which enables the use of local AI.
Typically I would now point you to packages in my local repository to install the software you need. But for the Nvidia driver I do not have packages. They are too much of a moving target, with the multiple versions each supporting ranges of GPU models, and each having a kernel module that should match a Slackware kernel.
Instead I would like to point you to the SlackBuilds.org script repository, where you can download the required SlackBuild scripts and supporting files to compile these packages yourself.
You will need:
- nvidia-driver (I used the 580.105.08 release but the current available version is already at 595)
- nvidia-kernel (edit the SlackBuild script and enable the “OPEN” build by setting the variable
OPENto “yes“
Build those two packages and install them, then reboot your computer. Use the ‘nvidia-smi’ program which is part of the nvidia-driver package to verify that your GPU is recognized and ready for use:
# nvidia-smi -L GPU 0: NVIDIA GeForce RTX 5060 Ti (UUID: GPU-065f08c5-cd2f-48bc-ac7c-2454613fd064)
We’re ready for the next step.
The CUDA toolkit
CUDA what?
A brief explanation first. I assume that you have had some thought about why there’s this hype around GPU’s in relation to AI. There is a fundamental overlap between the technologies (more specifically the Graphical Processing Unit or GPU) that were developed to speed up the rendering of three-dimensional graphics in computer games, and the capabilities needed to train AI programs and make them respond in real-time. Both require hardware that can perform thousands of simple, identical calculations simultaneously, at scale:
- To render a 3D scene, a GPU must calculate the color and position of millions of pixels at once. This is done using linear algebra (matrix and vector multiplication).
- Training a neural network (the basic building block of any AI program) also relies on massive matrix multiplications to adjust millions of “weights” or parameters.
The math involved here is nearly identical! The hardware designed to “paint” a video game frame was accidentally perfect for “training” an AI model.
Now, CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA. With CUDA, you can use Nvidia GPUs for general-purpose processing, not just graphics. It enables you to harness the power of GPU parallelism to accelerate stuff like scientific simulations and deep learning. CUDA has “democratized” the graphics hardware. This allows AI researchers to write code for GPU’s using standard languages like C++ and Python. The speedup compared to CPU-only sequential computing implementations is gigantic.
The CUDA toolkit is what’s going to drive our local AI management, see also the architecture diagram I included previously.
By the way, here is an interesting read for you: “The Origins of GPU Computing“.
Install the CUDA toolkit
Similar to the NVIDIA driver and kernel module, we install the CUDA toolkit using the SlackBuilds.org scripts.
I needed to use cudatoolkit_13 which was at version 13.2.0 when I compiled my package. The compilation of an Ollama package in a next step kept failing because of CUDA issues until I upgraded the toolkit from the 10.x (default at SBo) to 13.x.
After installing the CUDA toolkit, logout and login again to make your shell can ‘source’ the installed profile script ‘/etc/profile.d/cuda-13.2.sh‘
Install Ollama
For this article, I tried building Ollama from source. That went well eventually, but the resulting binary would never contain a working set of GPU library stubs (CUDA). What happens if you use an Ollama binary without support for CUDA is that the AI models you use will be running on your CPU instead of your GPU, killing the real-time experience.
I will leave the build instructions for Ollama in this section, but what I actually did was downloading the official binary and installing that as ‘/usr/local/bin/ollama’. If any of you can explain to me what I potentially did wrong, and show how to compile Ollama with CUDA support, let me know in the comments below!
Compile from source
Before we can compile Ollama, some packages need to be installed first:
- google-go-lang
This is part of Slackware -current but if you are still on Slackware 15.0 you can download this Go compiler from my own repository. - go-md2man
Needed to generate man pages.You can download this package from my own repository. - jq
A commandline JSON processor which is part of Slackware -current and which you can download from my own repository if you are running Slackware 15.0
Note that after installing google-go-lang you need to logout and login again to allow your shell to ‘source’ the profile script ‘/etc/profile.d/go.sh‘.
Then, compile Ollama, using yet another script you can download from SlackBuilds.org. Install the resulting package.
Or… download official binaries
In stead of compiling Ollama from source, you can also download and install the official binaries from Ollama’s server. Simply do this:
# wget https://ollama.com/download/ollama-linux-amd64.tar.zst # tar -C /usr/local -xf ollama-linux-amd64.tar.zst # /usr/local/bin/ollama --version
Since we have an NVIDIA card in our server, and the NVIDIA proprietary driver as well as the CUDA toolkit have already been installed, the Ollama binary auto-detects these capabilities and no extra steps are needed on Slackware as long as libcuda.so is in the dynamic linker path.
Create dedicated system account and directories
The ‘ollama’ account will be created as system user and group:
# groupadd -g 393 -r ollama # useradd -r -u 393 -g ollama -d /var/lib/ollama -s /sbin/nologin -c "Ollama service account" ollama
The directory to store the AI models:
# mkdir -p /var/lib/ollama/models # chown -R ollama:ollama /var/lib/ollama # chmod 750 /var/lib/ollama
Pre-create the log file:
# touch /var/log/ollama.log # chown ollama:ollama /var/log/ollama.log
Start Ollama
We’ll make sure that Ollama starts when the server boots using a ‘rc’ script. It is also possible to run Ollama in a container, which may be a future extension to the article.
Create the following ‘rc’ script called ‘/etc/rc.d/rc.ollama‘ to start Ollama when your computer boots:
#!/bin/bash
# /etc/rc.d/rc.ollama - Ollama service for Slackware
# Created by Jerry B Nettrouer II https://www.inpito.org/projects.php
# Load configuration (if file exists)
[ -f /etc/default/ollama ] && . /etc/default/ollama
# Load CUDA toolkit locations if those exist:
[ -f /etc/profile.d/cuda-13.2.sh ] && . /etc/profile.d/cuda-13.2.sh
# Set the Process ID file
PIDDIR="/run/ollama/"
PIDFILE="/run/ollama/ollama.pid"
# Set the log file
LOGFILE="/var/log/ollama/ollama.log"
case "$1" in
start)
if [ -f "$PIDFILE" ] && kill -0 $(cat "$PIDFILE") 2>/dev/null; then
echo "Ollama already running."
exit 0
fi
echo "Starting Ollama... (models: $OLLAMA_MODELS, host: $OLLAMA_HOST)"
# Create the run directory
mkdir -p $PIDDIR
chown -R ollama:ollama $PIDDIR
# Use nohup + setsid for clean daemon behavior
su -s /bin/sh -c "setsid nohup ollama serve >> $LOGFILE 2>&1 & echo \$! > $PIDFILE" ollama
echo "Started with PID $(cat "$PIDFILE")"
;;
stop)
echo "Stopping Ollama..."
if [ -f "$PIDFILE" ]; then
kill $(cat "$PIDFILE") 2>/dev/null
rm -f "$PIDFILE"
else
pkill -f "ollama serve" 2>/dev/null
fi
;;
restart)
$0 stop
sleep 1
$0 start
;;
status)
if [ -f "$PIDFILE" ] && kill -0 $(cat "$PIDFILE") 2>/dev/null; then
echo "Ollama is running (PID $(cat "$PIDFILE"))."
elif pgrep -f "ollama serve" >/dev/null; then
echo "Ollama is running (but no PID file)."
else
echo "Ollama is not running."
fi
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
;;
esac
exit 0
# ---
This ‘rc’ script relies on a configuration file ‘/etc/default/ollama‘ which needs the following content (you will probably change a few parameters):
# ---
OLLAMA_MODELS=${OLLAMA_MODELS:-"/var/lib/ollama/.ollama"}
OLLAMA_HOST=${OLLAMA_HOST:-"0.0.0.0:11434"}
OLLAMA_ORIGINS=*
# You can add more variables if needed, e.g.:
#OLLAMA_KEEP_ALIVE="-1" # Never unload automatically
#OLLAMA_DEBUG="1"
#OLLAMA_GPU_MEMORY_FRACTION="0.85" # Constrain VRAM usage:
# Need to export these, otherwise the ollama rc script will not pick them up:
export OLLAMA_MODELS OLLAMA_HOST OLLAMA_ORIGINS
# ---
Note in this configuration file that we instruct Ollama to listen on all interfaces (0.0.0.0), not just the loopback address (127.0.0.1). The safest way for a local Ollama which we are going to expose via an Apache Reverse Proxy would indeed be to only listen at the loopback address, but Ollama does not only want to talk to you (the user) but also it needs to be talked to! The web page via which you are going to access your local AI is provided by Open WebUI and that is going to be running inside a Docker container. The Open WebUI server inside Docker can not access the host’s loopback. That is why we tell Ollama to listen at all network interfaces.
And then we mitigate this risk by adding a firewall rule which blocks access to Ollama from anything else but the loopback address and our AI Docker network.
To bring it all together, invoke this ‘rc’ script in your ‘/etc/rc.d/rc.local’ by adding the following text block to it. There we ensure that the Ollama server port is firewalled from the outside world:
if [ -x /etc/rc.d/rc.ollama ]; then # Protect from outside abuse via firewall: # Allow established connections and loopback /usr/sbin/iptables -A INPUT -i lo -p tcp --dport 11434 -j ACCEPT # Allow Docker Ollama bridge network # (adjust the subnet to match 'docker network inspect localai.lan') /usr/sbin/iptables -A INPUT -s 172.24.1.0/24 -p tcp --dport 11434 -j ACCEPT # Drop everything else hitting this port /usr/sbin/iptables -A INPUT -p tcp --dport 11434 -j DROP # Start Ollama LLM offline server: echo "Starting Ollama LLM offline: /etc/rc.d/rc.ollama start" /etc/rc.d/rc.ollama start fi
Run the start script manually first to boot the OIlama server.
# /etc/rc.d/rc.ollama start
Note that Ollama does not offer any form of authentication mechanism. Any user or process that can access the TCP port can use it.
Test Ollama
Test from your non-root user account whether Ollama is ready for action:
$ ollama list
… or else:
$ curl http://127.0.0.1:11434/api/tags
LLM quickstart: pull and use Mistral
Let’s try to pull the ‘Mistral’ Large Language Model (this downloads ~4GB for mistral:7b):
$ ollama pull mistral
If you want to experience an interactive chat session:
$ ollama run mistral
You can also use a non-interactive single prompt which would be useful for scripting:\
$ ollama run mistral "Explain lithography in one paragraph"
… or use the REST API directly:
$ curl http://127.0.0.1:11434/api/generate -d '{"model":"mistral","prompt":"Hello, Mistral!","stream":false}' | python3 -m json.tool
Query Ollama about the AI models it has loaded. This also shows how much of the model runs in the GPU VRAM versus on the CPU:
$ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL ministral-3:14b 4760c35aeb9d 10 GB 100% GPU 4096 Forever
Create docker network and local directories
# docker network create --driver=bridge --subnet=172.24.1.0/24 --gateway=172.24.1.1 localai.lan
# mkdir -p /usr/local/docker-localai/ # mkdir -p /opt/dockerfiles/localai/{data,open-terminal-data}/
Install Open WebUI
Open WebUI is the current best-maintained self-hosted frontend for Ollama. This is its ‘docker-compose.yml‘ file which you should create in directory ‘/usr/local/docker-localai/‘:
# ---
name: open-webui
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
networks:
- localai.lan
# host-gateway resolves to the host machine's IP on the Docker bridge,
# allowing the container to reach host-resident services.
extra_hosts:
- "host.docker.internal:host-gateway"
ports:
# Bind ONLY to localhost. Apache will be the public-facing entry point.
- "127.0.0.1:3456:8080"
environment:
# Point Open WebUI at host-resident Ollama via the bridge gateway (127.0.0.1 does NOT work here).
- OLLAMA_BASE_URL=http://host.docker.internal:11434
# Must match what Apache sends as the public URL. This is critical for
# cookies, redirects, and CSRF protection once behind a reverse proxy.
- WEBUI_URL=https://ai.darkstar.lan
# Best practice: put this in a .env file next to docker-compose.yml
- WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
# Harden cookies when served over HTTPS via the proxy
- WEBUI_SESSION_COOKIE_SECURE=true
- WEBUI_SESSION_COOKIE_SAMESITE=lax
# Explicitly enable WebSocket support
- ENABLE_WEBSOCKET_SUPPORT=true
# Socket.IO ping interval and timeout (milliseconds).
# Ping every 20s; the client must respond within 30s.
# This keeps the WebSocket alive through NAT devices with
# short idle timers.
- WEBSOCKET_PING_INTERVAL=20000
- WEBSOCKET_PING_TIMEOUT=30000
volumes:
- /opt/dockerfiles/localai/data:/app/backend/data
depends_on:
- open-terminal
open-terminal:
# Use 'slim' (200 MB) instead of 'latest' (2 GB) unless you specifically
# need Node.js, ffmpeg, or data science libraries available to the AI agent.
image: ghcr.io/open-webui/open-terminal:${OPEN_TERMINAL_VARIANT}
container_name: open-terminal
restart: unless-stopped
networks:
- localai.lan
# No 'ports:' section - intentionally not exposed to the host.
# Open WebUI backend proxies to it via the Docker network.
environment:
- OPEN_TERMINAL_API_KEY=${OPEN_TERMINAL_API_KEY}
volumes:
# Persistent home directory for the terminal user.
# Files the AI creates here survive container restarts.
- /opt/dockerfiles/localai/open-terminal-data:/home/user
# Only add this if you specifically want the AI to read host files
#- /host/path/to/AI/data:/data:ro # :ro = read-only
networks:
localai.lan:
external: true
name: localai.lan
# ---
The accompanying ‘.env‘ file which should be created in the same location contains the following:
# --- # Persistent login: WEBUI_SECRET_KEY=eePAAjEgEnZdgAQcVKb/DA993rwU+xbBb1scG0Zz1sQ= # Connecting Open Terminal to Open WebUI: OPEN_TERMINAL_API_KEY=qIShpFT2IUZaglqLTX5UCw6oQSyuCuKgpgF/xViqUWA= OPEN_TERMINAL_VARIANT=latest # ---
Don’t forget to create and use the variable “WEBUI_SECRET_KEY”!
Without a persistent “WEBUI_SECRET_KEY”, you’ll be logged out every time the container is recreated.
Why does the “127.0.0.1” address not work from inside a container?
When Open WebUI runs in a Docker container on a bridge network (which is what any custom docker network uses), the address “127.0.0.1” inside that container refers to the container’s own loopback interface… not that of the host!
Setting OLLAMA_BASE_URL to “http://127.0.0.1:11434” would have the container talking to itself on a port where nothing is listening. You would get an immediate connection refused. The “extra_hosts” entry in the Compose file: “host.docker.internal:host-gateway” is a specific syntax meant to instruct Docker to inject a hosts-file entry into the container that maps the name “host.docker.internal” to the host’s IP address on the Docker bridge (typically something like 172.18.0.1, but you never need to hard-code that). This is Docker’s own supported mechanism for containers to reach host-resident services.
Even with “host.docker.internal” resolving correctly, there is still a firewall/bind problem. If Ollama’s OLLAMA_HOST is set to “127.0.0.1:11434”, the kernel will only accept connections arriving on the loopback interface. Traffic coming in from the Docker bridge (e.g., 172.18.0.x) arrives on a different interface and gets refused at the TCP socket level. Not by a firewall rule!
Note that I am already including the Docker configuration for Open Terminal, so that you have everything in one place from the start. I will explain about Open Terminal further down in another section of the article.
Start and configure Open WebUI
We perform the initial configuration of the Open WebUI container while still not opened up to the LAN. Just to be safe, since the very first user that is created has full admin rights. In the next step we will configure a reverse proxy to expose Open WebUI to the network.
# cd /usr/local/docker-localai # docker compose up -d
Docker downloads (pulls) the image and then the container will start. Watch the logs during first start (DB migrations run on first boot):
# docker logs -f open-webui
Look for the line like “INFO Application startup complete” which indicates that the server is ready. Let’s login!
On the host, navigate to http://127.0.0.1:3456
The admin account
Register your first account. This user automatically becomes the admin account. Naturally you need to define a strong password… Open WebUI’s admin user is able to control access to the AI models, user creation, and system-level settings.
- Go to ‘Settings > Connections‘ and confirm that the Ollama URL is shown as connected (a green indicator).
- Go to ‘Settings > Models‘, Your previously pulled models (e.g., “mistral:latest“) should appear.
- Start a new chat, select “mistral“, and away you go!
Internet access
To give your local AI model internet access via Open WebUI, you need to enable the built-in Web Search feature in the Admin Panel.
Recent AI models are highly capable of “tool use,” and this setup allows the model to search the web, read the top results, and summarize them for you.
- Enable Web Search in Admin Settings
- Open the Open WebUI interface https://ai.darkstar.lan/ in your browser.
- Click your Profile Icon (bottom-left) and select ‘Admin Panel‘.
- Navigate to the ‘Settings‘ tab and click on ‘Web Search‘.
- Toggle ‘Enable Web Search‘ to “ON”.
- Choose and Configure a Search Engine
You must select a provider to fetch the actual search results. Here are the most common options:- DuckDuckGo (Easiest): Works out of the box without an API key. Select “DDGS” as the search engine and “DuckDuckGo” as its backend from the dropdowns.
- Tavily (Recommended for AI): Specifically built for LLMs to get clean, searchable data. You will need a free Tavily API Key. Paste it into the Tavily API Key field in ‘Settings‘.
- Google PSE: Best for comprehensive results but requires creating a Google Programmable Search Engine to get a Search Engine ID and API Key.
- SearXNG (Private/Local): If you want to stay 100% local, you can run a SearXNG instance in a separate Docker container and point Open WebUI to its local URL (e.g., http://localhost:8080).
- Using Search in Chat
Once configured, you can use the web search in two ways:- Manual Toggle: In a new chat, look for the “+” icon or the Web Search toggle (globe icon) near the message box to activate it for that session.
- Keyword Trigger: You can often trigger a search by typing # or using a specific prefix if you have set up a “Search” tool/action in the Workspace settings.
- To make sure the AI actually uses the retrieved data effectively:
- Go to ‘Workspace > Models‘.
- Click the ‘Edit‘ (pencil) icon for your AI model.
- In the ‘Tools or Capabilities‘ section, ensure that “Web Search” is checked so the model knows it is allowed to use this external tool.
Apache reverse proxy configuration
Ensure that the following modules are loaded in httpd.conf or in a separate included configuration file below /etc/httpd/ :
# --- LoadModule proxy_module lib64/httpd/modules/mod_proxy.so LoadModule proxy_http_module lib64/httpd/modules/mod_proxy_http.so LoadModule proxy_wstunnel_module lib64/httpd/modules/mod_proxy_wstunnel.so LoadModule ssl_module lib64/httpd/modules/mod_ssl.so LoadModule rewrite_module lib64/httpd/modules/mod_rewrite.so LoadModule headers_module lib64/httpd/modules/mod_headers.so # ---
A note about ‘mod_proxy_wstunnel’: people often forget to account for WebSockets. Open WebUI streams LLM responses over a WebSocket connection. Without this module, you get a UI that connects, shows the model list, but then silently fails to stream any generated text.
Therefore, these are the essential bits you need to add to your Apache HTTPD server configuration:
# --- Proxy core --- ProxyPreserveHost On ProxyRequests Off # Tell Open WebUI the real client IP (used in logs and rate limiting) RequestHeader set X-Forwarded-Proto "https" RequestHeader set X-Forwarded-Host "ai.darkstar.lan" Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains" # Increase timeouts for long LLM inference (large models can think slowly) ProxyTimeout 300 Timeout 300 # Disable response buffering - critical for streaming LLM output. # Without this, Apache may buffer the entire response before forwarding. SetEnv proxy-sendchunked 1 SetEnv proxy-initial-not-buffered 1 # WebSocket upgrade support (essential for LLM streaming) RewriteEngine On RewriteCond %{HTTP:Upgrade} websocket [NC] RewriteCond %{HTTP:Connection} upgrade [NC] RewriteRule ^/?(.*) "ws://127.0.0.1:3456/$1" [P,L] # Open WebUI reverse proxy, connects to an Ollama backend: ProxyPass / http://127.0.0.1:3456/ keepalive=On ProxyPassReverse / http://127.0.0.1:3456/ # Optionally (you can remove this if you don't care) # Ensure that only the people you know can access the Web Interface: <Location /> Require all granted <RequireAny> Require host yourowndomain.com Require ip 192.168 Require ip 10.10 </RequireAny> </Location>
After adding this configuration block to the “VirtualHost” definition of ai.darkstar.lan, run a configuration check and then restart the Apache webserver:
# apachectl configtest # apachectl -k graceful
Check the result
Your Open WebUI page should now be accessible at https://ai.darkstar.lan/
Add Open Terminal
Open Terminal is a capability we can add to the Docker stack that gives the AI model a real computer to work on.
It connects a containerized computing environment to Open WebUI. The AI model you are using can use that sandboxed shell environment to write code, execute it, read the output, fix errors, and iterate, all without leaving the chat. It handles files, installs packages, runs servers, and returns results directly to you. Because we will run it in a Docker container it offers complete isolation from the host processes. We will give it persistent storage so that you can grab the resulting artifacts straight from a local directory.
This setup mirrors a capability that the Big Tech companies also provide with their commercial LLM’s: formulate an idea and let your AI generate working software. Ask it a question and get a functional script. Describe a website and watch it being rendered live.
When you upload a spreadsheet, CSV file or a database, you can instruct your AI to read the data, run analysis scripts and generate charts or reports.
Until here the PR text taken from the web site:-)
An important architectural consideration: Open WebUI proxies AI requests to Open Terminal. Open Terminal will never connect the other way round. This means that the Open Terminal container never needs to be reachable from the internet or even from the browser. It only needs to be reachable from within the Docker network. This keeps it nicely isolated.
To get Open Terminal up and running, nothing is required. We already added all the code to the ‘docker-compose.yml’ and ‘.env’ files. When the stack is running you can validate that Open Terminal is ready by examining the logs:
# docker logs -f open-terminal
Verify that Open WebUI can talk to Open Terminal via a command you execute inside the Open WebUI container (the docker commmand uses the ‘open-webui’ service name as defined in the ‘docker-compose.yml’ file):
# docker exec open-webui curl -s \
-H "Authorization: Bearer $(grep OPEN_TERMINAL_API_KEY /usr/local/docker-localai/.env | cut -d= -f2)" \
http://open-terminal:8000/health
A healthy response looks like:
json{"status": "ok"}
If that returns successfully, the two containers can see each other on the network and the API key is accepted.
Enable Open Terminal in Open WebUI
This needs to be done through the Open WebUI admin interface. It can not be done via configuration files.
- Navigate to https://ai.darkstar.lan/ and log in as an admin user.
- Go to ‘Admin Settings > Integrations > Open Terminal‘ and fill in the fields:
- URL: “http://open-terminal:8000”
- API Key: qIShpFT2IUZaglqLTX5UCw6oQSyuCuKgpgF/xViqUWA= (which is the value of OPEN_TERMINAL_API_KEY in your .env file)
- Click ‘Save‘, then toggle the connection ‘Enabled‘.
Open WebUI will immediately test the connection, and a green indicator confirms success.
The URL http://open-terminal:8000 works because Docker’s internal DNS resolves the service name ‘open-terminal’ to the container’s IP on localai.lan.
This is why the container needs no exposed port. It is only ever spoken to by Open WebUI’s backend, never by your browser directly.
You have a choice to make regarding the Docker Image variant of Open Terminal. Using the ‘slim‘ tag in the Compose file above would be a deliberate choice. I prefer ‘latest‘ instead. Here is what each variant gives an AI agent to work with:
- alpine:
~100 MB image. This gives: a basic shell, curl, jq, git. It’s minimal but functional. - slim:
~200 MB image. Content is identical to the ‘alpine’ image but this one is Debian-based. This guarantees better package compatibility. - latest:
~2 GB image. You will get a full Python environment, Node.js, the Docker CLI, ffmpeg and data science libraries.
For a personal server, ‘slim‘ may the pragmatic choice. The AI can run shell commands, use git, curl APIs, and manage files, which covers the vast majority of useful agent tasks without pulling a 2 GB image. But I may also need the AI to run a Python data processing task or Node.js scripts. Therefore I configured ‘latest‘ myself.
Ollama integration in Nextcloud
Official documentation for the integration of local AI into your Nextcloud server can be found here: https://docs.nextcloud.com/server/stable/admin_manual/ai/overview.html
In short, these are the steps you need to take to integrate your Ollama AI server into Nextcloud.
- Install the Nextcloud Assistant app using the administrator account of your Nextcloud instance
- Similarly, install OpenAI integration app
- Click on the administrator avatar in the top right, and go to ‘Administration Settings > Administration > Artificial Intelligence‘
- In ‘OpenAI and LocalAI configuration‘, set ‘http://127.0.0.1:11434/v1′ as the OpenAI-compatible ‘Service URL‘.
- Add one line to Nextcloud’s ‘
config/config.php‘ file (manually; there is no GUI to do this):
'allow_local_remote_servers' => true,
If you want your AI to feel responsive in Nextcloud it is also imperative to implement a number (minimum 4) of local ‘AI workers’ that pick up AI requests from the queue and process them immediately in the background. Otherwise that request processing is only happening every 5 minutes via Nextcloud’s internal cron. My advice is running them inside screen (or tmux) with this command added to ‘/etc/rc.d/rc.local‘:
/usr/bin/screen -S NEXTCLOUD -t AI_1 \ -Adm /usr/local/sbin/nextcloud_occ_backgroundworker.sh 1 && \ sleep 1 && \ /usr/bin/screen -S NEXTCLOUD -X screen -t AI_2 \ -Adm /usr/local/sbin/nextcloud_occ_backgroundworker.sh 2 && \ /usr/bin/screen -S NEXTCLOUD -X screen -t AI_3 \ -Adm /usr/local/sbin/nextcloud_occ_backgroundworker.sh 3 && \ /usr/bin/screen -S NEXTCLOUD -X screen -t AI_4 \ -Adm /usr/local/sbin/nextcloud_occ_backgroundworker.sh 4
Where the executable shell script ‘/usr/local/sbin/nextcloud_occ_backgroundworker.sh‘ is something you need to create yourself with the following content:
#!/bin/bash
if [ -n "$1" ]; then
echo "Starting Nextcloud AI Worker $1"
else
echo "Starting Nextcloud AI Worker"
fi
cd /var/www/htdocs/nextcloud/
set -e
while true; do
sudo -u apache php -d memory_limit=512M ./occ background-job:worker \
-v -t 60 "OC\TaskProcessing\SynchronousBackgroundJob"
done
# ---
If you need to access these AI workers at any time, you can do so from root’s commandline via:
# screen -x NEXTCLOUD
… and cycle through the four worker screens using [Ctrl]-a-n
Tasks are run as part of the background job system in Nextcloud, which only runs jobs every 5 minutes by default.
To pick up scheduled jobs faster you can set up background job workers inside the Nextcloud main server/container that process (AI and other)
tasks as soon as they are scheduled.
If the PHP code or the Nextcloud settings values are changed while a worker is running, those changes won’t be effective inside the runner.
For that reason, the worker needs to be restarted regularly. It is done with a timeout of N seconds which means any changes to the
settings or the code will be picked up after N seconds (worst case scenario). This timeout does not, in any way, affect the processing or the timeout of AI tasks.
The result of this configuration is the appearance of a new “AI” button in the Nextcloud task bar which you can click to access the Assistant, giving you access to chat, translation, image and audio analysis, and more:
Single Sign-ON (SSO)
Open WebUI supports OpenID Connect (OIDC) out of the box. See https://docs.openwebui.com/reference/env-configuration/#openid-oidc for the variables that enable Single Sign-On and https://docs.openwebui.com/troubleshooting/sso/ for additional troubleshooting information. You should be able to connect Open WebUI to your Cloudserver’s Keykloak Identity Provider without issues.
Unfortunately my local server that I equipped with a NVIDIA GPU is not running Keycloak or any other OIDC provider, and I could not validate this SSO capability myself.
Let me know if you were able to add SSO to your own setup!
Attribution
Many thanks to INPITO (the Indiana Non-Profit Information Technology Organization) who use Slackware as their OS and wrote the article that formed the inspiration for my own journey into local AI: https://www.inpito.org/ollama.php. I copied their Slackware boot script for Ollama.
Final thoughts
I hope this article will remove some of the resistance that many people still show towards the use of AI chatbots. The fact that you can run a Large Language Model on your gaming rig, use it to experiment with new technologies and be certain that none of that data will ever be shared externally, is great!
Leave your comments, suggestions and opinions below.
Thanks for reading. Eric


















In the blog post that I recently wrote about Ente Auth, the open source 2FA authenticator, I mentioned that this application can save its token secrets to the cloud. This is an optional feature and requires that you create an account at ente.io. This cloud backend service is where 


Nowadays we cannot imagine a world without the ability to fully personalize the way you consume movies and tv-shows. But that creates a dependency on a commercial provider. In this article I want to show you how to setup your own private streaming platform which you fully control. The engine of that platform will be 




Recent comments