Docker Installation
Setting up SurfSense using Docker
Docker Installation
This guide explains how to run SurfSense using Docker Compose, which is the preferred and recommended method for deployment.
Prerequisites
Before you begin, ensure you have:
- Docker and Docker Compose installed on your machine
- Git (to clone the repository)
- Completed all the prerequisite setup steps including:
- Auth setup
- File Processing ETL Service (choose one):
- Unstructured.io API key (Supports 34+ formats)
- LlamaIndex API key (enhanced parsing, supports 50+ formats)
- Docling (local processing, no API key required, supports PDF, Office docs, images, HTML, CSV)
- Other required API keys
Installation Steps
-
Configure Environment Variables Set up the necessary environment variables:
Linux/macOS:
# Copy example environment files cp surfsense_backend/.env.example surfsense_backend/.env cp surfsense_web/.env.example surfsense_web/.env cp .env.example .env # For Docker-specific settingsWindows (Command Prompt):
copy surfsense_backend\.env.example surfsense_backend\.env copy surfsense_web\.env.example surfsense_web\.env copy .env.example .envWindows (PowerShell):
Copy-Item -Path surfsense_backend\.env.example -Destination surfsense_backend\.env Copy-Item -Path surfsense_web\.env.example -Destination surfsense_web\.env Copy-Item -Path .env.example -Destination .envEdit all
.envfiles and fill in the required values:
Docker-Specific Environment Variables (Optional)
| ENV VARIABLE | DESCRIPTION | DEFAULT VALUE |
|---|---|---|
| FRONTEND_PORT | Port for the frontend service | 3000 |
| BACKEND_PORT | Port for the backend API service | 8000 |
| POSTGRES_PORT | Port for the PostgreSQL database | 5432 |
| PGADMIN_PORT | Port for pgAdmin web interface | 5050 |
| REDIS_PORT | Port for Redis (used by Celery) | 6379 |
| FLOWER_PORT | Port for Flower (Celery monitoring tool) | 5555 |
| POSTGRES_USER | PostgreSQL username | postgres |
| POSTGRES_PASSWORD | PostgreSQL password | postgres |
| POSTGRES_DB | PostgreSQL database name | surfsense |
| PGADMIN_DEFAULT_EMAIL | Email for pgAdmin login | admin@surfsense.com |
| PGADMIN_DEFAULT_PASSWORD | Password for pgAdmin login | surfsense |
| NEXT_PUBLIC_FASTAPI_BACKEND_URL | URL of the backend API (used by frontend during build and runtime) | http://localhost:8000 |
| NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE | Authentication method for frontend: LOCAL or GOOGLE | LOCAL |
| NEXT_PUBLIC_ETL_SERVICE | Document parsing service for frontend UI: UNSTRUCTURED, LLAMACLOUD, or DOCLING | DOCLING |
Note: Frontend environment variables with the NEXT_PUBLIC_ prefix are embedded into the Next.js production build at build time. Since the frontend now runs as a production build in Docker, these variables must be set in the root .env file (Docker-specific configuration) and will be passed as build arguments during the Docker build process.
Backend Environment Variables:
| ENV VARIABLE | DESCRIPTION |
|---|---|
| DATABASE_URL | PostgreSQL connection string (e.g., postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense) |
| SECRET_KEY | JWT Secret key for authentication (should be a secure random string) |
| NEXT_FRONTEND_URL | URL where your frontend application is hosted (e.g., http://localhost:3000) |
| AUTH_TYPE | Authentication method: GOOGLE for OAuth with Google, LOCAL for email/password authentication |
| GOOGLE_OAUTH_CLIENT_ID | (Optional) Client ID from Google Cloud Console (required if AUTH_TYPE=GOOGLE) |
| GOOGLE_OAUTH_CLIENT_SECRET | (Optional) Client secret from Google Cloud Console (required if AUTH_TYPE=GOOGLE) |
| EMBEDDING_MODEL | Name of the embedding model (e.g., sentence-transformers/all-MiniLM-L6-v2, openai://text-embedding-ada-002) |
| RERANKERS_ENABLED | (Optional) Enable or disable document reranking for improved search results (e.g., TRUE or FALSE, default: FALSE) |
| RERANKERS_MODEL_NAME | Name of the reranker model (e.g., ms-marco-MiniLM-L-12-v2) (required if RERANKERS_ENABLED=TRUE) |
| RERANKERS_MODEL_TYPE | Type of reranker model (e.g., flashrank) (required if RERANKERS_ENABLED=TRUE) |
| TTS_SERVICE | Text-to-Speech API provider for Podcasts (e.g., local/kokoro, openai/tts-1). See supported providers |
| TTS_SERVICE_API_KEY | (Optional if local) API key for the Text-to-Speech service |
| TTS_SERVICE_API_BASE | (Optional) Custom API base URL for the Text-to-Speech service |
| STT_SERVICE | Speech-to-Text API provider for Audio Files (e.g., local/base, openai/whisper-1). See supported providers |
| STT_SERVICE_API_KEY | (Optional if local) API key for the Speech-to-Text service |
| STT_SERVICE_API_BASE | (Optional) Custom API base URL for the Speech-to-Text service |
| FIRECRAWL_API_KEY | API key for Firecrawl service for web crawling |
| ETL_SERVICE | Document parsing service: UNSTRUCTURED (supports 34+ formats), LLAMACLOUD (supports 50+ formats including legacy document types), or DOCLING (local processing, supports PDF, Office docs, images, HTML, CSV) |
| UNSTRUCTURED_API_KEY | API key for Unstructured.io service for document parsing (required if ETL_SERVICE=UNSTRUCTURED) |
| LLAMA_CLOUD_API_KEY | API key for LlamaCloud service for document parsing (required if ETL_SERVICE=LLAMACLOUD) |
| CELERY_BROKER_URL | Redis connection URL for Celery broker (e.g., redis://localhost:6379/0) |
| CELERY_RESULT_BACKEND | Redis connection URL for Celery result backend (e.g., redis://localhost:6379/0) |
| SCHEDULE_CHECKER_INTERVAL | (Optional) How often to check for scheduled connector tasks. Format: <number><unit> where unit is m (minutes) or h (hours). Examples: 1m, 5m, 1h, 2h (default: 1m) |
| REGISTRATION_ENABLED | (Optional) Enable or disable new user registration (e.g., TRUE or FALSE, default: TRUE) |
Optional Backend LangSmith Observability:
| ENV VARIABLE | DESCRIPTION |
|---|---|
| LANGSMITH_TRACING | Enable LangSmith tracing (e.g., true) |
| LANGSMITH_ENDPOINT | LangSmith API endpoint (e.g., https://api.smith.langchain.com) |
| LANGSMITH_API_KEY | Your LangSmith API key |
| LANGSMITH_PROJECT | LangSmith project name (e.g., surfsense) |
Backend Uvicorn Server Configuration:
| ENV VARIABLE | DESCRIPTION | DEFAULT VALUE |
|---|---|---|
| UVICORN_HOST | Host address to bind the server | 0.0.0.0 |
| UVICORN_PORT | Port to run the backend API | 8000 |
| UVICORN_LOG_LEVEL | Logging level (e.g., info, debug, warning) | info |
| UVICORN_PROXY_HEADERS | Enable/disable proxy headers | false |
| UVICORN_FORWARDED_ALLOW_IPS | Comma-separated list of allowed IPs | 127.0.0.1 |
| UVICORN_WORKERS | Number of worker processes | 1 |
| UVICORN_ACCESS_LOG | Enable/disable access log (true/false) | true |
| UVICORN_LOOP | Event loop implementation | auto |
| UVICORN_HTTP | HTTP protocol implementation | auto |
| UVICORN_WS | WebSocket protocol implementation | auto |
| UVICORN_LIFESPAN | Lifespan implementation | auto |
| UVICORN_LOG_CONFIG | Path to logging config file or empty string | |
| UVICORN_SERVER_HEADER | Enable/disable Server header | true |
| UVICORN_DATE_HEADER | Enable/disable Date header | true |
| UVICORN_LIMIT_CONCURRENCY | Max concurrent connections | |
| UVICORN_LIMIT_MAX_REQUESTS | Max requests before worker restart | |
| UVICORN_TIMEOUT_KEEP_ALIVE | Keep-alive timeout (seconds) | 5 |
| UVICORN_TIMEOUT_NOTIFY | Worker shutdown notification timeout (sec) | 30 |
| UVICORN_SSL_KEYFILE | Path to SSL key file | |
| UVICORN_SSL_CERTFILE | Path to SSL certificate file | |
| UVICORN_SSL_KEYFILE_PASSWORD | Password for SSL key file | |
| UVICORN_SSL_VERSION | SSL version | |
| UVICORN_SSL_CERT_REQS | SSL certificate requirements | |
| UVICORN_SSL_CA_CERTS | Path to CA certificates file | |
| UVICORN_SSL_CIPHERS | SSL ciphers | |
| UVICORN_HEADERS | Comma-separated list of headers | |
| UVICORN_USE_COLORS | Enable/disable colored logs | true |
| UVICORN_UDS | Unix domain socket path | |
| UVICORN_FD | File descriptor to bind to | |
| UVICORN_ROOT_PATH | Root path for the application |
For more details, see the Uvicorn documentation.
Frontend Environment Variables
Important: Frontend environment variables are now configured in the Docker-Specific Environment Variables section above since the Next.js application runs as a production build in Docker. The following NEXT_PUBLIC_* variables should be set in your root .env file:
NEXT_PUBLIC_FASTAPI_BACKEND_URL- URL of the backend serviceNEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE- Authentication method (LOCALorGOOGLE)NEXT_PUBLIC_ETL_SERVICE- Document parsing service (should match backendETL_SERVICE)
These variables are embedded into the application during the Docker build process and affect the frontend's behavior and available features.
-
Build and Start Containers
Start the Docker containers:
Linux/macOS/Windows:
docker compose up --buildTo run in detached mode (in the background):
Linux/macOS/Windows:
docker compose up -dNote for Windows users: If you're using older Docker Desktop versions, you might need to use
docker compose(with a space) instead ofdocker compose. -
Access the Applications
Once the containers are running, you can access:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- pgAdmin: http://localhost:5050
Docker Services Overview
The Docker setup includes several services that work together:
- Backend: FastAPI application server
- Frontend: Next.js web application
- PostgreSQL (db): Database with pgvector extension
- Redis: Message broker for Celery
- Celery Worker: Handles background tasks (document processing, indexing, etc.)
- Celery Beat: Scheduler for periodic tasks (enables scheduled connector indexing)
- The schedule interval can be configured using the
SCHEDULE_CHECKER_INTERVALenvironment variable in your backend.envfile - Default: checks every minute for connectors that need indexing
- The schedule interval can be configured using the
- pgAdmin: Database management interface
All services start automatically with docker compose up. The Celery Beat service ensures that periodic indexing functionality works out of the box.
Using pgAdmin
pgAdmin is included in the Docker setup to help manage your PostgreSQL database. To connect:
- Open pgAdmin at http://localhost:5050
- Login with the credentials from your
.envfile (default: admin@surfsense.com / surfsense) - Right-click "Servers" > "Create" > "Server"
- In the "General" tab, name your connection (e.g., "SurfSense DB")
- In the "Connection" tab:
- Host:
db - Port:
5432 - Maintenance database:
surfsense - Username:
postgres(or your custom POSTGRES_USER) - Password:
postgres(or your custom POSTGRES_PASSWORD)
- Host:
- Click "Save" to connect
Useful Docker Commands
Container Management
-
Stop containers:
Linux/macOS/Windows:
docker compose down -
View logs:
Linux/macOS/Windows:
# All services docker compose logs -f # Specific service docker compose logs -f backend docker compose logs -f frontend docker compose logs -f db -
Restart a specific service:
Linux/macOS/Windows:
docker compose restart backend -
Execute commands in a running container:
Linux/macOS/Windows:
# Backend docker compose exec backend python -m pytest # Frontend docker compose exec frontend pnpm lint
Troubleshooting
- Linux/macOS: If you encounter permission errors, you may need to run the docker commands with
sudo. - Windows: If you see access denied errors, make sure you're running Command Prompt or PowerShell as Administrator.
- If ports are already in use, modify the port mappings in the
docker-compose.ymlfile. - For backend dependency issues, check the
Dockerfilein the backend directory. - For frontend dependency issues, check the
Dockerfilein the frontend directory. - Windows-specific: If you encounter line ending issues (CRLF vs LF), configure Git to handle line endings properly with
git config --global core.autocrlf truebefore cloning the repository.
Next Steps
Once your installation is complete, you can start using SurfSense! Navigate to the frontend URL and log in using your Google account.