SurfSense Documentation

Prerequisites

Required setup's before setting up SurfSense

Auth Setup

SurfSense supports both Google OAuth and local email/password authentication. Google OAuth is optional - if you prefer local authentication, you can skip this section.

Note: Google OAuth setup is required in your .env files if you want to use the Gmail and Google Calendar connectors in SurfSense.

To set up Google OAuth:

  1. Login to your Google Developer Console
  2. Enable the required APIs:
    • People API (required for basic Google OAuth)
    • Gmail API (required if you want to use the Gmail connector)
    • Google Calendar API (required if you want to use the Google Calendar connector) Google Developer Console People API
  3. Set up OAuth consent screen. Google Developer Console OAuth consent screen
  4. Create OAuth client ID and secret. Google Developer Console OAuth client ID
  5. It should look like this. Google Developer Console Config

File Upload's

SurfSense supports three ETL (Extract, Transform, Load) services for converting files to LLM-friendly formats:

Option 1: Unstructured

Files are converted using Unstructured

  1. Get an Unstructured.io API key from Unstructured Platform
  2. You should be able to generate API keys once registered Unstructured Dashboard

Option 2: LlamaIndex (LlamaCloud)

Files are converted using LlamaIndex which offers 50+ file format support.

  1. Get a LlamaIndex API key from LlamaCloud
  2. Sign up for a LlamaCloud account to access their parsing services
  3. LlamaCloud provides enhanced parsing capabilities for complex documents

Files are processed locally using Docling - IBM's open-source document parsing library.

  1. No API key required - all processing happens locally
  2. Privacy-focused - documents never leave your system
  3. Supported formats: PDF, Office documents (Word, Excel, PowerPoint), images (PNG, JPEG, TIFF, BMP, WebP), HTML, CSV, AsciiDoc
  4. Enhanced features: Advanced table detection, image extraction, and structured document parsing
  5. GPU acceleration support for faster processing (when available)

Note: You only need to set up one of these services.


LLM Observability (Optional)

This is not required for SurfSense to work. But it is always a good idea to monitor LLM interactions. So we do not have those WTH moments.

  1. Get a LangSmith API key from smith.langchain.com
  2. This helps in observing SurfSense Researcher Agent. LangSmith

Crawler

SurfSense have 2 options for saving webpages:

  • SurfSense Extension (Overall better experience & ability to save private webpages, recommended)
  • Crawler (If you want to save public webpages)

NOTE: SurfSense currently uses Firecrawl.py for web crawling. If you plan on using the crawler, you will need to create a Firecrawl account and get an API key.


Next Steps

Once you have all prerequisites in place, proceed to the installation guide to set up SurfSense.