Skip to main content
The Sources section is where you manage all the training data for your AI Agent. Upload files, add text content, create Q&A pairs, and crawl websites to train your agent with relevant knowledge.

Training Overview

View your training data summary including:
  • Total Sources - Count of all training sources across all types
  • Training Data - Total storage used (displayed in KB/MB)
  • Last Trained - When your agent was last trained
  • Breakdown by Type - See how many sources you have for Files, Text Content, Q&A, and Websites
The system automatically tracks the training state of each source and updates in real-time as processing completes.

Files

Upload documents to train your AI Agent on existing content like manuals, FAQs, product information, and business documents.

Supported File Types

  • PDF (.pdf) - PDF documents
  • Text (.txt) - Plain text files
  • Word (.doc, .docx) - Microsoft Word documents
  • CSV (.csv) - Comma-separated values

Uploading Files

  1. Navigate to the Files tab in Training
  2. Drag and drop files into the upload zone, or click to browse
  3. Files are processed automatically with real-time status updates
  4. Character count is tracked for each file

Processing States

Files go through several states during processing:
  • Extracting - Text is being extracted from the file
  • Ready - Content is ready for training
  • Training - Embeddings are being generated
  • Trained - Successfully indexed and ready to use
  • Failed - Processing error occurred

File Management

  • Preview - Click any file to view its extracted content
  • Delete - Remove files individually
  • Restore - Deleted files can be restored from trash
  • Search - Filter files by name
  • Pagination - View 50 files per page

Text Content

Add custom text snippets to train your agent with specific knowledge, FAQs, or formatted content.

Creating Text Content

  1. Navigate to the Text Content tab
  2. Click Add New to create a snippet
  3. Enter a Title for easy identification
  4. Use the rich text editor to format your content:
    • Headings and subheadings
    • Bold, italic, underline formatting
    • Bulleted and numbered lists
    • Links to external resources
    • Code blocks and quotes

Text Content Features

  • Rich HTML Editor - Format content with full HTML support
  • Character Count - Tracks storage usage per snippet
  • Edit Anytime - Update existing content as needed
  • Search & Filter - Find snippets quickly
  • Soft Delete - Restore accidentally deleted content

Best Practices

  • Use descriptive titles for easy organization
  • Break long content into separate snippets
  • Format content clearly with headings and lists
  • Include relevant keywords for better AI matching

Q&A Pairs

Create custom question-answer pairs to ensure your AI Agent provides precise responses to specific queries.

Creating Q&A Pairs

  1. Navigate to the Q&A tab
  2. Click Add Q&A Pair
  3. Enter the Question (supports HTML formatting)
  4. Enter the Answer (supports HTML formatting)
  5. Optionally assign a Category for organization

Q&A Features

  • Rich Formatting - Format both questions and answers with HTML
  • Categories - Organize Q&A pairs by topic
  • Bulk Import - Import multiple Q&A pairs from CSV
  • Edit & Delete - Modify existing pairs anytime
  • Character Tracking - Monitor storage usage

Creating Q&A from Conversations

You can create Q&A pairs directly from agent conversations:
  1. View a conversation in the Activity section
  2. Click Revise Answer on any AI response
  3. Edit the response if needed
  4. Check Save as Q&A Pair to add it to training data
This feature helps you continuously improve your agent based on real interactions.

CSV Import Format

When bulk importing Q&A pairs, use this CSV format:
question,answer,category
"What are your hours?","We're open Monday-Friday 9am-5pm","Business Hours"
"Do you ship internationally?","Yes, we ship worldwide","Shipping"

Website Crawler

Automatically extract content from websites to train your AI Agent on web-based documentation, blogs, or product pages.

Crawling Methods

Full Website Crawl

  1. Navigate to the Website tab
  2. Enter the homepage URL
  3. Configure crawl settings:
    • Max Depth - How many levels deep to crawl (default: 2)
    • Include Paths - Only crawl URLs matching these patterns
    • Exclude Paths - Skip URLs matching these patterns
  4. Click Start Crawl

Individual URLs

Add specific pages without crawling an entire site:
  1. Enter the URL of the page
  2. Click Add Single URL
  3. The page is fetched and processed immediately

Crawl Settings

Include Paths (optional)
  • Specify path patterns to crawl only specific sections
  • Example: /docs/, /blog/, /products/
  • Multiple patterns can be added
Exclude Paths (optional)
  • Skip unwanted sections like login pages or admin areas
  • Example: /login, /admin, /cart
  • Multiple patterns can be added
Max Depth
  • Controls how many link levels to follow from the starting URL
  • Depth 1: Only the starting page
  • Depth 2: Starting page + direct links
  • Depth 3: Two levels of linked pages

Website Source Management

Hierarchical View
  • Website sources are displayed in a tree structure
  • Parent URLs show child pages discovered during crawling
  • Click a parent to expand and view all child pages
Actions
  • Recrawl - Refresh content from the original URL
  • Cancel - Stop an in-progress crawl
  • Delete - Remove individual URLs or entire website sources
  • Bulk Delete - Select multiple sources to delete at once

Crawl Progress

Real-time updates show:
  • Crawled - Pages successfully processed
  • Queued - Pages waiting to be crawled
  • Total - Total pages discovered
Progress is updated live via WebSocket connection. Your subscription plan includes a link limit. The system tracks:
  • Links used across all agents
  • Links available in your plan
  • Upgrade prompts when approaching limits

Training States

All training sources go through these states:
StateDescription
ExtractingProcessing file or fetching content
DiscoveringWebsite crawler finding URLs
ReadyContent ready for training
TrainingGenerating embeddings for AI
TrainedSuccessfully indexed
EditedContent modified, needs retraining
FailedProcessing error
RemovedSoft deleted (can be restored)

Storage & Limits

  • Character Tracking - Each source displays character count
  • Total Storage - View total KB/MB used across all sources
  • Plan Limits - Different plans have different storage limits
  • Warnings - System alerts when approaching limits

Retraining Your Agent

After adding, editing, or deleting sources:
  1. Click Train Agent button
  2. The system generates embeddings for all new/modified content
  3. Real-time progress updates show training status
  4. Training typically completes in 2-5 minutes
Your agent automatically uses all Trained sources when responding to queries. There’s no need to manually enable individual sources.
  • Ensure uploaded files contain selectable text (not scanned images)
  • Website crawling respects robots.txt and may be blocked by some sites
  • Large files or websites may take several minutes to process
  • Character limits vary by subscription plan

What’s Next?