Sources - ChatRos Documentation

The Sources section is where you manage all the training data for your AI Agent. Upload files, add text content, create Q&A pairs, and crawl websites to train your agent with relevant knowledge.

Training Overview

View your training data summary including:

Total Sources - Count of all training sources across all types
Training Data - Total storage used (displayed in KB/MB)
Last Trained - When your agent was last trained
Breakdown by Type - See how many sources you have for Files, Text Content, Q&A, and Websites

The system automatically tracks the training state of each source and updates in real-time as processing completes.

Files

Upload documents to train your AI Agent on existing content like manuals, FAQs, product information, and business documents.

Supported File Types

PDF (.pdf) - PDF documents
Text (.txt) - Plain text files
Word (.doc, .docx) - Microsoft Word documents
CSV (.csv) - Comma-separated values

Uploading Files

Navigate to the Files tab in Training
Drag and drop files into the upload zone, or click to browse
Files are processed automatically with real-time status updates
Character count is tracked for each file

Processing States

Files go through several states during processing:

Extracting - Text is being extracted from the file
Ready - Content is ready for training
Training - Embeddings are being generated
Trained - Successfully indexed and ready to use
Failed - Processing error occurred

File Management

Preview - Click any file to view its extracted content
Delete - Remove files individually
Restore - Deleted files can be restored from trash
Search - Filter files by name
Pagination - View 50 files per page

Text Content

Add custom text snippets to train your agent with specific knowledge, FAQs, or formatted content.

Creating Text Content

Navigate to the Text Content tab
Click Add New to create a snippet
Enter a Title for easy identification
Use the rich text editor to format your content:
- Headings and subheadings
- Bold, italic, underline formatting
- Bulleted and numbered lists
- Links to external resources
- Code blocks and quotes

Text Content Features

Rich HTML Editor - Format content with full HTML support
Character Count - Tracks storage usage per snippet
Edit Anytime - Update existing content as needed
Search & Filter - Find snippets quickly
Soft Delete - Restore accidentally deleted content

Best Practices

Use descriptive titles for easy organization
Break long content into separate snippets
Format content clearly with headings and lists
Include relevant keywords for better AI matching

Q&A Pairs

Create custom question-answer pairs to ensure your AI Agent provides precise responses to specific queries.

Creating Q&A Pairs

Navigate to the Q&A tab
Click Add Q&A Pair
Enter the Question (supports HTML formatting)
Enter the Answer (supports HTML formatting)
Optionally assign a Category for organization

Q&A Features

Rich Formatting - Format both questions and answers with HTML
Categories - Organize Q&A pairs by topic
Bulk Import - Import multiple Q&A pairs from CSV
Edit & Delete - Modify existing pairs anytime
Character Tracking - Monitor storage usage

Creating Q&A from Conversations

You can create Q&A pairs directly from agent conversations:

View a conversation in the Activity section
Click Revise Answer on any AI response
Edit the response if needed
Check Save as Q&A Pair to add it to training data

This feature helps you continuously improve your agent based on real interactions.

CSV Import Format

When bulk importing Q&A pairs, use this CSV format:

question,answer,category
"What are your hours?","We're open Monday-Friday 9am-5pm","Business Hours"
"Do you ship internationally?","Yes, we ship worldwide","Shipping"

Website Crawler

Automatically extract content from websites to train your AI Agent on web-based documentation, blogs, or product pages.

Crawling Methods

Full Website Crawl

Navigate to the Website tab
Enter the homepage URL
Configure crawl settings:
- Max Depth - How many levels deep to crawl (default: 2)
- Include Paths - Only crawl URLs matching these patterns
- Exclude Paths - Skip URLs matching these patterns
Click Start Crawl

Individual URLs

Add specific pages without crawling an entire site:

Enter the URL of the page
Click Add Single URL
The page is fetched and processed immediately

Crawl Settings

Include Paths (optional)

Specify path patterns to crawl only specific sections
Example: /docs/, /blog/, /products/
Multiple patterns can be added

Exclude Paths (optional)

Skip unwanted sections like login pages or admin areas
Example: /login, /admin, /cart
Multiple patterns can be added

Max Depth

Controls how many link levels to follow from the starting URL
Depth 1: Only the starting page
Depth 2: Starting page + direct links
Depth 3: Two levels of linked pages

Website Source Management

Hierarchical View

Website sources are displayed in a tree structure
Parent URLs show child pages discovered during crawling
Click a parent to expand and view all child pages

Actions

Recrawl - Refresh content from the original URL
Cancel - Stop an in-progress crawl
Delete - Remove individual URLs or entire website sources
Bulk Delete - Select multiple sources to delete at once

Crawl Progress

Real-time updates show:

Crawled - Pages successfully processed
Queued - Pages waiting to be crawled
Total - Total pages discovered

Progress is updated live via WebSocket connection.

Link Limits

Your subscription plan includes a link limit. The system tracks:

Links used across all agents
Links available in your plan
Upgrade prompts when approaching limits

Training States

All training sources go through these states:

State	Description
Extracting	Processing file or fetching content
Discovering	Website crawler finding URLs
Ready	Content ready for training
Training	Generating embeddings for AI
Trained	Successfully indexed
Edited	Content modified, needs retraining
Failed	Processing error
Removed	Soft deleted (can be restored)

Storage & Limits

Character Tracking - Each source displays character count
Total Storage - View total KB/MB used across all sources
Plan Limits - Different plans have different storage limits
Warnings - System alerts when approaching limits

Retraining Your Agent

After adding, editing, or deleting sources:

Click Train Agent button
The system generates embeddings for all new/modified content
Real-time progress updates show training status
Training typically completes in 2-5 minutes

Your agent automatically uses all Trained sources when responding to queries. There’s no need to manually enable individual sources.

Ensure uploaded files contain selectable text (not scanned images)
Website crawling respects robots.txt and may be blocked by some sites
Large files or websites may take several minutes to process
Character limits vary by subscription plan

What’s Next?

Test Agent

Test your agent with the new training data

Settings

Configure your agent’s behavior

Activity

View conversations and responses

Analytics

Monitor performance metrics

​Training Overview

​Files

​Supported File Types

​Uploading Files

​Processing States

​File Management

​Text Content

​Creating Text Content

​Text Content Features

​Best Practices

​Q&A Pairs

​Creating Q&A Pairs

​Q&A Features

​Creating Q&A from Conversations

​CSV Import Format

​Website Crawler

​Crawling Methods

​Full Website Crawl

​Individual URLs

​Crawl Settings

​Website Source Management

​Crawl Progress

​Link Limits

​Training States

​Storage & Limits

​Retraining Your Agent

​What’s Next?

Test Agent

Settings

Activity

Analytics

Training Overview

Files

Supported File Types

Uploading Files

Processing States

File Management

Text Content

Creating Text Content

Text Content Features

Best Practices

Q&A Pairs

Creating Q&A Pairs

Q&A Features

Creating Q&A from Conversations

CSV Import Format

Website Crawler

Crawling Methods

Full Website Crawl

Individual URLs

Crawl Settings

Website Source Management

Crawl Progress

Link Limits

Training States

Storage & Limits

Retraining Your Agent

What’s Next?