Skip to main content
Website crawling is the easiest way to train your chatbot. BubblaV scans your pages, extracts content, and makes it searchable so your bot can answer questions accurately.

Adding Your Main Website

When you create a new website in your dashboard, you’ll be prompted to enter your main website URL. The crawling process starts automatically after you add the website. To add a main website later:
1

Navigate to Knowledge

Go to KnowledgeWebsites
2

Enter Your URL

Type your website URL (e.g., https://example.com)
3

Add Website

Click Add - crawling begins automatically in the background
4

Monitor Progress

Check the KnowledgePages section to see crawling progress. No manual start is needed.
5

Review Results

Once crawling completes, check the list of crawled pages and disable any you don’t want

How Crawling Works

When you add a website or sub-website, BubblaV automatically starts crawling:
  1. Visits your URL and extracts all text content
  2. Follows links to discover other pages on your domain
  3. Detects sitemaps at /sitemap.xml and crawls listed URLs
  4. Processes content into searchable chunks with embeddings
  5. Updates status for each page (Crawled, Pending, Failed)
Crawling respects your robots.txt file. Pages blocked there won’t be crawled.

Adding More Content Sources

Sub-websites

What is a sub-website? A sub-website is an additional website or domain that you want to include in your knowledge base alongside your main website. This allows you to train your chatbot on content from multiple related sites. To add a sub-website:
  1. Go to KnowledgeWebsites
  2. Click Add Website
  3. Enter the sub-website URL (e.g., https://blog.example.com)
  4. Click Add - crawling starts automatically
Add Sub-website
Use cases:
  • Blog on a subdomain (e.g., https://blog.example.com)
  • Help center on a different domain
  • Regional or language-specific sites
  • Multiple related websites you want to include in one knowledge base

Individual Pages

Add specific URLs that aren’t linked from your main site:
  1. Go to KnowledgePages
  2. Click Add Page
  3. Paste the full URL
  4. Click Add - the page will be crawled automatically
Add Page
Use cases:
  • Landing pages
  • PDF documents hosted online
  • Specific product pages

Sitemap Import

Import all URLs from your sitemap at once:
  1. Go to KnowledgeSitemaps
  2. Click Add Sitemap
  3. Enter your sitemap URL (e.g., https://example.com/sitemap.xml)
  4. Click Import
Add Sitemap
All URLs in the sitemap will be automatically queued for crawling.

Managing Crawled Pages

Enable/Disable Pages

Toggle pages on/off to control what the bot knows:
  • Enabled: Bot can use this content to answer questions
  • Disabled: Content is stored but not used
Disable pages like login, cart, checkout, and privacy policy that shouldn’t influence answers.

Delete Pages

Permanently remove pages from your knowledge base:
  1. Find the page in the list
  2. Click the delete icon
  3. Confirm deletion

Automatic Incremental Crawling

BubblaV automatically performs incremental crawls to keep your knowledge base up to date. The system detects changes on your website and only crawls new or updated pages, making the process efficient and fast. How it works:
  • The system monitors your websites for changes
  • New pages are automatically discovered and crawled
  • Updated pages are re-indexed when changes are detected
  • No manual action is required
Sync frequency by plan:
PlanAuto Sync
FreeManual only
StarterMonthly
ProWeekly
TurboWeekly
Incremental crawls run automatically in the background. You don’t need to manually trigger re-crawls.

Plan Page Limits

PlanMax Pages (Total)
Free50 pages
Starter500 pages
Pro5,000 pages
Turbo50,000 pages
“Total Pages” includes:
  • Crawled Web Pages
  • Uploaded Files (1 file = 1 page)
  • Q&A Entries (1 entry = 1 page)
When you hit your limit, new pages won’t be crawled and you won’t be able to upload files. Upgrade your plan for more capacity.

Best Practices

Crawl product pages, FAQs, and support content first. These have the highest impact on customer satisfaction.
Login, registration, cart, and checkout pages don’t help answer customer questions.
The system automatically performs incremental crawls to detect and index new or updated content. For major updates, the automatic sync will pick up changes based on your plan’s frequency.
Review failed pages to ensure important content isn’t missing. Fix issues on your website if needed.

Troubleshooting

  • Ensure pages are linked from your main site
  • Check your sitemap includes all pages
  • Add pages manually via the Pages tab
  • Verify page has visible text (not just images)
  • Check JavaScript-rendered content is server-side rendered
  • Contact support for complex pages
  • Large sites may take hours to fully crawl
  • Check progress in the dashboard
  • Pages are usable as soon as they’re crawled

Next Steps