What "unlimited" really means for sitemap generation
Most sitemap generators online enforce one or more hidden limits: a 500-URL free cap, a daily generation quota, a max site size, or memory limits that crash the crawler on larger websites. SitemapMaker.net was specifically built to handle large sites without any of those artificial constraints.
"Unlimited" here means three concrete things: no per-sitemap URL cap (we hit the sitemaps.org maximum of 50,000 then auto-split), no overall site size limit (sites with 1,000,000+ URLs work — we just generate more index files), and no daily quota that would force you to pay or wait. The only practical limits are the sitemaps.org protocol itself and a courtesy rate limit of 30 generations per hour per IP to prevent abuse.
How the sitemap index handles large sites
The sitemaps.org protocol caps a single sitemap file at 50,000 URLs and 50 MB uncompressed. For sites larger than this, you need a sitemap index — a master file that points to multiple smaller sitemaps. Our generator handles this automatically:
- If your site has up to 50,000 URLs, you get a single
sitemap.xmlfile. - If your site exceeds 50,000 URLs, we generate
sitemap-1.xml,sitemap-2.xml, etc., each containing up to 50,000 URLs. - A master
sitemap.xmlindex file is generated that links to all the sub-sitemaps. - You only submit the index URL to Google — Google fetches the linked sub-sitemaps automatically.
This split happens transparently. You do not need to configure batch sizes, choose split strategies, or run multiple jobs. Submit one URL, get one set of files, upload, done.
Memory and performance characteristics
The crawler is built to scale. Specifically:
- Streamed URL processing — URLs are flushed to disk in batches as they are discovered, so memory usage stays bounded regardless of site size.
- Bounded request timeouts — Each page fetch has a 15-second timeout to prevent stalls on slow servers.
- Maximum page size 512KB — We use HTTP Range requests to cap how much HTML we read per page, avoiding memory blowup on huge pages.
- Maximum crawl depth 6 — Prevents infinite loops on sites with calendar widgets or paginated archives.
- Same-host enforcement — The crawler stays on the original hostname for safety; subdomains require separate runs.
Real-world performance benchmarks
Here are typical generation times we have observed across different site sizes:
| Site size | Typical time | Output |
|---|---|---|
| 1–500 URLs | 5–15 seconds | Single sitemap.xml |
| 500–5,000 URLs | 20–60 seconds | Single sitemap.xml |
| 5,000–50,000 URLs | 2–4 minutes | Single sitemap.xml |
| 50,000–200,000 URLs | 5–12 minutes | Sitemap index + 2–4 children |
| 200,000+ URLs | 15+ minutes | Sitemap index + 5+ children |
Actual times depend heavily on your server's response speed. A site with fast server response times (under 200ms per page) finishes much quicker than one with slow back-end queries.
When unlimited matters
For most blogs and small business sites, the URL cap is irrelevant — they have well under 1,000 pages. Unlimited matters specifically when you run:
- Large e-commerce stores with tens of thousands of product, category, and faceted URLs
- News and publisher sites with deep article archives going back years
- Real estate, jobs, or classified portals with high listing turnover
- Marketplaces with seller pages, listing pages, and category pages
- Educational platforms with thousands of courses, lessons, and resources
- SaaS knowledge bases with extensive documentation
If your site is in any of these categories, the typical 500-URL free tier from competitors is essentially unusable, and the paid tiers run $50–$200 per month. Our unlimited free tier is a meaningful cost saving.
Tips for crawling large sites efficiently
- Use exclusions aggressively — large sites tend to have many low-value URL patterns (faceted nav, session IDs, sort orders) that bloat the sitemap without SEO benefit. Exclude them.
- Generate during off-peak hours — for very large sites, run the crawl when your server has spare capacity to handle the additional request load.
- Submit the index, not individual sitemaps — Google Search Console only needs the master index URL; it discovers and fetches sub-sitemaps automatically.
- Monitor in Search Console — after submission, watch the "Pages indexed / submitted" ratio. If it is low, your sitemap may include URLs Google does not consider valuable.
- Regenerate quarterly minimum — even static sites benefit from a quarterly refresh to update lastmod values and discover any newly added pages.
Architecture, briefly
For the technically curious: our crawler uses a queue-based BFS algorithm. URLs are added to a queue as they are discovered, processed in order, and their HTML responses are parsed for new internal links. A bloom filter ensures duplicates are rejected in constant memory. Output is streamed to a temp file as we go, then split into the final sitemap files at the end. The entire process is bounded in memory regardless of site size.