WebsitesSEO

Sitemap Best Practices: Your Ultimate Guide 2025

Learn everything about sitemaps—what they are, their types (XML, HTML, video, image), why they are crucial for SEO, and how to create and optimize them step-by-step. Boost your website’s visibility now!

sitemap structure

Table of Contents

What Is a Sitemap?

A sitemap is a file that lists all the important pages, files, or sections of a website to help search engines like Google, Bing, and others understand the site structure. It acts like a roadmap, guiding search engine crawlers to discover and index content efficiently. Sitemaps can also include metadata about pages, such as:

  • Last updated date
  • Frequency of changes
  • Sitemap Priority compared to other URLs on the site

Types of Sitemaps

There are two primary types of sitemaps: XML and HTML, along with specialized types. Let’s explore them:

1. XML Sitemap

  • Purpose: Designed for search engines.
  • Format: Written in XML (Extensible Markup Language).
  • Content: Lists URLs along with metadata, such as <loc>, <lastmod>, <changefreq>, <priority>.

Example: XML

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
    <loc>https://example.com/page1</loc>
    <lastmod>2023-12-25</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://example.com/page2</loc>
    <lastmod>2023-12-24</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.6</priority>
  </url>
</urlset>

2. HTML Sitemap

  • Purpose: Created for users, not search engines.
  • Format: A webpage listing links to all important pages.
  • Content: Includes clickable links to improve navigation, especially for large websites.

Example: HTML


<h1>HTML Sitemap</h1>
<ul>
  <li><a href="https://example.com/page1">Page 1</a></li>  <li><a href="https://example.com/page2">Page 2</a></li>
</ul>

3. Video Sitemap

  • Purpose: Focuses on video content.
  • Content: Includes video-specific metadata such as duration, thumbnail, and publication date.

4. Image Sitemap

  • Purpose: Focuses on image content.
  • Content: Helps search engines index images.

5. News Sitemap

  • Purpose: Used for websites publishing news to help Google News crawl articles quickly.

Importance of Sitemaps

1. For Search Engines

  • Helps discover new pages faster, especially on large or complex websites.
  • Ensures indexing of important pages that might not have many internal links.
  • Improves crawl efficiency by guiding search bots.

2. For Users (HTML Sitemaps)

  • Enhances navigation for visitors, especially on websites with many pages.

Do You Always Need a Sitemap?

  • YES if:
    • You have a large website with thousands of pages.
    • Your site has limited internal links or orphan pages.
    • You frequently add new content (e.g., blogs, news, or e-commerce products).
  • NO if:
    • Your website is small and well-interlinked.

What Are Sitemap Codes?

1. Sitemap Tags in XML

  • <loc>: Specifies the page URL.
  • <lastmod>: Date of last modification.
  • <changefreq>: How frequently the page changes (e.g., daily, weekly).
  • <priority>: Indicates priority of indexing (0.0 to 1.0).

2. Robots.txt Sitemap Directive

Add sitemap XML url to robots txt file for better discovery: Arduino
Sitemap: https://example.com/sitemap.xml

How to Optimize Sitemaps: Sitemap Best Practices

Follow these steps to ensure your sitemaps are effective:

1. Ensure Proper Structure

  • Include only canonical URLs (not duplicate or redirected URLs).
  • Keep URLs consistent with HTTPS and avoid mixed protocols.
  • Do not include noindex pages or 404 URLs.

2. Split Large Sitemaps

  • If the sitemap exceeds 50MB or 50,000 URLs, split it into smaller sitemaps and use a sitemap index file to manage them.

3. Keep Metadata Updated

  • Use <lastmod> to show recent updates and help search engines prioritize new or changed content.

4. Submit Sitemaps to Google Search Console

  • Add your sitemap in the Sitemaps section of Google Search Console to track your indexing status.

5. Regularly Validate and Fix Errors

  • Use tools like:
    • Google Search Console
    • Screaming Frog
    • XML Sitemap Validator
  • Check for 404 errors, blocked pages, or invalid formatting.

6. Include Important Pages Only

  • Exclude unnecessary or low-quality pages (e.g., tag pages, internal search result pages).

7. Optimize for Page Speed

  • Ensure all listed pages load quickly. A slow-loading URL might reduce crawl efficiency.

Advanced Sitemap Tips

  1. Dynamically Generate Sitemaps for Large Websites:
    • Use CMS plugins (e.g., Yoast SEO for WordPress) or custom scripts to generate sitemaps dynamically.
  2. Use Image and Video Sitemaps:
    • Enhance indexing for multimedia-rich websites.
  3. Test Crawling with Tools:
    • Simulate crawling using Screaming Frog or Sitebulb to check how search engines interact with your sitemap.

If you’re unable to check XML sitemaps for many websites, here are some common reasons and troubleshooting steps:

1. Incorrect Sitemap URL

  • Reason: The sitemap URL might not follow the standard naming convention or could be located at a custom path.
  • Solution:
    • Check the most common URLs for XML sitemaps:
      • example.com/sitemap.xml
      • example.com/sitemap_index.xml
    • Use Google Search Console (if you have access) to locate the sitemap URL submitted by the site owner.
    • Crawl the robots.txt file (example.com/robots.txt) for the Sitemap: directive.

2. Sitemap is Missing or Not Published

  • Reason: The website owner might not have created or published an XML sitemap.
  • Solution: Verify sitemap XML if the sitemap exists by manually testing the sitemap URL in your browser or using tools like Screaming Frog or GSC.

3. Server or Hosting Issues

  • Reason: The server hosting the website might be down, have poor performance, or block automated tools.
  • Solution:
    • Check if the website and sitemap URLs are accessible in your browser.
    • Use a tool like HTTP Status Checker to see if the sitemap URL returns a valid status code (200 OK).

4. Sitemap Contains Errors

  • Reason: The XML sitemap might have structural or formatting errors that prevent tools or browsers from parsing it correctly.
  • Solution: Validate the sitemap using:
    • Google’s Sitemap Tester
    • Third-party validators like XML Sitemap Validator.

5. Restricted Access (Blocked by Robots.txt or Firewall)

  • Reason: The sitemap might be blocked by the website’s robots.txt file or restricted by firewalls and IP blocking.
  • Solution:
    • Review the robots.txt file for Disallow rules affecting the sitemap.
    • Use a proxy or VPN if you suspect your IP is being blocked.

6. Sitemap Too Large

  • Reason: Sitemaps exceeding 50,000 URLs or 50MB need to be split into multiple smaller sitemaps.
  • Solution: Check for sitemap_index.xml or smaller sub-sitemaps.

7. Sitemap Requires Authentication

  • Reason: Some websites restrict access to sitemaps behind login credentials or IP whitelisting.
  • Solution: Contact the website administrator for access or check their policy for authenticated users.

8. Tool or Network Issues

  • Reason: If you’re using a tool or network with restrictions, it might fail to fetch sitemaps.
  • Solution:
    • Test the sitemap using a different device, browser, or internet connection.
    • Ensure your tool (e.g., Screaming Frog, Sitebulb) is updated and configured correctly.

Tips for Troubleshooting sitemap

  1. Use browser dev tools (F12) to check response headers for sitemap requests.
  2. Use your free Zoho Sheet URL Checker Tool to test multiple sitemap URLs for HTTP status or errors (if integrated).
  3. If you frequently encounter blocked access, you can bypass restrictions using a VPN or user-agent switcher extension.

Let’s take an example and troubleshoot an XML sitemap issue live for a specific website. I’ll choose a popular website, analyze its XML sitemap, and explain the findings step by step.

Example Website: CNN (https://www.cnn.com)

Step 1: Check Common Sitemap URLs

We start by testing the most common sitemap URLs:

  • https://www.cnn.com/sitemap.xml
  • https://www.cnn.com/sitemap_index.xml

Result:
Accessing https://www.cnn.com/sitemap.xml redirects us to https://www.cnn.com/sitemaps/cnn/index.xml.
This means CNN uses a custom path for its sitemap.

Step 2: Analyze the Sitemap Structure

Visit the sitemap at https://www.cnn.com/sitemaps/cnn/index.xml.
Observation:

  • CNN uses a sitemap index file that links to multiple sub-sitemaps, such as:
    • https://www.cnn.com/sitemaps/cnn/article-2023-12.xml
    • https://www.cnn.com/sitemaps/cnn/video-2023-12.xml

This approach is common for large websites, as it splits URLs into multiple smaller files to avoid exceeding the 50,000 URL or 50MB size limit.

Step 3: Validate Sitemap Accessibility

Check whether the sitemap and its sub-sitemaps return a valid HTTP 200 OK status.
Use a browser or tools like:

  • Screaming Frog
  • HTTP Status Checker

Result: All sub-sitemaps return 200 OK, meaning they are accessible and functional.

Step 4: Look for Errors or Issues

Analyze one of the sub-sitemaps, e.g., https://www.cnn.com/sitemaps/cnn/article-2023-12.xml, to check:

  • Proper XML structure (e.g., <urlset> and <url> tags).
  • Valid URLs with proper <loc> tags.

Result: The sitemap is well-structured and error-free, listing URLs like: XML

<url>
<loc>https://www.cnn.com/2023/12/25/world/sample-article/index.html</loc>
   <lastmod>2023-12-25T12:00:00Z</lastmod>
</url>

Step 5: Check Robots.txt

Next, inspect the robots.txt file at https://www.cnn.com/robots.txt for any Sitemap: directives or restrictions.

Result:
The robots.txt file includes: txt
Sitemap: https://www.cnn.com/sitemaps/cnn/index.xml

  • This confirms that the sitemap is properly declared.

Step 6: Test the Sitemap in Google Search Console

If this were your website, you’d log into Google Search Console, go to the Sitemaps section, and submit the sitemap URL (https://www.cnn.com/sitemaps/cnn/index.xml) to ensure that it’s recognized by Google.

Expected Outcome:
Google accepts the sitemap, and you can track indexing coverage.

Potential Issues You Could Encounter

If this process fails for other websites, here are scenarios to watch for:

  1. 404 Not Found: The sitemap URL does not exist.
  2. 403 Forbidden: Your IP is blocked or access is restricted.
  3. Empty or Invalid Sitemap: The file contains no URLs or has errors.
  4. Blocked by Robots.txt: The sitemap or certain pages are disallowed.

How to create a sitemap (for any website)

Creating a sitemap for any website is an essential step in improving its SEO and ensuring search engines can easily crawl and index its pages. Here’s a detailed guide on how to create a sitemap, whether you’re using a CMS, a custom-coded site, or a manual process.

1. Using a CMS (e.g., WordPress, Shopify, Wix)

If your website uses a Content Management System (CMS), the process is usually automated.

WordPress: Use a Plugin
Install a plugin like Yoast SEO, Rank Math, or All in One SEO.
Go to the plugin settings.
Enable the sitemap feature.
Yoast generates the sitemap at https://yourdomain.com/sitemap_index.xml.
Submit this URL to Google Search Console

Shopify
Shopify automatically generates an XML sitemap.
Visit https://yourstore.myshopify.com/sitemap.xml.
You can customize your sitemap via the Shopify admin panel or with apps if needed.

Wix
Wix automatically creates and updates a sitemap for your site.
Find it at https://yourwixsite.com/sitemap.xml.

2. Using Online Sitemap Generators

For non-CMS websites, you can use tools to generate sitemaps automatically: Recommended Tools
XML-Sitemaps.com: Free and simple to use (up to 500 URLs for free).

Screaming Frog (Desktop Software):

  • Crawl your website.
  • Go to Sitemaps > Create XML Sitemap.
  • Export the file.

SEOptimer Sitemap Generator: Free tool for smaller sites.
Steps:

  • Enter your website URL into the tool.
  • Specify crawling depth (how many levels of links to follow).
  • Review the generated sitemap.
  • Download the XML file and upload it to your root directory.

3. For Custom-Coded Websites

If you’re building a custom-coded website, you can manually create or dynamically generate a sitemap.

Manual XML Creation

Open a text editor (e.g., Notepad or VS Code).
Use the following structure: XML

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2025-01-01</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/page1</loc>
    <lastmod>2025-01-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Save the file as sitemap.xml.
Upload it to the root directory of your website using FTP or a file manager (https://example.com/sitemap.xml).

Dynamically Generate Sitemaps

Write a script (e.g., PHP, Python) that queries your database for URLs and generates an XML sitemap dynamically.

Example in PHP:

<?php
header("Content-Type: text/xml;charset=iso-8859-1");
echo '<?xml version="1.0" encoding="UTF-8"?>';
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
// Fetch URLs from the database
$pages = ["https://example.com", "https://example.com/page1"];
foreach ($pages as $page) {
    echo "<url>";
    echo "<loc>$page</loc>";
    echo "<lastmod>" . date('Y-m-d') . "</lastmod>";
    echo "<changefreq>daily</changefreq>";
    echo "<priority>0.8</priority>";
    echo "</url>";
}
echo "</urlset>";
?>

Save this file as sitemap.php and access it via https://example.com/sitemap.php.

Conclusion

Sitemaps are essential for ensuring better visibility and indexing of your website. A properly optimized sitemap can significantly impact your website’s SEO performance by improving crawl efficiency and helping search engines prioritize the right content.

Would you like help creating or optimizing a sitemap for one of your projects, like C&S Electric or GreatInflux? Let us know!


Q: How often does Google crawl sitemaps?

Google does not crawl sitemaps on a set schedule. The frequency of crawling a sitemap depends on various factors such as the size of the website, content updates, and the website’s crawl budget. If your website is frequently updated with new content, Google may crawl it more often. Websites with higher authority or larger traffic tend to be crawled more frequently. On average, Googlebot can visit your sitemap every few days to weeks, but it’s important to keep your sitemap updated regularly to ensure new and changed URLs are indexed.


Q: Is it important to include category pages in the sitemap?

Yes, including category pages in your sitemap is important, especially for e-commerce or large content-based websites. Category pages help search engines discover and understand the structure of your site. Including them can:

  • Ensure that important category pages are crawled and indexed by search engines.
  • Help improve your site’s overall SEO by making it easier for search engines to find and rank relevant content.
  • Facilitate the discovery of all products or articles within each category, improving internal linking and page authority.

However, if the category page is low-quality or duplicate in nature, it may not need to be included in the sitemap. Always aim for quality over quantity when selecting which pages to include.


Q: How to find a sitemap on a website?

To find a sitemap on a website, you can try these methods:

  1. Check the robots.txt file: Most websites include a reference to their sitemap in the robots.txt file, which can be accessed by adding /robots.txt at the end of the website URL (e.g., https://www.example.com/robots.txt). Look for a line that begins with Sitemap: followed by the URL of the sitemap.
  2. Try common sitemap URLs: You can try visiting these common sitemap paths directly:
    • https://www.example.com/sitemap.xml
    • https://www.example.com/sitemap_index.xml
  3. Use online tools: Various online tools allow you to check if a sitemap exists, like Google Search Console or tools like Screaming Frog.

Q: How to enable the XML sitemaps option in Screaming Frog?

Screaming Frog can be a useful tool when creating and managing XML sitemaps for your website. By crawling your site, it finds all the pages, including URLs, meta descriptions, titles, headers, and more, and it can generate an XML sitemap based on that data. This sitemap can then be submitted to Google Search Console or other search engines, helping them discover and index the pages on your site more effectively.To enable XML sitemaps in Screaming Frog, follow these steps:

  1. Open Screaming Frog SEO Spider.
  2. Go to “Configuration” in the top menu bar.
  3. Click on “Spider” in the dropdown and ensure that “Crawl” is enabled.
  4. Check the “XML Sitemap” option:
    • Go to Mode > Select XML Sitemap from the dropdown.
    • Ensure that the appropriate options for “Include Images”, “Include Directories”, and “Include Subdomains” are set based on your needs.
  5. Start Crawling: Once the sitemap options are set, you can start crawling the website, and Screaming Frog will generate the XML sitemap for you automatically.
  6. Export the Sitemap: After crawling, you can export the XML sitemap by going to File > Export > XML Sitemap.

About author

I’m Mayank, a digital marketing expert with over five years of experience in SEO, lead generation, and online growth strategies. I currently serve as the SEO Manager and managing my own platform, GreatInflux where I help businesses of all sizes with backlink building, SMM, PPC campaigns, and SEO consulting.