In this article we are going to learn how to configure sitemaps in AEM CaaS in step-by-step guide.

What is Sitemap

According to Google Developer

A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them. Search engines like Google read this file to crawl your site more efficiently. A sitemap tells search engines which pages and files you think are important in your site, and also provides valuable information about these files. For example, when the page was last updated and any alternate language versions of the page.

A sitemap helps search engines identify the list of urls eligible for crawling. Search engine craws these pages to list them and allow other people to find your content on search engine

Search engine look for sitemaps.xml at the site root path before crawling the site (i.e https://blog.bagwanpankaj.com/sitemap.xml)

Below is a sitemap snippet example that contains the page path and last modified date.

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<script/>
  <url>
    <loc>https://blog.bagwanpankaj.com/aem/aem-caas-how-to-configure-robots-txt</loc>
    <lastmod>2025-02-16T16:18:36+05:30</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://blog.bagwanpankaj.com/rust/introduction-to-webassembly-using-rust</loc>
    <lastmod>2025-02-16T16:18:36+05:30</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://blog.bagwanpankaj.com/rust/best-top-6-rust-framework-to-watch-out-in-2023</loc>
    <lastmod>2025-02-16T16:18:36+05:30</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://blog.bagwanpankaj.com/architecture/12-design-principles-you-can-implement-in-rust</loc>
    <lastmod>2025-02-16T16:18:36+05:30</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://blog.bagwanpankaj.com/javascript/bun-sh-an-introduction-another-js-runtime</loc>
    <lastmod>2025-02-16T16:18:36+05:30</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
</urlset>

As per AEM Documentation, thereare different ways to create Sitemap for author and publish.

Author Sitemap Implementation

Using the below OSGI configuration and setting allOnDemand to true will allow us to create a sitemap. Place the below configuration as part of the author configuration file config.author:

org.apache.sling.sitemap.impl.SitemapGeneratorManagerImpl~practice.cfg.json

{
  "allOnDemand": true
}

Note: allOnDemand configuration is having a drawback as it process data and generate sitemap everytime we hit the URL to generate a site map.

Common Configuration for both author and publish

Below is the common configuration for both author and publish, which allows us to include the last modified date and represent data in XML format.

Place the below configuration as part of the CONIG file to apply to both authors and publish:

com.adobe.aem.wcm.seo.impl.sitemap.PageTreeSitemapGeneratorImpl.cfg.json

{
  "enableLastModified": true,
  "lastModifiedSource": "cq:lastModified",
  "enableLanguageAlternates": false
}

Page Properties Update

To generate a sitemap it is mandatory to enable sitemap as part of root page property under advanced tab as shown below.

AEM Sitemap Page Properties

Exclude Page and apply some other properties

We can apply the below noindex property as part of robot tags if we don’t want that page to get indexed.

We can also apply other Root Tags property options specific to that page, such as follow, nofollow, noarchive, etc.

AEM Sitemap Page Properties Exclude pages

Generate Sitemap on author

Hit below URL to generate a sitemap as part of author:

http://localhost:4502/content/pracitce/en.sitemap.xml

Publish Sitemap Implementation

Above allowOnDemand, cannot be use in publish to generate a sitemap because every time it process and generate a sitemap.

  1. Create below configuration as part of publish config which will run a scheduler depending on give time or period to generate a sitemap considering below sitemap.

org.apache.sling.sitemap.impl.SitemapScheduler~practice.cfg.json

{
  "scheduler.name": "Practice Daily Sitemap Scheduler",
  "scheduler.expression": "0 0 2 1/1 * ? *",
  "searchPath": "/content/practice/us"
}

Note: Every time scheduler runs will generate sitemap folder inside /var/sitemaps folder like /var/sitemaps/content/practice/us/sitemap.xml hierarchy.

  1. Generating a site will also require below publish configurations to consider extension, resourcet ype and selectors to generate a sitemap.

org.apache.sling.sitemap.impl.SitemapServlet~practice.cfg.json

{
 "sling.servlet.extensions": "xml",
 "sling.servlet.resourceTypes": [
  "pracitce/components/structure/homepage",
  "practice/components/structure/profile",
  "practice/components/ea/structure/search"
 ],
 "sling.servlet.selectors": [
  "sitemap",
  "sitemap-index"
 ]
}

Generate Sitemap on author

Enable sitemap as part of advanced tab page properties and publish page.

Hit below URL to generate a sitemap as part of author:

http://localhost:4503/content/pracitce/en.sitemap.xml

Dispatcher Update

It will require below small amount of updates within dispatcher to access/render sitemap

  1. Allow below entry as part of dispatcher/src/conf.dispatcher.d/filters/filters.any file.
/0200 { /type "allow" /path "/content/*" /selectors '(sitemap-index|sitemap)' /extension "xml" }
  1. Allow .xml extension as part of rewrite rules dispatcher/src/conf.d/rewrites/rewrite.rules file.
RewriteCond %{REQUEST_URI} (.html|.jpe?g|.png|.svg|.xml)$

Further Reading


About The Author

I am Pankaj Baagwan, a System Design Architect. A Computer Scientist by heart, process enthusiast, and open source author/contributor/writer. Advocates Karma. Love working with cutting edge, fascinating, open source technologies.

  • To consult Pankaj Bagwan on System Design, Cyber Security and Application Development, SEO and SMO, please reach out at me[at]bagwanpankaj[dot]com

  • For promotion/advertisement of your services and products on this blog, please reach out at me[at]bagwanpankaj[dot]com

Stay tuned <3. Signing off for RAAM