15 October 2025

Creating a Google XML Sitemap using Oracle APEX. #JoelKallmanDay

Oracle APEX makes it easy to build modern web apps, but ensuring that your public pages are properly indexed by Google sometimes requires a little extra work. Recently, I was asked to generate a dynamic XML Sitemap for one such APEX app, and here’s how I went about it.

What is a Google Sitemap, and why is it important?

A Google Sitemap is a simple XML file, usually named sitemap.xml, that lists all the important URLs on your site along with metadata like update dates and priority. It helps search engines like Google discover and understand the structure of your site more efficiently, especially pages that may be hard to find via internal links or other navigation. Submitting it through Google Search Console gives you visibility. In short, it is a practical means to ensure faster, more reliable indexing of your site.

Our APEX application

For simplicity, we’ll ignore the non-essential parts and reduce the app to its three main pages.  The pages that we want to appear on the Google XML Sitemap are the detail pages. We want to be sure that each one of our thousands of articles are indexed.

  • A home page
  • A search page
  • A detailed page for each entry in the database (1,000s of entries)

Understanding the XML structure

The first step was to check Google’s documentation for the required XML structure:

Google Sitemap XML documentation

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.example.com/foo.html</loc>
    <lastmod>2022-06-04</lastmod>
  </url>
</urlset>

We quickly realized that, for our use-case, we also needed to include a News Sitemap:

Google News Sitemap documentation

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
  <url>
    <loc>http://www.mysite.org/ords/r/press/articles?p_slug=companies-a-b-in-merger-talks</loc>
    <lastmod>2022-06-04</lastmod>
    <news:news>
      <news:publication>
        <news:name>The Example Times</news:name>
        <news:language>en</news:language>
      </news:publication>
      <news:publication_date>2008-12-23</news:publication_date>
      <news:title>Companies A, B in Merger Talks</news:title>
    </news:news>
  </url>
</urlset>


Our Data Source: vw_articles View

For the purposes of this blog, I’ve replaced the actual view that we used with a view called vw_articles containing generated demo data. The view returns a list of news articles along with metadata such as their unique IDs, titles, publication dates, last update dates, etc.  A field called url_slug contains a descriptive but also unique, alternate key in a readable text form (e.g., tree-falls-in-a-forest, prime-minister-slips-on-banana) 

Name             Null?    Type          
---------------- -------- ------------- 
ARTICLE_ID       NOT NULL NUMBER        
TITLE            NOT NULL VARCHAR2(200) 
URL_SLUG         NOT NULL VARCHAR2(200) 
PUBLICATION_DATE NOT NULL DATE          
LAST_MODIFIED    NOT NULL DATE          
AUTHOR                    VARCHAR2(100) 
CATEGORY                  VARCHAR2(50)



Generating the XML per article

Now that we know what XML structure Google expects, let’s generate it dynamically using SQL. We'll start by generating the XML for each article. This will generate one row for each article. Each row will contain the XML describing that article.

select xmlelement("url",
         xmlelement("loc", 'http://www.mysite.org/ords/r/press/articles?p_slug=' 
                           || a.url_slug),
         xmlelement("lastmod", to_char(a.last_modified, 'YYYY-MM-DD')),
         xmlelement("changefreq", 
                     case
                       when a.last_modified > sysdate-15 
                         then 'daily'
                       when a.last_modified > sysdate-60 
                         then 'weekly'
                         else 'monthly'
                     end),
         xmlelement("priority", 
                     case
                       when a.last_modified > sysdate-15 
                         then '0.8'
                       when a.last_modified > sysdate-60 
                         then '0.6'
                         else '0.4'
                     end),
          xmlelement("news:news",
            xmlelement("news:publication",
              xmlelement("news:name",'The Best Daily News '||a.article_id),
              xmlelement("news:language",'en')
             ),  
            xmlelement("news:publication_date", a.publication_date),
            xmlelement("news:title", a.title)
           )
         )
  from vw_articles a;

You may notice that we’ve set each entry’s change frequency (changefreq) and priority dynamically - they are set based on the number of days since the article was last updated.

The SQL statement above gives us something that looks like this for each entry:

<url>
 <loc>http://www.mysite.org/ords/r/press/articles?p_slug=olympic-athletes-break-three-world-records</loc>
 <lastmod>2025-05-18</lastmod>
 <changefreq>monthly</changefreq>
 <priority>0.4</priority>
 <news:news>
  <news:publication>
   <news:name>The Best Daily News 4</news:name>
   <news:language>en</news:language>
  </news:publication>
  <news:publication_date>2025-05-18</news:publication_date>
  <news:title>Olympic Athletes Break Three World Records</news:title>
 </news:news>
</url>


Aggregating entries into a sitemap

Rather than generating multiple XML fragments, we’d like to aggregate each entry into a single XML document. To do this, we first use the XMLAGG function so that all the entries are returned as a single, aggregated row. We would like the entries to be sorted in descending order by last_modified, so we add order by a.last_modified desc to the end of the XMLAGG function.

We’ll enclose these aggregated entries inside a <urlset> tag to create a complete, valid XML document. Finally, we’ll use xmlserialize to convert our XMLTYPE output to an indented CLOB. We’ve also added the XML declaration at the start of the document.

select '<?xml version="1.0" encoding="UTF-8"?>' || chr(10) ||
   xmlserialize(
     content
     xmlelement(
           "urlset", 
           xmlattributes
              ('http://www.sitemaps.org/schemas/sitemap/0.9' as "xmlns",
               'http://www.google.com/schemas/sitemap-news/0.9' as "xmlns:news"),
           xmlagg(
             xmlelement("url",
               xmlelement("loc", 'http://www.mysite.org/ords/r/press/articles?p_slug=' 
                                 || a.url_slug),
               xmlelement("lastmod", to_char(a.last_modified, 'YYYY-MM-DD')),
               xmlelement("changefreq", 
                          case
                            when a.last_modified > sysdate-15 
                              then 'daily'
                            when a.last_modified > sysdate-90 
                              then 'weekly'
                              else 'monthly'
                          end),
               xmlelement("priority", 
                          case
                            when a.last_modified > sysdate-15 
                              then '0.8'
                            when a.last_modified > sysdate-90 
                              then '0.6'
                              else '0.4'
                          end),
                xmlelement("news:news",
                  xmlelement("news:publication",
                    xmlelement("news:name",'The Very Best Daily News'),
                    xmlelement("news:language",'en')
                   ),  
                  xmlelement("news:publication_date", a.publication_date),
                  xmlelement("news:title", a.title)
                 )
               )
            )
        )
     as clob indent size = 2
   ) as xml_output
from vw_articles a;

 For very large datasets (tens of thousands of URLs), consider chunking your sitemap into multiple sitemap files and generating a sitemap index file, as per Google’s recommendation (maximum 50,000 URLs per sitemap).

Final XML

This gives us our final XML, which is one <urlset> that encompasses all the <url> entries that we want to appear on the sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
  <url>
    <loc>http://www.mysite.org/ords/r/press/articles?p_slug=climate-change-policy-update</loc>
<lastmod>2025-08-17</lastmod> <changefreq>daily</changefreq> <priority>0.8</priority> <news:news> <news:publication> <news:name>The Very Best Daily News</news:name> <news:language>en</news:language> </news:publication> <news:publication_date>2025-08-16</news:publication_date> <news:title>Climate Change Policy Update</news:title> </news:news> </url> ... ... thousands of entries ... </urlset>

You can validate your sitemap using the W3C XML Validator or Google’s own testing tools before submitting.

Serving the Sitemap from APEX

  1. Create a blank APEX page with a minimal template. Call this new page sitemap with an alias of sitemap. Make sure that this page is a public page.
Screenshot showing sitemap output XML
  1. I strongly recommend using packages when developing any substantial piece of PL/SQL. For this example, I’ve created a package that contains the SQL statement to generate the XML shown above. The package is called googlesitemaps and you can download it from here.

    Basically, this package's procedure generateMainSitemap selects the XML <urlset> into a CLOB and then outputs that CLOB to the HTTP stream.

   procedure generateMainSitemap is
      l_clob_output  clob;  -- Holds the generated XML sitemap content
   begin
     ...
     ...   (removed for brevity)
     ...   A section that retrieves the XML in a CLOB variable 
     ...   called l_clob_output using the SQL shown earlier.
     ...   You can download this using the link above
     ...
     
     -- Set HTTP response content type to XML
     owa_util.mime_header('application/xml', false);
     
     -- Specify content length for client optimization
     htp.p('Content-Length: ' || dbms_lob.getlength(l_clob_output));
     
     -- Close HTTP headers
     owa_util.http_header_close;
     
     -- Output the XML content without HTML escaping (preserve XML structure)
     apex_util.prn(p_clob => l_clob_output, p_escape => false);

   end generateMainSitemap;
  1. Once the package is in place, create a pre-rendering, before-header process called “Output the XML sitemap”. Inside the begin ... end, this process will contain two lines.

    The first one calls the googlesitemaps.generateMainSitemap procedure to output the XML directly to the HTTP stream.

    The second one calls apex_application.stop_apex_engine to stop all APEX processing dead in its tracks. This ensures that nothing else is output to the HTTP stream.


Submitting the sitemap to the Google Search Console

Now we're ready to submit the sitemap to Google.  
Google will automatically search for a file called "sitemap.xml" but, unfortunately, with the current version of APEX (24.2), we can't add a dot in a page alias.  This means that we can't "pretend" to have a file called "sitemap.xml".  

The workaround to this is to submit your new sitemap directly to Google using the Google Search Console for your site.  The search console looks something like this.


If we click on Sitemaps we’ll come to a page that looks like this:

Here, you simply have to enter the URL of your newly created sitemap page and submit it to Google. It won't be processed immediately, but it will be added to a queue for processing. In my experience, it seems to get processed within two or three days.

Conclusion

So, there we have it. With this approach, you can make your APEX applications more discoverable and ensure your content reaches a wider audience through proper search indexing.  I hope that this blog post helps some of you make your public-facing APEX applications gain a wider audience.

Happy APEXing!!

No comments:

Post a Comment