14 December 2020

Ensuring that XMLTYPE to JSON transformations create JSON arrays - even when there is only a single element.

Recently, my colleagues and I were transforming a large quantity of XML to JSON via the APEX_JSON package.  We wanted any XML tag that was repeated more than once to be transformed into a JSON array.  However there was an issue with repeatable tags, whenever there was only one instance of the element then it wasn't created as an array.  

Here's a very simplified example to illustrate the issue.

<parents>
  <parent>
    <name>Aidan</name>
    <children>
      <child>Aoife</child>    <=== two entries
      <child>Fionn</child>
    </children>
  </parent>
  <parent>
    <name>Eamon</name>
    <children>
      <child>Saoirse</child>  <=== a single entry
    </children>
  </parent>
</parents>

When an XML document such as the one above is converted to JSON, the two children of Aidan are represented as an array containing two elements - but the single child of Eamon is not.  
This can cause some issues for JSON parsers consuming the data, as they now have to cater for two scenarios (with-array and without-array) to correctly extract the data. 
[
   {
      "name":"Aidan",
      "children":[         <=== this is what we want, an array.
         "Aoife",
         "Fionn"
      ]
   },
   {
      "name":"Eamon",
      "children":{         <=== we wanted an array here too :(
         "child":"Saoirse"
      }
   }
]

This piece of PL/SQL code illustrates the issue.
set serveroutput on
declare
  v_xml    xmltype;
  v_json   clob;
begin
  v_xml := xmltype(
  '<parents>
      <parent>
        <name>Aidan</name>
        <children>
          <child>Aoife</child>
          <child>Fionn</child>
        </children>
      </parent>
      <parent>
        <name>Eamon</name>
        <children>
          <child>Saoirse</child>
        </children>
      </parent>
    </parents>');

  apex_json.initialize_clob_output;
  apex_json.write(v_xml);
  v_json := apex_json.get_clob_output;
  apex_json.free_output;  
  
  dbms_output.put_line(v_json);
end;
/
[
 {"name":"Aidan",
  "children":["Aoife","Fionn"]},    <=== is an array
 {"name":"Eamon",
  "children":{"child":"Saoirse"}}   <=== is not an array :(
]
We tried several ways to work around this. For example, one unsatisfactory solution that we tried was to create of an empty <child/> tag whenever there was only one entry and then to subsequently strip it out of the resulting JSON.  

However, a quick communication with the ever-responsive APEX Development Team yielded a more elegant solution... 

Apparently, there are some naming conventions used by the DB XML to JSON generators.  Amongst these are :
  • if an XML node name is "rowset", then it always maps to an JSON array. 
  • if an XML node has a sub-node that ends in "_row", then it also always maps to an JSON array.
Armed with this knowledge, we modified our XML slightly and renamed the <child> tag to <child_row> so that our PL/SQL example now becomes:
set serveroutput on
declare
  v_xml    xmltype;
  v_json   clob;
begin
  v_xml := xmltype(
  '<parents>
      <parent>
        <name>Aidan</name>
        <children>
          <child_row>Aoife</child_row>
          <child_row>Fionn</child_row>
        </children>
      </parent>
      <parent>
        <name>Eamon</name>
        <children>
          <child_row>Saoirse</child_row>
        </children>
      </parent>
    </parents>');

  apex_json.initialize_clob_output;
  apex_json.write(v_xml);
  v_json := apex_json.get_clob_output;
  apex_json.free_output;  
  
  dbms_output.put_line(v_json);
end;
/
[ 
 {"name":"Aidan",
  "children":["Aoife","Fionn"]},  <=== is an array
 {"name":"Eamon",
  "children":["Saoirse"]}         <=== is also an array!!
]

Now we can consistently generate arrays without having to resort to complex pre- and post-processing.  The only little bit of pre-processing that we need to do is ensuring that any XML tags that must become JSON arrays end with "_row".

Many thanks to the great APEX Development Team and especially to Christian Neumueller for the support and help that they gave us for this issue!!

The above was run and tested on the Oracle Autonomous Database Cloud using an "Always Free" database that is running database version 19c at the time of writing.