19 August 2022

Correctly sorting data containing accented characters (a.k.a. the Côte d'Ivoire and Türkiye issue)

This will be short and sweet. Hopefully it will be useful to some of you.

As you may or may not know, the country formerly known as Turkey recently changed its official name to Türkiye , even in English. However, when sorting by country name, it should appear between Tunisia and Turkmenistan rather than later in the list. (Official U.N. sorted list can be found here.)

In the weeks following this change, I was asked on at least five separate occasions to look at sorting issues that arose. Please accept my apologies in advance if you already know what I'm about to write. I thought it was common knowledge, but recent experience has shown otherwise.

Below, I'll explain how I correctly sort by country within Oracle. This technique also applies to any other data (names, etc.) that need to take accented characters into account when sorting. I hope it's useful to some of you.

Note: What comes below applies principally to Latin alphabets. I haven’t tested on non-Latin alphabets, but I suspect a similar approach exists.

First, demonstrating the incorrect sort

This is what some use by default — the problem is that accented characters are sorted after non-accented letters.

So we can see that both Côte d'Ivoire and Türkiye are incorrectly sorted here and are placed after their peers.

SQL> select country_name from vw_temp_countries order by country_name;

COUNTRY_NAME
-------------
Cook Islands
Costa Rica
Croatia
Cuba
Curaçao
Cyprus
Czechia
Côte d'Ivoire
Tunisia
Turkmenistan
Tuvalu
Türkiye

13 rows selected.

Now, a correct sort

With this sort, accented characters are taken into account and sorted appropriately. The Oracle NLSSORT function ensures the correct order. The NLS_SORT parameter is set to swiss, as this setting accommodates most Latin-based languages.

SQL> select country_name 
         from vw_temp_countries
         order by nlssort(country_name, 'NLS_SORT = swiss');

COUNTRY_NAME
-------------
Cook Islands
Costa Rica
Côte d'Ivoire
Croatia
Cuba
Curaçao
Cyprus
Czechia
Tunisia
Türkiye
Turkmenistan
Tuvalu

13 rows selected.

That's it — as promised, short and sweet! 😀