String Similarity and Confusion – What we know

The issue of string similarity and confusion in the next round of the gTLD programme has remained challenging as work has carried into implementation. We already know of some changes to the rules from the last round in 2012, such as that IDN variants of existing TLDs and new applications will need to be taken into consideration during the string similarity evaluation, and that singulars and plurals in the same language are expected to be placed in contention sets to limit end-user confusion (provided that someone flags the singular and plural to ICANN). However, there is also uncertainty because of the perception of mixed outcomes of String Confusion Objections in 2012. Based on the implementation work so far and the outcomes of the previous String Confusion Objections, we have set out what we know about string similarity and confusion in the next round.

What is Confusion?

There are two mechanisms by which for string similarity can be identified – in the String Similarity Evaluation conducted by ICANN, or by an Objection being raised against a string. In 2012, ICANN’s String Similarity Evaluation standard was visual resemblance, while the standard for objections was resemblance.

For every set of strings that were found to be confusingly similar from a String Confusion Objection, the panel found that there was similar meaning, as well as visual and aural similarity.

Identical strings are Similar

This may be stating the obvious, but if more than one party applies for a string, then (subject to the rules being developed which may allow applicants to switch at the outset to a pre-selected alternative string) all applications will be placed in a contention set. This was the rule in 2012 and is being carried forward for the next round.

Variants

Variant strings could not be delegated in the 2012 round. Any applications that were variants of each other would be placed in a contention set together with one winner. In future rounds, variant strings can be delegated, but only to the same registry operator. There are also rules around which variants can be delegated – for instance, a variant string that uses two different scripts, cannot be delegated.

The Label Generation Rules define which code points are considered variants of each other. For example, the Label Generation Rules for Latin Script do not define e and é as variants of each other, so café and cafe are not considered variants (and so able to co-exist as TLDs provided run by the same registry operator). Rather, the question would be whether the two would be considered confusingly similar or not. If café and cafe are found to be confusingly similar, then only one of the strings could be delegated. If they are not found confusingly similar, both could be delegated to different registry operators.

Singulars and Plurals

After extensive discussions between the GNSO Council and members of the ICANN Board, it is anticipated that singulars and plurals of the same word in the same language will be treated in the same manner as if they were similar. The plan is for ICANN to respond to crowdsourcing, where anyone who identifies strings as being singular/plural of the same word in a language (irrespective of the actual intended language of the applications), can notify ICANN, providing evidence from a dictionary. This marks a change from the last round, where singular and plural similarity was decided in the Objection process.

Language and Script

In the 2012 round, of the 10 findings of string confusion arising from String Confusion Objections, all were in the same language and script. For there to be a finding of string confusion, panels expected visual, aural and meaning similarity to be established by the objector.

String Similarity Table1

There were three disputes which considered strings in the same script but in different languages (BOM/COM; HOTEIS/HOTEL; HOTELES/HOTEL), all of which were found not to be confusing. In the next round the the dictionary standard for singulars and plurals could potentially impact this: hotel and hoteles are the singular and plural in Spanish, for example, while hotel and hotéis (with an accent) are singular and plural in Portuguese.

String Similarity Table2

It only takes one (character)

Of the String Confusion Objections in 2012, nearly half considered strings that were identical but for one character, or identical but for one character being added or removed. With no new policy on this, we can expect similar outcomes to the 2012 round, The exception of course being singular/plurals which fall into this category.

One letter changed

Changing one letter in a string only resulted in one finding of string confusion, which was not upheld upon review (CAM/COM).

String Similarity Table3 1

One character added/removed

Strings where the difference was one character added/removed between each strings had mixed outcomes – 9 were found to be confusing, while 11 were found not to be confusing. Of the 9 found to be confusing, 8 were plural/singulars, while only one had the additional letter at the front. Of the 11 found not to be confusing, 4 were plural/singulars, 6 had the additional letter at the front, and 1 had an additional letter in the middle. So, excluding plural/singulars, there were 8 decisions to consider. Of these 8, 7 were found not to be confusing.

String Similarity Table4 1

Objecting against all applications for a string

In the 2012 round, a single String Confusion Objection could not be filed against all applicants for a particular string. Rather, an objector had to file a separate objection against each application for a string. These separate objections would not necessarily be considered together. This was best illustrated by the SHOPPING/SHOP objection. While two specific applications for SHOPPING and SHOP were found to be confusing, other applications for those strings did not receive objections and so were able to proceed, leading to both .SHOPPING and .SHOP being delegated as gTLDs. This kind of inconsistency should not be an issue in the future, as objectors will be able to file a single string confusion objection against all applications for a string.

Trademarks

Trademarks were a basis for establishing that strings are not similar, but the elements considered by panellists to be similar, or not, varied. There were three string confusion objections which concerned trademarked strings – MERCK/EMERCK, GBIZ/BIZ, and ITV/TV. The MERCK/EMERCK case is different to GBIZ/BIZ, and ITV/TV, as MERCK/EMERCK are both trademarks. In MERCK/EMERCK, the panellist found that there was visual similarity, but not aural similarity or similarity of meaning.

In the other two cases the disputes were between an existing TLD and an applicant applying for a dotBrand – BIZ and TV which were both existing TLDs are not trademarks, while GBIZ and ITV are both trademarks. Interestingly, in both cases similar meaning was found by the panellist, but not visual nor aural similarity.

String Similarity Table5

After reviewing the String Confusion Objections from 2012, the outcomes of the objections are more standard than a first glance might suggest. For two of the key areas of concern – filing an objection against all applications for a string rather than on an application-by-application basis, and singular/plurals – there is new policy which should clarify and standardise the outcomes.

Access the full list of String Similarity Decisions

Download now

Mitigating unforeseen service disruption with a dotBrand TLD application

Discover how a dotBrand TLD can mitigate unforeseen service disruptions, like the uncertainty surrounding the .IO ccTLD, and safeguard your online presence amidst evolving geopolitical landscapes.

 

Read now