public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* getaddrinfo chokes at hostnames containing "emoji" characters
@ 2018-05-16  8:40 Name Surname
  2018-05-16  9:10 ` Florian Weimer
  0 siblings, 1 reply; 6+ messages in thread
From: Name Surname @ 2018-05-16  8:40 UTC (permalink / raw)
  To: libc-help

Greetings everyone.

I recently bought a domain name containing "emoji" characters, as a 
novelty and in order to do some experiments. I tried getting the IP 
address associated to it using getaddrinfo, however, it errs and returns 
"Name or service not known". The same thing happens with any program 
that uses glibc for name resolution. I understand that emoji domains are 
not valid according to IDNA2008, however, some ccTLDs sell them, they 
were supported according to IDNA2003, and web browsers resolve them 
normally according to IDNA2003 (at least firefox does).

Is this a bug or a feature?

Should you want to experiment, a example valid emoji domain name is "📙.la".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: getaddrinfo chokes at hostnames containing "emoji" characters
  2018-05-16  8:40 getaddrinfo chokes at hostnames containing "emoji" characters Name Surname
@ 2018-05-16  9:10 ` Florian Weimer
  2018-05-16  9:49   ` Name Surname
  2018-05-16 14:04   ` Name Surname
  0 siblings, 2 replies; 6+ messages in thread
From: Florian Weimer @ 2018-05-16  9:10 UTC (permalink / raw)
  To: Name Surname, libc-help

On 05/16/2018 10:40 AM, Name Surname wrote:
> Greetings everyone.
> 
> I recently bought a domain name containing "emoji" characters, as a
> novelty and in order to do some experiments. I tried getting the IP
> address associated to it using getaddrinfo, however, it errs and returns
> "Name or service not known". The same thing happens with any program
> that uses glibc for name resolution. I understand that emoji domains are
> not valid according to IDNA2008, however, some ccTLDs sell them, they
> were supported according to IDNA2003, and web browsers resolve them
> normally according to IDNA2003 (at least firefox does).
> 
> Is this a bug or a feature?

In the near future, glibc will use the system libidn2 library to 
implement AI_IDN getaddrinfo support.  You will have to convince the 
libidn2 maintainers to enable Emoji support (by default), but as long as 
there is no published standard for that at all (perhaps with the 
exception of Unicode TR46 transitional mode, which is not recommended), 
this seems difficult.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: getaddrinfo chokes at hostnames containing "emoji" characters
  2018-05-16  9:10 ` Florian Weimer
@ 2018-05-16  9:49   ` Name Surname
  2018-05-16 14:04   ` Name Surname
  1 sibling, 0 replies; 6+ messages in thread
From: Name Surname @ 2018-05-16  9:49 UTC (permalink / raw)
  To: libc-help

Florian Weimer wrote:
> On 05/16/2018 10:40 AM, Name Surname wrote:
>> Greetings everyone.
>>
>> I recently bought a domain name containing "emoji" characters, as a
>> novelty and in order to do some experiments. I tried getting the IP
>> address associated to it using getaddrinfo, however, it errs and returns
>> "Name or service not known". The same thing happens with any program
>> that uses glibc for name resolution. I understand that emoji domains are
>> not valid according to IDNA2008, however, some ccTLDs sell them, they
>> were supported according to IDNA2003, and web browsers resolve them
>> normally according to IDNA2003 (at least firefox does).
>>
>> Is this a bug or a feature?
> 
> In the near future, glibc will use the system libidn2 library to 
> implement AI_IDN getaddrinfo support.  You will have to convince the 
> libidn2 maintainers to enable Emoji support (by default), but as long as 
> there is no published standard for that at all (perhaps with the 
> exception of Unicode TR46 transitional mode, which is not recommended), 
> this seems difficult.
> 
> Thanks,
> Florian
> 
> .
> 

Is it not possible to have glibc look the domain up according to 
IDNA2008 first, and, if it fails, lookup using the transistional mode? 
It seems to be what web browsers do, and is most probably what most end 
users would expect to happen. libidn2 has a section on its documentation 
regarding this:
https://libidn.gitlab.io/libidn2/manual/libidn2.html#Converting-with-backwards-compatibility

.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: getaddrinfo chokes at hostnames containing "emoji" characters
  2018-05-16  9:10 ` Florian Weimer
  2018-05-16  9:49   ` Name Surname
@ 2018-05-16 14:04   ` Name Surname
  2018-05-16 14:09     ` Florian Weimer
  1 sibling, 1 reply; 6+ messages in thread
From: Name Surname @ 2018-05-16 14:04 UTC (permalink / raw)
  To: libc-help

Florian Weimer wrote:
> On 05/16/2018 10:40 AM, Name Surname wrote:
>> Greetings everyone.
>>
>> I recently bought a domain name containing "emoji" characters, as a
>> novelty and in order to do some experiments. I tried getting the IP
>> address associated to it using getaddrinfo, however, it errs and returns
>> "Name or service not known". The same thing happens with any program
>> that uses glibc for name resolution. I understand that emoji domains are
>> not valid according to IDNA2008, however, some ccTLDs sell them, they
>> were supported according to IDNA2003, and web browsers resolve them
>> normally according to IDNA2003 (at least firefox does).
>>
>> Is this a bug or a feature?
> 
> In the near future, glibc will use the system libidn2 library to 
> implement AI_IDN getaddrinfo support.  You will have to convince the 
> libidn2 maintainers to enable Emoji support (by default), but as long as 
> there is no published standard for that at all (perhaps with the 
> exception of Unicode TR46 transitional mode, which is not recommended), 
> this seems difficult.
> 
> Thanks,
> Florian
> 
> .
> 

It seems that, according to the WHATWG URL standard, IDNs should be 
processed as per IDNA2008:

 > Let result be the result of running Unicode ToASCII with
 > domain_name set to domain, UseSTD3ASCIIRules set to beStrict,
 > CheckHyphens set to false,
 > CheckBidi set to true, CheckJoiners set to true,
 > *processing_option set to Nontransitional_Processing*,
 > and VerifyDnsLength set to beStrict.

Source: https://url.spec.whatwg.org/#idna

(Emphasis mine)

If I am understanding the standard correctly, then discussion of this 
matter is moot, as this implies that emoji domains are not even 
considered valid URLs.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: getaddrinfo chokes at hostnames containing "emoji" characters
  2018-05-16 14:04   ` Name Surname
@ 2018-05-16 14:09     ` Florian Weimer
  2018-05-16 18:20       ` Name Surname
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2018-05-16 14:09 UTC (permalink / raw)
  To: Name Surname, libc-help

On 05/16/2018 04:03 PM, Name Surname wrote:
> Florian Weimer wrote:
>> On 05/16/2018 10:40 AM, Name Surname wrote:
>>> Greetings everyone.
>>>
>>> I recently bought a domain name containing "emoji" characters, as a
>>> novelty and in order to do some experiments. I tried getting the IP
>>> address associated to it using getaddrinfo, however, it errs and returns
>>> "Name or service not known". The same thing happens with any program
>>> that uses glibc for name resolution. I understand that emoji domains are
>>> not valid according to IDNA2008, however, some ccTLDs sell them, they
>>> were supported according to IDNA2003, and web browsers resolve them
>>> normally according to IDNA2003 (at least firefox does).
>>>
>>> Is this a bug or a feature?
>>
>> In the near future, glibc will use the system libidn2 library to
>> implement AI_IDN getaddrinfo support.  You will have to convince the
>> libidn2 maintainers to enable Emoji support (by default), but as long as
>> there is no published standard for that at all (perhaps with the
>> exception of Unicode TR46 transitional mode, which is not recommended),
>> this seems difficult.

> It seems that, according to the WHATWG URL standard, IDNs should be
> processed as per IDNA2008:
> 
>   > Let result be the result of running Unicode ToASCII with
>   > domain_name set to domain, UseSTD3ASCIIRules set to beStrict,
>   > CheckHyphens set to false,
>   > CheckBidi set to true, CheckJoiners set to true,
>   > *processing_option set to Nontransitional_Processing*,
>   > and VerifyDnsLength set to beStrict.
> 
> Source: https://url.spec.whatwg.org/#idna
> 
> (Emphasis mine)
> 
> If I am understanding the standard correctly, then discussion of this
> matter is moot, as this implies that emoji domains are not even
> considered valid URLs.

Yes, Firefox implements something else.  It generates a DNS request for 
xn--nmchen_2-0za.wildcard.t.enyo.de. from 
<http://nämchen_2.wildcard.t.enyo.de/>, which is not allowed according 
to UseSTD3ASCIIRules.  This is probably a specification bug.

But based on what I understand, IDNA with TR46 non-transitional 
processing does not actually allow emojis.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: getaddrinfo chokes at hostnames containing "emoji" characters
  2018-05-16 14:09     ` Florian Weimer
@ 2018-05-16 18:20       ` Name Surname
  0 siblings, 0 replies; 6+ messages in thread
From: Name Surname @ 2018-05-16 18:20 UTC (permalink / raw)
  To: libc-help

Florian Weimer wrote:
> On 05/16/2018 04:03 PM, Name Surname wrote:
>> Florian Weimer wrote:
>>> On 05/16/2018 10:40 AM, Name Surname wrote:
>>>> Greetings everyone.
>>>>
>>>> I recently bought a domain name containing "emoji" characters, as a
>>>> novelty and in order to do some experiments. I tried getting the IP
>>>> address associated to it using getaddrinfo, however, it errs and 
>>>> returns
>>>> "Name or service not known". The same thing happens with any program
>>>> that uses glibc for name resolution. I understand that emoji domains 
>>>> are
>>>> not valid according to IDNA2008, however, some ccTLDs sell them, they
>>>> were supported according to IDNA2003, and web browsers resolve them
>>>> normally according to IDNA2003 (at least firefox does).
>>>>
>>>> Is this a bug or a feature?
>>>
>>> In the near future, glibc will use the system libidn2 library to
>>> implement AI_IDN getaddrinfo support.  You will have to convince the
>>> libidn2 maintainers to enable Emoji support (by default), but as long as
>>> there is no published standard for that at all (perhaps with the
>>> exception of Unicode TR46 transitional mode, which is not recommended),
>>> this seems difficult.
> 
>> It seems that, according to the WHATWG URL standard, IDNs should be
>> processed as per IDNA2008:
>>
>>   > Let result be the result of running Unicode ToASCII with
>>   > domain_name set to domain, UseSTD3ASCIIRules set to beStrict,
>>   > CheckHyphens set to false,
>>   > CheckBidi set to true, CheckJoiners set to true,
>>   > *processing_option set to Nontransitional_Processing*,
>>   > and VerifyDnsLength set to beStrict.
>>
>> Source: https://url.spec.whatwg.org/#idna
>>
>> (Emphasis mine)
>>
>> If I am understanding the standard correctly, then discussion of this
>> matter is moot, as this implies that emoji domains are not even
>> considered valid URLs.
> 
> Yes, Firefox implements something else.  It generates a DNS request for 
> xn--nmchen_2-0za.wildcard.t.enyo.de. from 
> <http://nämchen_2.wildcard.t.enyo.de/>, which is not allowed according 
> to UseSTD3ASCIIRules.  This is probably a specification bug.
> 
> But based on what I understand, IDNA with TR46 non-transitional 
> processing does not actually allow emojis.
> 
> Thanks,
> Florian
> .
> 

 > But based on what I understand, IDNA with TR46 non-transitional
 > processing does not actually allow emojis.

This is true.

It appears, though, that WHATWG changed their URL standard to recommend 
using Nontransitional_Processing quite recently (20/02/2017). Before 
that date, they recommended using Transitional_Processing. I suppose 
that, given enough time, the confusion will naturally clear itself up.
It certainly has cleared up for me :).
( Reference: 
https://github.com/whatwg/url/commit/f4d84a52e67b154b2d11e04889fe0a35a029c833 
)

Thanks for helping me out

.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-05-16 18:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-16  8:40 getaddrinfo chokes at hostnames containing "emoji" characters Name Surname
2018-05-16  9:10 ` Florian Weimer
2018-05-16  9:49   ` Name Surname
2018-05-16 14:04   ` Name Surname
2018-05-16 14:09     ` Florian Weimer
2018-05-16 18:20       ` Name Surname

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).