* getaddrinfo chokes at hostnames containing "emoji" characters @ 2018-05-16 8:40 Name Surname 2018-05-16 9:10 ` Florian Weimer 0 siblings, 1 reply; 6+ messages in thread From: Name Surname @ 2018-05-16 8:40 UTC (permalink / raw) To: libc-help Greetings everyone. I recently bought a domain name containing "emoji" characters, as a novelty and in order to do some experiments. I tried getting the IP address associated to it using getaddrinfo, however, it errs and returns "Name or service not known". The same thing happens with any program that uses glibc for name resolution. I understand that emoji domains are not valid according to IDNA2008, however, some ccTLDs sell them, they were supported according to IDNA2003, and web browsers resolve them normally according to IDNA2003 (at least firefox does). Is this a bug or a feature? Should you want to experiment, a example valid emoji domain name is "📙.la". ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: getaddrinfo chokes at hostnames containing "emoji" characters 2018-05-16 8:40 getaddrinfo chokes at hostnames containing "emoji" characters Name Surname @ 2018-05-16 9:10 ` Florian Weimer 2018-05-16 9:49 ` Name Surname 2018-05-16 14:04 ` Name Surname 0 siblings, 2 replies; 6+ messages in thread From: Florian Weimer @ 2018-05-16 9:10 UTC (permalink / raw) To: Name Surname, libc-help On 05/16/2018 10:40 AM, Name Surname wrote: > Greetings everyone. > > I recently bought a domain name containing "emoji" characters, as a > novelty and in order to do some experiments. I tried getting the IP > address associated to it using getaddrinfo, however, it errs and returns > "Name or service not known". The same thing happens with any program > that uses glibc for name resolution. I understand that emoji domains are > not valid according to IDNA2008, however, some ccTLDs sell them, they > were supported according to IDNA2003, and web browsers resolve them > normally according to IDNA2003 (at least firefox does). > > Is this a bug or a feature? In the near future, glibc will use the system libidn2 library to implement AI_IDN getaddrinfo support. You will have to convince the libidn2 maintainers to enable Emoji support (by default), but as long as there is no published standard for that at all (perhaps with the exception of Unicode TR46 transitional mode, which is not recommended), this seems difficult. Thanks, Florian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: getaddrinfo chokes at hostnames containing "emoji" characters 2018-05-16 9:10 ` Florian Weimer @ 2018-05-16 9:49 ` Name Surname 2018-05-16 14:04 ` Name Surname 1 sibling, 0 replies; 6+ messages in thread From: Name Surname @ 2018-05-16 9:49 UTC (permalink / raw) To: libc-help Florian Weimer wrote: > On 05/16/2018 10:40 AM, Name Surname wrote: >> Greetings everyone. >> >> I recently bought a domain name containing "emoji" characters, as a >> novelty and in order to do some experiments. I tried getting the IP >> address associated to it using getaddrinfo, however, it errs and returns >> "Name or service not known". The same thing happens with any program >> that uses glibc for name resolution. I understand that emoji domains are >> not valid according to IDNA2008, however, some ccTLDs sell them, they >> were supported according to IDNA2003, and web browsers resolve them >> normally according to IDNA2003 (at least firefox does). >> >> Is this a bug or a feature? > > In the near future, glibc will use the system libidn2 library to > implement AI_IDN getaddrinfo support. You will have to convince the > libidn2 maintainers to enable Emoji support (by default), but as long as > there is no published standard for that at all (perhaps with the > exception of Unicode TR46 transitional mode, which is not recommended), > this seems difficult. > > Thanks, > Florian > > . > Is it not possible to have glibc look the domain up according to IDNA2008 first, and, if it fails, lookup using the transistional mode? It seems to be what web browsers do, and is most probably what most end users would expect to happen. libidn2 has a section on its documentation regarding this: https://libidn.gitlab.io/libidn2/manual/libidn2.html#Converting-with-backwards-compatibility . ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: getaddrinfo chokes at hostnames containing "emoji" characters 2018-05-16 9:10 ` Florian Weimer 2018-05-16 9:49 ` Name Surname @ 2018-05-16 14:04 ` Name Surname 2018-05-16 14:09 ` Florian Weimer 1 sibling, 1 reply; 6+ messages in thread From: Name Surname @ 2018-05-16 14:04 UTC (permalink / raw) To: libc-help Florian Weimer wrote: > On 05/16/2018 10:40 AM, Name Surname wrote: >> Greetings everyone. >> >> I recently bought a domain name containing "emoji" characters, as a >> novelty and in order to do some experiments. I tried getting the IP >> address associated to it using getaddrinfo, however, it errs and returns >> "Name or service not known". The same thing happens with any program >> that uses glibc for name resolution. I understand that emoji domains are >> not valid according to IDNA2008, however, some ccTLDs sell them, they >> were supported according to IDNA2003, and web browsers resolve them >> normally according to IDNA2003 (at least firefox does). >> >> Is this a bug or a feature? > > In the near future, glibc will use the system libidn2 library to > implement AI_IDN getaddrinfo support. You will have to convince the > libidn2 maintainers to enable Emoji support (by default), but as long as > there is no published standard for that at all (perhaps with the > exception of Unicode TR46 transitional mode, which is not recommended), > this seems difficult. > > Thanks, > Florian > > . > It seems that, according to the WHATWG URL standard, IDNs should be processed as per IDNA2008: > Let result be the result of running Unicode ToASCII with > domain_name set to domain, UseSTD3ASCIIRules set to beStrict, > CheckHyphens set to false, > CheckBidi set to true, CheckJoiners set to true, > *processing_option set to Nontransitional_Processing*, > and VerifyDnsLength set to beStrict. Source: https://url.spec.whatwg.org/#idna (Emphasis mine) If I am understanding the standard correctly, then discussion of this matter is moot, as this implies that emoji domains are not even considered valid URLs. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: getaddrinfo chokes at hostnames containing "emoji" characters 2018-05-16 14:04 ` Name Surname @ 2018-05-16 14:09 ` Florian Weimer 2018-05-16 18:20 ` Name Surname 0 siblings, 1 reply; 6+ messages in thread From: Florian Weimer @ 2018-05-16 14:09 UTC (permalink / raw) To: Name Surname, libc-help On 05/16/2018 04:03 PM, Name Surname wrote: > Florian Weimer wrote: >> On 05/16/2018 10:40 AM, Name Surname wrote: >>> Greetings everyone. >>> >>> I recently bought a domain name containing "emoji" characters, as a >>> novelty and in order to do some experiments. I tried getting the IP >>> address associated to it using getaddrinfo, however, it errs and returns >>> "Name or service not known". The same thing happens with any program >>> that uses glibc for name resolution. I understand that emoji domains are >>> not valid according to IDNA2008, however, some ccTLDs sell them, they >>> were supported according to IDNA2003, and web browsers resolve them >>> normally according to IDNA2003 (at least firefox does). >>> >>> Is this a bug or a feature? >> >> In the near future, glibc will use the system libidn2 library to >> implement AI_IDN getaddrinfo support. You will have to convince the >> libidn2 maintainers to enable Emoji support (by default), but as long as >> there is no published standard for that at all (perhaps with the >> exception of Unicode TR46 transitional mode, which is not recommended), >> this seems difficult. > It seems that, according to the WHATWG URL standard, IDNs should be > processed as per IDNA2008: > > > Let result be the result of running Unicode ToASCII with > > domain_name set to domain, UseSTD3ASCIIRules set to beStrict, > > CheckHyphens set to false, > > CheckBidi set to true, CheckJoiners set to true, > > *processing_option set to Nontransitional_Processing*, > > and VerifyDnsLength set to beStrict. > > Source: https://url.spec.whatwg.org/#idna > > (Emphasis mine) > > If I am understanding the standard correctly, then discussion of this > matter is moot, as this implies that emoji domains are not even > considered valid URLs. Yes, Firefox implements something else. It generates a DNS request for xn--nmchen_2-0za.wildcard.t.enyo.de. from <http://nämchen_2.wildcard.t.enyo.de/>, which is not allowed according to UseSTD3ASCIIRules. This is probably a specification bug. But based on what I understand, IDNA with TR46 non-transitional processing does not actually allow emojis. Thanks, Florian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: getaddrinfo chokes at hostnames containing "emoji" characters 2018-05-16 14:09 ` Florian Weimer @ 2018-05-16 18:20 ` Name Surname 0 siblings, 0 replies; 6+ messages in thread From: Name Surname @ 2018-05-16 18:20 UTC (permalink / raw) To: libc-help Florian Weimer wrote: > On 05/16/2018 04:03 PM, Name Surname wrote: >> Florian Weimer wrote: >>> On 05/16/2018 10:40 AM, Name Surname wrote: >>>> Greetings everyone. >>>> >>>> I recently bought a domain name containing "emoji" characters, as a >>>> novelty and in order to do some experiments. I tried getting the IP >>>> address associated to it using getaddrinfo, however, it errs and >>>> returns >>>> "Name or service not known". The same thing happens with any program >>>> that uses glibc for name resolution. I understand that emoji domains >>>> are >>>> not valid according to IDNA2008, however, some ccTLDs sell them, they >>>> were supported according to IDNA2003, and web browsers resolve them >>>> normally according to IDNA2003 (at least firefox does). >>>> >>>> Is this a bug or a feature? >>> >>> In the near future, glibc will use the system libidn2 library to >>> implement AI_IDN getaddrinfo support. You will have to convince the >>> libidn2 maintainers to enable Emoji support (by default), but as long as >>> there is no published standard for that at all (perhaps with the >>> exception of Unicode TR46 transitional mode, which is not recommended), >>> this seems difficult. > >> It seems that, according to the WHATWG URL standard, IDNs should be >> processed as per IDNA2008: >> >> > Let result be the result of running Unicode ToASCII with >> > domain_name set to domain, UseSTD3ASCIIRules set to beStrict, >> > CheckHyphens set to false, >> > CheckBidi set to true, CheckJoiners set to true, >> > *processing_option set to Nontransitional_Processing*, >> > and VerifyDnsLength set to beStrict. >> >> Source: https://url.spec.whatwg.org/#idna >> >> (Emphasis mine) >> >> If I am understanding the standard correctly, then discussion of this >> matter is moot, as this implies that emoji domains are not even >> considered valid URLs. > > Yes, Firefox implements something else. It generates a DNS request for > xn--nmchen_2-0za.wildcard.t.enyo.de. from > <http://nämchen_2.wildcard.t.enyo.de/>, which is not allowed according > to UseSTD3ASCIIRules. This is probably a specification bug. > > But based on what I understand, IDNA with TR46 non-transitional > processing does not actually allow emojis. > > Thanks, > Florian > . > > But based on what I understand, IDNA with TR46 non-transitional > processing does not actually allow emojis. This is true. It appears, though, that WHATWG changed their URL standard to recommend using Nontransitional_Processing quite recently (20/02/2017). Before that date, they recommended using Transitional_Processing. I suppose that, given enough time, the confusion will naturally clear itself up. It certainly has cleared up for me :). ( Reference: https://github.com/whatwg/url/commit/f4d84a52e67b154b2d11e04889fe0a35a029c833 ) Thanks for helping me out . ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-05-16 18:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-16 8:40 getaddrinfo chokes at hostnames containing "emoji" characters Name Surname 2018-05-16 9:10 ` Florian Weimer 2018-05-16 9:49 ` Name Surname 2018-05-16 14:04 ` Name Surname 2018-05-16 14:09 ` Florian Weimer 2018-05-16 18:20 ` Name Surname
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).