From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 99747 invoked by alias); 16 May 2018 14:09:57 -0000 Mailing-List: contact libc-help-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: libc-help-owner@sourceware.org Received: (qmail 99736 invoked by uid 89); 16 May 2018 14:09:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=BAYES_00,HK_OBFDOM,SPF_HELO_PASS,TIME_LIMIT_EXCEEDED autolearn=unavailable version=3.3.2 spammy=8:ch, 8:ar, 8:il, 8:e X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 May 2018 14:09:46 +0000 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5017740201A1; Wed, 16 May 2018 14:09:45 +0000 (UTC) Received: from oldenburg.str.redhat.com (dhcp-192-212.str.redhat.com [10.33.192.212]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DD69DD74BB; Wed, 16 May 2018 14:09:44 +0000 (UTC) Subject: Re: getaddrinfo chokes at hostnames containing "emoji" characters To: Name Surname , "libc-help@sourceware.org" References: <50fefc33-ca42-37ff-4b7d-162e1d6b81eb@redhat.com> From: Florian Weimer Message-ID: Date: Wed, 16 May 2018 14:09:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-05/txt/msg00016.txt.bz2 On 05/16/2018 04:03 PM, Name Surname wrote: > Florian Weimer wrote: >> On 05/16/2018 10:40 AM, Name Surname wrote: >>> Greetings everyone. >>> >>> I recently bought a domain name containing "emoji" characters, as a >>> novelty and in order to do some experiments. I tried getting the IP >>> address associated to it using getaddrinfo, however, it errs and returns >>> "Name or service not known". The same thing happens with any program >>> that uses glibc for name resolution. I understand that emoji domains are >>> not valid according to IDNA2008, however, some ccTLDs sell them, they >>> were supported according to IDNA2003, and web browsers resolve them >>> normally according to IDNA2003 (at least firefox does). >>> >>> Is this a bug or a feature? >> >> In the near future, glibc will use the system libidn2 library to >> implement AI_IDN getaddrinfo support.  You will have to convince the >> libidn2 maintainers to enable Emoji support (by default), but as long as >> there is no published standard for that at all (perhaps with the >> exception of Unicode TR46 transitional mode, which is not recommended), >> this seems difficult. > It seems that, according to the WHATWG URL standard, IDNs should be > processed as per IDNA2008: > > > Let result be the result of running Unicode ToASCII with > > domain_name set to domain, UseSTD3ASCIIRules set to beStrict, > > CheckHyphens set to false, > > CheckBidi set to true, CheckJoiners set to true, > > *processing_option set to Nontransitional_Processing*, > > and VerifyDnsLength set to beStrict. > > Source: https://url.spec.whatwg.org/#idna > > (Emphasis mine) > > If I am understanding the standard correctly, then discussion of this > matter is moot, as this implies that emoji domains are not even > considered valid URLs. Yes, Firefox implements something else. It generates a DNS request for xn--nmchen_2-0za.wildcard.t.enyo.de. from , which is not allowed according to UseSTD3ASCIIRules. This is probably a specification bug. But based on what I understand, IDNA with TR46 non-transitional processing does not actually allow emojis. Thanks, Florian