From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from burlywood.elm.relay.mailchannels.net (burlywood.elm.relay.mailchannels.net [23.83.212.26]) by sourceware.org (Postfix) with ESMTPS id CE00D384B0C8 for ; Fri, 13 May 2022 08:18:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CE00D384B0C8 X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 933042199F; Fri, 13 May 2022 08:18:44 +0000 (UTC) Received: from pdx1-sub0-mail-a307.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id D4E122125B; Fri, 13 May 2022 08:18:43 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1652429923; a=rsa-sha256; cv=none; b=disPl7Q/HK11WuXUgCFoSX8Wd8dKrvBCKhc6YvmiBrm3BrxG3JBW4L+4Q+meV10ZG7mCRb A+G2OtDn2sa4qtDS8if9paQzj42icNyi59ofH35HZw+tywvaR/tb1tD+bY6WE8TVCD0knC mSYUQ5tEzpy6FmiFHIBN/zbp/uBAwbRUqoFTg+XBcHlwtzTWtWBHbnQnYAnvolRRH1wtIL t6Vg7f01qVWdQMDFXWsr4Hk2dZAwYJ+OBMWFaIGjAj5LCLPpjI0DGX8iJ5gcNMb7CaP7x9 cKVLqQgG9spXZTR3GynCKX7Nn4YmpY/cl8ay5bm8JcaHbmkQTKxj47H4U3LhTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1652429923; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zDkFLt3oRxSylaG5AujPJRBfTBHAlyyP1BeSWLUYzAE=; b=9nHcST2WqHxh6zaagDQn1ACLVLzI4h+38N003NNX6dCNNGwaCmWq/fYLyHMbysdXvyXmit +l8ESjEDjmPO7MRVs6zrcznH6BgMypzEKVXA+8AZoVKrVBLv+jt2yqFJKJXQJ314AUwGkh XFvTXK2sy5jXLArAf8V9dd9MmtQCeWvWImGfcS8+wtxRxNw6VGzAAcysHCMrgA+R8lbrmW v0WN83Xpo8cud19Vab9sEx8NElJvFJBCiqoJ5XWgzpppoM41jCHqi3L2Nw7BZfrseSNlcJ xpxX8ANXCCEiREbn84/XuV/IqwRh64hvt4wnlXgdBuiBEa6C1OwtMVd1OrO2hg== ARC-Authentication-Results: i=1; rspamd-6fcfc4d76-fdzmm; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@sourceware.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org X-MC-Relay: Junk X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Scare-Glossy: 4f5e95b1667be47f_1652429924157_2702733755 X-MC-Loop-Signature: 1652429924157:323720501 X-MC-Ingress-Time: 1652429924156 Received: from pdx1-sub0-mail-a307.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.105.211.134 (trex/6.7.1); Fri, 13 May 2022 08:18:44 +0000 Received: from [192.168.86.152] (unknown [103.199.173.7]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a307.dreamhost.com (Postfix) with ESMTPSA id 4L01kT2NdQzZW; Fri, 13 May 2022 01:18:40 -0700 (PDT) Message-ID: <32013319-a2af-d324-78bd-a227da91618b@sourceware.org> Date: Fri, 13 May 2022 13:48:33 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v3] wcrtomb: Make behavior POSIX compliant Content-Language: en-US To: Paul Eggert Cc: adhemerval.zanella@linaro.org, fweimer@redhat.com, dickey@his.com, libc-alpha@sourceware.org References: <20220505184348.3357550-3-siddhesh@sourceware.org> <20220512131503.764504-1-siddhesh@sourceware.org> <4ffe566a-8002-b574-daee-d6927b8ceaef@cs.ucla.edu> From: Siddhesh Poyarekar In-Reply-To: <4ffe566a-8002-b574-daee-d6927b8ceaef@cs.ucla.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1162.3 required=5.0 tests=BAYES_00, BODY_8BITS, JMQ_SPF_NEUTRAL, KAM_DMARC_NONE, KAM_DMARC_STATUS, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NEUTRAL, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2022 08:18:47 -0000 On 13/05/2022 10:26, Paul Eggert wrote: > On 5/12/22 06:15, Siddhesh Poyarekar wrote: >> +      switch (result) >> +        { >> +        case 4: >> +          s[3] = buf[3]; >> +          /* Fall through.  */ >> +        case 3: >> +          s[2] = buf[2]; >> +          /* Fall through.  */ >> +        case 2: >> +          s[1] = buf[1]; >> +          /* Fall through.  */ >> +        case 1: >> +          s[0] = buf[0]; >> +          break; >> +        default: >> +          memcpy (s, buf, result); >> +        } > > For me with GCC 12.1 -O2 on x86-64, the above code generates 2 > comparisons and 3 conditional branches in the common case where RESULT > is 1. Plus, GCC generates a jmp from the end of case 3 to the start of > case 2 (which precedes case 3 in the machine code), which is a bit odd. It seems to hit actual performance less, presumably because those branches get predicted correctly most times. However we could simply elide the check for the first byte like you've done. > How about something like the following instead? This generates machine > code with only one conditional branch - the one that decides whether to > call memcpy - and this branch is never taken with glibc-supplied > charmaps. (I'm assuming RESULT is at least 1.) > >   s[0] = buf[0]; >   if (2 <= result && result <= 4) >     { >       s[1] = buf[1]; >       memcpy (&s[result != 2], &buf[result != 2], 2); >       s[result - 1] = buf[result - 1]; >     } >   else >     memcpy (s, buf, result); This produces redundant loads and stores, which hits 2, 3, 4 byte copies much worse; I reckon the condition in the subscript ends up preventing the compiler from eliminating them. If result == 1 is a safe assumption (as it should be I think) then we could hoist it out of the switch like so: s[0] = buf[0]; switch (result) { case 4: s[3] = buf[3]; /* Fall through. */ case 3: s[2] = buf[2]; /* Fall through. */ case 2: s[1] = buf[1]; /* Fall through. */ case 1: break; default: memcpy (s, buf, result); } This improves things ever so slightly for the 1 and 2 byte locale tests in the microbenchmark, while keeping things the same for the rest. There's still a redundant byte copy in the > 4 bytes case, but that should be nothing compared to the PLT indirection. What do you think? Thanks, Siddhesh