From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 37130385EC54; Mon, 17 Aug 2020 12:07:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 37130385EC54 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1597666071; bh=SJuvIdc7kSJExfYd3ud/5qUujP9LApKbhmiRZihZivc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=XLPPfjW2Hlvk+3aCQMtjuXx7stjAcv9LwHzc50EfJkk/1gDZlkTeUBmC3o6q/p0A+ JhpytIQaos1R3I0mrpxUyAArLq6dr7mOMr0w7SzTWKloMU0y2PsUpL907jenzUOcqD T9/1v9JackjlD+nK10SSoEEUAJkmJgEZmsJt+pwE= From: "amonakov at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/96633] missed optimization? Date: Mon, 17 Aug 2020 12:07:51 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 10.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: amonakov at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Aug 2020 12:07:51 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96633 --- Comment #2 from Alexander Monakov --- Martin added me to CC so I assume he wants me to chime in. First of all, I find Nathan's behavior in that gcc@ thread distasteful at b= est (but if you ask me, such responses are simply more harm than good; link: https://lwn.net/ml/gcc/1a363f89-6f98-f583-e22a-a7fc02efb4db@acm.org/ ). Next, statements like "I've determined the following is abot 12% faster" do= n't carry weight without details such as the CPU family, structure of the bench= mark and the workload. Obviously, on input that lacks whitespace GCC's original = code is faster as the initial branch is 100% predictable. Likewise, if the input= was taken from /dev/random, the 12% figure is irrelevant to real-world uses of = such code. What the benchmark is doing with the return value of the function also matters a lot. With that out of the way: striving to get efficient branchless code on this code is not very valuable in practice, because the caller is likely to perf= orm a conditional branch on the result anyway. So making isWhitespace branchless simply moves the misprediction cost to the caller, making the overall code slower. (but of course such considerations are too complex for the compiler's limit= ed brain) In general such "bitmask tests" will benefit from the BT instruction on x86 (not an extension, was in the ISA since before I was born), plus CMOV to get the right mask if it doesn't fit in a register. For 100% branchless code we want to generate code similar to: char is_ws(char c) { unsigned long long mask =3D 1ll<<' ' | 1<<'\t' | 1<<'\r' | 1<<'\n'; unsigned long long v =3D c; if (v > 32) #if 1 mask =3D 0; #else return 0; #endif char r; asm("bt %1, %2; setc %0" : "=3Dr"(r) : "r"(v), "r"(mask)); return r; } movsbq %dil, %rax movl $0, %edx movabsq $4294977024, %rdi cmpq $33, %rax cmovnb %rdx, %rdi bt %rax, %rdi; setc %al ret (note we get %edx zeroing suboptimal, should have used xor %edx, %edx) This is generalizable to any input type, not just char. We even already get the "test against a mask" part of the idea right ;) Branchy testing is even cheaper with BT: void is_ws_cb(unsigned char c, void f(void)) { unsigned long long mask =3D 1ll<<' ' | 1<<'\t' | 1<<'\r' | 1<<'\n'; if (c <=3D 32 && (mask & (1ll<