From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1251) id 69AB0398A85B; Wed, 16 Jun 2021 08:58:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 69AB0398A85B MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: Roger Sayle To: gcc-cvs@gcc.gnu.org Subject: [gcc r12-1525] [PATCH] PR rtl-optimization/46235: Improved use of bt for bit tests on x86_64. X-Act-Checkin: gcc X-Git-Author: Roger Sayle X-Git-Refname: refs/heads/master X-Git-Oldrev: 041f74177072df1d66502319205990a4d970c92a X-Git-Newrev: 3155d51bfd1de8b6c4645dcb2292248a8d7cc3c9 Message-Id: <20210616085834.69AB0398A85B@sourceware.org> Date: Wed, 16 Jun 2021 08:58:34 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2021 08:58:34 -0000 https://gcc.gnu.org/g:3155d51bfd1de8b6c4645dcb2292248a8d7cc3c9 commit r12-1525-g3155d51bfd1de8b6c4645dcb2292248a8d7cc3c9 Author: Roger Sayle Date: Wed Jun 16 09:56:09 2021 +0100 [PATCH] PR rtl-optimization/46235: Improved use of bt for bit tests on x86_64. This patch tackles PR46235 to improve the code generated for bit tests on x86_64 by making more use of the bt instruction. Currently, GCC emits bt instructions when followed by condition jumps (thanks to Uros' splitters). This patch adds splitters in i386.md, to catch the cases where bt is followed by a conditional move (as in the original report), or by a setc/setnc (as in comment 5 of the Bugzilla PR). With this patch, the function in the original PR int foo(int a, int x, int y) { if (a & (1 << x)) return a; return 1; } which with -O2 on mainline generates: foo: movl %edi, %eax movl %esi, %ecx sarl %cl, %eax testb $1, %al movl $1, %eax cmovne %edi, %eax ret now generates: foo: btl %esi, %edi movl $1, %eax cmovc %edi, %eax ret Likewise, IsBitSet1 and IsBitSet2 (from comment 5) bool IsBitSet1(unsigned char byte, int index) { return (byte & (1<> index) & 1; } Before: movzbl %dil, %eax movl %esi, %ecx sarl %cl, %eax andl $1, %eax ret After: movzbl %dil, %edi btl %esi, %edi setc %al ret According to Agner Fog, SAR/SHR r,cl takes 2 cycles on skylake, where BT r,r takes only one, so the performance improvements on recent hardware may be more significant than implied by just the reduced number of instructions. I've avoided transforming cases (such as btsi_setcsi) where using bt sequences may not be a clear win (over sarq/andl). 2010-06-15 Roger Sayle gcc/ChangeLog PR rtl-optimization/46235 * config/i386/i386.md: New define_split for bt followed by cmov. (*bt_setcqi): New define_insn_and_split for bt followed by setc. (*bt_setncqi): New define_insn_and_split for bt then setnc. (*bt_setnc): New define_insn_and_split for bt followed by setnc with zero extension. gcc/testsuite/ChangeLog PR rtl-optimization/46235 * gcc.target/i386/bt-5.c: New test. * gcc.target/i386/bt-6.c: New test. * gcc.target/i386/bt-7.c: New test. Diff: --- gcc/config/i386/i386.md | 94 ++++++++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/bt-5.c | 44 +++++++++++++++++ gcc/testsuite/gcc.target/i386/bt-6.c | 69 ++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/bt-7.c | 69 ++++++++++++++++++++++++++ 4 files changed, 276 insertions(+) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 6e4abf32e7c..48532eb7ddf 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12794,6 +12794,100 @@ operands[0] = shallow_copy_rtx (operands[0]); PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0]))); }) + +;; Help combine recognize bt followed by cmov +(define_split + [(set (match_operand:SWI248 0 "register_operand") + (if_then_else:SWI248 + (ne + (zero_extract:SWI48 + (match_operand:SWI48 1 "register_operand") + (const_int 1) + (zero_extend:SI (match_operand:QI 2 "register_operand"))) + (const_int 0)) + (match_operand:SWI248 3 "nonimmediate_operand") + (match_operand:SWI248 4 "nonimmediate_operand")))] + "TARGET_USE_BT && TARGET_CMOVE + && !(MEM_P (operands[3]) && MEM_P (operands[4])) + && ix86_pre_reload_split ()" + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2)) + (const_int 0))) + (set (match_dup 0) + (if_then_else:SWI248 (eq (reg:CCC FLAGS_REG) (const_int 0)) + (match_dup 3) + (match_dup 4)))] +{ + operands[2] = lowpart_subreg (SImode, operands[2], QImode); +}) + +;; Help combine recognize bt followed by setc +(define_insn_and_split "*bt_setcqi" + [(set (subreg:SWI48 (match_operand:QI 0 "register_operand") 0) + (zero_extract:SWI48 + (match_operand:SWI48 1 "register_operand") + (const_int 1) + (zero_extend:SI (match_operand:QI 2 "register_operand")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_USE_BT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2)) + (const_int 0))) + (set (match_dup 0) + (eq:QI (reg:CCC FLAGS_REG) (const_int 0)))] +{ + operands[2] = lowpart_subreg (SImode, operands[2], QImode); +}) + +;; Help combine recognize bt followed by setnc +(define_insn_and_split "*bt_setncqi" + [(set (match_operand:QI 0 "register_operand") + (and:QI + (not:QI + (subreg:QI + (lshiftrt:SWI48 (match_operand:SWI48 1 "register_operand") + (match_operand:QI 2 "register_operand")) 0)) + (const_int 1))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_USE_BT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2)) + (const_int 0))) + (set (match_dup 0) + (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))] +{ + operands[2] = lowpart_subreg (SImode, operands[2], QImode); +}) + +(define_insn_and_split "*bt_setnc" + [(set (match_operand:SWI48 0 "register_operand") + (and:SWI48 + (not:SWI48 + (lshiftrt:SWI48 (match_operand:SWI48 1 "register_operand") + (match_operand:QI 2 "register_operand"))) + (const_int 1))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_USE_BT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2)) + (const_int 0))) + (set (match_dup 3) + (ne:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_dup 0) (zero_extend:SWI48 (match_dup 3)))] +{ + operands[2] = lowpart_subreg (SImode, operands[2], QImode); + operands[3] = gen_reg_rtx (QImode); +}) ;; Store-flag instructions. diff --git a/gcc/testsuite/gcc.target/i386/bt-5.c b/gcc/testsuite/gcc.target/i386/bt-5.c new file mode 100644 index 00000000000..73e7ed282d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/bt-5.c @@ -0,0 +1,44 @@ +/* PR rtl-optimization/46235 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -mtune=core2" } */ + +int foo (int a, int x, int y) +{ + if (a & (1<> y) & 1; +} + +unsigned char set1_wb (int x, int y) +{ + return (x & (1<> y) & 1; +} + +unsigned char clr1_bb (unsigned char x, int y) +{ + return (x & (1<> y) & 1); +} + +unsigned char clr1_wb (int x, int y) +{ + return (x & (1<> y) & 1); +} + +int clr1_bw (unsigned char x, int y) +{ + return (x & (1<> y) & 1); +} + +int clr1_ww (int x, int y) +{ + return (x & (1<> y) & 1); +} + +/* { dg-final { scan-assembler-times "bt\[lq\]\[ \t\]" 12 } } */ +/* { dg-final { scan-assembler-not "sar\[lq\]\[ \t\]" } } */ +/* { dg-final { scan-assembler-not "and\[lq\]\[ \t\]" } } */ +/* { dg-final { scan-assembler-not "not\[lq\]\[ \t\]" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/bt-7.c b/gcc/testsuite/gcc.target/i386/bt-7.c new file mode 100644 index 00000000000..292d7414c42 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/bt-7.c @@ -0,0 +1,69 @@ +/* PR rtl-optimization/46235 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -mtune=core2" } */ + +unsigned char set1_lb (long long x, int y) +{ + return (x & (1LL<> y) & 1; +} + +unsigned char clr1_lb (long long x, int y) +{ + return (x & (1LL<> y) & 1); +} + +int clr1_lw (long long x, int y) +{ + return (x & (1LL<> y) & 1); +} + +long long clr1_bl (unsigned char x, int y) +{ + return (x & (1<> y) & 1); +} + +long long clr1_wl (int x, int y) +{ + return (x & (1<> y) & 1); +} + +long long clr1_ll (long long x, int y) +{ + return (x & (1LL<> y) & 1); +} + +/* { dg-final { scan-assembler-times "bt\[lq\]\[ \t\]" 12 } } */ +/* { dg-final { scan-assembler-not "sar\[lq\]\[ \t\]" } } */ +/* { dg-final { scan-assembler-not "and\[lq\]\[ \t\]" } } */ +/* { dg-final { scan-assembler-not "not\[lq\]\[ \t\]" } } */ +