From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1251) id 18D7738515D6; Fri, 9 Jul 2021 16:48:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 18D7738515D6 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" From: Roger Sayle To: gcc-cvs@gcc.gnu.org Subject: [gcc r12-2228] Improvement to signed division of integer constant on x86_64. X-Act-Checkin: gcc X-Git-Author: Roger Sayle X-Git-Refname: refs/heads/master X-Git-Oldrev: 0d5db79a61af150cba48612c9fbc3267262adb93 X-Git-Newrev: 59045273cc648e354ba72f9188f69927f00802e2 Message-Id: <20210709164853.18D7738515D6@sourceware.org> Date: Fri, 9 Jul 2021 16:48:53 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Jul 2021 16:48:53 -0000 https://gcc.gnu.org/g:59045273cc648e354ba72f9188f69927f00802e2 commit r12-2228-g59045273cc648e354ba72f9188f69927f00802e2 Author: Roger Sayle Date: Fri Jul 9 17:45:40 2021 +0100 Improvement to signed division of integer constant on x86_64. This patch tweaks the way GCC handles 32-bit integer division on x86_64, when the numerator is constant. Currently the function int foo (int x) { return 100/x; } generates the code: foo: movl $100, %eax cltd idivl %edi ret where the sign-extension instruction "cltd" creates a long dependency chain, as it depends on the "mov" before it, and is depended upon by "idivl" after it. With this patch, GCC now matches both icc and LLVM and uses an xor instead, generating: foo: xorl %edx, %edx movl $100, %eax idivl %edi ret Microbenchmarking confirms that this is faster on Intel processors (Kaby lake), and no worse on AMD processors (Zen2), which agrees with intuition, but oddly disagrees with the llvm-mca cycle count prediction on godbolt.org. The tricky bit is that this sign-extension instruction is only produced by late (postreload) splitting, and unfortunately none of the subsequent passes (e.g. cprop_hardreg) is able to propagate and simplify its constant argument. The solution here is to introduce a define_insn_and_split that allows the constant numerator operand to be captured (by combine) and then split into an optimal form after reload. The above microbenchmarking also shows that eliminating the sign extension of negative values (using movl $-1,%edx) is also a performance improvement, as performed by icc but not by LLVM. Both the xor and movl sign-extensions are larger than cltd, so this transformation is prevented for -Os. 2021-07-09 Roger Sayle Uroš Bizjak gcc/ChangeLog * config/i386/i386.md (*divmodsi4_const): Optimize SImode divmod of a constant numerator with new define_insn_and_split. gcc/testsuite/ChangeLog * gcc.target/i386/divmod-9.c: New test case. Diff: --- gcc/config/i386/i386.md | 27 ++++++++++++++++++++++++++- gcc/testsuite/gcc.target/i386/divmod-9.c | 14 ++++++++++++++ 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 26fb81b9b4b..8b809c49fe0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -8385,7 +8385,7 @@ (ashiftrt:SWIM248 (match_dup 4) (match_dup 5))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 0) - (div:SWIM248 (match_dup 2) (match_dup 3))) + (div:SWIM248 (match_dup 2) (match_dup 3))) (set (match_dup 1) (mod:SWIM248 (match_dup 2) (match_dup 3))) (use (match_dup 1)) @@ -8661,6 +8661,31 @@ [(set_attr "type" "idiv") (set_attr "mode" "SI")]) +;; Avoid sign-extension (using cdq) for constant numerators. +(define_insn_and_split "*divmodsi4_const" + [(set (match_operand:SI 0 "register_operand" "=&a") + (div:SI (match_operand:SI 2 "const_int_operand" "n") + (match_operand:SI 3 "nonimmediate_operand" "rm"))) + (set (match_operand:SI 1 "register_operand" "=&d") + (mod:SI (match_dup 2) (match_dup 3))) + (clobber (reg:CC FLAGS_REG))] + "!optimize_function_for_size_p (cfun)" + "#" + "reload_completed" + [(set (match_dup 0) (match_dup 2)) + (set (match_dup 1) (match_dup 4)) + (parallel [(set (match_dup 0) + (div:SI (match_dup 0) (match_dup 3))) + (set (match_dup 1) + (mod:SI (match_dup 0) (match_dup 3))) + (use (match_dup 1)) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[4] = INTVAL (operands[2]) < 0 ? constm1_rtx : const0_rtx; +} + [(set_attr "type" "multi") + (set_attr "mode" "SI")]) + (define_expand "divmodqi4" [(parallel [(set (match_operand:QI 0 "register_operand") (div:QI diff --git a/gcc/testsuite/gcc.target/i386/divmod-9.c b/gcc/testsuite/gcc.target/i386/divmod-9.c new file mode 100644 index 00000000000..1515e6970e2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/divmod-9.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int foo (int x) +{ + return 100/x; +} + +int bar(int x) +{ + return -100/x; +} +/* { dg-final { scan-assembler-not "(cltd|cdq)" } } */ +