public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r12-2228] Improvement to signed division of integer constant on x86_64.
@ 2021-07-09 16:48 Roger Sayle
  0 siblings, 0 replies; only message in thread
From: Roger Sayle @ 2021-07-09 16:48 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:59045273cc648e354ba72f9188f69927f00802e2

commit r12-2228-g59045273cc648e354ba72f9188f69927f00802e2
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Jul 9 17:45:40 2021 +0100

    Improvement to signed division of integer constant on x86_64.
    
    This patch tweaks the way GCC handles 32-bit integer division on
    x86_64, when the numerator is constant.  Currently the function
    
    int foo (int x) {
      return 100/x;
    }
    
    generates the code:
    foo:    movl    $100, %eax
            cltd
            idivl   %edi
            ret
    
    where the sign-extension instruction "cltd" creates a long
    dependency chain, as it depends on the "mov" before it, and
    is depended upon by "idivl" after it.
    
    With this patch, GCC now matches both icc and LLVM and uses
    an xor instead, generating:
    foo:    xorl    %edx, %edx
            movl    $100, %eax
            idivl   %edi
            ret
    
    Microbenchmarking confirms that this is faster on Intel
    processors (Kaby lake), and no worse on AMD processors (Zen2),
    which agrees with intuition, but oddly disagrees with the
    llvm-mca cycle count prediction on godbolt.org.
    
    The tricky bit is that this sign-extension instruction is only
    produced by late (postreload) splitting, and unfortunately none
    of the subsequent passes (e.g. cprop_hardreg) is able to
    propagate and simplify its constant argument.  The solution
    here is to introduce a define_insn_and_split that allows the
    constant numerator operand to be captured (by combine) and
    then split into an optimal form after reload.
    
    The above microbenchmarking also shows that eliminating the
    sign extension of negative values (using movl $-1,%edx) is also
    a performance improvement, as performed by icc but not by LLVM.
    Both the xor and movl sign-extensions are larger than cltd,
    so this transformation is prevented for -Os.
    
    2021-07-09  Roger Sayle  <roger@nextmovesoftware.com>
                Uroš Bizjak  <ubizjak@gmail.com>
    
    gcc/ChangeLog
            * config/i386/i386.md (*divmodsi4_const): Optimize SImode
            divmod of a constant numerator with new define_insn_and_split.
    
    gcc/testsuite/ChangeLog
            * gcc.target/i386/divmod-9.c: New test case.

Diff:
---
 gcc/config/i386/i386.md                  | 27 ++++++++++++++++++++++++++-
 gcc/testsuite/gcc.target/i386/divmod-9.c | 14 ++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 26fb81b9b4b..8b809c49fe0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8385,7 +8385,7 @@
 		   (ashiftrt:SWIM248 (match_dup 4) (match_dup 5)))
 	      (clobber (reg:CC FLAGS_REG))])
    (parallel [(set (match_dup 0)
-	           (div:SWIM248 (match_dup 2) (match_dup 3)))
+		   (div:SWIM248 (match_dup 2) (match_dup 3)))
 	      (set (match_dup 1)
 		   (mod:SWIM248 (match_dup 2) (match_dup 3)))
 	      (use (match_dup 1))
@@ -8661,6 +8661,31 @@
   [(set_attr "type" "idiv")
    (set_attr "mode" "SI")])
 
+;; Avoid sign-extension (using cdq) for constant numerators.
+(define_insn_and_split "*divmodsi4_const"
+  [(set (match_operand:SI 0 "register_operand" "=&a")
+	(div:SI (match_operand:SI 2 "const_int_operand" "n")
+		(match_operand:SI 3 "nonimmediate_operand" "rm")))
+   (set (match_operand:SI 1 "register_operand" "=&d")
+	(mod:SI (match_dup 2) (match_dup 3)))
+   (clobber (reg:CC FLAGS_REG))]
+  "!optimize_function_for_size_p (cfun)"
+  "#"
+  "reload_completed"
+  [(set (match_dup 0) (match_dup 2))
+   (set (match_dup 1) (match_dup 4))
+   (parallel [(set (match_dup 0)
+		   (div:SI (match_dup 0) (match_dup 3)))
+	      (set (match_dup 1)
+		   (mod:SI (match_dup 0) (match_dup 3)))
+	      (use (match_dup 1))
+	      (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[4] = INTVAL (operands[2]) < 0 ? constm1_rtx : const0_rtx;
+}
+  [(set_attr "type" "multi")
+   (set_attr "mode" "SI")])
+
 (define_expand "divmodqi4"
   [(parallel [(set (match_operand:QI 0 "register_operand")
 		   (div:QI
diff --git a/gcc/testsuite/gcc.target/i386/divmod-9.c b/gcc/testsuite/gcc.target/i386/divmod-9.c
new file mode 100644
index 00000000000..1515e6970e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/divmod-9.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo (int x)
+{
+  return 100/x;
+}
+
+int bar(int x)
+{
+  return -100/x;
+}
+/* { dg-final { scan-assembler-not "(cltd|cdq)" } } */
+


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-07-09 16:48 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-09 16:48 [gcc r12-2228] Improvement to signed division of integer constant on x86_64 Roger Sayle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).