Re: [x86_64 PATCH] PR middle-end/105135: Catch more cmov idioms in combine.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Uros Bizjak <ubizjak@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [x86_64 PATCH] PR middle-end/105135: Catch more cmov idioms in combine.
Date: Wed, 20 Apr 2022 09:40:32 +0200	[thread overview]
Message-ID: <CAFULd4ZhSZwaUsgrJhaZMJrHf9fer6eUHrFtRBpjdBW32D9o6g@mail.gmail.com> (raw)
In-Reply-To: <004601d853e4$d11a99e0$734fcda0$@nextmovesoftware.com>

On Tue, Apr 19, 2022 at 1:58 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch addresses PR middle-end/105135, a missed-optimization regression
> affecting mainline.  I agree with Jakub's comment that the middle-end
> optimizations are sound, reducing basic blocks and conditional expressions
> at the tree-level, but requiring backend's to recognize conditional move
> instructions/idioms if/when beneficial.  This patch introduces two new
> define_insn_and_split in i386.md to recognize two additional cmove idioms.
>
> The first recognizes (PR105135's):
>
> int foo(int x, int y, int z)
> {
>   return ((x < y) << 5) + z;
> }
>
> and transforms (the 6 insns, 13 bytes):
>
>         xorl    %eax, %eax      ;; 2 bytes
>         cmpl    %esi, %edi      ;; 2 bytes
>         setl    %al             ;; 3 bytes
>         sall    $5, %eax        ;; 3 bytes
>         addl    %edx, %eax      ;; 2 bytes
>         ret                     ;; 1 byte
>
> into (the 4 insns, 9 bytes):
>
>         cmpl    %esi, %edi      ;; 2 bytes
>         leal    32(%rdx), %eax  ;; 3 bytes
>         cmovge  %edx, %eax      ;; 3 bytes
>         ret                     ;; 1 byte
>
>
> The second catches the very closely related (from PR 98865):
>
> int bar(int x, int y, int z)
> {
>   return -(x < y) & z;
> }
>
> and transforms the (6 insns, 12 bytes):
>         xorl    %eax, %eax      ;; 2 bytes
>         cmpl    %esi, %edi      ;; 2 bytes
>         setl    %al             ;; 3 bytes
>         negl    %eax            ;; 2 bytes
>         andl    %edx, %eax      ;; 2 bytes
>         ret                     ;; 1 byte
>
> into (4 insns, 8 bytes):
>         xorl    %eax, %eax      ;; 2 bytes
>         cmpl    %esi, %edi      ;; 2 bytes
>         cmovl   %edx, %eax      ;; 3 bytes
>         ret                     ;; 1 byte
>
> They both have in common that they recognize a setcc followed by two
> instructions, and replace them with one instruction and a cmov, which
> is typically a performance win, but always a size win.  Fine tuning
> these decisions based on microarchitecture is much easier in the
> backend, than the middle-end.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-04-19  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/105135
>         * config/i386/i386.md (*xor_cmov<mode>): Transform setcc, negate
>         then and into mov $0, followed by a cmov.
>         (*lea_cmov<mode>): Transform setcc, ashift const then plus into
>         lea followed by cmov.
>
> gcc/testsuite/ChangeLog
>         PR target/105135
>         * gcc.target/i386/cmov10.c: New test case.
>         * gcc.target/i386/cmov11.c: New test case.
>         * gcc.target/i386/pr105135.c: New test case.
>
>
> Thanks in advance,
> Roger


+;; Transform setcc;negate;and into mov_zero;cmov
+(define_insn_and_split "*xor_cmov<mode>"
+  [(set (match_operand:SWI248 0 "register_operand")
+    (and:SWI248
+      (neg:SWI248 (match_operator:SWI248 1 "ix86_comparison_operator"
+            [(match_operand 2 "flags_reg_operand")
+             (const_int 0)]))
+      (match_operand:SWI248 3 "register_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_CMOVE && can_create_pseudo_p ()"

Please use ix86_pre_reload_split instead of can_create_pseudo_p () here.

+  "#"
+  "&& 1"
+  [(set (match_dup 4) (const_int 0))
+   (set (match_dup 0)
+    (if_then_else:SWI248 (match_op_dup 1 [(match_dup 2) (const_int 0)])
+                 (match_dup 3) (match_dup 4)))]
+{
+  operands[4] = gen_reg_rtx (<MODE>mode);
+})

Single line preparation statements should use double quotes instead of
curly braces. See many examples in i386 .md files.

+;; Transform setcc;ashift_const;plus into lea_const;cmov
+(define_insn_and_split "*lea_cmov<mode>"
+  [(set (match_operand:SWI 0 "register_operand")
+    (plus:SWI (ashift:SWI (match_operator:SWI 1 "ix86_comparison_operator"
+                [(match_operand 2 "flags_reg_operand")
+                 (const_int 0)])
+                  (match_operand:SWI 3 "const_int_operand"))
+          (match_operand:SWI 4 "register_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_CMOVE && can_create_pseudo_p ()"

Same here, ix86_pre_reload_split should be used for
define_insn_and_split (FYI, can_create_pseudo_p is still good for
define_split where no instruction is defined).

+  "#"
+  "&& 1"
+  [(set (match_dup 5) (plus:<LEAMODE> (match_dup 4) (match_dup 6)))
+   (set (match_dup 0)
+    (if_then_else:<LEAMODE> (match_op_dup 1 [(match_dup 2) (const_int 0)])
+                (match_dup 5) (match_dup 4)))]
+{
+  operands[5] = gen_reg_rtx (<LEAMODE>mode);
+  operands[6] = GEN_INT (1 << INTVAL (operands[3]));
+  if (<MODE>mode != <LEAMODE>mode)
+    {
+      operands[0] = gen_lowpart (<LEAMODE>mode, operands[0]);
+      operands[4] = gen_lowpart (<LEAMODE>mode, operands[4]);

gen_lowpart is dangerous to use before reload. It can choke when
integer mode SUBREG of e.g. FP mode register is passed here. So you
have to either guarantee there are no unsupported subregs (but please
note that the compiler is extremely creative in this area) or you have
to force register to a pseudo (which can possibly defeat your
optimization by generating unwanted moves).

Uros.

+    }
+})
>

next prev parent reply	other threads:[~2022-04-20  7:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-19 11:58 Roger Sayle
2022-04-20  7:40 ` Uros Bizjak [this message]
2022-04-21 16:41   ` Roger Sayle
2022-04-21 18:04     ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFULd4ZhSZwaUsgrJhaZMJrHf9fer6eUHrFtRBpjdBW32D9o6g@mail.gmail.com \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).