public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/89984] Extra register move
       [not found] <bug-89984-4@http.gcc.gnu.org/bugzilla/>
@ 2021-09-01  8:42 ` pinskia at gcc dot gnu.org
  2021-09-04  6:41 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-01  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89984

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-09-01
           Keywords|                            |easyhack
     Ever confirmed|0                           |1
          Component|rtl-optimization            |target

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The problem is here:
(define_insn_and_split "@xorsign<mode>3_1"
  [(set (match_operand:MODEF 0 "register_operand" "=Yv")
        (unspec:MODEF
          [(match_operand:MODEF 1 "register_operand" "Yv")
           (match_operand:MODEF 2 "register_operand" "0")
           (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm")]
          UNSPEC_XORSIGN))]
  "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
  "#"
  "&& reload_completed"
  [(const_int 0)]
  "ix86_split_xorsign (operands); DONE;")


for AVX operand 2 does not need to be the same as operand 0.
Shouldn't be a hard change for someone starting out to improve this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/89984] Extra register move
       [not found] <bug-89984-4@http.gcc.gnu.org/bugzilla/>
  2021-09-01  8:42 ` [Bug target/89984] Extra register move pinskia at gcc dot gnu.org
@ 2021-09-04  6:41 ` pinskia at gcc dot gnu.org
  2021-09-06 12:18 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-04  6:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89984

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> The problem is here:
> (define_insn_and_split "@xorsign<mode>3_1"
>   [(set (match_operand:MODEF 0 "register_operand" "=Yv")
>         (unspec:MODEF
>           [(match_operand:MODEF 1 "register_operand" "Yv")
>            (match_operand:MODEF 2 "register_operand" "0")
>            (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm")]
>           UNSPEC_XORSIGN))]
>   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
>   "#"
>   "&& reload_completed"
>   [(const_int 0)]
>   "ix86_split_xorsign (operands); DONE;")
> 
> 
> for AVX operand 2 does not need to be the same as operand 0.
> Shouldn't be a hard change for someone starting out to improve this.

Right,  The way copysign is defined is like this:
(define_insn "@copysign<mode>3_var"
  [(set (match_operand:SSEMODEF 0 "register_operand" "=Yv,Yv,Yv,Yv,Yv")
        (unspec:SSEMODEF
          [(match_operand:SSEMODEF 2 "register_operand" "Yv,0,0,Yv,Yv")
           (match_operand:SSEMODEF 3 "register_operand" "1,1,Yv,1,Yv")
           (match_operand:<ssevecmodef> 4
             "nonimmediate_operand" "X,Yvm,Yvm,0,0")
           (match_operand:<ssevecmodef> 5
             "nonimmediate_operand" "0,Yvm,1,Yvm,1")]
          UNSPEC_COPYSIGN))
   (clobber (match_scratch:<ssevecmodef> 1 "=Yv,Yv,Yv,Yv,Yv"))]
  "(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)
   || (TARGET_SSE && (<MODE>mode == TFmode))"
  "#")

I suspect xorsign should be improved similarlly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/89984] Extra register move
       [not found] <bug-89984-4@http.gcc.gnu.org/bugzilla/>
  2021-09-01  8:42 ` [Bug target/89984] Extra register move pinskia at gcc dot gnu.org
  2021-09-04  6:41 ` pinskia at gcc dot gnu.org
@ 2021-09-06 12:18 ` cvs-commit at gcc dot gnu.org
  2021-09-06 12:35 ` hjl.tools at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-06 12:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89984

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:652bef70d392f9541b12ef65b461009c8c8fd54a

commit r12-3369-g652bef70d392f9541b12ef65b461009c8c8fd54a
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Sep 4 08:28:00 2021 -0700

    x86: Add non-destructive source to @xorsign<mode>3_1

    Add non-destructive source alternative to @xorsign<mode>3_1 for AVX.

    gcc/

            PR target/89984
            * config/i386/i386-expand.c (ix86_split_xorsign): Use operands[2].
            * config/i386/i386.md (@xorsign<mode>3_1): Add non-destructive
            source alternative for AVX.

    gcc/testsuite/

            PR target/89984
            * gcc.target/i386/pr89984-1.c: New test.
            * gcc.target/i386/pr89984-2.c: Likewise.
            * gcc.target/i386/xorsign-avx.c: Likewise.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/89984] Extra register move
       [not found] <bug-89984-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-09-06 12:18 ` cvs-commit at gcc dot gnu.org
@ 2021-09-06 12:35 ` hjl.tools at gmail dot com
  2021-09-08  8:24 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: hjl.tools at gmail dot com @ 2021-09-06 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89984

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
   Target Milestone|---                         |12.0
             Status|NEW                         |RESOLVED

--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 12.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/89984] Extra register move
       [not found] <bug-89984-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-09-06 12:35 ` hjl.tools at gmail dot com
@ 2021-09-08  8:24 ` jakub at gcc dot gnu.org
  2021-09-08 12:09 ` cvs-commit at gcc dot gnu.org
  2021-09-08 12:11 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-09-08  8:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89984

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This patch is incorrect, it breaks e.g.
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }

On top of the PR102224 patch, I've tried:
--- gcc/config/i386/i386.md.jj  2021-09-08 09:55:40.791551976 +0200
+++ gcc/config/i386/i386.md     2021-09-08 10:15:10.241011077 +0200
@@ -10919,18 +10919,19 @@ (define_expand "xorsign<mode>3"
 })

 (define_insn_and_split "@xorsign<mode>3_1"
-  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv")
+  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,Yv")
        (unspec:MODEF
-         [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv")
-          (match_operand:MODEF 2 "register_operand" "0,Yv,Yv")
-          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")]
-         UNSPEC_XORSIGN))]
+         [(match_operand:MODEF 1 "register_operand" "Yv,Yv")
+          (match_operand:MODEF 2 "register_operand" "0,Yv")
+          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm")]
+         UNSPEC_XORSIGN))
+   (clobber (match_scratch:MODEF 4 "=X,Yv"))]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   "ix86_split_xorsign (operands); DONE;"
-  [(set_attr "isa" "*,avx,avx")])
+  [(set_attr "isa" "*,avx")])


 ;; One complement instructions

--- gcc/config/i386/i386-expand.c.jj    2021-09-08 09:55:40.000000000 +0200
+++ gcc/config/i386/i386-expand.c       2021-09-08 10:04:15.470268062 +0200
@@ -2296,23 +2296,25 @@ void
 ix86_split_xorsign (rtx operands[])
 {
   machine_mode mode, vmode;
-  rtx dest, op0, op1, mask, x;
+  rtx dest, op0, op1, mask, x, temp;

   dest = operands[0];
   op0 = operands[1];
   op1 = operands[2];
   mask = operands[3];
+  temp = operands[4];

   mode = GET_MODE (dest);
   vmode = GET_MODE (mask);

-  /* The constraints ensure that for non-AVX dest == op1 is
-     different from op0, and for AVX that at most two of
-     dest, op0 and op1 are the same register but the third one
-     is different.  */
+  if (!reg_overlap_mentioned_p (dest, op0))
+    temp = dest;
+  else
+    gcc_assert (TARGET_AVX);
+
   if (rtx_equal_p (op0, op1))
     {
-      gcc_assert (TARGET_AVX && !rtx_equal_p (op0, dest));
+      gcc_assert (TARGET_AVX);
       if (vmode == V4SFmode)
        vmode = V4SImode;
       else
@@ -2323,9 +2325,9 @@ ix86_split_xorsign (rtx operands[])
       mask = lowpart_subreg (vmode, mask, GET_MODE (mask));
       if (MEM_P (mask))
        {
-         rtx msk = lowpart_subreg (vmode, dest, mode);
-         emit_insn (gen_rtx_SET (msk, mask));
-         mask = msk;
+         temp = lowpart_subreg (vmode, temp, mode);
+         emit_insn (gen_rtx_SET (temp, mask));
+         mask = temp;
        }
       op0 = lowpart_subreg (vmode, op0, mode);
       x = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, mask), op0);
@@ -2334,10 +2336,11 @@ ix86_split_xorsign (rtx operands[])
     {
       op1 = lowpart_subreg (vmode, op1, mode);
       x = gen_rtx_AND (vmode, op1, mask);
-      emit_insn (gen_rtx_SET (op1, x));
+      temp = lowpart_subreg (vmode, temp, mode);
+      emit_insn (gen_rtx_SET (temp, x));

       op0 = lowpart_subreg (vmode, op0, mode);
-      x = gen_rtx_XOR (vmode, op1, op0);
+      x = gen_rtx_XOR (vmode, temp, op0);
     }

   dest = lowpart_subreg (vmode, dest, mode);
and also with additional &Yv <- Yv, Yv, Yvm, X alternative with various ways to
disparage it slightly (^s or $s), but never convinced RA that it should use the
alternative with the scratch.
--- gcc/config/i386/i386.md.jj  2021-09-08 09:55:40.791551976 +0200
+++ gcc/config/i386/i386.md     2021-09-08 10:20:29.633498636 +0200
@@ -10919,18 +10919,19 @@
 })

 (define_insn_and_split "@xorsign<mode>3_1"
-  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv")
+  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,Yv")
        (unspec:MODEF
-         [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv")
-          (match_operand:MODEF 2 "register_operand" "0,Yv,Yv")
-          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")]
-         UNSPEC_XORSIGN))]
+         [(match_operand:MODEF 1 "register_operand" "Yv,Yv")
+          (match_operand:MODEF 2 "register_operand" "0,Yv")
+          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm")]
+         UNSPEC_XORSIGN))
+   (clobber (match_scratch:MODEF 4 "=X,&Yv"))]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   "ix86_split_xorsign (operands); DONE;"
-  [(set_attr "isa" "*,avx,avx")])
+  [(set_attr "isa" "noavx,avx")])


 ;; One complement instructions
with the same i386-expand.c changes does, but then it forces a scratch even for
the cases where it is not needed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/89984] Extra register move
       [not found] <bug-89984-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-09-08  8:24 ` jakub at gcc dot gnu.org
@ 2021-09-08 12:09 ` cvs-commit at gcc dot gnu.org
  2021-09-08 12:11 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-08 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89984

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7485a52551d71db2e8bbfc4c484196bcc321a1cd

commit r12-3417-g7485a52551d71db2e8bbfc4c484196bcc321a1cd
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Sep 8 14:06:10 2021 +0200

    i386: Fix up xorsign for AVX [PR89984]

    Thinking about it more this morning, while this patch fixes the problems
    revealed in the testcase, the recent PR89984 change was buggy too, but
    perhaps that can be fixed incrementally.  Because for AVX the new code
    destructively modifies op1.  If that is different from dest, say on:
    float
    foo (float x, float y)
    {
      return x * __builtin_copysignf (1.0f, y) + y;
    }
    then we get after RA:
    (insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (unspec:SF [
                    (reg:SF 20 xmm0 [88])
                    (reg:SF 21 xmm1 [89])
                    (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 
S16 A128])
                ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
         (nil))
    (insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
            (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
                (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
         (nil))
    but split the xorsign into:
            vandps  .LC0(%rip), %xmm1, %xmm1
            vxorps  %xmm0, %xmm1, %xmm0
    and then the addition:
            vaddss  %xmm1, %xmm0, %xmm0
    which means we miscompile it - instead of adding y in the end we add
    __builtin_copysignf (0.0f, y).
    So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
    alternative (enabled for both pre-AVX and AVX as in this patch) the
    &Yv <- Yv, Yv where destination must be different from inputs and another
    Yv <- Yv, Yv where it can be the same but then need a match_scratch
    (with X for the other alternatives and =Yv for the last one).
    That way we'd always have a safe register we can store the op1 & mask
    value into, either the destination (in the first alternative known to
    be equal to op1 which is needed for non-AVX but ok for AVX too), in the
    second alternative known to be different from both inputs and in the third
    which could be used for those
    float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
    cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
    some other register like xmm2.

    On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
    > I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
    > for scalar mode, can't we just expand them into and/xor operations in
    > the expander, just like vector modes did.

    Following seems to work for all the testcases I've tried (and in some
    generates better code than the post-reload splitter).

    2021-09-08  Jakub Jelinek  <jakub@redhat.com>
                liuhongt  <hongtao.liu@intel.com>

            PR target/89984
            * config/i386/i386.md (@xorsign<mode>3_1): Remove.
            * config/i386/i386-expand.c (ix86_expand_xorsign): Expand right
away
            into AND with mask and XOR, using paradoxical subregs.
            (ix86_split_xorsign): Remove.
            * config/i386/i386-protos.h (ix86_split_xorsign): Remove.

            * gcc.target/i386/avx-pr102224.c: Fix up PR number.
            * gcc.dg/pr89984.c: New test.
            * gcc.target/i386/avx-pr89984.c: New test.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/89984] Extra register move
       [not found] <bug-89984-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-09-08 12:09 ` cvs-commit at gcc dot gnu.org
@ 2021-09-08 12:11 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-09-08 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89984

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|REOPENED                    |RESOLVED

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Should be fixed now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-08 12:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-89984-4@http.gcc.gnu.org/bugzilla/>
2021-09-01  8:42 ` [Bug target/89984] Extra register move pinskia at gcc dot gnu.org
2021-09-04  6:41 ` pinskia at gcc dot gnu.org
2021-09-06 12:18 ` cvs-commit at gcc dot gnu.org
2021-09-06 12:35 ` hjl.tools at gmail dot com
2021-09-08  8:24 ` jakub at gcc dot gnu.org
2021-09-08 12:09 ` cvs-commit at gcc dot gnu.org
2021-09-08 12:11 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).