[Bug rtl-optimization/100085] New: Bad code for union transfer from _

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types
@ 2021-04-14 18:19 munroesj at gcc dot gnu.org
  2021-04-14 18:22 ` [Bug rtl-optimization/100085] " munroesj at gcc dot gnu.org
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2021-04-14 18:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

            Bug ID: 100085
           Summary: Bad code for union transfer from __float128 to vector
                    types
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: munroesj at gcc dot gnu.org
  Target Milestone: ---

Created attachment 50595
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50595&action=edit
Reduced example of union and __float128 to vector transfer.

GCC 10/9/8/7 will generate poor (-mcpu=power8) code when using a union to
transfer a __float128 scalar to any vector type. __float128 is a scalar type
and not typecast compatible with any vector type. Despite both being in Vector
registers. 

But for runtime codes implementing __float128 operations for -mcpu=power8 it is
useful (and faster) to perform some (data_class, conversions, etc) operations
directly in vector registers. The only solution for this is to use union to
transfer values between __float128/vector types. This should be a simple vector
register transfer and optimized as such.

But when for GCC and PowerPCle and -mcpu=power8, we are consistently seeing
store/reload sequences. For Power8 this can cause load-hit-store and pipe-line
rejects (33 cycles).

We don't see this when targeting -mcpu=power9, but power9 supports hardware
Float128 instruction. Also we don't see this when targeting BE.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
@ 2021-04-14 18:22 ` munroesj at gcc dot gnu.org
  2021-04-15  6:59 ` [Bug target/100085] " rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2021-04-14 18:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Steven Munroe <munroesj at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |munroesj at gcc dot gnu.org

--- Comment #1 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Created attachment 50596
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50596&action=edit
Compile test case fo xfer operation.

Compile for PowerPCle fo both -mcpu=power8 -mfloat128 and -mcpu=power9
-mfloat128 and see the differn asm generated.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
  2021-04-14 18:22 ` [Bug rtl-optimization/100085] " munroesj at gcc dot gnu.org
@ 2021-04-15  6:59 ` rguenth at gcc dot gnu.org
  2021-04-15 18:41 ` segher at gcc dot gnu.org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-15  6:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |powerpc
   Last reconfirmed|                            |2021-04-15
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
          Component|rtl-optimization            |target
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
RTL expansion for

vui128_t test_xfer_bin128_2_vui128t (__binary128 f128)
{
  vector(1) __int128 unsigned _3;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _3 = VIEW_CONVERT_EXPR<vector(1) __int128 unsigned>(f128_2(D));
  return _3;

power9 (-) vs power8 (+) is

 (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
-(insn 6 3 7 2 (set (mem/c:KF (reg/f:DI 112 virtual-stack-vars) [1  S16 A128])
-        (reg/v:KF 118 [ f128 ])) "vec_f128_ppc.h":143:19 -1
-     (nil))
-(insn 7 6 8 2 (set (reg:V1TI 120)
-        (mem/c:V1TI (reg/f:DI 112 virtual-stack-vars) [1  S16 A128]))
"t.c":13:10 -1
+(insn 6 3 7 2 (set (subreg:V1TI (reg:KF 120 [ f128 ]) 0)
+        (rotate:V1TI (subreg:V1TI (reg/v:KF 118 [ f128 ]) 0)
+            (const_int 64 [0x40]))) "vec_f128_ppc.h":143:19 -1
+     (nil))
+(insn 7 6 8 2 (set (mem/c:V1TI (reg/f:DI 112 virtual-stack-vars) [1  S16
A128])
+        (rotate:V1TI (subreg:V1TI (reg:KF 120 [ f128 ]) 0)
+            (const_int 64 [0x40]))) "vec_f128_ppc.h":143:19 -1
+     (nil))
+(insn 8 7 9 2 (set (reg:V2DI 122)
+        (vec_select:V2DI (mem/c:V2DI (reg/f:DI 112 virtual-stack-vars) [1  S16
A128])
+            (parallel [
+                    (const_int 1 [0x1])
+                    (const_int 0 [0])
+                ]))) "t.c":13:10 -1
+     (nil))
+(insn 9 8 10 2 (set (subreg:V2DI (reg:V1TI 121) 0)
+        (vec_select:V2DI (reg:V2DI 122)
+            (parallel [
+                    (const_int 1 [0x1])
+                    (const_int 0 [0])
+                ]))) "t.c":13:10 -1
      (nil))

so power8 avoids the stack but in turn ends up with sth that's not
optimized down the road.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
  2021-04-14 18:22 ` [Bug rtl-optimization/100085] " munroesj at gcc dot gnu.org
  2021-04-15  6:59 ` [Bug target/100085] " rguenth at gcc dot gnu.org
@ 2021-04-15 18:41 ` segher at gcc dot gnu.org
  2021-04-16 20:30 ` munroesj at gcc dot gnu.org
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: segher at gcc dot gnu.org @ 2021-04-15 18:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #3 from Segher Boessenkool <segher at gcc dot gnu.org> ---
The rotates in 6 and 7 are not merged, and neither are the vec_selects in
8 and 9.  Both should be pretty easy to do, there is no unspec in sight,
etc.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-04-15 18:41 ` segher at gcc dot gnu.org
@ 2021-04-16 20:30 ` munroesj at gcc dot gnu.org
  2021-04-29 15:04 ` munroesj at gcc dot gnu.org
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2021-04-16 20:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #4 from Steven Munroe <munroesj at gcc dot gnu.org> ---
I am seeing this a similar problem with union transfers from __float128 to
__int128.


     static inline unsigned __int128
     vec_xfer_bin128_2_int128t (__binary128 f128)
     {
       __VF_128 vunion;

       vunion.vf1 = f128;

       return (vunion.ui1);
     }

and 

unsigned __int128
test_xfer_bin128_2_int128 (__binary128 f128)
{
  return vec_xfer_bin128_2_int128t (f128);
}

generates:

0000000000000030 <test_xfer_bin128_2_int128>:
  30:   57 12 42 f0     xxswapd vs34,vs34
  34:   20 00 20 39     li      r9,32
  38:   d0 ff 41 39     addi    r10,r1,-48
  3c:   99 4f 4a 7c     stxvd2x vs34,r10,r9
  40:   f0 ff 61 e8     ld      r3,-16(r1)
  44:   f8 ff 81 e8     ld      r4,-8(r1)
  48:   20 00 80 4e     blr

For POWER8 should use mfvsrd/xxpermdi/mfvsrd.

This looks like the root cause of poor performance for __float128 soft-float on
POWER8. A simple benchmark using __float128 in C code calling libgcc for
-mcpu=power8 and then hardware instructions for -mcpu=power9.

P8 target P8AT14, Uses libgcc __addkf3_sw and __mulkf3_sw:
test_time_f128 f128 CC  tb delta = 52589, sec = 0.000102713

P9 Target P8AT14, Uses libgcc __addkf3_hw and __mulkf3_hw:
test_time_f128 f128 CC  tb delta = 18762, sec = 3.66445e-05

P9 Target P9AT14, inline hardware binary128 float:
test_time_f128 f128 CC  tb delta = 3809, sec = 7.43945e-06

I used Valgrind Itrace and Sim-ppc and perfstat analysis. Every call to libgcc
__add/sub/mul/divkf3 takes a load-hit-store flush every call. This explains why
__float128 is so 13.8 X slower on P8 then P9.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-04-16 20:30 ` munroesj at gcc dot gnu.org
@ 2021-04-29 15:04 ` munroesj at gcc dot gnu.org
  2021-04-30 19:52 ` bergner at gcc dot gnu.org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2021-04-29 15:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #5 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Any progress on this?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-04-29 15:04 ` munroesj at gcc dot gnu.org
@ 2021-04-30 19:52 ` bergner at gcc dot gnu.org
  2021-05-24  6:41 ` luoxhu at gcc dot gnu.org
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bergner at gcc dot gnu.org @ 2021-04-30 19:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #6 from Peter Bergner <bergner at gcc dot gnu.org> ---
(In reply to Steven Munroe from comment #5)
> Any progress on this?

Sorry, not yet.  We've been busy with P10 items and the gcc11 release.  It is
on our list for looking into.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-04-30 19:52 ` bergner at gcc dot gnu.org
@ 2021-05-24  6:41 ` luoxhu at gcc dot gnu.org
  2021-05-24 21:49 ` segher at gcc dot gnu.org
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-05-24  6:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

luoxhu at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |luoxhu at gcc dot gnu.org

--- Comment #7 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #3)
> The rotates in 6 and 7 are not merged, and neither are the vec_selects in
> 8 and 9.  Both should be pretty easy to do, there is no unspec in sight,
> etc.

Should this be done in pass bswaps or combine or by peephole2? :)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-05-24  6:41 ` luoxhu at gcc dot gnu.org
@ 2021-05-24 21:49 ` segher at gcc dot gnu.org
  2021-06-02  8:27 ` luoxhu at gcc dot gnu.org
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: segher at gcc dot gnu.org @ 2021-05-24 21:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to luoxhu from comment #7)
> (In reply to Segher Boessenkool from comment #3)
> > The rotates in 6 and 7 are not merged, and neither are the vec_selects in
> > 8 and 9.  Both should be pretty easy to do, there is no unspec in sight,
> > etc.
> 
> Should this be done in pass bswaps or combine or by peephole2? :)

It should be done by simplify-rtx.c at least (which will make it work in
combine
and other places): two rotates that together do nothing should be optimised to
that, or generally, two rotates should be optimised to just one (which then can
be optimised to nothing).  Similar for vec_select.  Maybe something in bswaps
can help as well, I don't know, I haven't looked closely yet.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-05-24 21:49 ` segher at gcc dot gnu.org
@ 2021-06-02  8:27 ` luoxhu at gcc dot gnu.org
  2021-06-09  5:13 ` luoxhu at gcc dot gnu.org
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-06-02  8:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #9 from luoxhu at gcc dot gnu.org ---
Patch sent, it could fix the __float128 to vector __int128 issue, 

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571689.html


But for __float128 to __int128 mentioned in #c4, need hack
rs6000_modes_tieable_p
to remove the stack operation in dse1. But I am not sure this is *LEGAL* since
TImode is allocated to GPR, It seems not true to access TImode from ALTIVEC or
VSX without copying?

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ad11b67b125..ee69463ac46 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1974,6 +1974,9 @@ rs6000_modes_tieable_p (machine_mode mode1, machine_mode
mode2)
       || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
     return mode1 == mode2;

+  if (mode1 == TImode && ALTIVEC_OR_VSX_VECTOR_MODE (mode2))
+    return true;
+


        xxpermdi %vs0,%vs34,%vs34,3
        mfvsrd %r4,%vs34
        mfvsrd %r3,%vs0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-06-02  8:27 ` luoxhu at gcc dot gnu.org
@ 2021-06-09  5:13 ` luoxhu at gcc dot gnu.org
  2021-06-09 21:35 ` bergner at gcc dot gnu.org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-06-09  5:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #10 from luoxhu at gcc dot gnu.org ---
float128 to vector __int128 is fixed by:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f700e4b0ee3ef53b48975cf89be26b9177e3a3f3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-06-09  5:13 ` luoxhu at gcc dot gnu.org
@ 2021-06-09 21:35 ` bergner at gcc dot gnu.org
  2021-06-09 22:08 ` segher at gcc dot gnu.org
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bergner at gcc dot gnu.org @ 2021-06-09 21:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #11 from Peter Bergner <bergner at gcc dot gnu.org> ---
(In reply to luoxhu from comment #9)
> But for __float128 to __int128 mentioned in #c4, need hack
> rs6000_modes_tieable_p
> to remove the stack operation in dse1. But I am not sure this is *LEGAL*
> since TImode is allocated to GPR, It seems not true to access TImode from
> ALTIVEC or VSX without copying?

We used to have a -mvsx-timode option which allowed TImode pseudos into the VSX
registers.  We deprecated the option a while back and basically always allow
TImode in the VSX registers now.  I would say we even prefer them in VSX
registers over GOR registers.  The only "issue" is that our ABIs define
parameter passing and return values for TImode values go through the GPRs. :-(

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-06-09 21:35 ` bergner at gcc dot gnu.org
@ 2021-06-09 22:08 ` segher at gcc dot gnu.org
  2021-06-10 15:00 ` munroesj at gcc dot gnu.org
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-09 22:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #12 from Segher Boessenkool <segher at gcc dot gnu.org> ---
We want to use plain TImode instead of V1TImode on newer cpus.  It probably is
a good idea (for performance) on p9 already, but this will need testing. That's
only sideways related to this issue though (but so is -mvsx-timode :-) )

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2021-06-09 22:08 ` segher at gcc dot gnu.org
@ 2021-06-10 15:00 ` munroesj at gcc dot gnu.org
  2021-06-11 20:28 ` segher at gcc dot gnu.org
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2021-06-10 15:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #13 from Steven Munroe <munroesj at gcc dot gnu.org> ---
"We want to use plain TImode instead of V1TImode on newer cpus."

Actually I disagree. We have vector __int128 in the ABI and with POWER10 a
complete set arithmetic operations for 128-bit in VRs.

Also this issue is not restricted to TImode. It also effects _Float128
(KFmode), _ibm128 (TFmode) and Libmvec for vector float/double. The proper and
optimum handling of these "union transfers" has been broken in GCC for years.

And I have grave reservations about the vague plans of small/fringe minority to
subset the PowerISA for their convenience.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2021-06-10 15:00 ` munroesj at gcc dot gnu.org
@ 2021-06-11 20:28 ` segher at gcc dot gnu.org
  2022-01-14 17:17 ` wschmidt at gcc dot gnu.org
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-11 20:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #14 from Segher Boessenkool <segher at gcc dot gnu.org> ---
We *have* TImode already, but most 128-bit scalars currently use V1TImode.
This often leads to reduced performance because that is not a scalar mode,
does not get all optimisations we have generically for all other integer
scalars.  We have to do a lot of it manually, which is a lot of (combine)
patterns, and we still miss almost all cases.

I am not saying we should remove V1TImode.  I am saying we want to use
plain TImode for scalars, on newer cpus.  On p8 we had V1TImode so that
we could reduce the traffic between the vector register files and the
GPR register file, because that was very costly on p8 (mtvsr* and mfvsr*
were 5 cycles, and mtvsrdd and mfvsrld didn't even exist yet).

Using V1TImode for scalars on p8 was a pretty big win.  It should be a win
again to use TImode on later cpus though.

> And I have grave reservations about the vague plans of small/fringe minority to 
> subset the PowerISA for their convenience.

I don't have reservations about that.  Instead, I battle that with all I can.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2021-06-11 20:28 ` segher at gcc dot gnu.org
@ 2022-01-14 17:17 ` wschmidt at gcc dot gnu.org
  2022-02-24 20:48 ` munroesj at gcc dot gnu.org
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2022-01-14 17:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Bill Schmidt <wschmidt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #15 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
This was fixed a while back in r12-1316 by Xiong Hu Luo.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2022-01-14 17:17 ` wschmidt at gcc dot gnu.org
@ 2022-02-24 20:48 ` munroesj at gcc dot gnu.org
  2022-02-24 20:53 ` munroesj at gcc dot gnu.org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-02-24 20:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #16 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Created attachment 52510
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52510&action=edit
Reduced tests for xfers from _float128 to vector or __int128

Cover more types including __int128 and vector __int128

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2022-02-24 20:48 ` munroesj at gcc dot gnu.org
@ 2022-02-24 20:53 ` munroesj at gcc dot gnu.org
  2022-02-24 21:17 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-02-24 20:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Steven Munroe <munroesj at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #17 from Steven Munroe <munroesj at gcc dot gnu.org> ---
I don't think this is fixed.

The fix was supposed to be back-ported to GCC11 for Advance Toolchain 15.

The updated test case shoes that this is clearly not working as advertised.

Either GCC12 fix has regressed due to subsequent updates or the AT15 GCC11
back-port fails due to some missing/different code between GCC11/12.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2022-02-24 20:53 ` munroesj at gcc dot gnu.org
@ 2022-02-24 21:17 ` segher at gcc dot gnu.org
  2022-02-24 21:22 ` segher at gcc dot gnu.org
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: segher at gcc dot gnu.org @ 2022-02-24 21:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |WAITING

--- Comment #18 from Segher Boessenkool <segher at gcc dot gnu.org> ---
What do you see, what do you want to see?

For me (powerpc64-linux -mcpu=power10) I see three empty functions, and
for the last

        stxv 34,-16(1)
        ld 3,-16(1)
        ld 4,-8(1)
        blr

(and the same for power7 and later, just less efficient until p9; and the
same with -mlittle -mabi=elfv2).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2022-02-24 21:17 ` segher at gcc dot gnu.org
@ 2022-02-24 21:22 ` segher at gcc dot gnu.org
  2022-02-24 21:26 ` segher at gcc dot gnu.org
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: segher at gcc dot gnu.org @ 2022-02-24 21:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #19 from Segher Boessenkool <segher at gcc dot gnu.org> ---
And the same with all of GCC 8, GCC 9, GCC 10, GCC 11, and current trunk.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2022-02-24 21:22 ` segher at gcc dot gnu.org
@ 2022-02-24 21:26 ` segher at gcc dot gnu.org
  2022-02-25 15:31 ` munroesj at gcc dot gnu.org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: segher at gcc dot gnu.org @ 2022-02-24 21:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |REOPENED

--- Comment #20 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Ah, there are problems with -mcpu=power8 -mlittle -mabi=elfv2, on GCC 11
and before.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2022-02-24 21:26 ` segher at gcc dot gnu.org
@ 2022-02-25 15:31 ` munroesj at gcc dot gnu.org
  2022-02-25 22:57 ` segher at gcc dot gnu.org
  2022-02-26 16:22 ` munroesj at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-02-25 15:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #21 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Yes I was told by Peter Bergner that the fix from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085#c15 had been back ported
top AT15.0-1.

But when ran this test with AT15.0-1 I saw:

0000000000000000 <test_xfer_bin128_2_vui32t_V0>:
   0:   20 00 20 39     li      r9,32
   4:   d0 ff 41 39     addi    r10,r1,-48
   8:   57 12 42 f0     xxswapd vs34,vs34
   c:   99 4f 4a 7c     stxvd2x vs34,r10,r9
  10:   ce 48 4a 7c     lvx     v2,r10,r9
  14:   20 00 80 4e     blr

0000000000000030 <test_xfer_bin128_2_vui64t_V0>:
  30:   20 00 20 39     li      r9,32
  34:   d0 ff 41 39     addi    r10,r1,-48
  38:   57 12 42 f0     xxswapd vs34,vs34
  3c:   99 4f 4a 7c     stxvd2x vs34,r10,r9
  40:   ce 48 4a 7c     lvx     v2,r10,r9
  44:   20 00 80 4e     blr

0000000000000060 <test_xfer_bin128_2_vui128t_V0>:
  60:   20 00 20 39     li      r9,32
  64:   d0 ff 41 39     addi    r10,r1,-48
  68:   57 12 42 f0     xxswapd vs34,vs34
  6c:   99 4f 4a 7c     stxvd2x vs34,r10,r9
  70:   99 4e 4a 7c     lxvd2x  vs34,r10,r9
  74:   57 12 42 f0     xxswapd vs34,vs34
  78:   20 00 80 4e     blr

0000000000000090 <test_xfer_bin128_2_ui128t_V0>:
  90:   57 12 42 f0     xxswapd vs34,vs34
  94:   20 00 40 39     li      r10,32
  98:   d0 ff 01 39     addi    r8,r1,-48
  9c:   f0 ff 21 39     addi    r9,r1,-16
  a0:   99 57 48 7c     stxvd2x vs34,r8,r10
  a4:   00 00 69 e8     ld      r3,0(r9)
  a8:   08 00 89 e8     ld      r4,8(r9)
  ac:   20 00 80 4e     blr

So either the patch for AT15.0-1 is not applied correctly or is non-functional
because of some difference between GCC11/GCC12. Or regressed because of some
other change/patch.

In my experience this part of GCC is fragile (based on the long/sad history of
IBM long double). So this needs to monitored with each new update.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2022-02-25 15:31 ` munroesj at gcc dot gnu.org
@ 2022-02-25 22:57 ` segher at gcc dot gnu.org
  2022-02-26 16:22 ` munroesj at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: segher at gcc dot gnu.org @ 2022-02-25 22:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #22 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Well, we do not do anything AT here; but the patch is not on the GCC 11
branch either.

Xiong Hu, does it backport there cleanly?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/100085] Bad code for union transfer from __float128 to vector types
  2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2022-02-25 22:57 ` segher at gcc dot gnu.org
@ 2022-02-26 16:22 ` munroesj at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-02-26 16:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #23 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Ok, but I strongly recommend a compiler test that verify that the compiler is
generating the expected code (for this and other cases).

We have a history of common code changes (accidental or deliberate) causing
regressions for POWER targets.

Best to find these early, before they impact customer performance.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-02-26 16:22 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14 18:19 [Bug rtl-optimization/100085] New: Bad code for union transfer from __float128 to vector types munroesj at gcc dot gnu.org
2021-04-14 18:22 ` [Bug rtl-optimization/100085] " munroesj at gcc dot gnu.org
2021-04-15  6:59 ` [Bug target/100085] " rguenth at gcc dot gnu.org
2021-04-15 18:41 ` segher at gcc dot gnu.org
2021-04-16 20:30 ` munroesj at gcc dot gnu.org
2021-04-29 15:04 ` munroesj at gcc dot gnu.org
2021-04-30 19:52 ` bergner at gcc dot gnu.org
2021-05-24  6:41 ` luoxhu at gcc dot gnu.org
2021-05-24 21:49 ` segher at gcc dot gnu.org
2021-06-02  8:27 ` luoxhu at gcc dot gnu.org
2021-06-09  5:13 ` luoxhu at gcc dot gnu.org
2021-06-09 21:35 ` bergner at gcc dot gnu.org
2021-06-09 22:08 ` segher at gcc dot gnu.org
2021-06-10 15:00 ` munroesj at gcc dot gnu.org
2021-06-11 20:28 ` segher at gcc dot gnu.org
2022-01-14 17:17 ` wschmidt at gcc dot gnu.org
2022-02-24 20:48 ` munroesj at gcc dot gnu.org
2022-02-24 20:53 ` munroesj at gcc dot gnu.org
2022-02-24 21:17 ` segher at gcc dot gnu.org
2022-02-24 21:22 ` segher at gcc dot gnu.org
2022-02-24 21:26 ` segher at gcc dot gnu.org
2022-02-25 15:31 ` munroesj at gcc dot gnu.org
2022-02-25 22:57 ` segher at gcc dot gnu.org
2022-02-26 16:22 ` munroesj at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).