public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
@ 2010-09-29 12:45 ` joakim.tjernlund at transmode dot se
  2015-01-18 16:34 ` segher at gcc dot gnu.org
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-09-29 12:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #18 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-09-29 09:02:49 UTC ---
I hope you don't mind me asking for status again(because I am curious)?

Upgraded to gcc 4.4.4 now and I noticed one (small)improvement:
add32carry:
        add 3,3,4
        subfc 0,4,3
        subfe 0,0,0
        subf 0,0,3
        mr 3,0

so one subfe becomes subf but the extra mr insn is still
there.( gcc 3.4.6 doesn't add that)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
  2010-09-29 12:45 ` [Bug target/43892] PowerPC suboptimal "add with carry" optimization joakim.tjernlund at transmode dot se
@ 2015-01-18 16:34 ` segher at gcc dot gnu.org
  2015-01-18 17:00 ` joakim.tjernlund at transmode dot se
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2015-01-18 16:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #19 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Current code:

add 3,3,4
subfc 4,4,3
subfe 9,9,9
subf 3,9,3

so we got rid of the useless register move.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
  2010-09-29 12:45 ` [Bug target/43892] PowerPC suboptimal "add with carry" optimization joakim.tjernlund at transmode dot se
  2015-01-18 16:34 ` segher at gcc dot gnu.org
@ 2015-01-18 17:00 ` joakim.tjernlund at transmode dot se
  2015-01-18 20:31 ` segher at gcc dot gnu.org
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2015-01-18 17:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #20 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> ---
(In reply to Segher Boessenkool from comment #19)
> Current code:
> 
> add 3,3,4
> subfc 4,4,3
> subfe 9,9,9
> subf 3,9,3
> 
> so we got rid of the useless register move.

Which gcc version?

Any progress with the bigger change(cc expander)?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2015-01-18 17:00 ` joakim.tjernlund at transmode dot se
@ 2015-01-18 20:31 ` segher at gcc dot gnu.org
  2015-01-18 21:44 ` joakim.tjernlund at transmode dot se
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2015-01-18 20:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #21 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Mainine (will be GCC 5 in a few months).

There is no addcc thing, that is not suitable for PowerPC.
The big changes are in though (and they are much bigger than
I originally thought, fwiw -- scope creep ;-) )


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2015-01-18 20:31 ` segher at gcc dot gnu.org
@ 2015-01-18 21:44 ` joakim.tjernlund at transmode dot se
  2015-01-18 22:54 ` segher at gcc dot gnu.org
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2015-01-18 21:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #22 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> ---
(In reply to Segher Boessenkool from comment #21)
> Mainine (will be GCC 5 in a few months).
> 
> There is no addcc thing, that is not suitable for PowerPC.
> The big changes are in though (and they are much bigger than
> I originally thought, fwiw -- scope creep ;-) )

Nice, but why is not addcc suitable for powerpc?
I guess you mean in its current form, it needs to be adapted to
really fit ppc(so that a loop with addcc in it is optimal)?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2015-01-18 21:44 ` joakim.tjernlund at transmode dot se
@ 2015-01-18 22:54 ` segher at gcc dot gnu.org
  2015-01-19  0:05 ` joakim.tjernlund at transmode dot se
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2015-01-18 22:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #23 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Do you know what addcc does?  PowerPC does not have any instruction
that behaves like it at all.  So it would have to expand to a big
fat sequence of instructions, that then hopefully are optimised to
something sane later.  Instead, the current code expands to something
sane immediately.

The problem for this testcase (as for all other similar "manual carry"
cases) is that we need to replace the carry in our original patterns
with (1 - carry).  There is no good way to do that in combine, we would
need to combine three insns, the first two of which are a parallel.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2015-01-18 22:54 ` segher at gcc dot gnu.org
@ 2015-01-19  0:05 ` joakim.tjernlund at transmode dot se
  2020-10-20 18:37 ` christophe.leroy at csgroup dot eu
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2015-01-19  0:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #24 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> ---
(In reply to Segher Boessenkool from comment #23)
> Do you know what addcc does?  PowerPC does not have any instruction

No, just guessing :) To me it was generic way to express add with carry

> that behaves like it at all.  So it would have to expand to a big
> fat sequence of instructions, that then hopefully are optimised to
> something sane later.  Instead, the current code expands to something
> sane immediately.

I was hoping that this 
    add 3,3,4
    subfc 0,4,3
    subfe 0,0,0
    subfc 0,0,3
    mr 3,0
could be this instead, once you had notion for carry in gcc for ppc:
    addc 3,3,4
    addze 3,3

the optimized loop would be an extra bonus


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2015-01-19  0:05 ` joakim.tjernlund at transmode dot se
@ 2020-10-20 18:37 ` christophe.leroy at csgroup dot eu
  2020-10-20 18:59 ` segher at gcc dot gnu.org
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 18:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

Christophe Leroy <christophe.leroy at csgroup dot eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |christophe.leroy at csgroup dot eu

--- Comment #25 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
With GCC 10.1, I still get:

00000000 <f>:
   0:   7c 63 22 14     add     r3,r3,r4
   4:   7c 84 18 10     subfc   r4,r4,r3
   8:   7d 29 49 10     subfe   r9,r9,r9
   c:   7c 69 18 50     subf    r3,r9,r3
  10:   4e 80 00 20     blr

Any plan to get the expected adde/addze instead ?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2020-10-20 18:37 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 18:59 ` segher at gcc dot gnu.org
  2020-10-20 19:37 ` joakim.tjernlund at infinera dot com
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2020-10-20 18:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #26 from Segher Boessenkool <segher at gcc dot gnu.org> ---
It isn't easy to do.  Feel free to try your hand at it :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2020-10-20 18:59 ` segher at gcc dot gnu.org
@ 2020-10-20 19:37 ` joakim.tjernlund at infinera dot com
  2020-10-21  6:09 ` christophe.leroy at csgroup dot eu
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: joakim.tjernlund at infinera dot com @ 2020-10-20 19:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #27 from Joakim Tjernlund <joakim.tjernlund at infinera dot com> ---
It has been 10 years, it is not that hard :)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2020-10-20 19:37 ` joakim.tjernlund at infinera dot com
@ 2020-10-21  6:09 ` christophe.leroy at csgroup dot eu
  2020-10-21 19:28 ` segher at gcc dot gnu.org
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-21  6:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #28 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Looks like we have a way to do it. Works at least with GCC 5.5, 8.2, 9.2, 10.1

unsigned long g(unsigned long a, unsigned long b)
{
        unsigned long long s = (unsigned long long)a + (unsigned long long)b;

        return (s >> 32) + s;
}

00000020 <g>:
  20:   7c 63 20 14     addc    r3,r3,r4
  24:   7c 63 01 94     addze   r3,r3
  28:   4e 80 00 20     blr



Though GCC 4.9.4 does:

00000014 <g>:
  14:   7c 69 1b 78     mr      r9,r3
  18:   7c 8b 23 78     mr      r11,r4
  1c:   39 00 00 00     li      r8,0
  20:   39 40 00 00     li      r10,0
  24:   7d 6b 48 14     addc    r11,r11,r9
  28:   7d 4a 41 14     adde    r10,r10,r8
  2c:   7c 6a 5a 14     add     r3,r10,r11
  30:   4e 80 00 20     blr

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2020-10-21  6:09 ` christophe.leroy at csgroup dot eu
@ 2020-10-21 19:28 ` segher at gcc dot gnu.org
  2020-10-21 21:01 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2020-10-21 19:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #29 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Yup, and that is a more elegant way of writing this anyway.  But we
still do not handle the exact testcase code optimally ;-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2020-10-21 19:28 ` segher at gcc dot gnu.org
@ 2020-10-21 21:01 ` jakub at gcc dot gnu.org
  2020-10-21 21:31 ` segher at gcc dot gnu.org
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-21 21:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #30 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Well, there is another way how to write it:
  u32 z;
  u32 y = __builtin_add_overflow (sum, x, &z);
  return y + z;
and for the last 5 years GCC even pattern recognizes the #c0 way as the above
one.
Except that the powerpc backend doesn't define uaddv*4/usubv*4 expanders that
are needed for that (unlike aarch64, arm, x86, sparc, visium).
But if powerpc has instructions to efficiently handle double-word
additions/subtractions (as seen in this PR it has), it is really desirable to
implement those.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2020-10-21 21:01 ` jakub at gcc dot gnu.org
@ 2020-10-21 21:31 ` segher at gcc dot gnu.org
  2021-06-03  1:50 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2020-10-21 21:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #31 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Performing a jump based on the carry bit is not something we can
easily do (there are no simple insns for it, and those sequences
that will do the trick are expensive).  But I'll look at that,
thanks for the hint!  At least in the __builtin_add_overflow case
most of it will be ootimised away :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2020-10-21 21:31 ` segher at gcc dot gnu.org
@ 2021-06-03  1:50 ` pinskia at gcc dot gnu.org
  2021-06-03  4:33 ` segher at gcc dot gnu.org
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-03  1:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #32 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> No.
> 
> Actually compilable testcase:
> 
> typedef unsigned int u32;
> 
> u32
> add32carry(u32 sum, u32 x)
> {
>   u32 z = sum + x;
>   if (sum + x < x)
>     z++;
>   return z;
> }
> 
> u32
> loop(u32 *buf, int len)
> {
>   u32 sum = 0;
>   for(; len; --len)
>     sum = add32carry(sum, *++buf);
>   return sum;
> }


Note on the trunk this code is recognized at least on the gimple level as add
with overflow and does:
  _7 = .ADD_OVERFLOW (sum_2(D), x_3(D));
  z_4 = REALPART_EXPR <_7>;
  _8 = IMAGPART_EXPR <_7>;
  if (_8 != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  z_5 = z_4 + 1;

  <bb 4> [local count: 1073741824]:
  # z_1 = PHI <z_4(2), z_5(3)>

---- CUT ---
So it is more about the back-end of PowerPC at this point.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2021-06-03  1:50 ` pinskia at gcc dot gnu.org
@ 2021-06-03  4:33 ` segher at gcc dot gnu.org
  2021-06-03  7:00 ` joakim.tjernlund at infinera dot com
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-03  4:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #33 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #32)
> So it is more about the back-end of PowerPC at this point.

For the testcase

===
typedef unsigned int u32;
typedef unsigned long long u64;

u32 f(u32 a, u32 b)
{
        u32 s = a + b;
        if (a + b < b)
                s++;
        return s;
}

u32 g(u32 *p, u32 n)
{
        u32 s = 0;
        while (n--)
                s = f(s, *p++);
        return s;
}

u32 g4(u32 *p)
{
        u32 s = 0;
        s = f(s, *p++);
        s = f(s, *p++);
        s = f(s, *p++);
        s = f(s, *p++);
        return s;
}

u32 h4(u32 *p)
{
        u64 s = 0;
        s += *p++;imple
        s += *p++;
        s += *p++;
        s += *p++;
        s = (s >> 32) + (u32)s;
        s = (s >> 32) + (u32)s;
        return s;
}
===

... GCC does not do anything with ADD_OVERFLOW.  But all *do* compile
to reasonable code (albeit not optimal).  So no, you cannot say Gimple
is super here and it is all the backend's fault :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2021-06-03  4:33 ` segher at gcc dot gnu.org
@ 2021-06-03  7:00 ` joakim.tjernlund at infinera dot com
  2021-06-03 19:32 ` segher at gcc dot gnu.org
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: joakim.tjernlund at infinera dot com @ 2021-06-03  7:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #34 from Joakim Tjernlund <joakim.tjernlund at infinera dot com> ---
(In reply to Christophe Leroy from comment #28)
> Looks like we have a way to do it. Works at least with GCC 5.5, 8.2, 9.2,
> 10.1
> 
> unsigned long g(unsigned long a, unsigned long b)
> {
> 	unsigned long long s = (unsigned long long)a + (unsigned long long)b;
> 
> 	return (s >> 32) + s;
> }
> 
> 00000020 <g>:
>   20:	7c 63 20 14 	addc    r3,r3,r4
>   24:	7c 63 01 94 	addze   r3,r3
>   28:	4e 80 00 20 	blr
> 

Does that work when placed in a loop to?
Something close to the example in the first comment:

for(;len; --len)
   sum = add32carry(sum, *++buf);


        addic 3, 3, 0 /* clear carry */
.L31:
        lwzu 0,4(9)
        adde 3, 3, 0 /* add with carry */
        bdnz .L31

        addze 3, 3 /* add in final carry */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2021-06-03  7:00 ` joakim.tjernlund at infinera dot com
@ 2021-06-03 19:32 ` segher at gcc dot gnu.org
  2021-12-03 19:44 ` roger at nextmovesoftware dot com
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-03 19:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #35 from Segher Boessenkool <segher at gcc dot gnu.org> ---
You get something like

.L5:
        lwzu 9,4(10)
        addc 8,3,9
        adde 3,9,3
        bdnz .L5

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2021-06-03 19:32 ` segher at gcc dot gnu.org
@ 2021-12-03 19:44 ` roger at nextmovesoftware dot com
  2023-06-06 20:14 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 28+ messages in thread
From: roger at nextmovesoftware dot com @ 2021-12-03 19:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #36 from Roger Sayle <roger at nextmovesoftware dot com> ---
Patch proposed:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586169.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2021-12-03 19:44 ` roger at nextmovesoftware dot com
@ 2023-06-06 20:14 ` jakub at gcc dot gnu.org
  2023-08-29 16:41 ` bergner at gcc dot gnu.org
  2023-08-29 17:43 ` roger at nextmovesoftware dot com
  21 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-06 20:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #37 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What happened with this patch?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2023-06-06 20:14 ` jakub at gcc dot gnu.org
@ 2023-08-29 16:41 ` bergner at gcc dot gnu.org
  2023-08-29 17:43 ` roger at nextmovesoftware dot com
  21 siblings, 0 replies; 28+ messages in thread
From: bergner at gcc dot gnu.org @ 2023-08-29 16:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #38 from Peter Bergner <bergner at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #37)
> What happened with this patch?

It looks like David approved Roger's patch here:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586813.html

...but it was never committed upstream.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
       [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2023-08-29 16:41 ` bergner at gcc dot gnu.org
@ 2023-08-29 17:43 ` roger at nextmovesoftware dot com
  21 siblings, 0 replies; 28+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-08-29 17:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #39 from Roger Sayle <roger at nextmovesoftware dot com> ---
My apologies for dropping the ball on this patch (series)... My only access to
PowerPC hardware is/was via the GCC compile farm, which complicates things.

Shortly after David's approval, Segher enquired whether the patch could be
modified to also handle -mcpu=power10 (which represents carry differently):
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586868.html

Trying to (also) address this then openned up a rabbit hole/can of worms
related to how middle-end (and rs6000.md) represents overflow, which included a
combine patch:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586572.html

Soon after GCC entered stage 4 (or stage 3), and the above patches (and an
unsubmitted one for power10) simply got lost in the backlog.  I believe this
patch is sound, but unfortunately I don't have the bandwidth/patience to
(re)check it against mainline on (multiple variants of) rs6000.

If one of the IBM folks could take it from here, that'd be much appreciated.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
  2010-04-26 13:33 [Bug rtl-optimization/43892] New: " gcc-bugzilla at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2010-05-27  1:37 ` dje at gcc dot gnu dot org
@ 2010-05-27  7:33 ` joakim dot tjernlund at transmode dot se
  5 siblings, 0 replies; 28+ messages in thread
From: joakim dot tjernlund at transmode dot se @ 2010-05-27  7:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from joakim dot tjernlund at transmode dot se  2010-05-27 07:33 -------
(In reply to comment #16)
> >> You have no patience, now do you?
> 
> > Sure I do. It is just that its been almost a month and from the
> > description it sounded like an easy fix:
> > "config/rs6000/rs6000.md would need to add a add<mode>cc expander"
> 
> No you do not have any patience; in fact, your comments are rather obnoxious,
> such as: "its been almost a month".  If you do not know what you are talking
> about, stop talking.  No, it is not an easy fix.  The high-level concept and
> description is simple, the implementation is extremely complex and tedious.

Oops, sorry if I sounded obnoxious. Of course I don't know what I am talking
about. I am not a gcc dev. and gcc is a complex piece of SW but I don't
think I should just shut up either. Now that you have made it clear what is
involved I will stop pestering you, I was hoping you would have a small
patch for me to test but I see now that it won't happen.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
  2010-04-26 13:33 [Bug rtl-optimization/43892] New: " gcc-bugzilla at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2010-05-26 20:47 ` joakim dot tjernlund at transmode dot se
@ 2010-05-27  1:37 ` dje at gcc dot gnu dot org
  2010-05-27  7:33 ` joakim dot tjernlund at transmode dot se
  5 siblings, 0 replies; 28+ messages in thread
From: dje at gcc dot gnu dot org @ 2010-05-27  1:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from dje at gcc dot gnu dot org  2010-05-27 01:37 -------
>> You have no patience, now do you?

> Sure I do. It is just that its been almost a month and from the
> description it sounded like an easy fix:
> "config/rs6000/rs6000.md would need to add a add<mode>cc expander"

No you do not have any patience; in fact, your comments are rather obnoxious,
such as: "its been almost a month".  If you do not know what you are talking
about, stop talking.  No, it is not an easy fix.  The high-level concept and
description is simple, the implementation is extremely complex and tedious.

>> even though it sounded that the
>> initial fix was easy(add<mode>cc expander)
> 
>> The fix will be a few thousand lines of patch.  Literally.

> Oops, just to add a cc expander?

Yes.  Again, if you do not know the complexity of what you are requesting, get
more information instead of acting annoyed that people are not jumping to solve
your problem.

It is not "just adding" a cc expander.  Do you even know what that means or
what it involves?  For the expander to be effective, the PowerPC port of GCC
needs to be taught to track the carry bit, which it currently does not.  *ALL*
patterns that produce instructions affecting the carry bit must be updated. 
One cannot add the pattern in isolation.

If you do not understand the implications of your request, then *ask* why it is
more complicated than you assumed.  There is no "simple" fix.  The only fix is
the ultimate fix: completely propagating the carry bit throughout the PowerPC
port.

You apparently have not read the documentation to understand the -mcpu= option
or the --with-cpu= configure option.  You are making a lot of incorrect
assumption and assertions, apparently without making any effort to gain some
knowledge before you start writing.  That really does not encourage anyone to
help you, especially when it requires a lot of work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
  2010-04-26 13:33 [Bug rtl-optimization/43892] New: " gcc-bugzilla at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2010-05-26 16:46 ` segher at kernel dot crashing dot org
@ 2010-05-26 20:47 ` joakim dot tjernlund at transmode dot se
  2010-05-27  1:37 ` dje at gcc dot gnu dot org
  2010-05-27  7:33 ` joakim dot tjernlund at transmode dot se
  5 siblings, 0 replies; 28+ messages in thread
From: joakim dot tjernlund at transmode dot se @ 2010-05-26 20:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from joakim dot tjernlund at transmode dot se  2010-05-26 20:47 -------
(In reply to comment #14)
> (In reply to comment #13)
> > > Please see -mcpu= .
> > 
> > Almost forgot, but how do I specify that at gcc build/configure ?
> 
> You can configure with --with-cpu= to set a default for -mcpu= .

Thanks.

> 
> > Also, I haven't seen any progress on this issue
> 
> You have no patience, now do you?

Sure I do. It is just that its been almost a month and from the
description it sounded like an easy fix:
"config/rs6000/rs6000.md would need to add a add<mode>cc expander"

> 
> > even though it sounded that the
> > initial fix was easy(add<mode>cc expander)
> 
> The fix will be a few thousand lines of patch.  Literally.

Oops, just to add a cc expander?

> 
> In order to fix this problem (and a whole host of way more important
> missed optimisation opportunities) we need to expose the CA bit to
> the compiler as an actual register.  Currently, whenever GCC uses the
> carry bit it does so by having the consumer and producer in a canned
> asm sequence; this is suboptimal for many reasons.

This looks like you are aiming for the ultimate impl. so
you can address other cases too. I can understand that takes
some time :) 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
  2010-04-26 13:33 [Bug rtl-optimization/43892] New: " gcc-bugzilla at gcc dot gnu dot org
  2010-05-21 17:42 ` [Bug target/43892] " segher at gcc dot gnu dot org
  2010-05-25 21:42 ` joakim dot tjernlund at transmode dot se
@ 2010-05-26 16:46 ` segher at kernel dot crashing dot org
  2010-05-26 20:47 ` joakim dot tjernlund at transmode dot se
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: segher at kernel dot crashing dot org @ 2010-05-26 16:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from segher at kernel dot crashing dot org  2010-05-26 16:46 -------
(In reply to comment #13)
> > Please see -mcpu= .
> 
> Almost forgot, but how do I specify that at gcc build/configure ?

You can configure with --with-cpu= to set a default for -mcpu= .

> Also, I haven't seen any progress on this issue

You have no patience, now do you?

> even though it sounded that the
> initial fix was easy(add<mode>cc expander)

The fix will be a few thousand lines of patch.  Literally.

In order to fix this problem (and a whole host of way more important
missed optimisation opportunities) we need to expose the CA bit to
the compiler as an actual register.  Currently, whenever GCC uses the
carry bit it does so by having the consumer and producer in a canned
asm sequence; this is suboptimal for many reasons.

Fixing it properly will take a while.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
  2010-04-26 13:33 [Bug rtl-optimization/43892] New: " gcc-bugzilla at gcc dot gnu dot org
  2010-05-21 17:42 ` [Bug target/43892] " segher at gcc dot gnu dot org
@ 2010-05-25 21:42 ` joakim dot tjernlund at transmode dot se
  2010-05-26 16:46 ` segher at kernel dot crashing dot org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: joakim dot tjernlund at transmode dot se @ 2010-05-25 21:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from joakim dot tjernlund at transmode dot se  2010-05-25 21:42 -------
(In reply to comment #12)
> (In reply to comment #11)
> > If this is the case for something as simple as add with carry, one really
> > needs a simple way to tell gcc what ppc class CPU one wants to use.
> 
> Please see -mcpu= .
> 

Almost forgot, but how do I specify that at gcc build/configure ?

Also, I haven't seen any progress on this issue even though it sounded that the
initial fix was easy(add<mode>cc expander)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/43892] PowerPC suboptimal "add with carry" optimization
  2010-04-26 13:33 [Bug rtl-optimization/43892] New: " gcc-bugzilla at gcc dot gnu dot org
@ 2010-05-21 17:42 ` segher at gcc dot gnu dot org
  2010-05-25 21:42 ` joakim dot tjernlund at transmode dot se
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu dot org @ 2010-05-21 17:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from segher at gcc dot gnu dot org  2010-05-21 17:42 -------
(In reply to comment #11)
> If this is the case for something as simple as add with carry, one really
> needs a simple way to tell gcc what ppc class CPU one wants to use.

Please see -mcpu= .


-- 

segher at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu dot
                   |                            |org
         AssignedTo|unassigned at gcc dot gnu   |segher at gcc dot gnu dot
                   |dot org                     |org
           Severity|normal                      |enhancement
             Status|NEW                         |ASSIGNED
          Component|regression                  |target
   Last reconfirmed|2010-04-26 13:52:59         |2010-05-21 17:42:28
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-08-29 17:43 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-43892-4@http.gcc.gnu.org/bugzilla/>
2010-09-29 12:45 ` [Bug target/43892] PowerPC suboptimal "add with carry" optimization joakim.tjernlund at transmode dot se
2015-01-18 16:34 ` segher at gcc dot gnu.org
2015-01-18 17:00 ` joakim.tjernlund at transmode dot se
2015-01-18 20:31 ` segher at gcc dot gnu.org
2015-01-18 21:44 ` joakim.tjernlund at transmode dot se
2015-01-18 22:54 ` segher at gcc dot gnu.org
2015-01-19  0:05 ` joakim.tjernlund at transmode dot se
2020-10-20 18:37 ` christophe.leroy at csgroup dot eu
2020-10-20 18:59 ` segher at gcc dot gnu.org
2020-10-20 19:37 ` joakim.tjernlund at infinera dot com
2020-10-21  6:09 ` christophe.leroy at csgroup dot eu
2020-10-21 19:28 ` segher at gcc dot gnu.org
2020-10-21 21:01 ` jakub at gcc dot gnu.org
2020-10-21 21:31 ` segher at gcc dot gnu.org
2021-06-03  1:50 ` pinskia at gcc dot gnu.org
2021-06-03  4:33 ` segher at gcc dot gnu.org
2021-06-03  7:00 ` joakim.tjernlund at infinera dot com
2021-06-03 19:32 ` segher at gcc dot gnu.org
2021-12-03 19:44 ` roger at nextmovesoftware dot com
2023-06-06 20:14 ` jakub at gcc dot gnu.org
2023-08-29 16:41 ` bergner at gcc dot gnu.org
2023-08-29 17:43 ` roger at nextmovesoftware dot com
2010-04-26 13:33 [Bug rtl-optimization/43892] New: " gcc-bugzilla at gcc dot gnu dot org
2010-05-21 17:42 ` [Bug target/43892] " segher at gcc dot gnu dot org
2010-05-25 21:42 ` joakim dot tjernlund at transmode dot se
2010-05-26 16:46 ` segher at kernel dot crashing dot org
2010-05-26 20:47 ` joakim dot tjernlund at transmode dot se
2010-05-27  1:37 ` dje at gcc dot gnu dot org
2010-05-27  7:33 ` joakim dot tjernlund at transmode dot se

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).