public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64
@ 2021-03-15  8:46 eggert at gnu dot org
  2021-09-01  5:30 ` [Bug target/99591] " pinskia at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: eggert at gnu dot org @ 2021-03-15  8:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

            Bug ID: 99591
           Summary: Improving __builtin_add_overflow performance on x86-64
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: eggert at gnu dot org
  Target Milestone: ---

This is with gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9) on x86-64. For the
function:

_Bool signed1_overflow (signed char a, signed char b)
{
  signed char r;
  return __builtin_add_overflow (a, b, &r);
}

gcc generates the code:

signed1_overflow:
        movsbl  %sil, %esi
        movsbl  %dil, %edi
        addb    %sil, %dil
        seto    %al
        ret

The movsbl instructions are unnecessary and can be omitted.


For the function:

_Bool signed2_overflow (short a, short b)
{
  short r;
  return __builtin_add_overflow (a, b, &r);
}

gcc generates:

signed2_overflow:
        movswl  %di, %edi
        movswl  %si, %esi
        xorl    %eax, %eax
        addw    %si, %di
        jo      .L8
.L6:
        andl    $1, %eax
        ret
.L8:
        movl    $1, %eax
        jmp     .L6

Better would be this:

signed2_overflow:
        addw    %si, %di
        seto    %al
        retq

There are similar opportunities for improvement in __builtin_sub_overflow and
__builtin_mul_overflow.

This bug report follows up on this discussion about Gnulib:

https://lists.gnu.org/r/bug-gnulib/2021-03/msg00078.html
https://lists.gnu.org/r/bug-gnulib/2021-03/msg00079.html
https://lists.gnu.org/r/bug-gnulib/2021-03/msg00080.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/99591] Improving __builtin_add_overflow performance on x86-64
  2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
@ 2021-09-01  5:30 ` pinskia at gcc dot gnu.org
  2021-09-01  5:32 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-01  5:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |10.3.0
      Known to work|                            |11.1.0

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Looks fixed for GCC 11+.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/99591] Improving __builtin_add_overflow performance on x86-64
  2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
  2021-09-01  5:30 ` [Bug target/99591] " pinskia at gcc dot gnu.org
@ 2021-09-01  5:32 ` pinskia at gcc dot gnu.org
  2021-09-01  7:44 ` eggert at cs dot ucla.edu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-01  5:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> Looks fixed for GCC 11+.

signed2_overflow(short, short):
.LFB0:
        .cfi_startproc
        addw    %si, %di
        seto    %al
        ret

signed1_overflow(signed char, signed char):
.LFB1:
        .cfi_startproc
        addb    %sil, %dil
        seto    %al
        ret

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/99591] Improving __builtin_add_overflow performance on x86-64
  2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
  2021-09-01  5:30 ` [Bug target/99591] " pinskia at gcc dot gnu.org
  2021-09-01  5:32 ` pinskia at gcc dot gnu.org
@ 2021-09-01  7:44 ` eggert at cs dot ucla.edu
  2021-09-01  7:57 ` [Bug c/99591] " pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: eggert at cs dot ucla.edu @ 2021-09-01  7:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

eggert at cs dot ucla.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eggert at cs dot ucla.edu

--- Comment #3 from eggert at cs dot ucla.edu ---
(In reply to Andrew Pinski from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > Looks fixed for GCC 11+.
It doesn't appear to be fixed in GCC 11.2.1 20210728 (Red Hat 11.2.1-1). For
signed1_overflow I get the same suboptimal machine code described in comment
#0. For signed2_overflow I get:

signed2_overflow:
        movswl  %si, %esi
        movswl  %di, %edi
        addw    %si, %di
        seto    %al
        ret

Although this is better than the machine code described in comment #0 it's
still clearly suboptimal, as the two movswl instructions are redundant and both
can be omitted.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/99591] Improving __builtin_add_overflow performance on x86-64
  2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
                   ` (2 preceding siblings ...)
  2021-09-01  7:44 ` eggert at cs dot ucla.edu
@ 2021-09-01  7:57 ` pinskia at gcc dot gnu.org
  2021-09-01  8:16 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-01  7:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-09-01
     Ever confirmed|0                           |1
          Component|target                      |c
             Status|UNCONFIRMED                 |NEW

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to eggert from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > Looks fixed for GCC 11+.
> It doesn't appear to be fixed in GCC 11.2.1 20210728 (Red Hat 11.2.1-1). For
> signed1_overflow I get the same suboptimal machine code described in comment
> #0. For signed2_overflow I get:

This is interesting, the C++ front-end is fine but the C front-end is not.

C front-end:
  return r = REALPART_EXPR <SAVE_EXPR <.ADD_OVERFLOW ((int) a, (int) b)>>;,
(_Bool) IMAGPART_EXPR <SAVE_EXPR <.ADD_OVERFLOW ((int) a, (int) b)>>;;

While the C++ frontend is:
  <<cleanup_point return <retval> = r = REALPART_EXPR <SAVE_EXPR <.ADD_OVERFLOW
(a, b)>>;, (bool) IMAGPART_EXPR <SAVE_EXPR <.ADD_OVERFLOW (a, b)>>;>>;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/99591] Improving __builtin_add_overflow performance on x86-64
  2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
                   ` (3 preceding siblings ...)
  2021-09-01  7:57 ` [Bug c/99591] " pinskia at gcc dot gnu.org
@ 2021-09-01  8:16 ` jakub at gcc dot gnu.org
  2021-09-01  8:53 ` [Bug tree-optimization/99591] " jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-09-01  8:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
But the user could have written:
int signed1_overflow (signed char a, signed char b)
{
  signed char r;
  return __builtin_add_overflow (a, b, &r);
}

int signed2_overflow (short a, short b)
{
  short r;
  return __builtin_add_overflow (a, b, &r);
}

int signed3_overflow (signed char a, signed char b)
{
  signed char r;
  return __builtin_add_overflow ((int) a, (int) b, &r);
}

int signed4_overflow (short a, short b)
{
  short r;
  return __builtin_add_overflow ((int) a, (int) b, &r);
}
and then the latter two functions behave the same in C and C++.

So, I think it would be better to optimize this at the RTL level (only when
we've decided what exact operation we are using), but then I think the problem
is that this kind of thing is optimized usually by combine which doesn't
trigger
as the registers have multiple uses:
(insn 9 6 10 2 (set (reg:SI 92)
        (sign_extend:SI (reg/v:QI 88 [ a ]))) "pr99591.c":16:33 151
{extendqisi2}
     (nil))
(insn 10 9 11 2 (set (reg:SI 93)
        (sign_extend:SI (reg/v:QI 90 [ b ]))) "pr99591.c":16:33 151
{extendqisi2}
     (nil))
(insn 11 10 12 2 (set (reg:QI 86 [ _6+1 ])
        (const_int 0 [0])) "pr99591.c":16:33 77 {*movqi_internal}
     (nil))
(insn 12 11 13 2 (parallel [
            (set (reg:CCO 17 flags)
                (eq:CCO (plus:HI (sign_extend:HI (subreg:QI (reg:SI 92) 0))
                        (sign_extend:HI (subreg:QI (reg:SI 93) 0)))
                    (sign_extend:HI (plus:QI (subreg:QI (reg:SI 92) 0)
                            (subreg:QI (reg:SI 93) 0)))))
            (set (reg:QI 94)
                (plus:QI (subreg:QI (reg:SI 92) 0)
                    (subreg:QI (reg:SI 93) 0)))
        ]) "pr99591.c":16:33 238 {*addvqi4}
     (nil))
all in the same insn, but still multiple uses.
Another option is some gimple optimization, see the arguments are promoted and
repeat part of the expand_arith_overflow analysis and demote the arguments if
possible.  Or maybe just demote always and let expand_arith_overflow promote
again if needed?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/99591] Improving __builtin_add_overflow performance on x86-64
  2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
                   ` (4 preceding siblings ...)
  2021-09-01  8:16 ` jakub at gcc dot gnu.org
@ 2021-09-01  8:53 ` jakub at gcc dot gnu.org
  2021-09-02  9:26 ` cvs-commit at gcc dot gnu.org
  2021-09-02  9:29 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-09-01  8:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51393
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51393&action=edit
gcc12-pr99591.patch

Untested fix.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/99591] Improving __builtin_add_overflow performance on x86-64
  2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
                   ` (5 preceding siblings ...)
  2021-09-01  8:53 ` [Bug tree-optimization/99591] " jakub at gcc dot gnu.org
@ 2021-09-02  9:26 ` cvs-commit at gcc dot gnu.org
  2021-09-02  9:29 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-02  9:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:2af6dd77ea742d4ee911f466878624972929508a

commit r12-3312-g2af6dd77ea742d4ee911f466878624972929508a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Sep 2 11:25:07 2021 +0200

    match.pd: Demote IFN_{ADD,SUB,MUL}_OVERFLOW operands [PR99591]

    The overflow builtins work on infinite precision integers and then convert
    to the result type's precision, so any argument promotions are useless.
    The expand_arith_overflow expansion is able to demote the arguments itself
    through get_range_pos_neg and get_min_precision calls and if needed promote
    to whatever mode it decides to perform the operations in, but if there are
    any promotions it demoted, those are already expanded.  Normally combine
    would remove the useless sign or zero extensions when it sees the result
    of those is only used in a lowpart subreg, but typically those lowpart
    subregs appear multiple times in the pattern so that they describe properly
    the overflow behavior and combine gives up, so we end up with e.g.
            movswl  %si, %esi
            movswl  %di, %edi
            imulw   %si, %di
            seto    %al
    where both movswl insns are useless.

    The following patch fixes it by demoting operands of the ifns (only gets
    rid of integral to integral conversions that increase precision).
    While IFN_{ADD,MUL}_OVERFLOW are commutative and just one simplify would be
    enough, IFN_SUB_OVERFLOW is not, therefore two simplifications.

    2021-09-02  Jakub Jelinek  <jakub@redhat.com>

            PR tree-optimization/99591
            * match.pd: Demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW if they
            were promoted.

            * gcc.target/i386/pr99591.c: New test.
            * gcc.target/i386/pr97950.c: Match or reject setb or jn?b
instructions
            together with seta or jn?a.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/99591] Improving __builtin_add_overflow performance on x86-64
  2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
                   ` (6 preceding siblings ...)
  2021-09-02  9:26 ` cvs-commit at gcc dot gnu.org
@ 2021-09-02  9:29 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-09-02  9:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for GCC 12.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-02  9:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15  8:46 [Bug target/99591] New: Improving __builtin_add_overflow performance on x86-64 eggert at gnu dot org
2021-09-01  5:30 ` [Bug target/99591] " pinskia at gcc dot gnu.org
2021-09-01  5:32 ` pinskia at gcc dot gnu.org
2021-09-01  7:44 ` eggert at cs dot ucla.edu
2021-09-01  7:57 ` [Bug c/99591] " pinskia at gcc dot gnu.org
2021-09-01  8:16 ` jakub at gcc dot gnu.org
2021-09-01  8:53 ` [Bug tree-optimization/99591] " jakub at gcc dot gnu.org
2021-09-02  9:26 ` cvs-commit at gcc dot gnu.org
2021-09-02  9:29 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).