public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue
@ 2015-10-27  5:07 eggert at gnu dot org
  2015-10-27  5:15 ` [Bug target/68110] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: eggert at gnu dot org @ 2015-10-27  5:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68110

            Bug ID: 68110
           Summary: __builtin_sub_overflow unsigned performance issue
           Product: gcc
           Version: 5.2.1
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: eggert at gnu dot org
  Target Milestone: ---

I ran into this minor performance issue when changing Gnulib's lib/intprops.h
to use the new __builtin_sub_overflow function. I found that
__builtin_sub_overflow is less efficient than the portable C code that it
replaced, when the operands and result are unsigned, or unsigned long, or
unsigned long long. To reproduce the problem, compile and run the following
program with 'gcc -O2 -S' on x86-64:

  _Bool f1 (unsigned long long a, unsigned long long b)
  {
    return a < b;
  }

  _Bool f2 (unsigned long long a, unsigned long long b)
  {
    unsigned long long r;
    return __builtin_sub_overflow (a, b, &r);
  }

Although the functions are semantically equivalent, f1 uses only 3
instructions:

        f1:
                cmpq    %rsi, %rdi
                setb    %al
                ret

whereas f2 uses 5 instructions:

        f2:
                movq    %rdi, %rax
                subq    %rsi, %rax
                cmpq    %rdi, %rax
                seta    %al
                ret

There is a similar problem for x86.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/68110] __builtin_sub_overflow unsigned performance issue
  2015-10-27  5:07 [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue eggert at gnu dot org
@ 2015-10-27  5:15 ` pinskia at gcc dot gnu.org
  2015-10-27  5:19 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-10-27  5:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68110

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64*-* i686-*-*
          Component|c                           |target

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Looks like the unused sub is not being removed.
Normally when someone uses __builtin_sub_overflow, the sub will not be unused.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/68110] __builtin_sub_overflow unsigned performance issue
  2015-10-27  5:07 [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue eggert at gnu dot org
  2015-10-27  5:15 ` [Bug target/68110] " pinskia at gcc dot gnu.org
@ 2015-10-27  5:19 ` pinskia at gcc dot gnu.org
  2015-10-27  6:34 ` eggert at gnu dot org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-10-27  5:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68110

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So the question is does anyone use this function without "a - b" later on?  If
not then it is just a microbenchmark of this code is showing the regression and
I would say the microbenchmark is wrong.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/68110] __builtin_sub_overflow unsigned performance issue
  2015-10-27  5:07 [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue eggert at gnu dot org
  2015-10-27  5:15 ` [Bug target/68110] " pinskia at gcc dot gnu.org
  2015-10-27  5:19 ` pinskia at gcc dot gnu.org
@ 2015-10-27  6:34 ` eggert at gnu dot org
  2015-10-27  9:46 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: eggert at gnu dot org @ 2015-10-27  6:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68110

--- Comment #3 from Paul Eggert <eggert at gnu dot org> ---
(In reply to Andrew Pinski from comment #2)
> So the question is does anyone use this function without "a - b" later on? 

Not that I know of. The usual pattern for callers of the Gnulib macro is to use
the macro to check whether there's overflow, and then to subtract if the result
wouldn't overflow. So you're correct that my example is a microbenchmark and is
not a good representative.

That being said, and I'm just thinking aloud here, it may be helpful for GCC to
try "a < b" as an alternative way to test for unsigned subtraction overflow, or
conversely to have GCC use __builtin_sub_overflow as an alternative way to
compute "a < b". Here's a slightly-more-realistic example:

  unsigned long
  sub1 (unsigned long a, unsigned long b)
  {
    if (a < b)
      return b - a;
    else
      return a - b;
  }

  unsigned long
  sub2 (unsigned long a, unsigned long b)
  {
    unsigned long c;
    if (__builtin_sub_overflow (a, b, &c))
      return b - a;
    else
      return c;
  }

On x86-64, gcc -O2 -S generates this:

  sub1:
          movq    %rsi, %rdx
          movq    %rdi, %rax
          subq    %rdi, %rdx
          subq    %rsi, %rax
          cmpq    %rsi, %rdi
          cmovb   %rdx, %rax
          ret

  sub2:
          movq    %rdi, %rax
          subq    %rsi, %rax
          cmpq    %rdi, %rax
          ja      .L13
          rep ret
  .L13:
          movq    %rsi, %rax
          subq    %rdi, %rax
          ret

The two functions have the same behavior. Presumably one implementation is
better than the other, and could be used for both sub1 and sub2.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/68110] __builtin_sub_overflow unsigned performance issue
  2015-10-27  5:07 [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue eggert at gnu dot org
                   ` (2 preceding siblings ...)
  2015-10-27  6:34 ` eggert at gnu dot org
@ 2015-10-27  9:46 ` rguenth at gcc dot gnu.org
  2021-08-21 22:33 ` [Bug tree-optimization/68110] " pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-27  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68110

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
value-numbering would need to special-case them via the insertion trick it does
for conversions.  somehow.  not sure if feasible or worthwhile.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/68110] __builtin_sub_overflow unsigned performance issue
  2015-10-27  5:07 [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue eggert at gnu dot org
                   ` (3 preceding siblings ...)
  2015-10-27  9:46 ` rguenth at gcc dot gnu.org
@ 2021-08-21 22:33 ` pinskia at gcc dot gnu.org
  2021-08-21 22:34 ` pinskia at gcc dot gnu.org
  2023-11-13  3:05 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-21 22:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68110

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-08-21

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The testcase is fixed on rtl level since gcc 6 for x86_64.

For aarch64, it was fixed on the rtl level since GCC 9.

On the gimple level we still have:
  _4 = .SUB_OVERFLOW (a_2(D), b_3(D));
  _1 = IMAGPART_EXPR <_4>;

Maybe a simple match.pd pattern is needed:
(if (TYPE_SIGN (type) == UNSIGNED)
 (simplify
  (imagpart (sub_overflow:s @0 @1))
  (convert (le @0 @1))))

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/68110] __builtin_sub_overflow unsigned performance issue
  2015-10-27  5:07 [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue eggert at gnu dot org
                   ` (4 preceding siblings ...)
  2021-08-21 22:33 ` [Bug tree-optimization/68110] " pinskia at gcc dot gnu.org
@ 2021-08-21 22:34 ` pinskia at gcc dot gnu.org
  2023-11-13  3:05 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-21 22:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68110

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #4)
> but since we don't
> have any pass to generate __builtin_sub_overflow from a-b and a<b and don't
> CSE __builtin_sub_overflow with either of those, I am a bit wary of dropping
> the builtin too early.

That changed in GCC 10 IIRC.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/68110] __builtin_sub_overflow unsigned performance issue
  2015-10-27  5:07 [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue eggert at gnu dot org
                   ` (5 preceding siblings ...)
  2021-08-21 22:34 ` pinskia at gcc dot gnu.org
@ 2023-11-13  3:05 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-13  3:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68110

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Mine to finish up later on.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-11-13  3:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27  5:07 [Bug c/68110] New: __builtin_sub_overflow unsigned performance issue eggert at gnu dot org
2015-10-27  5:15 ` [Bug target/68110] " pinskia at gcc dot gnu.org
2015-10-27  5:19 ` pinskia at gcc dot gnu.org
2015-10-27  6:34 ` eggert at gnu dot org
2015-10-27  9:46 ` rguenth at gcc dot gnu.org
2021-08-21 22:33 ` [Bug tree-optimization/68110] " pinskia at gcc dot gnu.org
2021-08-21 22:34 ` pinskia at gcc dot gnu.org
2023-11-13  3:05 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).