public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/104357] New: [Aarch64] Failure to use csinv instead of mvn+csel where possible
@ 2022-02-02 23:40 gabravier at gmail dot com
  2022-02-02 23:51 ` [Bug tree-optimization/104357] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: gabravier at gmail dot com @ 2022-02-02 23:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104357

            Bug ID: 104357
           Summary: [Aarch64] Failure to use csinv instead of mvn+csel
                    where possible
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

unsigned char stbi__clamp(int x)
{
   if ((unsigned)x > 255) {
      if (x < 0) return 0;
      if (x > 255) return 255;
   }
   return x;
}

With -O3, GCC outputs this (on aarch64):

stbi__clamp(int):
  mvn w1, w0
  cmp w0, 256
  and w0, w0, 255
  asr w1, w1, 31
  and w1, w1, 255
  csel w0, w0, w1, cc
  ret

LLVM instead outputs this:

stbi__clamp(int):
  asr w8, w0, #31
  cmp w0, #255
  csinv w0, w0, w8, ls
  ret

I don't know if the `and`s are there because of ABI differences, but it seems
to me like the `mvn` can definitely be replaced by using `csinv` instead of
`csel`.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/104357] [Aarch64] Failure to use csinv instead of mvn+csel where possible
  2022-02-02 23:40 [Bug target/104357] New: [Aarch64] Failure to use csinv instead of mvn+csel where possible gabravier at gmail dot com
@ 2022-02-02 23:51 ` pinskia at gcc dot gnu.org
  2022-02-03  0:02 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-02 23:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104357

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-02-02
          Component|target                      |tree-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This will get GCC closer to what clang/LLVM produces:
unsigned char stbi__clamp(int x)
{
   int t = x;
   if ((unsigned)x > 255) {
      if (x < 0) t =  0;
      else if (x > 255) t =  -1;
   }
   return t;
}

---- CUT ----
The zero-extends are due to the cast not being outside of the csel and the RTL
level is not really good at cross bb optimizations.
The gimple level looks like:
  <bb 2> [local count: 1073741824]:
  x.0_1 = (unsigned int) x_3(D);
  if (x.0_1 > 255)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  _7 = x_3(D) >= 0;
  _6 = (unsigned char) _7;
  _8 = -_6;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 536870913]:
  _4 = (unsigned char) x_3(D);

  <bb 5> [local count: 1073741824]:
  # _2 = PHI <_8(3), _4(4)>
  return _2;

Which in theory could be improved to the what I gave above.
The gimple level has no knowledge of the rtl/target level that to do - in
unsigned, you need to a zero extend still.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/104357] [Aarch64] Failure to use csinv instead of mvn+csel where possible
  2022-02-02 23:40 [Bug target/104357] New: [Aarch64] Failure to use csinv instead of mvn+csel where possible gabravier at gmail dot com
  2022-02-02 23:51 ` [Bug tree-optimization/104357] " pinskia at gcc dot gnu.org
@ 2022-02-03  0:02 ` pinskia at gcc dot gnu.org
  2022-02-03  0:05 ` [Bug tree-optimization/104357] " pinskia at gcc dot gnu.org
  2023-05-04  5:37 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-03  0:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104357

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
One thing I should note:
  _7 = x_3(D) >= 0;
  _6 = (unsigned char) _7;
  _8 = -_6;

Should be done on the gimple level as:
t = x_3(D) >> (sizeof(x_3(D))*8 - 1)
_8 = (unsigned char)t;

And then we can factor out the cast and I think it will produce the same code.

And yes it does, that is:
unsigned char stbi__clamp(int x)
{
   int t = x;
   if ((unsigned)x > 255) {
      t = x >> 31;
   }
   return t;
}

So Mine for GCC 13.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/104357] Failure to use csinv instead of mvn+csel where possible
  2022-02-02 23:40 [Bug target/104357] New: [Aarch64] Failure to use csinv instead of mvn+csel where possible gabravier at gmail dot com
  2022-02-02 23:51 ` [Bug tree-optimization/104357] " pinskia at gcc dot gnu.org
  2022-02-03  0:02 ` pinskia at gcc dot gnu.org
@ 2022-02-03  0:05 ` pinskia at gcc dot gnu.org
  2023-05-04  5:37 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-03  0:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104357

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[Aarch64] Failure to use    |Failure to use csinv
                   |csinv instead of mvn+csel   |instead of mvn+csel where
                   |where possible              |possible

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Oh by the way it looks like LLVM does not do a good job on x86 either. But with
my idea, GCC will do better than LLVM on x86 even.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/104357] Failure to use csinv instead of mvn+csel where possible
  2022-02-02 23:40 [Bug target/104357] New: [Aarch64] Failure to use csinv instead of mvn+csel where possible gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2022-02-03  0:05 ` [Bug tree-optimization/104357] " pinskia at gcc dot gnu.org
@ 2023-05-04  5:37 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-04  5:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104357

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> One thing I should note:
>   _7 = x_3(D) >= 0;
>   _6 = (unsigned char) _7;
>   _8 = -_6;
> 
> Should be done on the gimple level as:
> t = x_3(D) >> (sizeof(x_3(D))*8 - 1)
> _8 = (unsigned char)t;

Actually it is:
  _2 = x_3(D) >> 31;
  _5 = ~_2;
  _8 = (unsigned char) _5;

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-04  5:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-02 23:40 [Bug target/104357] New: [Aarch64] Failure to use csinv instead of mvn+csel where possible gabravier at gmail dot com
2022-02-02 23:51 ` [Bug tree-optimization/104357] " pinskia at gcc dot gnu.org
2022-02-03  0:02 ` pinskia at gcc dot gnu.org
2022-02-03  0:05 ` [Bug tree-optimization/104357] " pinskia at gcc dot gnu.org
2023-05-04  5:37 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).