public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
@ 2024-06-19 15:15 burnus at gcc dot gnu.org
  2024-06-20  1:58 ` [Bug middle-end/115551] " xry111 at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-06-19 15:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

            Bug ID: 115551
           Summary: [missed optimization] "c1 << (a + c2)" not optimized
                    into "(c1 << c2) << a"
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: pinskia at gcc dot gnu.org
  Target Milestone: ---

Created attachment 58468
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58468&action=edit
patch to show how to get a nice output – but doesn't actually use it. Not to be
used..

"c1 << (a + c2)" not optimized into "(c1 << c2) << a"

Example:

int f(int ch) {
  unsigned long mask1 = ((((1UL))) << (1 + 4 * ((1) - 1))) << (ch * 4); 
  unsigned long mask2 = ((((1UL))) << (1 + 4 * ((ch + 1) - 1)));
  return mask1-mask2;
}

GCC converts this currently to:

mask1 = 2 << (ch * 4)
mask2 = 1 << (ch * 4 + 1)

* * *

Related to
https://lore.kernel.org/lkml/d7ef7a6158df4ba6687233b0e00d37796b069fb3.1718791090.git.u.kleine-koenig@baylibre.com/

Result: 
* With the 2nd form the resulting binary gets ~25% smaller
* Saving nearly 500 bytes!

* * *

On ARM, the generated code for mask1 is:

lsls    r0, r0, #2
movs    r3, #2
lsl.w   r0, r3, r0

and for mask2:

lsls    r0, r0, #2
adds    r0, #1  // additional 'adds' instruction
movs    r3, #1
lsl.w   r0, r3, r0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
  2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
@ 2024-06-20  1:58 ` xry111 at gcc dot gnu.org
  2024-06-20  6:57 ` burnus at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: xry111 at gcc dot gnu.org @ 2024-06-20  1:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xry111 at gcc dot gnu.org

--- Comment #1 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
"c1 << (-5 + 5)" is fine but "(c1 << -5) << 5" invokes undefined behavior. 
Thus we need some range info to do this optimization.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
  2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
  2024-06-20  1:58 ` [Bug middle-end/115551] " xry111 at gcc dot gnu.org
@ 2024-06-20  6:57 ` burnus at gcc dot gnu.org
  2024-06-20  7:21 ` burnus at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-06-20  6:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
> Thus we need some range info to do this optimization.

Good point.

It seems as if for c1 << (c2 * a + c3),  C requires a >= -c3/c2 (read as float
division; c2 ≠ 0)

And the suggested optimization requires c2*a >= 0 and c3 >= 0 to fulfill C
requirement of nonnegative shifts.

Thus, this is fulfilled for any value of 'a' if c3 >= 0 and abs(c2) > c3.


The optimization can also be done for any value of 'a', if the hardware
supports c1 << (negative value)  (as right shift, fillung with zeros) and
popcount(c1) == popcount(c1 << c3).


The first condition is fulfilled in this example.

I don't know about the second, but observed that Clang/LLVM optimizes the diff
mask1-mask2 to 0 on ARM but not x86_64 (not checked why nor whether ARM handles
negative shifts in a well-defined way or not).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
  2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
  2024-06-20  1:58 ` [Bug middle-end/115551] " xry111 at gcc dot gnu.org
  2024-06-20  6:57 ` burnus at gcc dot gnu.org
@ 2024-06-20  7:21 ` burnus at gcc dot gnu.org
  2024-06-20  7:32 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-06-20  7:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> ---
As we want to have a >= 0, I tried to convey it differently for the example in
comment 0:

(a) __attribute__((assume(ch >= 0)));
(b) 'unsigned ch' (instead of 'int ch')

but it didn't help.  Thus, it looks as if it could be implemented for:
  c1 << (a + c2)
if (a >= 0 and c2 >= 0) doing then the conversion
  (c1 << c2) << a

Whether Ranger handles abs(c2) > c3 >= 0 → a >= 0 for 'c1 << (a*c2 + c3)', I
don't know, but the variant above with unsigned and 'assume(a >= 0)' should be
implementable and seems to make sense.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
  2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-06-20  7:21 ` burnus at gcc dot gnu.org
@ 2024-06-20  7:32 ` rguenth at gcc dot gnu.org
  2024-06-20  7:55 ` jakub at gcc dot gnu.org
  2024-06-20  7:58 ` burnus at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-20  7:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
all same for right-shifts (both logical and arithmetic).

Note that 1 << (a + 5) might be cheaper than (1<<5) << a due to constraints
on immediates but for GIMPLE the latter is definitely more canonical.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
  2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-06-20  7:32 ` rguenth at gcc dot gnu.org
@ 2024-06-20  7:55 ` jakub at gcc dot gnu.org
  2024-06-20  7:58 ` burnus at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-06-20  7:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We should also verify that it doesn't stand in the way of shift sanitization,
because
unsigned int a;
...
1 << (5 + a);
is well defined only for a in [0, 26] while once we optimize it to
(1 << 5) << a;
that would be well defined for a in [0, 31].  I think the shift sanitization is
done early, so just something to be verified in a testcase next to the patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
  2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-06-20  7:55 ` jakub at gcc dot gnu.org
@ 2024-06-20  7:58 ` burnus at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-06-20  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #6 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Crossref: New Bug 115555 is for the range analysis to deduce from 'x << a' that
'a' must be nonnegative.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-20  7:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
2024-06-20  1:58 ` [Bug middle-end/115551] " xry111 at gcc dot gnu.org
2024-06-20  6:57 ` burnus at gcc dot gnu.org
2024-06-20  7:21 ` burnus at gcc dot gnu.org
2024-06-20  7:32 ` rguenth at gcc dot gnu.org
2024-06-20  7:55 ` jakub at gcc dot gnu.org
2024-06-20  7:58 ` burnus at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).