public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N
@ 2023-02-10 22:26 bergner at gcc dot gnu.org
  2023-02-10 22:28 ` [Bug tree-optimization/108757] " bergner at gcc dot gnu.org
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: bergner at gcc dot gnu.org @ 2023-02-10 22:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

            Bug ID: 108757
           Summary: We do not simplify (a - (N*M)) / N + M -> a / N
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bergner at gcc dot gnu.org
  Target Milestone: ---

The Eigen project code has a missed optimization that basically simplifies down
to the following test case:

linux$ cat bug.c 
#define N 32
#define M 2

unsigned long int
foo (unsigned long int a)
{
  return (a - (N*M)) / N + M;
}

linux$ gcc -O2 -S bug.c 

linux$ cat bug.s 
foo:
        addi 3,3,-64
        srdi 3,3,5
        addi 3,3,2
        blr

We should be able to simplify this down to just 'a / N', which for power-of-2
N, results in just the srdi...although, I don't think N is required to be a
power-of-2 to fold this down to just 'a / N'.  Can we also simplify this for
non-constant N & M?  Maybe under -fast-math or something similar?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
@ 2023-02-10 22:28 ` bergner at gcc dot gnu.org
  2023-02-10 22:33 ` pinskia at gcc dot gnu.org
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bergner at gcc dot gnu.org @ 2023-02-10 22:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Peter Bergner <bergner at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-02-10
                 CC|                            |chip.kerchner at ibm dot com
             Target|                            |powerpc64le-linux
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Peter Bergner <bergner at gcc dot gnu.org> ---
Reported by our Eigen developers and confirmed by me.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
  2023-02-10 22:28 ` [Bug tree-optimization/108757] " bergner at gcc dot gnu.org
@ 2023-02-10 22:33 ` pinskia at gcc dot gnu.org
  2023-02-10 22:40 ` pinskia at gcc dot gnu.org
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-10 22:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
  2023-02-10 22:28 ` [Bug tree-optimization/108757] " bergner at gcc dot gnu.org
  2023-02-10 22:33 ` pinskia at gcc dot gnu.org
@ 2023-02-10 22:40 ` pinskia at gcc dot gnu.org
  2023-02-11 13:14 ` anlauf at gcc dot gnu.org
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-10 22:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I am not sure this can be done in the normal case unless you know the range of
a to be [64...INF] .
The wrap around case might be an issue ...
But I am not 100% sure.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-02-10 22:40 ` pinskia at gcc dot gnu.org
@ 2023-02-11 13:14 ` anlauf at gcc dot gnu.org
  2023-02-11 16:38 ` segher at gcc dot gnu.org
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: anlauf at gcc dot gnu.org @ 2023-02-11 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #3 from anlauf at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #2)
> I am not sure this can be done in the normal case unless you know the range
> of a to be [64...INF] .
> The wrap around case might be an issue ...
> But I am not 100% sure.

It is.  Just print foo(0).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-02-11 13:14 ` anlauf at gcc dot gnu.org
@ 2023-02-11 16:38 ` segher at gcc dot gnu.org
  2023-02-11 17:32 ` bergner at gcc dot gnu.org
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2023-02-11 16:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #4 from Segher Boessenkool <segher at gcc dot gnu.org> ---
If N is a power of two optimising this to a/N is valid, but for other values
of N it is not (division is not the inverse of multiplication in C).  It also
only works for unsigned of course.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-02-11 16:38 ` segher at gcc dot gnu.org
@ 2023-02-11 17:32 ` bergner at gcc dot gnu.org
  2023-02-11 17:55 ` segher at gcc dot gnu.org
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bergner at gcc dot gnu.org @ 2023-02-11 17:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #5 from Peter Bergner <bergner at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #4)
> If N is a power of two optimising this to a/N is valid, but for other values
> of N it is not (division is not the inverse of multiplication in C).  It also
> only works for unsigned of course.

Isn't this valid for any N & M, such that M is a factor of N?  Meaning, as long
as N / M has no remainder, it seems like this should be ok.  For example, N =
30 & M = 2 should be just as ok as the N = 32 & M = 2 case, correct?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-02-11 17:32 ` bergner at gcc dot gnu.org
@ 2023-02-11 17:55 ` segher at gcc dot gnu.org
  2023-02-11 18:05 ` bergner at gcc dot gnu.org
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2023-02-11 17:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #6 from Segher Boessenkool <segher at gcc dot gnu.org> ---
No?  Take a=59 as counterexample:

(a - (N*M)) / N + M = (59 - 2*30)/30 + 2 = ~0UL/30 + 2

but

a / N = 59/30 = 1

Integer division in C is division towards zero, almost no normal algebraic
simplifications apply there.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-02-11 17:55 ` segher at gcc dot gnu.org
@ 2023-02-11 18:05 ` bergner at gcc dot gnu.org
  2023-02-11 18:15 ` segher at gcc dot gnu.org
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bergner at gcc dot gnu.org @ 2023-02-11 18:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #7 from Peter Bergner <bergner at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #6)
> No?  Take a=59 as counterexample:
> 
> (a - (N*M)) / N + M = (59 - 2*30)/30 + 2 = ~0UL/30 + 2

For unsigned integers, isn't having a < N*M UB so we're free to do as we wish
for either the optimized and non-optimized sequences?  Meaning, can't we assume
here that a >= N*M?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-02-11 18:05 ` bergner at gcc dot gnu.org
@ 2023-02-11 18:15 ` segher at gcc dot gnu.org
  2023-02-11 19:45 ` chip.kerchner at ibm dot com
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: segher at gcc dot gnu.org @ 2023-02-11 18:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
No, addition and subtraction are well defined for all inputs, for unsigned
integers.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-02-11 18:15 ` segher at gcc dot gnu.org
@ 2023-02-11 19:45 ` chip.kerchner at ibm dot com
  2023-02-11 21:55 ` chip.kerchner at ibm dot com
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: chip.kerchner at ibm dot com @ 2023-02-11 19:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #9 from Chip Kerchner <chip.kerchner at ibm dot com> ---
Doesn't this work for powers of two (N) and signed values (for A, N and M)?

(59 - (33 * -2)) / -2 + 31 = -62 + 31 = -29

and

59 / -2 = -29

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-02-11 19:45 ` chip.kerchner at ibm dot com
@ 2023-02-11 21:55 ` chip.kerchner at ibm dot com
  2023-02-11 22:03 ` chip.kerchner at ibm dot com
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: chip.kerchner at ibm dot com @ 2023-02-11 21:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #10 from Chip Kerchner <chip.kerchner at ibm dot com> ---
Oops that should be 31 * -2, not 33.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-02-11 21:55 ` chip.kerchner at ibm dot com
@ 2023-02-11 22:03 ` chip.kerchner at ibm dot com
  2023-02-13 20:13 ` chip.kerchner at ibm dot com
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: chip.kerchner at ibm dot com @ 2023-02-11 22:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #11 from Chip Kerchner <chip.kerchner at ibm dot com> ---
Nevermind, using a similar example that Segher gave, it would failed too.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-02-11 22:03 ` chip.kerchner at ibm dot com
@ 2023-02-13 20:13 ` chip.kerchner at ibm dot com
  2023-02-13 20:20 ` pinskia at gcc dot gnu.org
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: chip.kerchner at ibm dot com @ 2023-02-13 20:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #12 from Chip Kerchner <chip.kerchner at ibm dot com> ---
Here is an example of the original problem

#define EIGEN_ALWAYS_INLINE __attribute__((always_inline)) inline

typedef __vector float Packet4f;
typedef size_t Index;

EIGEN_ALWAYS_INLINE Packet4f ploadu(const float* from)
{
  return vec_xl(0, const_cast<float*>(from));
}

EIGEN_ALWAYS_INLINE void pstoreu(float* to, const Packet4f &from)
{
  vec_xst(from, 0, to);
}

void convert(Index rows, float*src, float *result)
{
  for(Index i = 0; i + 4 <= rows; i+=4) {
    Packet4f r32_0 = ploadu(src + i +  0);
    pstoreu(result + i +  0, r32_0);
  }
}

And the output (with notation on the lines in question)

    cmpldi 0,3,3
    blelr 0
    addi 3,3,-4  <- i = rows - 4
    li 9,0
    srdi 3,3,2   <- i >>= 2
    addi 8,3,1   <- i = i + 1
    andi. 7,8,0x3
    mr 10,8
    beq 0,.L10
    cmpdi 0,7,1
    beq 0,.L14
    cmpdi 0,7,2
    beq 0,.L15
    lxv 0,0(4)
    mr 8,3
    li 9,16
    stxv 0,0(5)
.L15:
    lxvx 0,4,9
    addi 8,8,-1
    stxvx 0,5,9
    addi 9,9,16
.L14:
    lxvx 0,4,9
    cmpdi 0,8,1
    stxvx 0,5,9
    addi 9,9,16
    beqlr 0
.L10:
    srdi 10,10,2
    mtctr 10
.L3:
    lxvx 0,4,9
    addi 10,9,16
    addi 7,9,32
    addi 8,9,48
    stxvx 0,5,9
    lxvx 0,4,10
    addi 9,9,64
    stxvx 0,5,10
    lxvx 0,4,7
    stxvx 0,5,7
    lxvx 0,4,8
    stxvx 0,5,8
    bdnz .L3
    blr

In this case the 3 lines notated can be replaced a simple `srdi 8,3,2`

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-02-13 20:13 ` chip.kerchner at ibm dot com
@ 2023-02-13 20:20 ` pinskia at gcc dot gnu.org
  2023-02-13 20:27 ` [Bug rtl-optimization/108757] " pinskia at gcc dot gnu.org
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-13 20:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
IIRC this is a doloop issue and has been reported before. Maybe even by myself
while I was working at Sony. I think I tried to fix it too.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug rtl-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-02-13 20:20 ` pinskia at gcc dot gnu.org
@ 2023-02-13 20:27 ` pinskia at gcc dot gnu.org
  2023-02-13 20:36 ` chip.kerchner at ibm dot com
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-13 20:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=37451
          Component|tree-optimization           |rtl-optimization

--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Oh yes pr 37451 . There are a few other related ones referenced now too. But
basically it is doloop which is introducing this.
The original example does not correspond to the example in given by the eigen
folks either..

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug rtl-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-02-13 20:27 ` [Bug rtl-optimization/108757] " pinskia at gcc dot gnu.org
@ 2023-02-13 20:36 ` chip.kerchner at ibm dot com
  2023-02-13 20:37 ` chip.kerchner at ibm dot com
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: chip.kerchner at ibm dot com @ 2023-02-13 20:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #15 from Chip Kerchner <chip.kerchner at ibm dot com> ---
How about this (from Peter's testcase)?  Does it still have issues?  It
produces the same assembly.

#define N 32
#define M 2

unsigned long int
foo (unsigned long int a)
{
  return (a - (N*M)) / N + M;
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug rtl-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-02-13 20:36 ` chip.kerchner at ibm dot com
@ 2023-02-13 20:37 ` chip.kerchner at ibm dot com
  2023-02-13 20:58 ` [Bug tree-optimization/108757] " pinskia at gcc dot gnu.org
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: chip.kerchner at ibm dot com @ 2023-02-13 20:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #16 from Chip Kerchner <chip.kerchner at ibm dot com> ---
Dang copy and paste issue...  This is what I meant.

unsigned long int
foo (unsigned long int a)
{
  return (a + (N*M)) / N - M;
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2023-02-13 20:37 ` chip.kerchner at ibm dot com
@ 2023-02-13 20:58 ` pinskia at gcc dot gnu.org
  2023-02-13 20:59 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-13 20:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |tree-optimization

--- Comment #17 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
  _13 = rows_8(D) + 18446744073709551612;
  _15 = _13 >> 2;
  doloop.8_5 = _15 + 1;


This is what IV-OPTS produces.

Reduced testcase:
typedef __SIZE_TYPE__ size_t;

void convert(size_t rows, float*src, float *result)
{
  for(size_t i = 0; i + 4 <= rows; i+=4) {
    float t = src[i];
    result[i] = t;
  }
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2023-02-13 20:58 ` [Bug tree-optimization/108757] " pinskia at gcc dot gnu.org
@ 2023-02-13 20:59 ` pinskia at gcc dot gnu.org
  2023-02-13 21:01 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-13 20:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|powerpc64le-linux           |powerpc64le-linux,
                   |                            |aarch64-*-*

--- Comment #18 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note this shows up on aarch64 too with my reduced testcase.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2023-02-13 20:59 ` pinskia at gcc dot gnu.org
@ 2023-02-13 21:01 ` pinskia at gcc dot gnu.org
  2023-05-11 13:15 ` guojiufu at gcc dot gnu.org
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-13 21:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #19 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note in the loop case we know it does not wrap because there is a check
already:
  <bb 2> [local count: 118111600]:
  if (rows_8(D) > 3)
    goto <bb 5>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 5> [local count: 105119324]:
  _13 = rows_8(D) + 18446744073709551612;
  _15 = _13 / 4;
  doloop.6_5 = _15 + 1;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2023-02-13 21:01 ` pinskia at gcc dot gnu.org
@ 2023-05-11 13:15 ` guojiufu at gcc dot gnu.org
  2023-05-11 17:32 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2023-05-11 13:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |guojiufu at gcc dot gnu.org

--- Comment #20 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #19)
> Note in the loop case we know it does not wrap because there is a check
> already:
>   <bb 2> [local count: 118111600]:
>   if (rows_8(D) > 3)
>     goto <bb 5>; [89.00%]
>   else
>     goto <bb 4>; [11.00%]
> 
>   <bb 5> [local count: 105119324]:
>   _13 = rows_8(D) + 18446744073709551612;
>   _15 = _13 / 4;
>   doloop.6_5 = _15 + 1;

Checking why below code does not work:
/* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N */
(for div (trunc_div round_div)
 (simplify
  (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
  (if (ANY_INTEGRAL_TYPE_P (type) &&
       (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
   (if (TYPE_OVERFLOW_UNDEFINED (type))
    (div @0 @2)
#if GIMPLE
    (with
     {
       bool overflowed = true;
       value_range vr0;
       if (INTEGRAL_TYPE_P (type)
           && get_global_range_query ()->range_of_expr (vr0, @0)
           && !vr0.varying_p () && !vr0.undefined_p ())
         {
           wide_int wmin0 = vr0.lower_bound ();
           wide_int wmax0 = vr0.upper_bound ();
           wide_int w1 = -wi::to_wide (@1);
           wi::overflow_type min_ovf, max_ovf;
           wi::add (wmin0, w1, TYPE_SIGN (type), &min_ovf);
           wi::add (wmax0, w1, TYPE_SIGN (type), &max_ovf);
           if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
             overflowed = false;
         }
     }
    (if (!overflowed)
     (div @0 @2)))
#endif
   ))))


Interesting thing:
the VR is always VR_VARYING, even for the below simple case:

typedef unsigned long INT;
INT __attribute__ ((noinline)) foo (INT x)
{
  if (x < 4)
    return 0;
  INT a = x + 18446744073709551612ULL;
  INT b = a >> 2;
  return b + 1;
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2023-05-11 13:15 ` guojiufu at gcc dot gnu.org
@ 2023-05-11 17:32 ` pinskia at gcc dot gnu.org
  2023-05-12  1:52 ` guojiufu at gcc dot gnu.org
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-11 17:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #21 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #20)
> Interesting thing:
> the VR is always VR_VARYING, even for the below simple case:
> 
> typedef unsigned long INT;
> INT __attribute__ ((noinline)) foo (INT x)
> {
>   if (x < 4)
>     return 0;
>   INT a = x + 18446744073709551612ULL;
>   INT b = a >> 2;
>   return b + 1;
> }

Yes that is because x does not have a "global" range.

You could try the following testcase:
```
typedef unsigned long INT;
INT __attribute__ ((noinline)) foo (INT x)
{
  if (x < 4)
    __builtin_unreachable();
  INT a = x + 18446744073709551612ULL;
  INT b = a >> 2;
  return b + 1;
}
```

Which gets a (global) range for x_1(D) during forwprop3:
  # RANGE [irange] INT [4, +INF]
  INTD.2750 x_1(D) = xD.2751;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2023-05-11 17:32 ` pinskia at gcc dot gnu.org
@ 2023-05-12  1:52 ` guojiufu at gcc dot gnu.org
  2023-05-12  6:25 ` guojiufu at gcc dot gnu.org
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2023-05-12  1:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #22 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #21)
> (In reply to Jiu Fu Guo from comment #20)
> > Interesting thing:
> > the VR is always VR_VARYING, even for the below simple case:
> > 
> > typedef unsigned long INT;
> > INT __attribute__ ((noinline)) foo (INT x)
> > {
> >   if (x < 4)
> >     return 0;
> >   INT a = x + 18446744073709551612ULL;
> >   INT b = a >> 2;
> >   return b + 1;
> > }
> 
> Yes that is because x does not have a "global" range.

I also tried "get_range_query (cfun)->range_of_expr (vr0, @0)", 
> 
> You could try the following testcase:
> ```
> typedef unsigned long INT;
> INT __attribute__ ((noinline)) foo (INT x)
> {
>   if (x < 4)
>     __builtin_unreachable();
>   INT a = x + 18446744073709551612ULL;
>   INT b = a >> 2;
>   return b + 1;
> }
> ```
> 
> Which gets a (global) range for x_1(D) during forwprop3:
>   # RANGE [irange] INT [4, +INF]
>   INTD.2750 x_1(D) = xD.2751;

(In reply to Andrew Pinski from comment #21)
> (In reply to Jiu Fu Guo from comment #20)
> > Interesting thing:
> > the VR is always VR_VARYING, even for the below simple case:
> > 
> > typedef unsigned long INT;
> > INT __attribute__ ((noinline)) foo (INT x)
> > {
> >   if (x < 4)
> >     return 0;
> >   INT a = x + 18446744073709551612ULL;
> >   INT b = a >> 2;
> >   return b + 1;
> > }
> 
> Yes that is because x does not have a "global" range.
> 
> You could try the following testcase:
> ```
> typedef unsigned long INT;
> INT __attribute__ ((noinline)) foo (INT x)
> {
>   if (x < 4)
>     __builtin_unreachable();
>   INT a = x + 18446744073709551612ULL;
>   INT b = a >> 2;
>   return b + 1;
> }
> ```
> 
> Which gets a (global) range for x_1(D) during forwprop3:
>   # RANGE [irange] INT [4, +INF]
>   INTD.2750 x_1(D) = xD.2751;

Thanks so much!
"get_range_query (cfun)->range_of_expr (vr0, @0)" works for both the case!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2023-05-12  1:52 ` guojiufu at gcc dot gnu.org
@ 2023-05-12  6:25 ` guojiufu at gcc dot gnu.org
  2023-05-12  9:35 ` guojiufu at gcc dot gnu.org
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2023-05-12  6:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #23 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
/* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N */
(for div (trunc_div exact_div)
 (simplify
  (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) &&
       (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
   (if (TYPE_OVERFLOW_UNDEFINED (type))
    (div @0 @2)
#if GIMPLE
    (with
     {
       bool overflowed = true;
       value_range vr0;
       if (get_range_query (cfun)->range_of_expr (vr0, @0)
           && !vr0.varying_p () && !vr0.undefined_p ())
         {
           wide_int wmin0 = vr0.lower_bound ();
           wide_int wmax0 = vr0.upper_bound ();
           wide_int w1 = -wi::to_wide (@1);
           wi::overflow_type min_ovf, max_ovf;
           wi::sub (wmin0, w1, TYPE_SIGN (type), &min_ovf);
           wi::sub (wmax0, w1, TYPE_SIGN (type), &max_ovf);
           if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
             overflowed = false;
         }
     }
    (if (!overflowed)
     (rshift @0 @2)))
#endif
   ))))

Got one match for the case.
Checking if it is safe(condition) or how to support other forms:
signed type, negative N, non-power2 N, negative M ...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2023-05-12  6:25 ` guojiufu at gcc dot gnu.org
@ 2023-05-12  9:35 ` guojiufu at gcc dot gnu.org
  2023-09-04  2:46 ` cvs-commit at gcc dot gnu.org
  2023-09-04  3:26 ` guojiufu at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2023-05-12  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #24 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #23)
> /* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N
> */
> (for div (trunc_div exact_div)
div was not used in this matcher, yet.  Here rshift is used:  t/(1<<N) could be
optimized to "t>>N".

>  (simplify
>   (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) &&
>        (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
>    (if (TYPE_OVERFLOW_UNDEFINED (type))
>     (div @0 @2)
This should be "(rshift @0 @2)", otherwise it will be error if relax
"TYPE_UNSIGNED (type)"

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2023-05-12  9:35 ` guojiufu at gcc dot gnu.org
@ 2023-09-04  2:46 ` cvs-commit at gcc dot gnu.org
  2023-09-04  3:26 ` guojiufu at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-09-04  2:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #25 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jiu Fu Guo <guojiufu@gcc.gnu.org>:

https://gcc.gnu.org/g:1aceceb1e2d6e86ce183c8cc448750fa03b6f79e

commit r14-3644-g1aceceb1e2d6e86ce183c8cc448750fa03b6f79e
Author: Jiufu Guo <guojiufu@linux.ibm.com>
Date:   Mon Sep 4 10:31:04 2023 +0800

    Optimize '(X - N * M) / N' to 'X / N - M' if valid

    Integer expression "(X - N * M) / N" can be optimized to "X / N - M" with
    the below conditions:
    1. There is no wrap/overflow/underflow.
       wrap/overflow/underflow breaks the arithmetic operation.
    2. "X - N * M" and "X" are not of opposite sign.
       Here, the operation "/" would be "trunc_div", the fractional part is
       discarded. If "X - N * M" and "X" are in different signs, then trunc_div
       discards the fractional parts (of /N) in different directions.

            PR tree-optimization/108757

    gcc/ChangeLog:

            * match.pd ((X - N * M) / N): New pattern.
            ((X + N * M) / N): New pattern.
            ((X + C) div_rshift N): New pattern.

    gcc/testsuite/ChangeLog:

            * gcc.dg/pr108757-1.c: New test.
            * gcc.dg/pr108757-2.c: New test.
            * gcc.dg/pr108757.h: New test.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
  2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2023-09-04  2:46 ` cvs-commit at gcc dot gnu.org
@ 2023-09-04  3:26 ` guojiufu at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2023-09-04  3:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #26 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Patch was committed.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-09-04  3:26 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-10 22:26 [Bug tree-optimization/108757] New: We do not simplify (a - (N*M)) / N + M -> a / N bergner at gcc dot gnu.org
2023-02-10 22:28 ` [Bug tree-optimization/108757] " bergner at gcc dot gnu.org
2023-02-10 22:33 ` pinskia at gcc dot gnu.org
2023-02-10 22:40 ` pinskia at gcc dot gnu.org
2023-02-11 13:14 ` anlauf at gcc dot gnu.org
2023-02-11 16:38 ` segher at gcc dot gnu.org
2023-02-11 17:32 ` bergner at gcc dot gnu.org
2023-02-11 17:55 ` segher at gcc dot gnu.org
2023-02-11 18:05 ` bergner at gcc dot gnu.org
2023-02-11 18:15 ` segher at gcc dot gnu.org
2023-02-11 19:45 ` chip.kerchner at ibm dot com
2023-02-11 21:55 ` chip.kerchner at ibm dot com
2023-02-11 22:03 ` chip.kerchner at ibm dot com
2023-02-13 20:13 ` chip.kerchner at ibm dot com
2023-02-13 20:20 ` pinskia at gcc dot gnu.org
2023-02-13 20:27 ` [Bug rtl-optimization/108757] " pinskia at gcc dot gnu.org
2023-02-13 20:36 ` chip.kerchner at ibm dot com
2023-02-13 20:37 ` chip.kerchner at ibm dot com
2023-02-13 20:58 ` [Bug tree-optimization/108757] " pinskia at gcc dot gnu.org
2023-02-13 20:59 ` pinskia at gcc dot gnu.org
2023-02-13 21:01 ` pinskia at gcc dot gnu.org
2023-05-11 13:15 ` guojiufu at gcc dot gnu.org
2023-05-11 17:32 ` pinskia at gcc dot gnu.org
2023-05-12  1:52 ` guojiufu at gcc dot gnu.org
2023-05-12  6:25 ` guojiufu at gcc dot gnu.org
2023-05-12  9:35 ` guojiufu at gcc dot gnu.org
2023-09-04  2:46 ` cvs-commit at gcc dot gnu.org
2023-09-04  3:26 ` guojiufu at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).