* [Bug tree-optimization/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
@ 2023-04-03 18:07 ` pinskia at gcc dot gnu.org
2023-04-04 12:12 ` manolis.tsamis at vrull dot eu
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-03 18:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note sometimes -fwrapv will optimize things because it can assume that overflow
is defined as wrapping and this is one case that is true. Yes it sounds counter
intuitive but it is true. Even re-association happens more with -fwrapv :).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
2023-04-03 18:07 ` [Bug tree-optimization/109393] " pinskia at gcc dot gnu.org
@ 2023-04-04 12:12 ` manolis.tsamis at vrull dot eu
2023-04-11 12:16 ` [Bug c/109393] " rguenth at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-04-04 12:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
--- Comment #2 from manolis.tsamis at vrull dot eu ---
(In reply to Andrew Pinski from comment #1)
> Note sometimes -fwrapv will optimize things because it can assume that
> overflow is defined as wrapping and this is one case that is true. Yes it
> sounds counter intuitive but it is true. Even re-association happens more
> with -fwrapv :).
Yes it is truly counter intuitive :)
Yet, leaving this wrapv behaviour aside, isn't this something that should be
always optimized regardless of wrap/overflow flags?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
2023-04-03 18:07 ` [Bug tree-optimization/109393] " pinskia at gcc dot gnu.org
2023-04-04 12:12 ` manolis.tsamis at vrull dot eu
@ 2023-04-11 12:16 ` rguenth at gcc dot gnu.org
2023-05-10 12:55 ` manolis.tsamis at vrull dot eu
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-11 12:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |c
Status|UNCONFIRMED |NEW
Last reconfirmed| |2023-04-11
Ever confirmed|0 |1
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's probably a mismatch of GENERIC/GIMPLE folding. In this case it's
pointer_int_sum prematurely distributing the multiplication:
/* Return a tree for the sum or difference (RESULTCODE says which)
of pointer PTROP and integer INTOP. */
tree
pointer_int_sum (location_t loc, enum tree_code resultcode,
tree ptrop, tree intop, bool complain)
{
...
/* If what we are about to multiply by the size of the elements
contains a constant term, apply distributive law
and multiply that constant term separately.
This helps produce common subexpressions. */
but this kind of stuff shouldn't be done by the frontends these days.
Gating this fixes the issue. I think this piece of code should be axed
(after careful evaluation of its effect)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
` (2 preceding siblings ...)
2023-04-11 12:16 ` [Bug c/109393] " rguenth at gcc dot gnu.org
@ 2023-05-10 12:55 ` manolis.tsamis at vrull dot eu
2023-05-11 12:47 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-05-10 12:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
--- Comment #4 from manolis.tsamis at vrull dot eu ---
(In reply to Richard Biener from comment #3)
> It's probably a mismatch of GENERIC/GIMPLE folding. In this case it's
> pointer_int_sum prematurely distributing the multiplication:
>
> /* Return a tree for the sum or difference (RESULTCODE says which)
> of pointer PTROP and integer INTOP. */
>
> tree
> pointer_int_sum (location_t loc, enum tree_code resultcode,
> tree ptrop, tree intop, bool complain)
> {
> ...
> /* If what we are about to multiply by the size of the elements
> contains a constant term, apply distributive law
> and multiply that constant term separately.
> This helps produce common subexpressions. */
>
> but this kind of stuff shouldn't be done by the frontends these days.
>
> Gating this fixes the issue. I think this piece of code should be axed
> (after careful evaluation of its effect)
I have been testing this change and by looking at some tests that it causes to
fail, there is a regression on testcases like this (taken from
copy-headers-5.c, there are some similar fails):
int is_sorted(int *a, int n)
{
for (int i = 0; i < n - 1; i++)
if (a[i] > a[i + 1])
return 0;
return 1;
}
The gimple code for gcc currently is (taken from dump-tree-ch2):
_1 = (long unsigned int) i_18;
_2 = _1 * 4;
_3 = a_13(D) + _2;
_4 = *_3;
_5 = _1 + 1;
_6 = _5 * 4;
_7 = a_13(D) + _6;
_8 = *_7;
While with this change the result is:
_1 = (long unsigned int) i_11;
_2 = _1 * 4;
_3 = a_14(D) + _2;
_4 = *_3;
_5 = i_11 + 1;
_6 = (long unsigned int) _5;
_7 = _6 * 4;
_8 = a_14(D) + _7;
_9 = *_8;
As a result the generated code has two loads per loop instead of one.
The same holds if e.g. a[i + 1] > a[i + 2] is used because the constant
addition is done before the cast to long unsigned int. I believe this is
because the logic to distribute the constant is missing as a GIMPLE pattern?
Given the original transform it should be valid to propagate the constant
addition through the cast?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
` (3 preceding siblings ...)
2023-05-10 12:55 ` manolis.tsamis at vrull dot eu
@ 2023-05-11 12:47 ` rguenth at gcc dot gnu.org
2023-05-11 12:59 ` manolis.tsamis at vrull dot eu
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-11 12:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to manolis.tsamis from comment #4)
> Given the original transform it should be valid to propagate the constant
> addition through the cast?
Yes. Note doing so loses information, we know i + 1 doesn't overflow
(undefined behavior). Widening preserves this knowledge I think, but if
just an unsigned conversion would be propagated it would be lost.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
` (4 preceding siblings ...)
2023-05-11 12:47 ` rguenth at gcc dot gnu.org
@ 2023-05-11 12:59 ` manolis.tsamis at vrull dot eu
2023-09-04 11:38 ` manolis.tsamis at vrull dot eu
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-05-11 12:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
--- Comment #6 from manolis.tsamis at vrull dot eu ---
(In reply to Richard Biener from comment #5)
> (In reply to manolis.tsamis from comment #4)
> > Given the original transform it should be valid to propagate the constant
> > addition through the cast?
>
> Yes. Note doing so loses information, we know i + 1 doesn't overflow
> (undefined behavior). Widening preserves this knowledge I think, but if
> just an unsigned conversion would be propagated it would be lost.
So, is that the reason that this transform isn't already implemented as an
optimisation?
But then again isn't this information also lost in the code currently produced
by GCC, where the constant is already propagated? Although I can see how it is
different to do the propagation of constants in the front-end only vs doing it
anywhere while transforming the code; maybe that's the difference that matters.
But hopefully doing better canonicalization/constant folding on address
calculations and constants should also result in better optimisation
opportunities overall.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
` (5 preceding siblings ...)
2023-05-11 12:59 ` manolis.tsamis at vrull dot eu
@ 2023-09-04 11:38 ` manolis.tsamis at vrull dot eu
2023-09-04 13:06 ` philipp.tomsich at vrull dot eu
2023-09-08 13:14 ` manolis.tsamis at vrull dot eu
8 siblings, 0 replies; 10+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-09-04 11:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
--- Comment #7 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
After some attempts to improve on this, my current solution is:
1) Do not change pointer_int_sum in c-common (otherwise codegen regressions
are observed)
2) Introduce a pattern that folds (unsigned type) (x + CONST_INT_1) *
CONST_INT_2 to (unsigned type) (x * CONST_INT_2) + (unsigned type) (CONST_INT_1
* CONST_INT_1) under the right circumstances.
This combination improves codegen (including the testcase provided) and also
doesn't cause any regressions on the GCC testsuite or benchmarks that I have
tried so far.
Given that a[(ulong) i +- C] in GIMPLE is quite common, I believe this change
helps with folding / canonicalization in many cases.
My current match.pd pattern to do that is below; any feedback or improvements
are welcome.
(simplify
(mult (convert@3 (plus @0 INTEGER_CST@1)) INTEGER_CST@2)
(with { tree tt = TREE_TYPE(@3); }
(if (INTEGRAL_TYPE_P (type)
&& !TYPE_SATURATING (type)
&& !TYPE_OVERFLOW_TRAPS (type)
&& !TYPE_OVERFLOW_SANITIZED (type)
&& !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
(plus (mult (convert:tt @0) @2) (mult (convert:tt @1) @2)))))
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
` (6 preceding siblings ...)
2023-09-04 11:38 ` manolis.tsamis at vrull dot eu
@ 2023-09-04 13:06 ` philipp.tomsich at vrull dot eu
2023-09-08 13:14 ` manolis.tsamis at vrull dot eu
8 siblings, 0 replies; 10+ messages in thread
From: philipp.tomsich at vrull dot eu @ 2023-09-04 13:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
--- Comment #8 from philipp.tomsich at vrull dot eu ---
On Mon, 4 Sept 2023 at 13:38, manolis.tsamis at vrull dot eu <
gcc-bugzilla@gcc.gnu.org> wrote:
> My current match.pd pattern to do that is below; any feedback or
> improvements
> are welcome.
>
> (simplify
> (mult (convert@3 (plus @0 INTEGER_CST@1)) INTEGER_CST@2)
> (with { tree tt = TREE_TYPE(@3); }
> (if (INTEGRAL_TYPE_P (type)
> && !TYPE_SATURATING (type)
> && !TYPE_OVERFLOW_TRAPS (type)
> && !TYPE_OVERFLOW_SANITIZED (type)
> && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> (plus (mult (convert:tt @0) @2) (mult (convert:tt @1) @2)))))
>
Please attach a current .patch to this ticket to make sure that there are
no surprises in reproducing.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/109393] Very trivial address calculation does not fold
2023-04-03 17:57 [Bug tree-optimization/109393] New: Very trivial address calculation does not fold manolis.tsamis at vrull dot eu
` (7 preceding siblings ...)
2023-09-04 13:06 ` philipp.tomsich at vrull dot eu
@ 2023-09-08 13:14 ` manolis.tsamis at vrull dot eu
8 siblings, 0 replies; 10+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-09-08 13:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393
--- Comment #9 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
Created attachment 55856
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55856&action=edit
Address calculation pattern v1
^ permalink raw reply [flat|nested] 10+ messages in thread