public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/104106] New: Fail to remove some useless loop
@ 2022-01-18 22:25 denis.campredon at gmail dot com
2022-01-18 22:27 ` [Bug tree-optimization/104106] " pinskia at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: denis.campredon at gmail dot com @ 2022-01-18 22:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106
Bug ID: 104106
Summary: Fail to remove some useless loop
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: denis.campredon at gmail dot com
Target Milestone: ---
In the following snippet none of the loops are removed when compiled with -O2
or -O3.
In f and g the optimizers shoulds detect that tmp_a is only written and never
read.
In h, only one index of tmp_a is read, so it should be the only one computed.
Ideally, if not too complex for gcc, the first two loops should be removed and
the computations, if any, done on the last loop.
-------------
int f(char *a, unsigned n) {
char tmp_a[n];
for (unsigned i = 1; i != n; i++) tmp_a[i] = a[i];
return a[0];
}
int g(char *a, int n) {
char tmp_a[n];
for (int i = 1; i < n; i++) tmp_a[i] = a[i] - a[i - 1];
return a[0];
}
int h(char *a, int n) {
char tmp_a[n];
for (int i = 0; i < n; i++) tmp_a[i] = a[i];
return tmp_a[1];
}
int i(char *a, char *b, int n) {
char tmp_a[n];
char tmp_b[n];
for (int i = 1; i < n; i++) tmp_a[i] = a[i] - a[i - 1];
for (int i = 1; i < n; i++) tmp_b[i] = b[i] - b[i - 1];
int result = 0;
for (int i = 1; i < n; i++) result += tmp_a[i] + tmp_b[i];
return result;
}
---------------------
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/104106] Fail to remove some useless loop
2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
@ 2022-01-18 22:27 ` pinskia at gcc dot gnu.org
2022-01-18 22:35 ` [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-18 22:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Keywords| |missed-optimization
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops
2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
2022-01-18 22:27 ` [Bug tree-optimization/104106] " pinskia at gcc dot gnu.org
@ 2022-01-18 22:35 ` pinskia at gcc dot gnu.org
2022-01-19 8:00 ` denis.campredon at gmail dot com
2022-01-19 10:55 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-18 22:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Summary|Fail to remove some useless |Fail to remove stores to
|loop |VLA inside loops
Last reconfirmed| |2022-01-18
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
If we change the VLA to a normal array, GCC is able to optimize f and g.
h is almost done:
<bb 2> [local count: 118111600]:
if (n_8(D) > 0)
goto <bb 3>; [89.00%]
else
goto <bb 4>; [11.00%]
<bb 3> [local count: 105119324]:
_18 = (unsigned int) n_8(D);
_6 = (sizetype) _18;
__builtin_memcpy (&tmp_a, a_11(D), _6);
<bb 4> [local count: 118111600]:
_4 = tmp_a[0];
GCC could do a PRE for the load of tmp_a[0] to a_11[0] and unspecified.
This is true even with the VLA.
As for i, that requires loop fision which GCC does not implement yet (there is
another bug about that even).
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops
2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
2022-01-18 22:27 ` [Bug tree-optimization/104106] " pinskia at gcc dot gnu.org
2022-01-18 22:35 ` [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops pinskia at gcc dot gnu.org
@ 2022-01-19 8:00 ` denis.campredon at gmail dot com
2022-01-19 10:55 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: denis.campredon at gmail dot com @ 2022-01-19 8:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106
--- Comment #2 from denis.campredon at gmail dot com ---
The missed optimisations are also present if the arrays are allocated with
malloc or new.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops
2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
` (2 preceding siblings ...)
2022-01-19 8:00 ` denis.campredon at gmail dot com
@ 2022-01-19 10:55 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-19 10:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think DSE doesn't quite understand free() as must-def, likewise for
__builtin_stack_restore though that is _much_ more difficult to tie to
a specific alloca allocation.
It would be nice if the gimplifier would emit CLOBBERs for the VLAs
that go out of scope at the __builtin_stack_restore point, that would
fix the VLA case I think. For
int f(char *a, unsigned n) {
char *tmp_a = __builtin_malloc (n);
for (unsigned i = 1; i != n; i++) tmp_a[i] = a[i];
__builtin_free (tmp_a);
return a[0];
}
it should already work but DSE is not set up to consider variable indexed
stores in loops that end up as pointer accesses like
*_5 = _6;
or rather stmt_kills_ref_p is too simple minded when looking at free():
case BUILT_IN_FREE:
{
tree ptr = gimple_call_arg (stmt, 0);
tree base = ao_ref_base (ref);
if (base && TREE_CODE (base) == MEM_REF
&& TREE_OPERAND (base, 0) == ptr)
{
++alias_stats.stmt_kills_ref_p_yes;
return true;
}
it might be able to use points-to analysis or track down the base of
_5 via DR analysis but in the end I think it's DSEs job to do better
here.
With
int f(char *a, unsigned n) {
char (*tmp_a)[n] = __builtin_malloc (n);
for (unsigned i = 1; i != n; i++) (*tmp_a)[i] = a[i];
__builtin_free (tmp_a);
return a[0];
}
this issue is avoided but we then run into
/* If we visit this PHI by following a backedge then we have to
make sure ref->ref only refers to SSA names that are invariant
with respect to the loop represented by this PHI node. */
if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
gimple_bb (temp))
&& !for_each_index (ref->ref ? &ref->ref : &ref->base,
check_name, gimple_bb (temp)))
return DSE_STORE_LIVE;
which explicitely disables DSE of variably indexed accesses with the index
varying in the loop. That's because of cross iteration dependences otherwise
mishandled.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-01-19 10:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18 22:25 [Bug tree-optimization/104106] New: Fail to remove some useless loop denis.campredon at gmail dot com
2022-01-18 22:27 ` [Bug tree-optimization/104106] " pinskia at gcc dot gnu.org
2022-01-18 22:35 ` [Bug tree-optimization/104106] Fail to remove stores to VLA inside loops pinskia at gcc dot gnu.org
2022-01-19 8:00 ` denis.campredon at gmail dot com
2022-01-19 10:55 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).