public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable
@ 2015-08-23 3:14 michael.collison at linaro dot org
2015-08-25 9:14 ` [Bug tree-optimization/67323] " rguenth at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: michael.collison at linaro dot org @ 2015-08-23 3:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
Bug ID: 67323
Summary: Use non-unit stride loads by preference when
applicable
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: michael.collison at linaro dot org
Target Milestone: ---
On arm targets the following code fails to generate a vld3:
struct pixel {
char r,g,b;
};
void
t2(int len, struct pixel * __restrict p, struct pixel * __restrict x)
{
len = len & ~31;
for (int i = 0; i < len; i++){
p[i].r = x[i].r * 2;
p[i].g = x[i].g * 3;
p[i].b = x[i].b * 4;
}
}
Yes the same code with line 11 changed to:
p[i].g = x[i].g;
does generate a vld3.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
2015-08-23 3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
@ 2015-08-25 9:14 ` rguenth at gcc dot gnu.org
2015-08-25 9:57 ` michael.collison at linaro dot org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-08-25 9:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2015-08-25
CC|richard.guenther at gmail dot com |rguenth at gcc dot gnu.org
Depends on| |66721
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. We go down the SLP path here because the vectorizer thinks that
SLP is always cheaper than using interleaving (which generally is true
if there were not targets which can do the load plus interleave with
load-lanes ...).
I think this may be a regression as well because I enhanced SLP to apply
to way more cases.
Note that my plan is to make the vectorizer consider both (well, not really,
but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
on costs which route to go.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
[Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
2015-08-23 3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
2015-08-25 9:14 ` [Bug tree-optimization/67323] " rguenth at gcc dot gnu.org
@ 2015-08-25 9:57 ` michael.collison at linaro dot org
2015-08-25 10:05 ` rguenther at suse dot de
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: michael.collison at linaro dot org @ 2015-08-25 9:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
--- Comment #2 from Michael Collison <michael.collison at linaro dot org> ---
Richard,
Should I create a test case that fails until you resolve this in GCC 6?
On 08/25/2015 02:14 AM, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>
> Richard Biener <rguenth at gcc dot gnu.org> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> Status|UNCONFIRMED |ASSIGNED
> Last reconfirmed| |2015-08-25
> CC|richard.guenther at gmail dot com |rguenth at gcc dot gnu.org
> Depends on| |66721
> Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
> Ever confirmed|0 |1
>
> --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
> Confirmed. We go down the SLP path here because the vectorizer thinks that
> SLP is always cheaper than using interleaving (which generally is true
> if there were not targets which can do the load plus interleave with
> load-lanes ...).
>
> I think this may be a regression as well because I enhanced SLP to apply
> to way more cases.
>
> Note that my plan is to make the vectorizer consider both (well, not really,
> but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
> on costs which route to go.
>
>
> Referenced Bugs:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
> [Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
2015-08-23 3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
2015-08-25 9:14 ` [Bug tree-optimization/67323] " rguenth at gcc dot gnu.org
2015-08-25 9:57 ` michael.collison at linaro dot org
@ 2015-08-25 10:05 ` rguenther at suse dot de
2015-08-25 10:14 ` michael.collison at linaro dot org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2015-08-25 10:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 25 Aug 2015, michael.collison at linaro dot org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>
> --- Comment #2 from Michael Collison <michael.collison at linaro dot org> ---
> Richard,
>
> Should I create a test case that fails until you resolve this in GCC 6?
If you can provide one that I can check in together with a fix that
would be nice. Having it in the tree now and FAILing isn't according
to our policies.
> On 08/25/2015 02:14 AM, rguenth at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
> >
> > Richard Biener <rguenth at gcc dot gnu.org> changed:
> >
> > What |Removed |Added
> > ----------------------------------------------------------------------------
> > Status|UNCONFIRMED |ASSIGNED
> > Last reconfirmed| |2015-08-25
> > CC|richard.guenther at gmail dot com |rguenth at gcc dot gnu.org
> > Depends on| |66721
> > Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
> > Ever confirmed|0 |1
> >
> > --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
> > Confirmed. We go down the SLP path here because the vectorizer thinks that
> > SLP is always cheaper than using interleaving (which generally is true
> > if there were not targets which can do the load plus interleave with
> > load-lanes ...).
> >
> > I think this may be a regression as well because I enhanced SLP to apply
> > to way more cases.
> >
> > Note that my plan is to make the vectorizer consider both (well, not really,
> > but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
> > on costs which route to go.
> >
> >
> > Referenced Bugs:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
> > [Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
2015-08-23 3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
` (2 preceding siblings ...)
2015-08-25 10:05 ` rguenther at suse dot de
@ 2015-08-25 10:14 ` michael.collison at linaro dot org
2015-10-07 11:37 ` rguenth at gcc dot gnu.org
2021-05-04 12:32 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: michael.collison at linaro dot org @ 2015-08-25 10:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
--- Comment #4 from Michael Collison <michael.collison at linaro dot org> ---
Hi Richard,
No I do not have a fix now. Thanks for the info on the policy.
On 08/25/2015 03:05 AM, rguenther at suse dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>
> --- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
> On Tue, 25 Aug 2015, michael.collison at linaro dot org wrote:
>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>>
>> --- Comment #2 from Michael Collison <michael.collison at linaro dot org> ---
>> Richard,
>>
>> Should I create a test case that fails until you resolve this in GCC 6?
> If you can provide one that I can check in together with a fix that
> would be nice. Having it in the tree now and FAILing isn't according
> to our policies.
>
>> On 08/25/2015 02:14 AM, rguenth at gcc dot gnu.org wrote:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>>>
>>> Richard Biener <rguenth at gcc dot gnu.org> changed:
>>>
>>> What |Removed |Added
>>> ----------------------------------------------------------------------------
>>> Status|UNCONFIRMED |ASSIGNED
>>> Last reconfirmed| |2015-08-25
>>> CC|richard.guenther at gmail dot com |rguenth at gcc dot gnu.org
>>> Depends on| |66721
>>> Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
>>> Ever confirmed|0 |1
>>>
>>> --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
>>> Confirmed. We go down the SLP path here because the vectorizer thinks that
>>> SLP is always cheaper than using interleaving (which generally is true
>>> if there were not targets which can do the load plus interleave with
>>> load-lanes ...).
>>>
>>> I think this may be a regression as well because I enhanced SLP to apply
>>> to way more cases.
>>>
>>> Note that my plan is to make the vectorizer consider both (well, not really,
>>> but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
>>> on costs which route to go.
>>>
>>>
>>> Referenced Bugs:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
>>> [Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
2015-08-23 3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
` (3 preceding siblings ...)
2015-08-25 10:14 ` michael.collison at linaro dot org
@ 2015-10-07 11:37 ` rguenth at gcc dot gnu.org
2021-05-04 12:32 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-07 11:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
I note that the efficiency you gain is only by a reduced number of loads/store
instructions. vld3 instead of six vldr (huh, appearantly vld3 can load 16
byte vectors but vldr only 8 byte ones?). I assume vld3 has no penalty
for the lane-split itself so the code-size reduction is always wanted.
Thus we'd want to always use a lane load/store even if the permutation is
pointless as soon as we'd otherwise would issue more than one SLP load, say for
void
t5 (int len, int * __restrict p, int * __restrict q)
{
for (int i = 0; i < len; i+=8) {
p[i] = q[i] * 2;
p[i+1] = q[i+1] * 2;
p[i+2] = q[i+2] * 2;
p[i+3] = q[i+3] * 2;
p[i+4] = q[i+4] * 2;
p[i+5] = q[i+5] * 2;
p[i+6] = q[i+6] * 2;
p[i+7] = q[i+7] * 2;
}
}
instead of
.L4:
vldr d18, [r2, #-16]
vldr d19, [r2, #-8]
vldr d16, [r2, #-32]
vldr d17, [r2, #-24]
vshl.i32 q9, q9, #1
vshl.i32 q8, q8, #1
add r3, r3, #1
cmp r0, r3
vstr d18, [r1, #-16]
vstr d19, [r1, #-8]
vstr d16, [r1, #-32]
vstr d17, [r1, #-24]
add r2, r2, #32
add r1, r1, #32
bhi .L4
use vld2.32 / vst2.32? Generally for SLP the implicit permute performed
by those instructions could be modeled properly (and the SLP chain
permuted accordingly).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
2015-08-23 3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
` (4 preceding siblings ...)
2015-10-07 11:37 ` rguenth at gcc dot gnu.org
@ 2021-05-04 12:32 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-04 12:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-05-04 12:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-23 3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
2015-08-25 9:14 ` [Bug tree-optimization/67323] " rguenth at gcc dot gnu.org
2015-08-25 9:57 ` michael.collison at linaro dot org
2015-08-25 10:05 ` rguenther at suse dot de
2015-08-25 10:14 ` michael.collison at linaro dot org
2015-10-07 11:37 ` rguenth at gcc dot gnu.org
2021-05-04 12:32 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).