public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable
@ 2015-08-23  3:14 michael.collison at linaro dot org
  2015-08-25  9:14 ` [Bug tree-optimization/67323] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: michael.collison at linaro dot org @ 2015-08-23  3:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

            Bug ID: 67323
           Summary: Use non-unit stride loads by preference when
                    applicable
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: michael.collison at linaro dot org
  Target Milestone: ---

On arm targets the following code fails to generate a vld3:

struct pixel {
  char r,g,b;
};

void 
t2(int len, struct pixel * __restrict p, struct pixel * __restrict x)
{
  len = len & ~31;
  for (int i = 0; i < len; i++){
      p[i].r = x[i].r * 2;
      p[i].g = x[i].g * 3;
      p[i].b = x[i].b * 4;
  }
}

Yes the same code with line 11 changed to:

p[i].g = x[i].g;

does generate a vld3.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
  2015-08-23  3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
@ 2015-08-25  9:14 ` rguenth at gcc dot gnu.org
  2015-08-25  9:57 ` michael.collison at linaro dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-08-25  9:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2015-08-25
                 CC|richard.guenther at gmail dot com  |rguenth at gcc dot gnu.org
         Depends on|                            |66721
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  We go down the SLP path here because the vectorizer thinks that
SLP is always cheaper than using interleaving (which generally is true
if there were not targets which can do the load plus interleave with
load-lanes ...).

I think this may be a regression as well because I enhanced SLP to apply
to way more cases.

Note that my plan is to make the vectorizer consider both (well, not really,
but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
on costs which route to go.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
[Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
  2015-08-23  3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
  2015-08-25  9:14 ` [Bug tree-optimization/67323] " rguenth at gcc dot gnu.org
@ 2015-08-25  9:57 ` michael.collison at linaro dot org
  2015-08-25 10:05 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: michael.collison at linaro dot org @ 2015-08-25  9:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

--- Comment #2 from Michael Collison <michael.collison at linaro dot org> ---
Richard,

Should I create a test case that fails until you resolve this in GCC 6?

On 08/25/2015 02:14 AM, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>
> Richard Biener <rguenth at gcc dot gnu.org> changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>               Status|UNCONFIRMED                 |ASSIGNED
>     Last reconfirmed|                            |2015-08-25
>                   CC|richard.guenther at gmail dot com  |rguenth at gcc dot gnu.org
>           Depends on|                            |66721
>             Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
>       Ever confirmed|0                           |1
>
> --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
> Confirmed.  We go down the SLP path here because the vectorizer thinks that
> SLP is always cheaper than using interleaving (which generally is true
> if there were not targets which can do the load plus interleave with
> load-lanes ...).
>
> I think this may be a regression as well because I enhanced SLP to apply
> to way more cases.
>
> Note that my plan is to make the vectorizer consider both (well, not really,
> but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
> on costs which route to go.
>
>
> Referenced Bugs:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
> [Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
  2015-08-23  3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
  2015-08-25  9:14 ` [Bug tree-optimization/67323] " rguenth at gcc dot gnu.org
  2015-08-25  9:57 ` michael.collison at linaro dot org
@ 2015-08-25 10:05 ` rguenther at suse dot de
  2015-08-25 10:14 ` michael.collison at linaro dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2015-08-25 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 25 Aug 2015, michael.collison at linaro dot org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
> 
> --- Comment #2 from Michael Collison <michael.collison at linaro dot org> ---
> Richard,
> 
> Should I create a test case that fails until you resolve this in GCC 6?

If you can provide one that I can check in together with a fix that
would be nice.  Having it in the tree now and FAILing isn't according
to our policies.

> On 08/25/2015 02:14 AM, rguenth at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
> >
> > Richard Biener <rguenth at gcc dot gnu.org> changed:
> >
> >             What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >               Status|UNCONFIRMED                 |ASSIGNED
> >     Last reconfirmed|                            |2015-08-25
> >                   CC|richard.guenther at gmail dot com  |rguenth at gcc dot gnu.org
> >           Depends on|                            |66721
> >             Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
> >       Ever confirmed|0                           |1
> >
> > --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
> > Confirmed.  We go down the SLP path here because the vectorizer thinks that
> > SLP is always cheaper than using interleaving (which generally is true
> > if there were not targets which can do the load plus interleave with
> > load-lanes ...).
> >
> > I think this may be a regression as well because I enhanced SLP to apply
> > to way more cases.
> >
> > Note that my plan is to make the vectorizer consider both (well, not really,
> > but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
> > on costs which route to go.
> >
> >
> > Referenced Bugs:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
> > [Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs
> 
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
  2015-08-23  3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
                   ` (2 preceding siblings ...)
  2015-08-25 10:05 ` rguenther at suse dot de
@ 2015-08-25 10:14 ` michael.collison at linaro dot org
  2015-10-07 11:37 ` rguenth at gcc dot gnu.org
  2021-05-04 12:32 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: michael.collison at linaro dot org @ 2015-08-25 10:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

--- Comment #4 from Michael Collison <michael.collison at linaro dot org> ---
Hi Richard,

No I do not have a fix now. Thanks for the info on the policy.

On 08/25/2015 03:05 AM, rguenther at suse dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>
> --- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
> On Tue, 25 Aug 2015, michael.collison at linaro dot org wrote:
>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>>
>> --- Comment #2 from Michael Collison <michael.collison at linaro dot org> ---
>> Richard,
>>
>> Should I create a test case that fails until you resolve this in GCC 6?
> If you can provide one that I can check in together with a fix that
> would be nice.  Having it in the tree now and FAILing isn't according
> to our policies.
>
>> On 08/25/2015 02:14 AM, rguenth at gcc dot gnu.org wrote:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>>>
>>> Richard Biener <rguenth at gcc dot gnu.org> changed:
>>>
>>>              What    |Removed                     |Added
>>> ----------------------------------------------------------------------------
>>>                Status|UNCONFIRMED                 |ASSIGNED
>>>      Last reconfirmed|                            |2015-08-25
>>>                    CC|richard.guenther at gmail dot com  |rguenth at gcc dot gnu.org
>>>            Depends on|                            |66721
>>>              Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
>>>        Ever confirmed|0                           |1
>>>
>>> --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
>>> Confirmed.  We go down the SLP path here because the vectorizer thinks that
>>> SLP is always cheaper than using interleaving (which generally is true
>>> if there were not targets which can do the load plus interleave with
>>> load-lanes ...).
>>>
>>> I think this may be a regression as well because I enhanced SLP to apply
>>> to way more cases.
>>>
>>> Note that my plan is to make the vectorizer consider both (well, not really,
>>> but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
>>> on costs which route to go.
>>>
>>>
>>> Referenced Bugs:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
>>> [Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
  2015-08-23  3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
                   ` (3 preceding siblings ...)
  2015-08-25 10:14 ` michael.collison at linaro dot org
@ 2015-10-07 11:37 ` rguenth at gcc dot gnu.org
  2021-05-04 12:32 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-07 11:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
I note that the efficiency you gain is only by a reduced number of loads/store
instructions.  vld3 instead of six vldr (huh, appearantly vld3 can load 16
byte vectors but vldr only 8 byte ones?).  I assume vld3 has no penalty
for the lane-split itself so the code-size reduction is always wanted.
Thus we'd want to always use a lane load/store even if the permutation is
pointless as soon as we'd otherwise would issue more than one SLP load, say for

void
t5 (int len, int * __restrict p, int * __restrict q)
{
  for (int i = 0; i < len; i+=8) {
      p[i] = q[i] * 2;
      p[i+1] = q[i+1] * 2;
      p[i+2] = q[i+2] * 2;
      p[i+3] = q[i+3] * 2;
      p[i+4] = q[i+4] * 2;
      p[i+5] = q[i+5] * 2;
      p[i+6] = q[i+6] * 2;
      p[i+7] = q[i+7] * 2;
  }
}

instead of

.L4:
        vldr    d18, [r2, #-16]
        vldr    d19, [r2, #-8]
        vldr    d16, [r2, #-32]
        vldr    d17, [r2, #-24]
        vshl.i32        q9, q9, #1
        vshl.i32        q8, q8, #1
        add     r3, r3, #1
        cmp     r0, r3
        vstr    d18, [r1, #-16]
        vstr    d19, [r1, #-8]
        vstr    d16, [r1, #-32]
        vstr    d17, [r1, #-24]
        add     r2, r2, #32
        add     r1, r1, #32
        bhi     .L4

use vld2.32 / vst2.32?  Generally for SLP the implicit permute performed
by those instructions could be modeled properly (and the SLP chain
permuted accordingly).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable
  2015-08-23  3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
                   ` (4 preceding siblings ...)
  2015-10-07 11:37 ` rguenth at gcc dot gnu.org
@ 2021-05-04 12:32 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-04 12:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-04 12:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-23  3:14 [Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable michael.collison at linaro dot org
2015-08-25  9:14 ` [Bug tree-optimization/67323] " rguenth at gcc dot gnu.org
2015-08-25  9:57 ` michael.collison at linaro dot org
2015-08-25 10:05 ` rguenther at suse dot de
2015-08-25 10:14 ` michael.collison at linaro dot org
2015-10-07 11:37 ` rguenth at gcc dot gnu.org
2021-05-04 12:32 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).