public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length
@ 2022-10-26  9:13 linkw at gcc dot gnu.org
  2022-10-26  9:15 ` [Bug tree-optimization/107412] " linkw at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-10-26  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107412

            Bug ID: 107412
           Summary: Miss to fold LEN_{LOAD,STORE} when the specified
                    length equal to vector length
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

test case:
===
#define N 16
int src[N];
int dest[N];

void foo (){
  for (int i = 0; i < (N-1); i++)
   dest[i] = src[i];
}

===

Options: -mcpu=power10 -fno-tree-loop-distribute-patterns --param
vect-partial-vector-usage=2 -O2 -ftree-vectorize -funroll-loops
-fno-vect-cost-model

optimized gimple output:

void foo ()
{
  vector(16) unsigned char vect_2;
  vector(16) unsigned char vect_13;
  vector(16) unsigned char vect_34;
  vector(16) unsigned char vect_47;

  <bb 2> [local count: 67108864]:
  vect_2 = .LEN_LOAD (&src, 128B, 16, 0);
  .LEN_STORE (&dest, 128B, 16, vect_2, 0);
  vect_34 = .LEN_LOAD (&MEM <int[16]> [(void *)&src + 16B], 128B, 16, 0);
  .LEN_STORE (&MEM <int[16]> [(void *)&dest + 16B], 128B, 16, vect_34, 0);
  vect_47 = .LEN_LOAD (&MEM <int[16]> [(void *)&src + 32B], 128B, 16, 0);
  .LEN_STORE (&MEM <int[16]> [(void *)&dest + 32B], 128B, 16, vect_47, 0);
  vect_13 = .LEN_LOAD (&MEM <int[16]> [(void *)&src + 48B], 128B, 12, 0);
  .LEN_STORE (&MEM <int[16]> [(void *)&dest + 48B], 128B, 12, vect_13, 0);
[tail call]
  return;

}

It's expected that we only have one separated .LEN_LOAD and .LEN_STORE with
length 12, the others can adopt just normal vector load/store.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/107412] Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length
  2022-10-26  9:13 [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length linkw at gcc dot gnu.org
@ 2022-10-26  9:15 ` linkw at gcc dot gnu.org
  2022-10-27 22:03 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-10-26  9:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107412

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |linkw at gcc dot gnu.org
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-10-26
           Keywords|                            |missed-optimization
             Target|                            |powerpc64le-linux-gnu

--- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> ---
I'm working on a patch to extend the current
gimple_fold_mask_load_store_mem_ref.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/107412] Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length
  2022-10-26  9:13 [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length linkw at gcc dot gnu.org
  2022-10-26  9:15 ` [Bug tree-optimization/107412] " linkw at gcc dot gnu.org
@ 2022-10-27 22:03 ` segher at gcc dot gnu.org
  2022-10-31  6:39 ` linkw at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: segher at gcc dot gnu.org @ 2022-10-27 22:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107412

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Make sure we only use "plain" accesses on machines that allow all unaligned
accesses?  p8 and later I think.  The load-with-length insns are even later,
but a builtin does not necessarily translate to those newer insns, so some
care is required :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/107412] Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length
  2022-10-26  9:13 [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length linkw at gcc dot gnu.org
  2022-10-26  9:15 ` [Bug tree-optimization/107412] " linkw at gcc dot gnu.org
  2022-10-27 22:03 ` segher at gcc dot gnu.org
@ 2022-10-31  6:39 ` linkw at gcc dot gnu.org
  2022-11-07  8:17 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-10-31  6:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107412

--- Comment #3 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #2)
> Make sure we only use "plain" accesses on machines that allow all unaligned
> accesses?  p8 and later I think.  The load-with-length insns are even later,
> but a builtin does not necessarily translate to those newer insns, so some
> care is required :-)

Thanks for raising this, for now these LEN_{LOAD,STORE} can ONLY be generated
when the target defines the relevant optab len_{load,store}, on power it's
power9 and later, while on s390 it should be some latest cpu I guess. Those are
internal function and invisible for users, the support and the documentation
assume it can work for unaligned access, if some target wants to emulate it
later, it needs some more updates (not only this folding). So we don't need to
bother it for now IHMO. :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/107412] Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length
  2022-10-26  9:13 [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length linkw at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-10-31  6:39 ` linkw at gcc dot gnu.org
@ 2022-11-07  8:17 ` cvs-commit at gcc dot gnu.org
  2022-11-07  8:25 ` linkw at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-07  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107412

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:8408120fecc56385b316dafec1bdfe3aac61fc05

commit r13-3716-g8408120fecc56385b316dafec1bdfe3aac61fc05
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Mon Nov 7 02:07:27 2022 -0600

    vect: Fold LEN_{LOAD,STORE} if it's for the whole vector [PR107412]

    As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
    STORE} into normal vector load/store if the given length is known
    to be equal to the length of the whole vector.  It would help to
    improve overall cycles as normally the latency of vector access
    with length in bytes is bigger than normal vector access, and it
    also saves the preparation for length if constant length can not
    be encoded into instruction (such as on power).

            PR tree-optimization/107412

    gcc/ChangeLog:

            * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to
...
            (gimple_fold_partial_load_store_mem_ref): ... this, add one
parameter
            mask_p indicating it's for mask or length, and add some handlings
for
            IFN LEN_{LOAD,STORE}.
            (gimple_fold_mask_load): Rename to ...
            (gimple_fold_partial_load): ... this, add one parameter mask_p.
            (gimple_fold_mask_store): Rename to ...
            (gimple_fold_partial_store): ... this, add one parameter mask_p.
            (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
            and adjust calls on gimple_fold_mask_load_store_mem_ref to
            gimple_fold_partial_load_store_mem_ref.

    gcc/testsuite/ChangeLog:

            * gcc.target/powerpc/pr107412.c: New test.
            * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
            folded LEN_LOAD.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/107412] Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length
  2022-10-26  9:13 [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length linkw at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-11-07  8:17 ` cvs-commit at gcc dot gnu.org
@ 2022-11-07  8:25 ` linkw at gcc dot gnu.org
  2022-11-07  8:25 ` linkw at gcc dot gnu.org
  2022-12-05  5:28 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-11-07  8:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107412

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> ---
Fixed on trunk.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/107412] Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length
  2022-10-26  9:13 [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length linkw at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-11-07  8:25 ` linkw at gcc dot gnu.org
@ 2022-11-07  8:25 ` linkw at gcc dot gnu.org
  2022-12-05  5:28 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-11-07  8:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107412

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/107412] Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length
  2022-10-26  9:13 [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length linkw at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-11-07  8:25 ` linkw at gcc dot gnu.org
@ 2022-12-05  5:28 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-05  5:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107412

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:380d62c14c99d8df13b7a86660e7ee67d01ad827

commit r13-4488-g380d62c14c99d8df13b7a86660e7ee67d01ad827
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Sun Dec 4 23:27:08 2022 -0600

    gimple-fold: Refine gimple_fold_partial_load_store_mem_ref [PR107412]

    Following Richard's review comments, this patch is to use
    untruncated type for the length used for IFN_LEN_{LOAD,STORE}
    instead of "unsigned int" for better robustness.  It also
    avoid to use to_constant and tree arithmetic for subtraction.

    Co-authored-by: Richard Sandiford  <richard.sandiford@arm.com>

            PR tree-optimization/107412

    gcc/ChangeLog:

            * gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Use
            untruncated type for the length, and avoid to_constant and tree
            arithmetic for subtraction.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-12-05  5:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26  9:13 [Bug tree-optimization/107412] New: Miss to fold LEN_{LOAD,STORE} when the specified length equal to vector length linkw at gcc dot gnu.org
2022-10-26  9:15 ` [Bug tree-optimization/107412] " linkw at gcc dot gnu.org
2022-10-27 22:03 ` segher at gcc dot gnu.org
2022-10-31  6:39 ` linkw at gcc dot gnu.org
2022-11-07  8:17 ` cvs-commit at gcc dot gnu.org
2022-11-07  8:25 ` linkw at gcc dot gnu.org
2022-11-07  8:25 ` linkw at gcc dot gnu.org
2022-12-05  5:28 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).