public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95254] New: aarch64: gcc generate inefficient code with fixed sve vector length
@ 2020-05-21  7:33 felix.yang at huawei dot com
  2020-06-04 12:05 ` [Bug target/95254] " cvs-commit at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: felix.yang at huawei dot com @ 2020-05-21  7:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254

            Bug ID: 95254
           Summary: aarch64: gcc generate inefficient code with fixed sve
                    vector length
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: felix.yang at huawei dot com
  Target Milestone: ---
            Target: aarch64

Test case:

typedef short __attribute__((vector_size (8))) v4hi;

typedef union U4HI { v4hi v; short a[4]; } u4hi;

short b[4];

void pass_v4hi (v4hi v)
{
    int i;
    u4hi u;
    u.v = v;
    for (i = 0; i < 4; i++)
      b[i] = u.a[i];
};

$ gcc -O2 -ftree-slp-vectorize -S -march=armv8.2-a+sve foo.c
assembly code:
pass_v4hi:
.LFB0:
        .cfi_startproc
        adrp    x0, .LANCHOR0
        str     d0, [x0, #:lo12:.LANCHOR0]
        ret
        .cfi_endproc

$ gcc -O2 -ftree-slp-vectorize -S -march=armv8.2-a+sve -msve-vector-bits=256
foo.c
assembly code:
pass_v4hi:
.LFB0:
        .cfi_startproc
        sub     sp, sp, #16
        .cfi_def_cfa_offset 16
        ptrue   p0.b, vl32
        adrp    x0, .LANCHOR0
        add     x0, x0, :lo12:.LANCHOR0
        str     d0, [sp, 8]
        ld1h    z0.d, p0/z, [sp, #1, mul vl]
        st1h    z0.d, p0, [x0]
        add     sp, sp, 16
        .cfi_def_cfa_offset 0
        ret
        .cfi_endproc


The root cause here is that we choose a different mode in
aarch64_vectorize_related_mode[1]: VNx2HImode instead of V4HImode.
Then in the final tree ssa forwprop pass, we need to do a VIEW_CONVERT from
V4HImode to VNx2HImode.
One way to fix this is to catch and simplify the pattern in
aarch64_expand_sve_mem_move, emitting a mov pattern of V4HImode instead.
I am assuming endianness does not make a difference here. Will propose a patch
for comments.


[1] call trace:
(gdb) bt
#0  aarch64_vectorize_related_mode (vector_mode=E_VNx8HImode, element_mode=...,
nunits=...) at ../../gcc-git/gcc/config/aarch64/aarch64.c:2377
#1  0x00000000012983b4 in related_vector_mode (vector_mode=E_VNx8HImode,
element_mode=..., nunits=...) at ../../gcc-git/gcc/stor-layout.c:535
#2  0x0000000001652918 in get_related_vectype_for_scalar_type
(prevailing_mode=E_VNx8HImode, scalar_type=0xffffb22da498, nunits=...)
    at ../../gcc-git/gcc/tree-vect-stmts.c:11463
#3  0x0000000001653304 in get_vectype_for_scalar_type (vinfo=0x2f0dc80,
scalar_type=0xffffb22da498, group_size=4)
    at ../../gcc-git/gcc/tree-vect-stmts.c:11545
#4  0x00000000016533a0 in get_vectype_for_scalar_type (vinfo=0x2f0dc80,
scalar_type=0xffffb22da498, node=0x2e5d460)
    at ../../gcc-git/gcc/tree-vect-stmts.c:11569
#5  0x00000000016987e8 in vect_get_constant_vectors (vinfo=0x2f0dc80,
slp_node=0x2e53080, op_num=0, vec_oprnds=0xffffffffc738)
    at ../../gcc-git/gcc/tree-vect-slp.c:3562
#6  0x00000000016993f8 in vect_get_slp_defs (vinfo=0x2f0dc80,
slp_node=0x2e53080, vec_oprnds=0xffffffffc7a8, n=1) at
../../gcc-git/gcc/tree-vect-slp.c:3786
#7  0x0000000001631c70 in vect_get_vec_defs (vinfo=0x2f0dc80,
op0=0xffffb20e3120, op1=0x0, stmt_info=0x2feef60, vec_oprnds0=0xffffffffcdd0,
vec_oprnds1=0x0,
    slp_node=0x2e53080) at ../../gcc-git/gcc/tree-vect-stmts.c:1726
#8  0x0000000001648bc8 in vectorizable_store (vinfo=0x2f0dc80,
stmt_info=0x2feef60, gsi=0xffffffffdad0, vec_stmt=0xffffffffd5b0,
slp_node=0x2e53080,
    cost_vec=0x0) at ../../gcc-git/gcc/tree-vect-stmts.c:8186
#9  0x0000000001651808 in vect_transform_stmt (vinfo=0x2f0dc80,
stmt_info=0x2feef60, gsi=0xffffffffdad0, slp_node=0x2e53080,
slp_node_instance=0x2fefe70)
    at ../../gcc-git/gcc/tree-vect-stmts.c:11184
#10 0x000000000169a4a0 in vect_schedule_slp_instance (vinfo=0x2f0dc80,
node=0x2e53080, instance=0x2fefe70) at ../../gcc-git/gcc/tree-vect-slp.c:4134
#11 0x000000000169aaac in vect_schedule_slp (vinfo=0x2f0dc80) at
../../gcc-git/gcc/tree-vect-slp.c:4258
#12 0x00000000016972f0 in vect_slp_bb_region (region_begin=..., region_end=...,
datarefs=..., n_stmts=10) at ../../gcc-git/gcc/tree-vect-slp.c:3227
#13 0x0000000001697c60 in vect_slp_bb (bb=0xffffb22ce340) at
../../gcc-git/gcc/tree-vect-slp.c:3350
#14 0x00000000016a56f0 in (anonymous namespace)::pass_slp_vectorize::execute
(this=0x2e6aae0, fun=0xffffb2116000) at
../../gcc-git/gcc/tree-vectorizer.c:1320

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/95254] aarch64: gcc generate inefficient code with fixed sve vector length
  2020-05-21  7:33 [Bug target/95254] New: aarch64: gcc generate inefficient code with fixed sve vector length felix.yang at huawei dot com
@ 2020-06-04 12:05 ` cvs-commit at gcc dot gnu.org
  2020-06-05  9:36 ` cvs-commit at gcc dot gnu.org
  2020-06-05  9:41 ` rsandifo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-06-04 12:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254

--- Comment #1 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:43088bb4dadd3d14b6b594c5f9363fe879f3d7f7

commit r11-928-g43088bb4dadd3d14b6b594c5f9363fe879f3d7f7
Author: liuhongt <hongtao.liu@intel.com>
Date:   Fri May 29 13:38:49 2020 +0800

    Fix zero-masking for vcvtps2ph when dest operand is memory.

    When dest is memory, zero-masking is not valid, only merging-masking is
available,

    2020-06-24  Hongtao Liu  <hongtao.liu@inte.com>

    gcc/ChangeLog:
            PR target/95254
            * config/i386/sse.md (*vcvtps2ph_store<merge_mask_name>):
            Refine from *vcvtps2ph_store<mask_name>.
            (vcvtps2ph256<mask_name>): Refine constraint from vm to v.
            (<mask_codefor>avx512f_vcvtps2ph512<mask_name>): Ditto.
            (*vcvtps2ph256<merge_mask_name>): New define_insn.
            (*avx512f_vcvtps2ph512<merge_mask_name>): Ditto.
            * config/i386/subst.md (merge_mask): New define_subst.
            (merge_mask_name): New define_subst_attr.
            (merge_mask_operand3): Ditto.

    gcc/testsuite/ChangeLog:
            * gcc.target/i386/avx512f-vcvtps2ph-pr95254.c: New test.
            * gcc.target/i386/avx512vl-vcvtps2ph-pr95254.c: Ditto.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/95254] aarch64: gcc generate inefficient code with fixed sve vector length
  2020-05-21  7:33 [Bug target/95254] New: aarch64: gcc generate inefficient code with fixed sve vector length felix.yang at huawei dot com
  2020-06-04 12:05 ` [Bug target/95254] " cvs-commit at gcc dot gnu.org
@ 2020-06-05  9:36 ` cvs-commit at gcc dot gnu.org
  2020-06-05  9:41 ` rsandifo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-06-05  9:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:9a182ef9ee011935d827ab5c6c9a7cd8e22257d8

commit r11-966-g9a182ef9ee011935d827ab5c6c9a7cd8e22257d8
Author: Fei Yang <felix.yang@huawei.com>
Date:   Fri Jun 5 10:34:59 2020 +0100

    expand: Simplify removing subregs when expanding a copy [PR95254]

    In rtl expand, if we have a copy that matches one of the following
patterns:
      (set (subreg:M1 (reg:M2 ...)) (subreg:M1 (reg:M2 ...)))
      (set (subreg:M1 (reg:M2 ...)) (mem:M1 ADDR))
      (set (mem:M1 ADDR) (subreg:M1 (reg:M2 ...)))
      (set (subreg:M1 (reg:M2 ...)) (constant C))
    where mode M1 is equal in size to M2, try to detect whether the mode change
    involves an implicit round trip through memory.  If so, see if we can avoid
    that by removing the subregs and doing the move in mode M2 instead.

    2020-06-05  Felix Yang  <felix.yang@huawei.com>

    gcc/
            PR target/95254
            * expr.c (emit_move_insn): Check src and dest of the copy to see
            if one or both of them are subregs, try to remove the subregs when
            innermode and outermode are equal in size and the mode change
involves
            an implicit round trip through memory.

    gcc/testsuite/
            PR target/95254
            * gcc.target/aarch64/pr95254.c: New test.
            * gcc.target/i386/pr67609.c: Check "movq\t%xmm0" instead of
"movdqa".

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/95254] aarch64: gcc generate inefficient code with fixed sve vector length
  2020-05-21  7:33 [Bug target/95254] New: aarch64: gcc generate inefficient code with fixed sve vector length felix.yang at huawei dot com
  2020-06-04 12:05 ` [Bug target/95254] " cvs-commit at gcc dot gnu.org
  2020-06-05  9:36 ` cvs-commit at gcc dot gnu.org
@ 2020-06-05  9:41 ` rsandifo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-06-05  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
                 CC|                            |rsandifo at gcc dot gnu.org
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #3 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-06-05  9:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21  7:33 [Bug target/95254] New: aarch64: gcc generate inefficient code with fixed sve vector length felix.yang at huawei dot com
2020-06-04 12:05 ` [Bug target/95254] " cvs-commit at gcc dot gnu.org
2020-06-05  9:36 ` cvs-commit at gcc dot gnu.org
2020-06-05  9:41 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).