From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id D7A503861004; Wed, 24 Mar 2021 08:52:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D7A503861004
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/99728] code pessimization when using wrapper
 classes around SIMD types
Date: Wed, 24 Mar 2021 08:52:38 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 10.2.1
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc bug_status everconfirmed component
 cf_reconfirmed_on
Message-ID: <bug-99728-4-n7HrNFNbFD@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-99728-4@http.gcc.gnu.org/bugzilla/>
References: <bug-99728-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Mar 2021 08:52:39 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99728

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
          Component|c++                         |tree-optimization
   Last reconfirmed|                            |2021-03-24
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue for store-motion is that we see an aggregate copy:

Unanalyzed memory reference 0: *d_28(D).lam1 =3D *d_28(D).lam2;

  __MEM <struct s0data_s> (d_28(D)).lam1 =3D __MEM <struct s0data_s>
(d_28(D)).lam2;
  __MEM <struct s0data_s> (d_28(D)).lam2.v =3D _38;
  il_36 =3D il_74 + 1ul;

there is another PR about those Unanalyzed refs preventing LIM/SM but then
getting rid of those aggregate copies would be nice as well since many
passes do not like them.  I suppose 'vtype' in this case has a FP mode
which prevents us from simplistic folding of this (unless we'd always
expand those to FP load/store sequences).

Indeed, we're copying

    type <record_type 0x7ffff437dbd0 Tvsimple sizes-gimplified
needs-constructing cxx-odr-p type_1 type_5 type_6 V4DF
        size <integer_cst 0x7ffff658b228 constant 256>
        unit-size <integer_cst 0x7ffff658b318 constant 32>
        align:256 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type
0x7ffff437dbd0
        fields <function_decl 0x7ffff4391400 operator=3D type <method_type
0x7ffff43933f0>
            public external autoinline decl_3 QI t.C:3:8 align:16
warn_if_not_align:0 context <record_type 0x7ffff437dbd0 Tvsimple>
            full-name "constexpr Tvsimple& Tvsimple::operator=3D(Tvsimple&&)
noexcept (<uninstantiated>)"
            not-really-extern chain <function_decl 0x7ffff4391300 operator=
=3D>>
context <translation_unit_decl 0x7ffff6578168 t.C>
        full-name "struct Tvsimple"
        needs-constructor X() X(constX&) this=3D(X&) n_parents=3D0 use_temp=
late=3D0
interface-unknown
        pointer_to_this <pointer_type 0x7ffff437dd20> reference_to_this
<reference_type 0x7ffff437d690> chain <type_decl 0x7ffff4a554c0 Tvsimple>>

OK, so for a simple

struct X { double x; };

void foo (struct X *x, struct X *y)
{
  *x =3D *y;
}

we do generate x87 FP load/store insns and do not transfer bytes.  Probably
OK from a C language perspective but questionable on the GIMPLE side
(we've been there before).

So one thing we can experiment with is to gimplify those aggregate
copies to register load/store when the aggregates have been assigned
non-BLKmode by the target.  This might of course confuse SRA which
means that SRA itself might be a better place to perform this
optimization.  [mind struct { double; double; } on x86 gets TImode
for example]=