From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 44D843858D3C; Wed, 9 Nov 2022 12:56:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 44D843858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 378971F9BD; Wed, 9 Nov 2022 12:56:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1667998610; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vdX1gQYO341ADqG47Q45HJcbwziN765KGKCY3KT/3FY=; b=jNH0kH0hRaZQ3pma7MxiH53eY4xBC0e1WbeDoSA7ASToBCqWNA46JQ9bdroRUdY/05qYwC slCdIuWbOhhUZhVMMx+h9OYxPQlXQRvmXPWLlDbtwn3ReSI7YCfko8fbUD4PICw0Z1qyn6 pTcv/7CqQDSHgtAA32UEuC5hVIDSbFg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1667998610; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vdX1gQYO341ADqG47Q45HJcbwziN765KGKCY3KT/3FY=; b=KV4i3RzV7LsG7ei5rXcG9DIS34Rgiom7Sk0nxEuQJX+kT6hys8fDTkOjLOE69GjqYA186Z CXptaLhxETEWNZDw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 0C8DB2C141; Wed, 9 Nov 2022 12:56:50 +0000 (UTC) Date: Wed, 9 Nov 2022 12:56:49 +0000 (UTC) From: Richard Biener To: Jiufu Guo cc: Richard Biener , Jeff Law , gcc-patches@gcc.gnu.org, pinskia@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, segher@kernel.crashing.org, rguenth@gcc.gnu.org Subject: Re: [RFC] propgation leap over memory copy for struct In-Reply-To: <7e8rkktrdz.fsf@pike.rch.stglabs.ibm.com> Message-ID: References: <20221031024235.110995-1-guojiufu@linux.ibm.com> <7e8rkktrdz.fsf@pike.rch.stglabs.ibm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 9 Nov 2022, Jiufu Guo wrote: > Hi, > > Richard Biener writes: > > > On Mon, Oct 31, 2022 at 11:14 PM Jeff Law via Gcc-patches > > wrote: > >> > >> > >> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: > >> > Hi, > >> > > >> > We know that for struct variable assignment, memory copy may be used. > >> > And for memcpy, we may load and store more bytes as possible at one time. > >> > While it may be not best here: > >> > 1. Before/after stuct variable assignment, the vaiable may be operated. > >> > And it is hard for some optimizations to leap over memcpy. Then some struct > >> > operations may be sub-optimimal. Like the issue in PR65421. > >> > 2. The size of struct is constant mostly, the memcpy would be expanded. Using > >> > small size to load/store and executing in parallel may not slower than using > >> > large size to loat/store. (sure, more registers may be used for smaller bytes.) > >> > > >> > > >> > In PR65421, For source code as below: > >> > ////////t.c > >> > #define FN 4 > >> > typedef struct { double a[FN]; } A; > >> > > >> > A foo (const A *a) { return *a; } > >> > A bar (const A a) { return a; } > >> > >> So the first question in my mind is can we do better at the gimple > >> phase? For the second case in particular can't we just "return a" > >> rather than copying a into then returning ? This feels > >> a lot like the return value optimization from C++. I'm not sure if it > >> applies to the first case or not, it's been a long time since I looked > >> at NRV optimizations, but it might be worth poking around in there a bit > >> (tree-nrv.cc). > >> > >> > >> But even so, these kinds of things are still bound to happen, so it's > >> probably worth thinking about if we can do better in RTL as well. > >> > >> > >> The first thing that comes to my mind is to annotate memcpy calls that > >> are structure assignments. The idea here is that we may want to expand > >> a memcpy differently in those cases. Changing how we expand an opaque > >> memcpy call is unlikely to be beneficial in most cases. But changing > >> how we expand a structure copy may be beneficial by exposing the > >> underlying field values. This would roughly correspond to your method #1. > >> > >> Or instead of changing how we expand, teach the optimizers about these > >> annotated memcpy calls -- they're just a a copy of each field. That's > >> how CSE and the propagators could treat them. After some point we'd > >> lower them in the usual ways, but at least early in the RTL pipeline we > >> could keep them as annotated memcpy calls. This roughly corresponds to > >> your second suggestion. > > > > In the end it depends on the access patterns so some analysis like SRA > > performs would be nice. The issue with expanding memcpy on GIMPLE > > is that we currently cannot express 'rep; movb;' or other target specific > > sequences from the cpymem like optabs on GIMPLE and recovering those > > from piecewise copies on RTL is going to be difficult. > Actually, it is a special memcpy. It is generated during expanding the > struct assignment(expand_assignment/store_expr/emit_block_move). > We may introduce a function block_move_for_record for struct type. And > this function could be a target hook to generate specificed sequences. > For example: > r125:DF=[r112:DI+0x20] > r126:DF=[r112:DI+0x28] > [r112:DI]=r125:DF > [r112:DI+0x8]=r126:DF > > After expanding, following passes(cse/prop/dse/..) could optimize the > insn sequences. e.g "[r112:DI+0x20]=f1;r125:DF=[r112:DI+0x20]; > [r112:DI]=r125:DF;r129:DF=[r112:DI]" ==> "r129:DF=f1" > > And if the small reading/writing insns are still occur in late passes > e.g. combine, we would recover the isnsn to better sequence: > r125:DF=[r112:DI+0x20];r126:DF=[r112:DI+0x28] > ==> > r155:V2DI=[r112:DI+0x20]; > > Any comments? Thanks! As said the best copying decomposition depends on the followup uses and the argument passing ABI which is why I suggested to perform SRA like analysis which collects the access patterns and use that to drive the heuristic expanding this special memcpy. Richard.