From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by sourceware.org (Postfix) with ESMTPS id B447D385781A for ; Wed, 17 Mar 2021 07:46:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B447D385781A Received: by mail-ed1-x52e.google.com with SMTP id j3so1021546edp.11 for ; Wed, 17 Mar 2021 00:46:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=zcn0yRh6cNr/Jhv52s1ELVNtZTA6S+JFHHaAbpB0nG4=; b=VDA6vJmBLMzvzpn78q14fqAS/gA/FwPyblYYe2qJTf17QeW8YJ2a8b9ocKEN69e078 k5Zk7bJoJWjqoKAapqSzpbvQ5xJp9c6mlgAux48+BuAqi0VECsWKUcxyGQbBqBUp/+rk eLw3BQX9fpZN/ET/0WgJkLyju9dfZ2YDgkPg0cfeMqexhLX4K4s1b/SiL7+HejjXg0ld Ht0tXF3wXKBFGIxZaXjAXSQMT5XAnCfbAPqezFiW9Ki0N3vAnkdBeZuvW3sl4A3/2G75 uh/yuLYUfaSw3Km/fQ3cw+VgO6hm6pM73e5ui1Bw4/zBnvEIhONs73fkmeytFSCAbBOO F/eQ== X-Gm-Message-State: AOAM530oOF1SRBJUy05N2N/laENQ6xch72FKc7JCRWw8V/m35a+fyya/ QL8raqJ4U9JYwTPnjnumSI2cVv6uY05BVS5di3U= X-Google-Smtp-Source: ABdhPJw61uSQi/bxG3EWqcTI3VKuWBdRBfkdVGFUoIwpEm1p0CZifKH0/7Tkw0jTnMiH2DVXD2TZoqrZzpvFZ84NZMw= X-Received: by 2002:a05:6402:3075:: with SMTP id bs21mr40653619edb.274.1615967187765; Wed, 17 Mar 2021 00:46:27 -0700 (PDT) MIME-Version: 1.0 References: <87pn00z2st.fsf@euler.schwinge.homeip.net> <87mtv3ywdp.fsf@euler.schwinge.homeip.net> <87eege1y2g.fsf@dem-tschwing-1.ger.mentorg.com> In-Reply-To: <87eege1y2g.fsf@dem-tschwing-1.ger.mentorg.com> From: Richard Biener Date: Wed, 17 Mar 2021 08:46:16 +0100 Message-ID: Subject: Re: 'walk_stmt_load_store_addr_ops' for non-'gimple_assign_single_p (stmt)' To: Thomas Schwinge Cc: Michael Matz , GCC Development , Jakub Jelinek , Frederik Harwath Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Mar 2021 07:46:31 -0000 On Wed, Mar 17, 2021 at 12:25 AM Thomas Schwinge wrote: > > Hi! > > Thanks, Michael, and again Richard for your quick responses. > > On 2021-03-16T15:25:10+0000, Michael Matz wrote: > > On Tue, 16 Mar 2021, Thomas Schwinge wrote: > > > >> >>Indeed, given (Fortran) 'zzz =3D 1', we produce GIMPLE: > >> >> > >> >> gimple_assign > >> >> > >> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, as > >> >>expected, the 'visit_store' callback invoked, with 'rhs' and 'arg': > >> >>''. > >> >> > >> >>However, given (Fortran) 'zzz =3D r + r2', we produce GIMPLE: > >> >> > >> >> gimple_assign > > > > But that's pre-ssa form. After writing into SSA 'zzz' will be replaced= by > > an SSA name, and the actual store into 'zzz' will happen in a store > > instruction. > > Oh, I see, and... > > >> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, > >> >>unexpectedly, no callback at all invoked: neither 'visit_load', nor > >> >>'visit_store' (nor 'visit_address', obviously). > >> > > >> > The variables involved are registers. You only get called on memory = operands. > >> > >> How would I have told that from the 'walk_stmt_load_store_addr_ops' > >> function description? (How to improve that one "to reflect relatity"?= ) > >> > >> But 'zzz' surely is the same in 'zzz =3D 1' vs. 'zzz =3D r + r2' -- fo= r the > >> former I *do* see the 'visit_store' callback invoked, for the latter I > >> don't? > > > > The walk_gimple functions are intended to be used on the SSA form of > > gimple (i.e. the one that it is in most of the time). > > Yes, "most of the time", but actually not in my case: I'm doing stuff > right after gimplification (before OMP lowering)... So that's the > "detail" I was missing -- sorry for not mentioning that right away. :-| > > As 'walk_gimple_[...]' are used during gimplification, OMP lowering, > supposedly they're fine to use in non-SSA form -- but evidently some of > the helper functions are not. (Might there be a way to add > 'gcc_checking_assert (gimple_in_ssa_p (cfun))' or similar for that? > Putting that into 'walk_stmt_load_store_addr_ops' triggers a lot, as > called from 'gcc/cgraphbuild.c:cgraph_node::record_stmt_references', for > example.) > > > And in that it's > > not the case that 'zzz =3D 1' and 'zzz =3D r + r2' are similar. The fo= rmer > > can have memory as the lhs (that includes static variables, or indirect= ion > > through pointers), the latter can not. The lhs of a binary statement i= s > > always an SSA name. A write to an SSA name is not a store, which is wh= y > > it's not walked for walk_stmt_load_store_addr_ops. > > > > Maybe it helps to look at simple C examples: [...] > > I see, many thanks for reminding me about these items! > > > If you want > > to capture writes into SSA names as well ([...]) > > you need the per-operand callback indeed. > > What I actually need is loads from/uses of actual variables (and I didn't > see 'walk_stmt_load_store_addr_ops' give me these). > > > But that depends on > > what you actually want to do. > > This is a prototype/"initial hack" for a very simple implementation of > "Avoid unnecessary data transfer out of OMP > construct". (... to be completed and posted later.) So simple that it > will easily fail (gracefully, of course), but yet is effective for a lot > of real-world code: > > subroutine [...] > [...] > real([...]) xx !temporary variable, for distance calculation > [...] > !$acc kernels pcopyin(x, zii) reduction(+:eva) ! implicit 'copy(xx)' = for scalar used inside region; established during gimplification > do 100 i=3D0,n-1 > evx=3D0.0d0 > do 90 j=3D0,n-1 > xx=3Dabs(x(1,i)-x(1,j)) > [...] > !$acc end kernels > [...] > ['xx' never used here] > end subroutine [...] > > Inside 'kernels', we'd like to automatically parallelize loops (we've > basically got that working; analysis by Graphite etc.), but the problem > is that given "implicit 'copy(xx)' for scalar used inside region", when > Graphite later looks at the outlined 'kernels' region's function, it must > assume that 'xx' is still live after the OpenACC 'kernels' construct -- > and thus cannot treat it as a thread-private temporary, cannot > parallelize. > > Now, walking each function backwards (!), I'm taking note of any > variables' uses, and if I reach an 'kernels' construct, but have not seen > a use of 'xx', I may then optimize 'copy(xx)' -> 'firstprivate(xx)', > enabling Graphite to do its thing. (Other such optimizations may be > added later.) (This is inspired by Jakub's commit > 1a80d6b87d81c3f336ab199a901cf80ae349c335 "re PR tree-optimization/68128 > (A huge regression in Parboil v2.5 OpenMP CUTCP test (2.5 times lower > performance))".) > > I've now got a simple 'callback_op', which for '!is_lhs' looks at > 'get_base_address ([op])', and if that 'var' is contained in the set of > current candidates (initialized per containg 'bind's, which we enter > first, even if walking a 'gimple_seq' backwards), removes that 'var' as a > candidate for such optimization. (Plus some "details", of couse.) This > seems to work fine, as far as I can tell. :-) It might then still fail for x =3D a[var] when you are interested in 'var'. I think you want to use walk_gimple_stmt and provide walk_tree_fn which will recurse into the complex tree operands (also making get_base_address unnecessary). > > Of course, the eventual IPA-based solution (see PR90591, PR94693, etc.) > will be much better -- but we need something now. > > > Gr=C3=BC=C3=9Fe > Thomas > ----------------- > Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 M=C3=BCnchen= Registergericht M=C3=BCnchen HRB 106955, Gesch=C3=A4ftsf=C3=BChrer: Thomas= Heurung, Frank Th=C3=BCrauf