From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9200 invoked by alias); 19 Feb 2016 21:01:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 8952 invoked by uid 89); 19 Feb 2016 21:01:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=gcc7, benchmarking, Detect, piecewise X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 19 Feb 2016 21:01:30 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 9ECF5C057EC9; Fri, 19 Feb 2016 21:01:28 +0000 (UTC) Received: from slagheap.utah.redhat.com (ovpn-113-79.phx2.redhat.com [10.3.113.79]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u1JL1R3x004792; Fri, 19 Feb 2016 16:01:27 -0500 Subject: Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments. To: Richard Biener References: <56BD1EFB.90008@redhat.com> <56BE14B0.9040801@redhat.com> <56C0ACC1.60905@redhat.com> <0D3F08EF-9DC3-4848-AEC7-1AE639464B3D@gmail.com> <56C4217F.2040809@redhat.com> <56C47D5C.7010804@redhat.com> <56C49B69.2090209@redhat.com> Cc: gcc-patches From: Jeff Law Message-ID: <56C782A7.7060606@redhat.com> Date: Fri, 19 Feb 2016 21:01:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-02/txt/msg01389.txt.bz2 On 02/18/2016 02:56 AM, Richard Biener wrote: >>> Just a short quick comment - the above means you only handle partial >>> stores >>> with no interveaning uses. You don't handle, say >>> >>> struct S { struct R { int x; int y; } r; int z; } s; >>> >>> s = { {1, 2}, 3 }; >>> s.r.x = 1; >>> s.r.y = 2; >>> struct R r = s.r; >>> s.z = 3; >>> >>> where s = { {1, 2}, 3} is still dead. >> >> Right. But handling that has never been part of DSE's design goals. Once >> there's a use, DSE has always given up. > > Yeah, which is why I in the end said we need a "better" DSE ... So I cobbled up a quick test for this. I only handle assignments which may reference the same memory as the currently tracked store. Obviously that could be extended to handle certain builtins and the like. It triggers a bit here and there. While looking at those cases it occurred to me that, we could also look at this as a failure earlier in the optimization pipeline. In fact DOM already has code to handle a closely related situation. When DOM sees a store to memory, it creates a new fake statement with the RHS and LHS reversed. So in the case above DOM creates statements that look like: 1 = s.r.x 2 = s.r.y DOM then puts the RHS into the available expression table as equivalent to the LHS. So if it finds a later load of s.r.x, it will replace that load with "1". I haven't looked at it in a while, but it certainly was functional prior to the tuple merge. Presumably DOM is not looking at r = s.r and realizing it could look s.r piece-wise in the available expression table. If it did, it would effectively turn that fragment into: s = { {1, 2}, 3 }; s.r.x = 1; s.r.y = 2; struct R r = {1, 2} s.z = 3; At which point we no longer have the may-read of s.r.{x,y} and DSE would see the initial assignment as dead. I'm not sure if it's advisable to teach DOM how to lookup structure references piecewise or not. The code to handle this case in DSE is relatively simple, so perhaps we just go with the DSE variant. I also looked a bit at cases where we find that while an entire store (such as an aggregate initialization or mem*) may not be dead, pieces of the store may be dead. That's trivial to detect. It triggers relatively often. The trick is once detected, we have to go back and rewrite the original statement to only store the live parts. I've only written the detection code, the rewriting might be somewhat painful. I'm starting to wonder if what we have is a 3-part series. [1/3] The basic tracking to handle 33562, possibly included in gcc-6 [2/3] Ignore reads that reference stuff not in live_bytes for gcc-7 [3/3] Detect partially dead aggregate stores, rewriting the partially dead store to only store the live bytes. Also for gcc-7. Obviously [1/3] would need compile-time benchmarking, but I really don't expect any issues there. Jeff