From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 121149 invoked by alias); 6 Mar 2018 14:17:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 121140 invoked by uid 89); 6 Mar 2018 14:17:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=highlights, conceivably X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 06 Mar 2018 14:17:45 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 5C667ACC8; Tue, 6 Mar 2018 14:17:43 +0000 (UTC) Date: Tue, 06 Mar 2018 14:17:00 -0000 From: Michael Matz To: Jeff Law cc: Richard Biener , gcc-patches Subject: Re: [RFA][PATCH][PR middle-end/61118] Improve tree CFG accuracy for setjmp/longjmp In-Reply-To: Message-ID: References: <93b4b8a7-c6fe-65c0-0609-62ebee669967@redhat.com> <80db4b7a-9b15-4c09-222e-646d68fdf6c0@redhat.com> <0869f1e3-78f3-7601-225e-e97ad68b5609@redhat.com> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-IsSubscribed: yes X-SW-Source: 2018-03/txt/msg00263.txt.bz2 Hi, On Mon, 5 Mar 2018, Jeff Law wrote: > >>> Actually, without further conditions I don't see how it would be safe > >>> for the successor to have multiple preds. We might have this > >>> situation: > >>> > >>> bb1: ret = setjmp > >>> bb2: x0 = phi > >> No. Can't happen -- we're still building the initial CFG. There are no > >> PHI nodes, there are no SSA_NAMEs. > > > > While that is currently true I think it's short-sighted. Thinking about > > the situation in terms of SSA names and PHI nodes clears up the mind. In > > addition there is already code which builds (sub-)cfgs when SSA form > > exists (gimple_find_sub_bbs). Currently that code can't ever generate > > setjmp calls, so it's not an issue. > > It's not clearing up anything for me. Clearly you're onto something > that I'm missing, but still trying to figure out. I'm saying that if there is SSA form then having the second-return edge to the successor block if that has multiple predecessors, then this would be incorrect. Do you agree? I'm also saying that if it's incorrect when in SSA form, then it can't be correct (perhaps only very subtly so) without SSA form either; I guess that's where we don't agree. I'm also saying that we currently don't have SSA form while dealing with setjmp by more luck and giving up than thought: inlining and va-arg expansion do construct sub-cfgs while in SSA. inlining setjmp calls is simply disabled, va-arg expansion doesn't emit setjmp calls into the sequence. But there is nothing inherently magic about SSA form and setjmp, so if we're going to fiddle with the CFG form created by setjmp to make it more precise, then I think we ought to do it in a way that is correct in all intermediate forms, especially at this devel stage, because that'd give confidence that it's also correct in the constrained way we're needing it. > Certainly we have to be careful WRT the implicit set of the return value > of the setjmp call that occurs on the longjmp path. That's worth > investigating. I suspect that works today more by accident of having an > incorrect CFG than by design. No, our current imprecise CFG makes it so that the setting of the return value is reached by the abnormal edge from the dispatcher, so in a data-flow sense it's executed on the first- and second-return case and all is well, and so the IL subsumes all the effects that happen in reality (and more, which is the reason for the missed optimizations). You want to change the CFG such that it doesn't subsume effects that don't happen in reality, and that makes sense, but by that you're making it not reflect actions which _do_ happen. > But it does or at least it should. It's implicitly set on the longjmp > side. If we get this wrong I'd expect we'll see uninit uses in the PHI. > That's easy enough to instrument and check for. Not only uninit uses but conceivably misoptimizations as well. Let's contrieve an example which uses gotos to demonstrate the abnormal edges and let's have the second-return edge to land after the setjmp call (and setting of return value): ret = setjmp(buf); second_return: if (ret == 0) { // first return inside: // see below call foo(buf) // might call longjmp maybe-goto dispatch // therefore we have an edge to dispatch } return; dispatch: // our abnormal dispatcher goto second_return; // go to second-return target Now, as far as data-flow is concerned, as soon as the ret==0 block is entered once, ret will never change from 0 again (there simply is no visible set which could change it). So an optimizer would be correct in threading the sequence maybe-goto-dispatch->second_return to inside (bear with me): ret = setjmp(); if (ret == 0) { // first return inside: // see below call foo() // might call longjmp maybe-goto inside // if it does we loop, ret must still be zero ?! } return; This currently would not happen: first we don't thread through abnormal edges, and propagation of knowledge over abnormal edges is artificially limited as well. But from a pure data-flow perspective the above transformation would be correct, meaning that initial IL can't have been correct. You might say in practice even if the above would happen it doesn't matter, because at runtime, with a real longjmp, the "maybe-goto inside" would in reality land at the setjmp return again, hence setting ret, then the test comes along and all is well. That would probably be true currently, but assume foo always calls longjmp, and GCC can figure out it's using the same buffer as the above setjmp, then a further optimization might do away with the longjmp and just jump directly to the inside label: misoptimization achieved. None of the above events can happen when the second-return target is imprecise and before the setjmp. Now that all might sound like theoretical games, but I think it highlights why we should think long and hard about setjmp CFGs and IL before we relax it. FWIW: I think none of the issues I'm thinking about matter when you check two things: 1) that the setjmp call has no LHS, and 2) that the target BB has a single predecessor. I was only triggered by the discussion between you and Richi of why (2) might not be important. But you certainly miss checking (1) as well. IIRC the testcase you were trying to optimize had no LHS, right? It'd be nice if we try to make the setjmp IL correct For Real (tm) in the next devel phase: 1) make setjmp have two outgoing edges, always 2) create a setjmp_receive internal function that has a return value For 'ret = setjmp(buf)', create this CFG: bb1 ret = setjmp(buf) | \ bb-recv | ----------------\ | ret = setjmp_receiver | / normal /---------------/ path / | / bb-succ None of these edges would be abnormal. bb-recv would be the target for edges from all calls that might call longjmp(buf). Those edges might need to be abnormal. As the above CFG reflects all runtime effects precisely, but not which instructions are used to achieve them the expansion to RTL will be special. Ciao, Michael.