From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id 012713857C4F for ; Wed, 5 May 2021 10:04:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 012713857C4F Received: by mail-ej1-x62c.google.com with SMTP id w3so1926179ejc.4 for ; Wed, 05 May 2021 03:04:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=semihDYg9pGgKTX2k9t/kBfmDE+ST4vm+Tt+ZEaERT4=; b=azrlakCtSLScc7unHqACp8JRI66xiC/9eIR66ufyUVTzG0vbwuA1tftMuNGGWM2U24 oKgrp7/AaHw8hj1UqYGciY2/PKTE3Ug2vsV9qRJ62gNjGLMORhBWJ1gXwLn3n/b5LPhW b7UeYf3v/KnwwHAmEuyVrz3s5oUEPdcJLQsdAVAbZY9Cch4NityF6SCtCJz4iDcNGz8k hxKy06OHORB9unXW5WMrY43mYidOw52s5kG8rfw/1m7AvLPIWHmRbFjQHqLrzXdX4e37 hCihGxKE2sZ+ZnA7xpJkCZTpnXbR3Y/LQrv/ObCqxgxJ7hzGJRnMQ3g7YjZHCrMwkiSN sZwA== X-Gm-Message-State: AOAM530tZNBGld5JLlY22fWmMruYd0KERjPzSLWha49h9eFXwwK0vWBK Px90XJOJzLMMNII0ICTeO9/XwDrf27irxBCkSVGDWgR3pt0= X-Google-Smtp-Source: ABdhPJxSsBQarIJXk4L9uqP9g+cXSaVMUdaTAckc1EESlTH+6HVmB6xFqgUL3Si1r3elJA86X5bdRQlEZD8vsUwxtoI= X-Received: by 2002:a17:906:cd27:: with SMTP id oz39mr22849692ejb.129.1620209046900; Wed, 05 May 2021 03:04:06 -0700 (PDT) MIME-Version: 1.0 References: <20210208124716.487709-1-stefansf@linux.ibm.com> In-Reply-To: From: Richard Biener Date: Wed, 5 May 2021 12:03:55 +0200 Message-ID: Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns To: Stefan Schulze Frielinghaus Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 May 2021 10:04:10 -0000 On Wed, May 5, 2021 at 11:36 AM Richard Biener wrote: > > On Tue, Mar 16, 2021 at 6:13 PM Stefan Schulze Frielinghaus > wrote: > > > > [snip] > > > > Please find attached a new version of the patch. A major change compared to > > the previous patch is that I created a separate pass which hopefully makes > > reviewing also easier since it is almost self-contained. After realizing that > > detecting loops which mimic the behavior of rawmemchr/strlen functions does not > > really fit into the topic of loop distribution, I created a separate pass. > > It's true that these reduction-like patterns are more difficult than > the existing > memcpy/memset cases. > > > Due > > to this I was also able to play around a bit and schedule the pass at different > > times. Currently it is scheduled right before loop distribution where loop > > header copying already took place which leads to the following effect. > > In fact I'd schedule it after loop distribution so there's the chance that loop > distribution can expose a loop that fits the new pattern. > > > Running > > this setup over > > > > char *t (char *p) > > { > > for (; *p; ++p); > > return p; > > } > > > > the new pass transforms > > > > char * t (char * p) > > { > > char _1; > > char _7; > > > > [local count: 118111600]: > > _7 = *p_3(D); > > if (_7 != 0) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > > [local count: 105119324]: > > > > [local count: 955630225]: > > # p_8 = PHI > > p_6 = p_8 + 1; > > _1 = *p_6; > > if (_1 != 0) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > > [local count: 105119324]: > > # p_2 = PHI > > goto ; [100.00%] > > > > [local count: 850510901]: > > goto ; [100.00%] > > > > [local count: 12992276]: > > > > [local count: 118111600]: > > # p_9 = PHI > > return p_9; > > > > } > > > > into > > > > char * t (char * p) > > { > > char * _5; > > char _7; > > > > [local count: 118111600]: > > _7 = *p_3(D); > > if (_7 != 0) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > > [local count: 105119324]: > > _5 = p_3(D) + 1; > > p_10 = .RAWMEMCHR (_5, 0); > > > > [local count: 118111600]: > > # p_9 = PHI > > return p_9; > > > > } > > > > which is fine so far. However, I haven't made up my mind so far whether it is > > worthwhile to spend more time in order to also eliminate the "first unrolling" > > of the loop. > > Might be a phiopt transform ;) Might apply to quite some set of > builtins. I wonder how the strlen case looks like though. > > > I gave it a shot by scheduling the pass prior pass copy header > > and ended up with: > > > > char * t (char * p) > > { > > [local count: 118111600]: > > p_5 = .RAWMEMCHR (p_3(D), 0); > > return p_5; > > > > } > > > > which seems optimal to me. The downside of this is that I have to initialize > > scalar evolution analysis which might be undesired that early. > > > > All this brings me to the question where do you see this peace of code running? > > If in a separate pass when would you schedule it? If in an existing pass, > > which one would you choose? > > I think it still fits loop distribution. If you manage to detect it > with your pass > standalone then you should be able to detect it in loop distribution. Can you > explain what part is "easier" as standalone pass? Btw, another "fitting" pass would be final value replacement (pass_scev_cprop) since what these patterns provide is a builtin call to compute the value of one of the loop PHIs on exit. Note this pass leaves removal of in-loop computations to followup DCE which means that in some cases it does unprofitable transforms. There's a bug somewhere where I worked on doing final value replacement on-demand when DCE figures out the loop is otherwise dead but I never finished this (loop distribution could also use such mechanism to get rid of unwanted PHIs). > > Another topic which came up is whether there exists a more elegant solution to > > my current implementation in order to deal with stores (I'm speaking of the `if > > (store_dr)` statement inside of function transform_loop_1). For example, > > > > extern char *p; > > char *t () > > { > > for (; *p; ++p); > > return p; > > } > > > > ends up as > > > > char * t () > > { > > char * _1; > > char * _2; > > char _3; > > char * p.1_8; > > char _9; > > char * p.1_10; > > char * p.1_11; > > > > [local count: 118111600]: > > p.1_8 = p; > > _9 = *p.1_8; > > if (_9 != 0) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > > [local count: 105119324]: > > > > [local count: 955630225]: > > # p.1_10 = PHI <_1(6), p.1_8(5)> > > _1 = p.1_10 + 1; > > p = _1; > > _3 = *_1; > > if (_3 != 0) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > > [local count: 105119324]: > > # _2 = PHI <_1(3)> > > goto ; [100.00%] > > > > [local count: 850510901]: > > goto ; [100.00%] > > > > [local count: 12992276]: > > > > [local count: 118111600]: > > # p.1_11 = PHI <_2(8), p.1_8(7)> > > return p.1_11; > > > > } > > > > where inside the loop a load and store occurs. For a rawmemchr like loop I > > have to show that we never load from a memory location to which we write. > > Currently I solve this by hard coding those facts which are not generic at all. > > I gave compute_data_dependences_for_loop a try which failed to determine the > > fact that stores only happen to p[0] and loads from p[i] where i>0. Maybe > > there are more generic solutions to express this in contrast to my current one? > > So the example loop is not valid to be trasformed to rawmemchr since it's > valid to call it with p = &p; - but sure, once you pass the first *p != 0 check > things become fishy but GCC isn't able to turn that into a non-dependence. > > Why is the case of stores inside the loop important? In fact that you > think it is > makes a case for integrating this with loop distribution since loop distribution > would be able to prove (if possible) that the stores can be separated into > a different loop. > > And sorry for the delay in answering ... > > Thanks, > Richard. > > > > > Thanks again for your input so far. Really appreciated. > > > > Cheers, > > Stefan