From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by sourceware.org (Postfix) with ESMTPS id 0E52C3858D32 for ; Tue, 18 Oct 2022 23:36:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E52C3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-x102c.google.com with SMTP id t10-20020a17090a4e4a00b0020af4bcae10so15413563pjl.3 for ; Tue, 18 Oct 2022 16:36:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=wYvEs6COUOzv+KYKOqe0/TGomga8VzMUXlt3jcOaWYk=; b=SxIWnbjmw4Astltu4QfRQ6KxMs0bKKlnS5tLr+ql7rK4EN8lQgS+xbFTekEWKxYap/ aEB2CfnXKsamZ/7r88h6WNYB4Wie5MbbrJwpk9PXElznvUAO4DbTtesDKajeoGIydyvk 9pqShy3Xfn4vjVYkr5mf+yP8dOnkctIjpqVyiDjb0UJyoEk0p3AvstdxYY8XmEJaC7Rh z0f6Oma7fEl4BSZly7VNPJ5HHkXN3wcQ4hxZsGXcJSGmBc2CTQFX4jPUbugfLfCB7tQO xo1zq7D/JR4JO1MKfbecaM4turnDttUaLP6hqjTe0hUw+EkxyGc5D47Tc0W1MAL2x9YX OqqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wYvEs6COUOzv+KYKOqe0/TGomga8VzMUXlt3jcOaWYk=; b=nh0qgcsCVe1S8P4PrDdiHK8vsDmIZOOHgXBDehhzDdLSuiUAmIv1i6iBYJ38WzAPTQ LGThiFNuVH4rlnBXaujHJYpC3nvrljslP5yIPru7tDoWBDv4jPiKZaPDoNio+Qol47F3 LOBciOwy3hVi8Km6wsy/LoRsvHlqVtRZCY8UihItOR4ZCYpnvYI7BRH5vHWbk6v4dI8R zvTOVKXBV5LoMsH91A0s2gYTXa70BQlovw7VViRIsBQoQVQhQ52V26E/fjo7IucJKdlf sWfu7mdyLSyECovNBUjlosjSRV/d6n02qRt29rn77GR6/Qne7fd/PL3lZRFvzZHAoD3Y ut8g== X-Gm-Message-State: ACrzQf1x+XJwW9isl4jQVVLaUY3OuEHD/WZ0ZIFo4PJHDAFoVN/0bF+u ikdvtFeT1Ke6qKzcdnxX9JE= X-Google-Smtp-Source: AMsMyM5jnQ6bHjF7hQFkK+cc1/jlQqxyPev5qQ6mOidYy0MJ5GGKmk+JPnE805BfHCJXv505i4aQ5w== X-Received: by 2002:a17:902:ead1:b0:181:991f:2d25 with SMTP id p17-20020a170902ead100b00181991f2d25mr5473304pld.107.1666136161534; Tue, 18 Oct 2022 16:36:01 -0700 (PDT) Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a]) by smtp.gmail.com with ESMTPSA id 6-20020a620406000000b0054ee4b632dasm9794748pfe.169.2022.10.18.16.36.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Oct 2022 16:36:01 -0700 (PDT) Message-ID: <53dcbef4-7aef-5f63-9bd8-e11c614b0be8@gmail.com> Date: Tue, 18 Oct 2022 17:36:00 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion) Content-Language: en-US To: Vineet Gupta , gcc@gcc.gnu.org Cc: Kito Cheng , Philipp Tomsich References: <1a636f1e-31be-1735-5d8f-649df3c5e018@gmail.com> <1e118c0c-5d9a-4fca-9fe9-12e2baa34019@rivosinc.com> From: Jeff Law In-Reply-To: <1e118c0c-5d9a-4fca-9fe9-12e2baa34019@rivosinc.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,URI_DOTEDU autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 10/18/22 15:51, Vineet Gupta wrote: > > > >> Where BB4 corresponds to .L2 and BB6 corresponds to .L3. Evaluation >> of the constants occurs in BB3 and BB5. > > And Evaluation here means use of the constant (vs. definition ?). In this case, use of the constant. > >> PRE/GCSE is better suited for this scenario, but it has a critical >> constraint.  In particular our PRE formulation is never allowed to >> put an evaluation of an expression on a path that didn't have one >> before. So >> while there clearly a redundancy on the path 2->3->4->5 (BB3 and >> BB5), there is nowhere we could put an evaluation that would reduce >> the number of evaluation on that path without introducing an >> evaluation on paths that didn't have one.  So consider 2->4->6.  On >> that path there are zero evaluations.  So we can't place an eval in >> BB2 because that will cause evaluations on 2->4->6 which didn't have >> any evaluations. > > OK. How does PRE calculate all possible paths to consider: say your > example 2-3-4-5 and 2-4-6 ? Is that just indicative or would actually > be the one PRE calculates for this case. Would there be more ? PRE has a series of dataflow equations it solves which gives it the answer to that question.  The one that computes this property is usually called anticipated.  Given some block B in a graph G. An expression is anticipated at B when the expression is guaranteed to be computed if we reach B.  That doesn't mean the evaluation must happen in B, just that evaluation at some point is guaranteed if we reach B. If an expression is not anticipated in a block, then PRE will not insert in that block since doing so would add evaluations on paths where they did not previously have any. > >> There isn't a great place in GCC to handle this right now.  If the >> constraints were relaxed in PRE, then we'd have a chance, but getting >> the cost model right is going to be tough. > > It would have been better (for this specific case) if loop unrolling > was not being done so early. The tree pass cunroll is flattening it > out and leaving for rest of the all tree/rtl passes to pick up the > pieces and remove any redundancies, if at all. It obviously needs to > be early if we are injecting 7x more instructions, but seems like a > lot to unravel. Yup.  If that loop gets unrolled, it's going to be a mess.  It will almost certainly make this problem worse as each iteration is going to have a pair of constants loaded and no good way to remove them. > > FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it always > unrolls. Is that expected ? The only case I'm immediately aware of where this wouldn't work would be if -O3 came after -fno-unroll-oops. > > If this seems worthwhile and you have ideas to do this any better, I'd > be happy to work on this with some guidance. I don't see  a great solution here.    Something like Cliff Click's work might help, but it's far from a guarantee.  Click's work essentially throws away the PRE constraint about never inserting an expression evaluation on a path where it didn't exit, along with all kinds of other things.  Essentially it's a total reformulation of redundancy elimination. I did an implementation eons ago in gimple, but never was able to convince myself the implementation was correct or that integrating it was a good thing.   It's almost certainly going to cause performance regressions elsewhere so it may end up doing more harm than good.  I don't really know. https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.pdf Jeff