From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by sourceware.org (Postfix) with ESMTPS id 362E93858D32 for ; Tue, 18 Oct 2022 21:51:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 362E93858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pj1-x1036.google.com with SMTP id d7-20020a17090a2a4700b0020d268b1f02so18691374pjg.1 for ; Tue, 18 Oct 2022 14:51:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=CaM0o+cDVOlxmu5rdGu3STYINAOMVdOmgSt/TBbUzOU=; b=cDcsXiKQSINsdW04W2KJTC2vJB9J9bfcT8xOuXihz1NLCR37Lb1mYc9yBaEwg9bu+f KXl35N4ubwqdyPIFQEgLzOW93PI/oLbyLd0xnHT08j04n5sUICtRdQSgLIWs+kEykUn7 XRw60J+Dr69eXxZRQOJrCvWoMAa7E5Pf4WSnCP3k6D6W9Gx2tLBhElRfCxoH33dy2EW+ HfgeJmtvEAmo5v6wv/kcPU4tbFGcGDLB6lnB0wc0zZrA8TvGqjPkLqxPsPu5wzCVhiWN 1niUyR/TeudXp5LKsJGqW4xBaGp2HurEZ0qXNnymHwePPUj2g+Uiw8D36PoCzHy+6epr Yq+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CaM0o+cDVOlxmu5rdGu3STYINAOMVdOmgSt/TBbUzOU=; b=0Z4lqukKBglgKji+MIUZ4JQwlIfizAF8OXgijplxszEu9AJLpiqLp/8Cl+zP/WqZj4 5JQU/bc3Crx5xNpHBa8syvnLRpS9M+2H1ESHjQAKA3dW16gcRwva9Vyg7Uz+4wYtBz9E m86ZlknkRkqGYgcUbGEe8/OOfOsqFhCSol6CEDYLqgNC8pmSMYPeehu6GS5x/P6YDbJ1 KYhV3RziLjpS7lw2gR/ekgG61PmB4oZSDbtuZlrvgRcAXBYZs2Th7OrK1eAjAM+6syn1 xQwTJE4V/YvumyYrd6TLKm0gO6W4wj28LolJ/uP2sOcYzbTMDUlMFLy1/4NC8mnic/8E /BvQ== X-Gm-Message-State: ACrzQf1R7U7jthhF7tQvxbVF7g5iDnMRBeyyxUMncjEQkVNfYLtMUUVr NCiV2ucbWhF7WrSJcPLQuyVzYQ== X-Google-Smtp-Source: AMsMyM7ylnfW1O0W4SscbuyAPN+aSr+WCC5sO/C5ZV5FtDJVYKYLNjJH282A6MF0bF/T8BnoiI2tpw== X-Received: by 2002:a17:903:2684:b0:17b:7568:ffea with SMTP id jf4-20020a170903268400b0017b7568ffeamr5189189plb.128.1666129865144; Tue, 18 Oct 2022 14:51:05 -0700 (PDT) Received: from [10.0.17.209] ([50.221.140.188]) by smtp.gmail.com with ESMTPSA id w2-20020a170902e88200b0017cc29a5536sm9297783plg.17.2022.10.18.14.51.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Oct 2022 14:51:04 -0700 (PDT) Message-ID: <1e118c0c-5d9a-4fca-9fe9-12e2baa34019@rivosinc.com> Date: Tue, 18 Oct 2022 14:51:03 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion) Content-Language: en-US To: Jeff Law , gcc@gcc.gnu.org Cc: Kito Cheng , Philipp Tomsich References: <1a636f1e-31be-1735-5d8f-649df3c5e018@gmail.com> From: Vineet Gupta In-Reply-To: <1a636f1e-31be-1735-5d8f-649df3c5e018@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Jeff, On 10/14/22 09:54, Jeff Law via Gcc wrote: ... >> .L2: >>     xor    a4,a4,a5 >>     andi    a4,a4,1 >>     srli    a3,a0,2 >>     srli    a5,a5,1 >>     beq    a4,zero,.L3 >> >>     li    a4,-24576    # 0xFFFF_A000 >>     addi    a4,a4,1        # 0xFFFF_A001 >>     xor    a5,a5,a4 >>     zext.h    a5,a5 >> >> .L3: >>     xor    a3,a3,a5 >>     andi    a3,a3,1 >>     srli    a4,a0,3 >>     srli    a5,a5,1 >>     beq    a3,zero,.L4 >> >>     li    a3,-24576    # 0xFFFF_A000 >>     addi    a3,a3,1        # 0xFFFF_A001 >> ... >> ... >> >> I see that with small tests cse1 is able to substitute redundant >> constant reg with equivalent old reg. > > I find it easier to reason about this stuff with a graphical CFG, so a > bit of ascii art... > > >           2 >         /    \ >      3 ---> 4 >              /    \ >          5 --->  6 > Yeah A picture is worth thousand words :-) > Where BB4 corresponds to .L2 and BB6 corresponds to .L3. Evaluation of > the constants occurs in BB3 and BB5. And Evaluation here means use of the constant (vs. definition ?). > CSE isn't going to catch this.  The best way to think about CSE's > capabilities is that it can work on extended basic blocks.     An > extended basic block can have jumps out, but not jumps in.  There are 3 > EBBs in this code.  (1,2), (4,5) and 6.    So BB4 is in a different EBB > than BB3.  So the evaluation in BB3 can't be used by CSE in the EBB > containing BB4, BB5. Thanks for the detailed explanation. > PRE/GCSE is better suited for this scenario, but it has a critical > constraint.  In particular our PRE formulation is never allowed to put > an evaluation of an expression on a path that didn't have one before. So > while there clearly a redundancy on the path 2->3->4->5 (BB3 and BB5), > there is nowhere we could put an evaluation that would reduce the number > of evaluation on that path without introducing an evaluation on paths > that didn't have one.  So consider 2->4->6.  On that path there are zero > evaluations.  So we can't place an eval in BB2 because that will cause > evaluations on 2->4->6 which didn't have any evaluations. OK. How does PRE calculate all possible paths to consider: say your example 2-3-4-5 and 2-4-6 ? Is that just indicative or would actually be the one PRE calculates for this case. Would there be more ? > There isn't a great place in GCC to handle this right now.  If the > constraints were relaxed in PRE, then we'd have a chance, but getting > the cost model right is going to be tough. It would have been better (for this specific case) if loop unrolling was not being done so early. The tree pass cunroll is flattening it out and leaving for rest of the all tree/rtl passes to pick up the pieces and remove any redundancies, if at all. It obviously needs to be early if we are injecting 7x more instructions, but seems like a lot to unravel. FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it always unrolls. Is that expected ? If this seems worthwhile and you have ideas to do this any better, I'd be happy to work on this with some guidance. Thx, -Vineet