From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x233.google.com (mail-oi1-x233.google.com [IPv6:2607:f8b0:4864:20::233]) by sourceware.org (Postfix) with ESMTPS id 773F53856DEA for ; Tue, 10 Oct 2023 11:31:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 773F53856DEA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-oi1-x233.google.com with SMTP id 5614622812f47-3af609b9264so3825928b6e.2 for ; Tue, 10 Oct 2023 04:31:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1696937487; x=1697542287; darn=gcc.gnu.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=HuWdtan2jPNAvYKoH465xXP83+SLAnrbjba/MEhehbU=; b=LzgcOW8DaqVNxaBdez9V/L+JIiyTrpKKR8tu77Hdw/zegzhh+lj7Qxxw21bXeXHuhT C73qt7WfsNoSoNXdjb0ZD56KG3uZMpLf7hHQ18Zb3Lird2J7O4x1aC9+qQcQJnUoCWoS oEEz9TEPj5YnQjNr078UD31qqhPeysMxv7G072hIcQ9iPi/D3BjmIRnbQ34SexTg8dZW OwhcSf7pZU0UNwyY5+Y4WKr8l+PyZTaRklM+QK1ePUXr1+If6mrELf+KEq1dDkSrI6yP Eu7LkkJnNXadm0ofvI/CyMCk0wmJe+UjhLrxfiKXMbwJgp6CE2+4wMIdMvV4CvXiC4A8 t3DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696937487; x=1697542287; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=HuWdtan2jPNAvYKoH465xXP83+SLAnrbjba/MEhehbU=; b=F7e/RsebZxjtzZWWL94neNTsm7fBP/u5fA4oiDb8vyIm9mbTS7R4vE/+RxD3uM05JS CbtaoH5xI+Sm6AVOnM9JxdbWRCBRbyNRC2GLPLyjGt2A6xTADY9426SrabH24LEDnpC1 Drmbln68YpPjmww5+flCY9q5KnP38peH8Iqb1EHRqVw8B0YI1bNy8WFCYy9kOkACyWHP zUnikIdp2UpJ6DIr1sNLUCMSz+rRCuFBR/HuXsDhse5cCMeVWSR7jJqRDyHpnkjtMYIj Wo68ZXz0x8yyam4qBElVZEQ3GgYg+IFqTf9SuKWJPTAgKwaqnwsV2+GA1AQrZvWdDl70 oVSw== X-Gm-Message-State: AOJu0YzwyvxydKErbnwxf0nK2LslxzF+IQnglToTYNUI0p47D6p5iJ4d A3RuoxtiEoU34DfJZ2pgnG0nNEEKDWPIEOG9ukVxjRC3CBYJ4Um1T9E= X-Google-Smtp-Source: AGHT+IGP1MkaoDCRguudsIk8ZwLhkqoSo33jYMwXmvYvxXwt0XMQop0XSwFWn0hyVCFR2MOLquNg+vH6MUTABJGHobU= X-Received: by 2002:a05:6358:e49f:b0:143:9827:5f71 with SMTP id by31-20020a056358e49f00b0014398275f71mr19938275rwb.8.1696937486557; Tue, 10 Oct 2023 04:31:26 -0700 (PDT) MIME-Version: 1.0 From: Joern Rennecke Date: Tue, 10 Oct 2023 12:31:15 +0100 Message-ID: Subject: RFD: doloop needs better support for nested loops To: GCC Content-Type: multipart/mixed; boundary="00000000000018aa3c06075b0c48" X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --00000000000018aa3c06075b0c48 Content-Type: text/plain; charset="UTF-8" I'm working on implementing hardware loops for the CORE-V CV32E40P https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest/corev_hw_loop.html This core supports nested hardware lops, but does not allow any other flow control inside hardware loops. I found that our existing interfaces do not allow sufficient control over when to emit doloop patterns, i.e. allowing nested doloops while rejecting other flow control inside the loop. TARGET_CAN_USE_DOLOOP_P does not get passed anything to look at the individual loop. Most convenient would be the loop structure, although that would cause tight coupling of the target port with the internal data structures of the loop optimizers. OTOH we already have a precedent with TARGET_PREDICT_DOLOOP_P . TARGET_INVALID_WITHIN_DOLOOP is missing context. We neither know the loop nesting depth, nor if any jump instruction under consideration is the final branch to jump back to the loop latch. Actually, the seccond part is the main problem for the CV32E40P: inner doloops that have been transformed can be recognized as such, but un-transformed condjumps could either be spaghetti code inside the loop or the final jump instruction of the loop. The doloop_end pattern is also missing context to make meaningful decisions. Although we know the label where the pattern is supposed to jump to, we don't know where the original branch is. Even if we scan the insn stream, this is ambigous, since there can be two (or more) nested doloop candidates. What we could do here is add optional arguments; there is precedence, e.g. for the call pattern. The advantage of this approach is that ports that are fine with the current interface need not be patched. To make it possible to scritinze the control flow of the loop, the branch at the end of the loop makes a good optional argument. There is also the issue that loop setup is a bit more costly for large loops, and it would be nice to weigh that against the iteration count. We had information about the iteration count at TARGET_CAN_USE_DOLOOP_P, but nothing to allow us to analyze the loop body. Although the port could stash avay the iteration count into a globalvariable or machine_function member, it would be more straightforward and robust to pass the information together so that it can be considered in context. Attached is an patch for an optional 3rd parameter to doloop_end . --00000000000018aa3c06075b0c48 Content-Type: text/plain; charset="US-ASCII"; name="tmp.txt" Content-Disposition: attachment; filename="tmp.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lnk8p22h0 MjAyMy0xMC0wNSAgSm9lcm4gUmVubmVja2UgIDxqb2Vybi5yZW5uZWNrZUBlbWJlY29zbS5jb20+ CgpnY2MvCiAgICAgICAgICAgICogZG9jL21kLnRleGkgKGRvbG9vcF9lbmQpOiBEb2N1bWVudCBv cHJpb25hbCBvcGVyYW5kIDIuCiAgICAgICAgICAgICogbG9vcC1kb2xvb3AuY2MgKGRvbG9vcF9v cHRpbWl6ZSk6IFByb3ZpZGUgM3JkIG9wZXJhbmQgdG8KICAgICAgICAgICAgZ2VuX2RvbG9vcF9l bmQuCiAgICAgICAgICAgICogdGFyZ2V0LWluc25zLmRlZiAoZG9sb29wX2VuZCk6IEFkZCBvcHRp b25hbCAzcmQgb3BlcmFuZC4KCmRpZmYgLS1naXQgYS9nY2MvY29uZmlnLmdjYyBiL2djYy9jb25m aWcuZ2NjCmluZGV4IGVlNDZkOTZiZjYyLi5iYTQyYWMzZDQyNSAxMDA2NDQKLS0tIGEvZ2NjL2Nv bmZpZy5nY2MKKysrIGIvZ2NjL2NvbmZpZy5nY2MKQEAgLTU0NCw3ICs1NDQsNyBAQCByaXNjdiop CiAJZXh0cmFfb2Jqcz0icmlzY3YtYnVpbHRpbnMubyByaXNjdi1jLm8gcmlzY3Ytc3IubyByaXNj di1zaG9ydGVuLW1lbXJlZnMubyByaXNjdi1zZWxmdGVzdHMubyByaXNjdi1zdHJpbmcubyIKIAll eHRyYV9vYmpzPSIke2V4dHJhX29ianN9IHJpc2N2LXYubyByaXNjdi12c2V0dmwubyByaXNjdi12 ZWN0b3ItY29zdHMubyIKIAlleHRyYV9vYmpzPSIke2V4dHJhX29ianN9IHJpc2N2LXZlY3Rvci1i dWlsdGlucy5vIHJpc2N2LXZlY3Rvci1idWlsdGlucy1zaGFwZXMubyByaXNjdi12ZWN0b3ItYnVp bHRpbnMtYmFzZXMubyIKLQlleHRyYV9vYmpzPSIke2V4dHJhX29ianN9IHRoZWFkLm8iCisJZXh0 cmFfb2Jqcz0iJHtleHRyYV9vYmpzfSB0aGVhZC5vIGNvcmV2Lm8iCiAJZF90YXJnZXRfb2Jqcz0i cmlzY3YtZC5vIgogCWV4dHJhX2hlYWRlcnM9InJpc2N2X3ZlY3Rvci5oIgogCXRhcmdldF9ndGZp bGVzPSIkdGFyZ2V0X2d0ZmlsZXMgXCQoc3JjZGlyKS9jb25maWcvcmlzY3YvcmlzY3YtdmVjdG9y LWJ1aWx0aW5zLmNjIgpkaWZmIC0tZ2l0IGEvZ2NjL2xvb3AtZG9sb29wLmNjIGIvZ2NjL2xvb3At ZG9sb29wLmNjCmluZGV4IDRmZWIwYTI1YWI5Li5kNzAzY2I1ZjJhZiAxMDA2NDQKLS0tIGEvZ2Nj L2xvb3AtZG9sb29wLmNjCisrKyBiL2djYy9sb29wLWRvbG9vcC5jYwpAQCAtNzIwLDcgKzcyMCw4 IEBAIGRvbG9vcF9vcHRpbWl6ZSAoY2xhc3MgbG9vcCAqbG9vcCkKICAgY291bnQgPSBjb3B5X3J0 eCAoZGVzYy0+bml0ZXJfZXhwcik7CiAgIHN0YXJ0X2xhYmVsID0gYmxvY2tfbGFiZWwgKGRlc2Mt PmluX2VkZ2UtPmRlc3QpOwogICBkb2xvb3BfcmVnID0gZ2VuX3JlZ19ydHggKG1vZGUpOwotICBy dHhfaW5zbiAqZG9sb29wX3NlcSA9IHRhcmdldG0uZ2VuX2RvbG9vcF9lbmQgKGRvbG9vcF9yZWcs IHN0YXJ0X2xhYmVsKTsKKyAgcnR4X2luc24gKmRvbG9vcF9zZXEgPSB0YXJnZXRtLmdlbl9kb2xv b3BfZW5kIChkb2xvb3BfcmVnLCBzdGFydF9sYWJlbCwKKwkJCQkJCSBCQl9FTkQgKGRlc2MtPmlu X2VkZ2UtPnNyYykpOwogCiAgIHdvcmRfbW9kZV9zaXplID0gR0VUX01PREVfUFJFQ0lTSU9OICh3 b3JkX21vZGUpOwogICB3b3JkX21vZGVfbWF4ID0gKEhPU1RfV0lERV9JTlRfMVUgPDwgKHdvcmRf bW9kZV9zaXplIC0gMSkgPDwgMSkgLSAxOwpAQCAtNzM3LDcgKzczOCw4IEBAIGRvbG9vcF9vcHRp bWl6ZSAoY2xhc3MgbG9vcCAqbG9vcCkKICAgICAgIGVsc2UKIAljb3VudCA9IGxvd3BhcnRfc3Vi cmVnICh3b3JkX21vZGUsIGNvdW50LCBtb2RlKTsKICAgICAgIFBVVF9NT0RFIChkb2xvb3BfcmVn LCB3b3JkX21vZGUpOwotICAgICAgZG9sb29wX3NlcSA9IHRhcmdldG0uZ2VuX2RvbG9vcF9lbmQg KGRvbG9vcF9yZWcsIHN0YXJ0X2xhYmVsKTsKKyAgICAgIGRvbG9vcF9zZXEgPSB0YXJnZXRtLmdl bl9kb2xvb3BfZW5kIChkb2xvb3BfcmVnLCBzdGFydF9sYWJlbCwKKwkJCQkJICAgQkJfRU5EIChk ZXNjLT5pbl9lZGdlLT5zcmMpKTsKICAgICB9CiAgIGlmICghIGRvbG9vcF9zZXEpCiAgICAgewpk aWZmIC0tZ2l0IGEvZ2NjL3RhcmdldC1pbnNucy5kZWYgYi9nY2MvdGFyZ2V0LWluc25zLmRlZgpp bmRleCBjNDQxNWQwMDczNS4uOTYyYzVjYzUxZDEgMTAwNjQ0Ci0tLSBhL2djYy90YXJnZXQtaW5z bnMuZGVmCisrKyBiL2djYy90YXJnZXQtaW5zbnMuZGVmCkBAIC00OCw3ICs0OCw3IEBAIERFRl9U QVJHRVRfSU5TTiAoY2FzZXNpLCAocnR4IHgwLCBydHggeDEsIHJ0eCB4MiwgcnR4IHgzLCBydHgg eDQpKQogREVGX1RBUkdFVF9JTlNOIChjaGVja19zdGFjaywgKHJ0eCB4MCkpCiBERUZfVEFSR0VU X0lOU04gKGNsZWFyX2NhY2hlLCAocnR4IHgwLCBydHggeDEpKQogREVGX1RBUkdFVF9JTlNOIChk b2xvb3BfYmVnaW4sIChydHggeDAsIHJ0eCB4MSkpCi1ERUZfVEFSR0VUX0lOU04gKGRvbG9vcF9l bmQsIChydHggeDAsIHJ0eCB4MSkpCitERUZfVEFSR0VUX0lOU04gKGRvbG9vcF9lbmQsIChydHgg eDAsIHJ0eCB4MSwgcnR4IG9wdDIpKQogREVGX1RBUkdFVF9JTlNOIChlaF9yZXR1cm4sIChydHgg eDApKQogREVGX1RBUkdFVF9JTlNOIChlcGlsb2d1ZSwgKHZvaWQpKQogREVGX1RBUkdFVF9JTlNO IChleGNlcHRpb25fcmVjZWl2ZXIsICh2b2lkKSkK --00000000000018aa3c06075b0c48--