From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by sourceware.org (Postfix) with ESMTPS id 951C13858D35 for ; Wed, 11 Oct 2023 00:24:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 951C13858D35 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=sifive.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sifive.com Received: by mail-il1-x132.google.com with SMTP id e9e14a558f8ab-3573dc94399so3961275ab.0 for ; Tue, 10 Oct 2023 17:24:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1696983885; x=1697588685; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bs0bFT5ZzfCOUa7WrU0qF6+zdqt53zXrRBdq8c3C4Mc=; b=ch9gqMmL8bODSvtsTzG9mWVJz1ZRkYxxawoHSaDsiFqjjZ6MMsGJRMC3mc40IKy82v T1U45oj3rQPw/rUvR4XeLEj+o5a9Pi84UryzPHfGTnUXItsUo04KAqVb2VIjKxDJJ8uF jJswe6pUu/9IrI0dIG1u0aPIiUx9S2HvAtm1gqG49luAMVUmzdOOyaaOX0YEnDvIGq29 WL98X4GvZ13sXLolwRWU/V7jQItnKhL3tqRpucqDcr9Q6bW9A3/sHKF+sQKK3XFHk79b dbAprWs7GcG8dVtzOV6zW1CeAjaxTnutNYawHYvUSJoKh46pvx7MJndwmImGWvM6rbXh dnCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696983885; x=1697588685; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bs0bFT5ZzfCOUa7WrU0qF6+zdqt53zXrRBdq8c3C4Mc=; b=JGXD6vvtYxDFDDeQriEJ9pRaE9aDRiKfGVISPRFgHuAsVT1rFRS7wKEugzGhRPikg6 BRZRSRk+Ro62Amh9Fw7e/SAL5zCk/TAqkpCfw55Oe6nxbuYVZPIHcJxXwkvATi4RbrGe niuzqYwsOsamBkkg/abuiZR7RLjplzuwLxq7qedSAn9NqJxETe+wotZJzCWHCkTZn37I ZS33cK5V4uEGyROb6B0pPA380z56CZjtVJjpP7lMEtpTqieKuvqSEkGeHKMjD/t+kOz1 nzKeppK1HJ+8lJ3tdy8mQl7i+VIsYhDqOY1VfflJBTkTgZIeQv/JPi31W7gJphyW3kRW GUcg== X-Gm-Message-State: AOJu0YzgPHxHWfFe+OLXBSa1QJWWxE8+yMQbRabI8yn/Ro/7tOZN2oIm y9myQTpO1WjalSwEGiwQpNW9w0jo3H0NcP4bxJqi/L6z6vMt3nS6ylIJtyRtNg94dlZHmxq3fIp UhjPkLnomC06BPuqHCtrXKcQGIxTGLEEpLEntQSHW8iqS9W4rGblnyit2cKOjyHA9zkHtgYRzjw == X-Google-Smtp-Source: AGHT+IF93qjYuRPUcrzq9LqqK2BUEMh7vHPspE5VR/ezhc7g/NWiSDGYwaf6uxkd+A7mDYGh036XiA== X-Received: by 2002:a05:6e02:1a05:b0:349:7cec:db76 with SMTP id s5-20020a056e021a0500b003497cecdb76mr24862049ild.31.1696983885247; Tue, 10 Oct 2023 17:24:45 -0700 (PDT) Received: from mail-il1-f182.google.com (mail-il1-f182.google.com. [209.85.166.182]) by smtp.gmail.com with ESMTPSA id b15-20020a029a0f000000b0042bb394c249sm2997410jal.38.2023.10.10.17.24.44 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 10 Oct 2023 17:24:44 -0700 (PDT) Received: by mail-il1-f182.google.com with SMTP id e9e14a558f8ab-351437112c1so25915365ab.3 for ; Tue, 10 Oct 2023 17:24:44 -0700 (PDT) X-Received: by 2002:a05:6e02:1c84:b0:34c:bc7e:896c with SMTP id w4-20020a056e021c8400b0034cbc7e896cmr24961100ill.1.1696983883945; Tue, 10 Oct 2023 17:24:43 -0700 (PDT) MIME-Version: 1.0 References: <001ae968-da60-4e3b-8909-d6b99980ea63@ventanamicro.com> In-Reply-To: <001ae968-da60-4e3b-8909-d6b99980ea63@ventanamicro.com> From: Andrew Waterman Date: Tue, 10 Oct 2023 17:24:33 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [committed] [PR target/93062] RISC-V: Handle long conditional branches for RISC-V To: Jeff Law Cc: "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: I remembered another concern since we discussed this patch privately. Using ra for long calls results in a sequence that will corrupt the return-address stack. Corrupting the RAS is potentially more costly than mispredicting a branch, since it can result in a cascading sequence of mispredictions as the program returns up the stack. Of course, if these long calls are dynamically quite rare, this isn't the end of the world. But it's always preferable to use a register other than ra or t0 to avoid this performance reduction. I know nothing about the complexity of register scavenging, but it would be nice to opportunistically use a scratch register (other than t0), falling back to ra only when necessary. Tangentially, I noticed the patch uses `jump label, ra' for far branches but uses `call label' for far jumps. These corrupt the RAS in opposite ways (the former pops the RAS and the latter pushes it. Any reason for using a different sequence in one than the other? On Tue, Oct 10, 2023 at 3:11=E2=80=AFPM Jeff Law wr= ote: > > > Ventana has had a variant of this patch from Andrew W. in its tree for > at least a year. I'm dusting it off and submitting it on Andrew's behal= f. > > There's multiple approaches we could be using here. > > First we could make $ra fixed and use it as the scratch register for the > long branch sequences. > > Second, we could add a match_scratch to all the conditional branch > patterns and allow the register allocator to assign the scratch register > from the pool of GPRs. > > Third we could do register scavenging. This can usually work, though it > can get complex in some scenarios. > > Forth we could use trampolines for extended reach. > > Andrew's original patch did a bit of the first approach (make $ra fixed) > and mostly the second approach. The net is it was probably the worst in > terms of impacting code generation -- we lost a register *and* forced > every branch instruction to get a scratch register allocated. > > I had expected the second approach to produce better code than the > first, but that wasn't actually the case in practice. It's probably a > combination of allocating a GPR at every branch point (even with a life > of a single insn, there's a cost) and perhaps the additional operands on > conditional branches spoiling simplistic pattern matching in one or more > passes. > > In addition to performing better based on dynamic instruction counts, > the first approach is significantly simpler to implement. Given those > two positives, that's what I've chosen to go with. Yes it does remove > $ra from the set of registers available, but the impact of that is *tiny*= . > > If someone wanted to dive into one of the other approaches to address a > real world impact, that's great. If that happens I would strongly > suggest also evaluating perlbench from spec2017. It seems particularly > sensitive to this issue in terms of approach #2's impact on code generati= on. > > I've built & regression tested this variant on the vt1 configuration > without regressions. Earlier versions have been bootstrapped as well. > > Pushed to the trunk, > > Jeff >