From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::34]) by sourceware.org (Postfix) with ESMTP id 57A983858410 for ; Sun, 7 May 2023 17:34:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 57A983858410 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk Received: by angie.orcam.me.uk (Postfix, from userid 500) id 7CF8D92009C; Sun, 7 May 2023 19:34:48 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 728CD92009B; Sun, 7 May 2023 18:34:48 +0100 (BST) Date: Sun, 7 May 2023 18:34:48 +0100 (BST) From: "Maciej W. Rozycki" To: Jiaxun Yang cc: YunQiang Su , Richard Sandiford , gcc-patches@gcc.gnu.org, YunQiang Su Subject: Re: [PATCH v2] MIPS: add speculation_barrier support In-Reply-To: <3C634BC6-7556-4724-8012-83F8F3C1C1B3@flygoat.com> Message-ID: References: <20230428123327.686353-1-yunqiang.su@cipunited.com> <20230428131249.713463-1-yunqiang.su@cipunited.com> <3C634BC6-7556-4724-8012-83F8F3C1C1B3@flygoat.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-3488.9 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_INFOUSMEBIZ,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 3 May 2023, Jiaxun Yang wrote: > Since it’s possible to run R2- binary on R2+ processor, we’d better find a > semantic that do eliminate speculation on all processors. While SSNOPs > on R2+ processors is pretty much undefined, there is no guarantee that > SSNOP sequence can eliminate speculation. Not exactly undefined on R2+, SSNOP is still required to single-issue, so it does act as an execution barrier. Good point otherwise. Both EHB and J[AL]R.HB are backwards compatible however (except for an obscure 4Kc J[AL]R.HB erratum I came across once and which may be no longer relevant), so I think the legacy sequence ought to just return via JR.HB as well, therefore providing the required semantics with newer hardware. If it does trap for 4Kc, then the OS can emulate it (and we can ignore it for bare metal, deferring to whoever might be interested for a workaround). > My proposal is for R2- CPUs we can do a dummy syscall to act as instruction > hazard barrier, since exception must clear the pipeline this should be true > for all known implementations. I think the SSNOP approach should be sufficient. > The most lightweight syscall I know is to do a MIPS_ATOMIC_SET with > sysmips. A dummy variable on stack should do the track. Do let me know if there > is a better option. That would have to be gettimeofday(2) then, the most performance-critical one, and also one that does not have side effects. The real syscall and not VSDO emulation of course (there's a reason it's emulated via VSDO, which is exactly our reason too). > I have a vague memory about a discussion finding that exception does not indicate > a memory barrier, so perhaps we still need a sync preceding to that syscall. There is no claim that I could find in the architecture specification saying that taking an exception implies a memory barrier and therefore we must conclude it does not. Likewise executing ERET. As I say I think the SSNOP approach should be sufficient, along with relying on SYNC emulation. > > I think there may be no authoritative source of information here, this is > > a grey area. The longest SSNOP sequences I have seen were for the various > > Broadcom implementations and counted 7 instructions. Both the Linux > > kernel and the CFE firmware has them. > > Was it for SiByte or BMIPS? Both AFAICT. > > Also we may not be able to fully enforce ordering for the oldest devices > > that do not implement SYNC, as this is system-specific, e.g. involving > > branching on the CP0 condition with the BC0F instruction, and inventing an > > OS interface for that seems unreasonable at this point of history. > > I guess this is not a valid concern for user space applications? > As per R4000 manual BC0F will issue “Coprocessor unusable exception” > exception and it’s certain that we have Staus.CU0 = 0 in user space. Exactly, which is why an OS service would have to provide the required semantics to the userland, and none might be available. And we probably do not care anyway, because I gather this is a security feature to prevent certain types of data leaks via a side channel. I wouldn't expect anyone doing any serious security-sensitive processing with legacy MIPS hardware. And then there's no speculative execution with all these pieces of legacy hardware (R3000, eh?) that have no suitable barriers provided, mostly because they are not relevant for them anyway. For bare metal we probably do not care about such legacy hardware either way. Overall I'd say let's do the best we can without bending backwards and then rely on people's common sense. Maciej