From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 7B8F93858D20 for ; Fri, 27 Jan 2023 20:00:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7B8F93858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 30RJx1w9031553; Fri, 27 Jan 2023 13:59:01 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 30RJx0pK031552; Fri, 27 Jan 2023 13:59:00 -0600 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Fri, 27 Jan 2023 13:59:00 -0600 From: Segher Boessenkool To: Michael Meissner , gcc-patches@gcc.gnu.org, "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: Re: [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future) Message-ID: <20230127195900.GS25951@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On Wed, Nov 09, 2022 at 09:43:16PM -0500, Michael Meissner wrote: > This patch is very preliminary support for a potential new feature to the > PowerPC that extends the current power10 MMA architecture. This feature may or > may not be present in any specific future PowerPC processor. MMA is an optional facility in ISA 3.1 -- please don't say it is power10 only. > In the current MMA subsystem for Power10, there are 8 512-bit accumulator > registers. These accumulators are each tied to sets of 4 FPR registers. Four VSRs. FPRs are only 64bits. You mean this is VSRs 0..31 . > When > you issue a prime instruction, it makes sure the accumulator is a copy of the 4 I suppose you mean the xxmtacc instruction? > FPR registers the accumulator is tied to. When you issue a deprime > instruction, it makes sure that the accumulator data content is logically > copied to the matching FPR register. And xxmfacc. Very importantly all the other rules in 7.2.1.3 "VSX Accumulators" apply as well. That should make old code work on new systems transparently. > In terms of changes, we now use the wD constraint for accumulators. If you > compile with -mcpu=power10, the wD constraint will match the equivalent FPR > register that overlaps with the accumulator. The set of *four* *VSX* registers. Of course in the end it is just a number, but :-) > If you compile with -mcpu=future, > the wD constraint will match the DMR register and not the FPR register. Constraints do not "match" anything. "Will allow" perhaps? > In general, if you only use the built-in functions, things work between the two > systems. If you use extended asm, you will likely need to modify the code. > Going forward, hopefully if you modify your code to use the wD constraint and > %A output modifier, you can write code that switches more easily between the > two systems. You *already* are required to follow all these rules that make this painless and transparent. > There is one bug that I noticed. When you use the full DMR instruction the > constant copy propagation patch issues internal errors. I believe this is due > to the CCP pass not handling opaque types cleanly enough, and it only shows up > in larger types. I would like to get these patches committed, and then work > the maintainers of the CCP to fix the problem. Erm. If the compiler ICEs, we can not include this code. But hopefully you mean something else? Segher