From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Vq8j=3W=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 10CFD385B800
	for <gcc-patches@gcc.gnu.org>; Tue, 22 Nov 2022 14:13:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 10CFD385B800
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 308BE1FB;
	Tue, 22 Nov 2022 06:14:04 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 409A43F73D;
	Tue, 22 Nov 2022 06:13:57 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Mail-Followup-To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,GCC Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]
References: <PAWPR08MB89824348B31E96B4F6432A3C833E9@PAWPR08MB8982.eurprd08.prod.outlook.com>
	<mpt5yf7beug.fsf@arm.com>
	<PAWPR08MB898293A1C6C76ED8AAC5DC80830D9@PAWPR08MB8982.eurprd08.prod.outlook.com>
Date: Tue, 22 Nov 2022 14:13:56 +0000
In-Reply-To: <PAWPR08MB898293A1C6C76ED8AAC5DC80830D9@PAWPR08MB8982.eurprd08.prod.outlook.com>
	(Wilco Dijkstra's message of "Tue, 22 Nov 2022 10:35:59 +0000")
Message-ID: <mptilj784bf.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-33.8 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Hi Richard,
>
>> I guess an obvious question is: if 1 (rather than 2) was the right value
>> for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA
>> pipes?  It would be good to clarify how, conceptually, the core property
>> should map to the fma_reassoc_width value.
>
> 1 turns off reassociation so that FMAs get properly formed. After reassociation far
> fewer FMAs get formed so we end up with more FLOPS which means slower execution.
> It's a significant slowdown on cores that are not wide, have only 1 or 2 FP pipes and
> may have high FP latencies. So we turn it off by default on all older cores.
>
>> It sounds from the existing comment like the main motivation for returning 1
>> was to encourage more FMAs to be formed, rather than to prevent FMAs from
>> being reassociated.  Is that no longer an issue?  Or is the point that,
>> with more FMA pipes, lower FMA formation is a price worth paying for
>> the better parallelism we get when FMAs can be formed?
>
> Exactly. A wide CPU can deal with the extra instructions, so the loss from fewer
> FMAs ends up lower than the speedup from the extra parallelism. Having more FMAs
> will be even faster of course.

Thanks.  It would be good to put this in a comment somewhere, perhaps above
the fma_reassoc_width field.  It isn't obvious from the patch as posted,
and changing the existing comment drops the previous hint about what
was going on.

>
>> Does this code ever see opc == FMA?
>
> No, that's the problem, reassociation ignores the fact that we actually want FMAs.

Yeah, but I was wondering if later code would sometimes query this
hook for existing FMAs, even if that code wasn't the focus of the patch.
Once we add the distinction between FMAs and other ops, it seemed natural
to test for existing FMAs.

But of course, FMA is an rtl code rather than a tree code (oops), so that
was never going to happen.

> A smart reassociation pass could form more FMAs while also increasing
> parallelism, but the way it currently works always results in fewer FMAs.

Yeah, as Richard said, that seems the right long-term fix.
It would also avoid the hack of treating PLUS_EXPR as a signal
of an FMA, which has the drawback of assuming (for 2-FMA cores)
that plain addition never benefits from reassociation in its own right.

Still, I guess the hackiness is pre-existing and the patch is removing
the hackiness for some cores, so from that point of view it's a strict
improvement over the status quo.  And it's too late in the GCC 13
cycle to do FMA reassociation properly.  So I'm OK with the patch
in principle, but could you post an update with more commentary?

Thanks,
Richard