From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=9VWM=6X=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id CD0773858D32
	for <gcc-patches@gcc.gnu.org>; Mon, 27 Feb 2023 21:33:04 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CD0773858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9E38DC14;
	Mon, 27 Feb 2023 13:33:47 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AA11D3F99C;
	Mon, 27 Feb 2023 13:33:03 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
Mail-Followup-To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,Tamar Christina <Tamar.Christina@arm.com>,  nd <nd@arm.com>,  "rguenther\@suse.de" <rguenther@suse.de>,  "jlaw\@ventanamicro.com" <jlaw@ventanamicro.com>, richard.sandiford@arm.com
Cc: Tamar Christina <Tamar.Christina@arm.com>,  nd <nd@arm.com>,  "rguenther\@suse.de" <rguenther@suse.de>,  "jlaw\@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]
References: <patch-16909-tamar@arm.com> <mptzg9l4p7n.fsf@arm.com>
	<VI1PR08MB53257CBBF08FD81509AED8ACFFDE9@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<mptwn4p384e.fsf@arm.com>
	<VI1PR08MB53255CE1AC5F885D2660E3CFFFAF9@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<mptfsarux3d.fsf@arm.com>
	<VI1PR08MB5325BCA53FA5F1C80A003843FFAF9@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Mon, 27 Feb 2023 21:33:02 +0000
In-Reply-To: <VI1PR08MB5325BCA53FA5F1C80A003843FFAF9@VI1PR08MB5325.eurprd08.prod.outlook.com>
	(Tamar Christina via Gcc-patches's message of "Mon, 27 Feb 2023
	12:14:05 +0000")
Message-ID: <mptcz5upze9.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-28.6 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, February 27, 2023 12:12 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
>> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>> by using new optabs [PR108583]
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > Hi,
>> >
>> >> > I avoided open coding it with add and shift because it creates a 4
>> >> > instructions (and shifts which are typically slow) dependency chain
>> >> > instead of a load and multiply.  This change, unless the target is
>> >> > known to optimize it further is unlikely to be beneficial.  And by
>> >> > the time we get to costing the only alternative is to undo the
>> >> > existing pattern and
>> >> so you lose the general shift optimization.
>> >> >
>> >> > So it seemed unwise to open code as shifts, given the codegen out
>> >> > of the vectorizer would be degenerate for most targets or one needs
>> >> > the more complicated route of costing during pattern matching already.
>> >>
>> >> Hmm, OK.  That seems like a cost-model thing though, rather than
>> >> something that should be exposed through optabs.  And I imagine the
>> >> open-coded version would still be better than nothing on targets without
>> highpart multiply.
>> >>
>> >> So how about replacing the hook with one that simply asks whether
>> >> division through highpart multiplication is preferred over the add/shift
>> sequence?
>> >> (Unfortunately it's not going to be possible to work that out from
>> >> existing
>> >> information.)
>> >
>> > So this doesn't work for SVE.  For SVE the multiplication widening
>> > pass introduces FMAs at gimple level.  So in the cases where the
>> > operation is fed from a widening multiplication we end up generating FMA.
>> If that was it I could have matched FMA.
>> >
>> > But it also pushes the multiplication in the second operand because it
>> > no longer has a mul to share the results with.
>> >
>> > In any case, the gimple code is transformed into
>> >
>> > vect__3.8_122 = .MASK_LOAD (_29, 8B, loop_mask_121);
>> > vect_patt_57.9_123 = (vector([8,8]) unsigned short) vect__3.8_122;
>> > vect_patt_64.11_127 = .FMA (vect_patt_57.9_123, vect_cst__124, { 257,
>> > ... });
>> > vect_patt_65.12_128 = vect_patt_64.11_127 >> 8;
>> > vect_patt_66.13_129 = .FMA (vect_patt_57.9_123, vect_cst__124,
>> > vect_patt_65.12_128);
>> > vect_patt_62.14_130 = vect_patt_66.13_129 >> 8;
>> > vect_patt_68.15_131 = (vector([8,8]) unsigned charD.21)
>> > vect_patt_62.14_130;
>> >
>> > This transformation is much worse than the original code, it extended
>> > the dependency chain with another expensive instruction. I can try to
>> > correct this in RTL by matching FMA and shift and splitting into MUL +
>> ADDHNB and hope CSE takes care of the extra mul.
>> >
>> > But this seems like a hack, and it's basically undoing the earlier
>> > transformation.  It seems to me that the open coding is a bad idea.
>> 
>> Could you post the patch that gives this result?  I'll have a poke around.
>
> Sure, I'll post the new series, it needs all of them.

Thanks.  Which testcase did you use to get the above?

But since SVE does have highpart multiply, and since the assumption for
SVE is that MULH+shift is better than ADD*3+shift*2, shouldn't SVE just
be one of the targets for which the hook that "asks whether division
through highpart multiplication is preferred over the add/shift
sequence" returns true?

For extra conservativeness, we could make the hook default to true
and explicitly return false for Advanced SIMD and for SVE2.

Richard