on 2021/7/15 下午7:58, Richard Biener wrote:
> On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
>>> Hi Uros,
>>>
>>> on 2021/7/15 下午3:17, Uros Bizjak wrote:
>>>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>
>>>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>>>>>> on 2021/7/14 下午2:38, Richard Biener wrote:
>>>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>>>
>>>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>>
>>>>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>
>>>>>> Thanks for confirming!  The related patch v2 is attached and the testing
>>>>>> is ongoing.
>>>>>>
>>>>>
>>>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>>>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>>>>
>>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>>>
>>>> These XFAILs should be removed after your patch.
>>>>
>>> I'm curious whether it's intentional not to specify -fno-vect-cost-model
>>> for this test case.  As noted above, this case is sensitive on how we
>>> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
>>> only with this mul_highpart pattern support, no matter how we model it
>>> (x86 part of this patch exists or not).
>>>
>>>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
>>>> is actually not needed.
>>>>
>>>
>>> Thanks for the information!  The justification for the x86 part is that:
>>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
>>> optab support, i386 port has already customized costing for
>>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
>>> support), if we don't follow the same way for IFN_MULH, I'm worried that
>>> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
>>> a right thing (we shouldn't cost it specially), it at least means we
>>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
>>> has direct mul_highpart optab support, I think they should be costed
>>> consistently.  Does it sound reasonable?
>>>
>>
>> Hi Richard(s),
>>
>> This possibly inconsistent handling problem seems like a counter example
>> better to use a new IFN rather than the existing tree_code, it seems hard
>> to maintain (should remember to keep consistent for its handlings).  ;)
>> From this perspective, maybe it's better to move backward to use tree_code
>> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
>> costing issue Richi pointed out before)?
>>
>> What do you think?
> 
> No, whenever we want to do code generation based on machine
> capabilities the canonical way to test for those is to look at optabs
> and then it's most natural to keep that 1:1 relation and emit
> internal function calls which directly map to supported optabs
> instead of going back to some tree codes.
> 
> When targets "lie" and provide expanders for something they can
> only emulate then they have to compensate in their costing.
> But as I understand this isn't the case for x86 here.
> 
> Now, in this case we already have the MULT_HIGHPART_EXPR tree,
> so yes, it might make sense to use that instead of introducing an
> alternate way via the direct internal function.  Somebody decided
> that MULT_HIGHPART is generic enough to warrant this - but I
> see that expand_mult_highpart can fail unless can_mult_highpart_p
> and this is exactly one of the cases we want to avoid - either
> we can handle something generally in which case it can be a
> tree code or we can't, then it should be 1:1 tied to optabs at best
> (mult_highpart has scalar support only for the direct optab,
> vector support also for widen_mult).
> 

Thanks for the detailed explanation!  The attached v4 follows the
preferred IFN way like v3, just with extra test case updates.

Bootstrapped & regtested again on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

	PR tree-optimization/100696
	* internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
	* internal-fn.def (IFN_MULH): New internal function.
	* tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
	recog normal multiply highpart as IFN_MULH.
	* config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
	function CFN_MULH.

gcc/testsuite/ChangeLog:

	PR tree-optimization/100696
	* gcc.target/i386/pr100637-3w.c: Adjust for mul_highpart recog.