on 2021/7/15 下午7:58, Richard Biener wrote: > On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin wrote: >> >> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote: >>> Hi Uros, >>> >>> on 2021/7/15 下午3:17, Uros Bizjak wrote: >>>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin wrote: >>>>> >>>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: >>>>>> on 2021/7/14 下午2:38, Richard Biener wrote: >>>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin wrote: >>>>>>>> >>>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote: >>>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin wrote: >>>>>>> >>>>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart? >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>> >>>>>> Thanks for confirming! The related patch v2 is attached and the testing >>>>>> is ongoing. >>>>>> >>>>> >>>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and >>>>> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as below: >>>>> >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >>>> >>>> These XFAILs should be removed after your patch. >>>> >>> I'm curious whether it's intentional not to specify -fno-vect-cost-model >>> for this test case. As noted above, this case is sensitive on how we >>> cost mult_highpart. Without cost modeling, the XFAILs can be removed >>> only with this mul_highpart pattern support, no matter how we model it >>> (x86 part of this patch exists or not). >>> >>>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch >>>> is actually not needed. >>>> >>> >>> Thanks for the information! The justification for the x86 part is that: >>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart >>> optab support, i386 port has already customized costing for >>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab >>> support), if we don't follow the same way for IFN_MULH, I'm worried that >>> we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is >>> a right thing (we shouldn't cost it specially), it at least means we >>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it >>> has direct mul_highpart optab support, I think they should be costed >>> consistently. Does it sound reasonable? >>> >> >> Hi Richard(s), >> >> This possibly inconsistent handling problem seems like a counter example >> better to use a new IFN rather than the existing tree_code, it seems hard >> to maintain (should remember to keep consistent for its handlings). ;) >> From this perspective, maybe it's better to move backward to use tree_code >> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid >> costing issue Richi pointed out before)? >> >> What do you think? > > No, whenever we want to do code generation based on machine > capabilities the canonical way to test for those is to look at optabs > and then it's most natural to keep that 1:1 relation and emit > internal function calls which directly map to supported optabs > instead of going back to some tree codes. > > When targets "lie" and provide expanders for something they can > only emulate then they have to compensate in their costing. > But as I understand this isn't the case for x86 here. > > Now, in this case we already have the MULT_HIGHPART_EXPR tree, > so yes, it might make sense to use that instead of introducing an > alternate way via the direct internal function. Somebody decided > that MULT_HIGHPART is generic enough to warrant this - but I > see that expand_mult_highpart can fail unless can_mult_highpart_p > and this is exactly one of the cases we want to avoid - either > we can handle something generally in which case it can be a > tree code or we can't, then it should be 1:1 tied to optabs at best > (mult_highpart has scalar support only for the direct optab, > vector support also for widen_mult). > Thanks for the detailed explanation! The attached v4 follows the preferred IFN way like v3, just with extra test case updates. Bootstrapped & regtested again on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. Is it ok for trunk? BR, Kewen ----- gcc/ChangeLog: PR tree-optimization/100696 * internal-fn.c (first_commutative_argument): Add info for IFN_MULH. * internal-fn.def (IFN_MULH): New internal function. * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to recog normal multiply highpart as IFN_MULH. * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined function CFN_MULH. gcc/testsuite/ChangeLog: PR tree-optimization/100696 * gcc.target/i386/pr100637-3w.c: Adjust for mul_highpart recog.