Hi all, I revised my patch according to all your reviews. Regtested on x86_64-pc-linux-gnu. BRs, Haochen > -----Original Message----- > From: Liu, Hongtao > Sent: Thursday, June 30, 2022 4:57 PM > To: Uros Bizjak ; Jiang, Haochen > > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH] i386: Extend cvtps2pd to memory > > > > > -----Original Message----- > > From: Uros Bizjak > > Sent: Thursday, June 30, 2022 4:53 PM > > To: Jiang, Haochen > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory > > > > On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak wrote: > > > > > > On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak wrote: > > > > > > > > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen > > > wrote: > > > > > > > > > > > -----Original Message----- > > > > > > From: Uros Bizjak > > > > > > Sent: Thursday, June 30, 2022 2:20 PM > > > > > > To: Jiang, Haochen > > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao > > > > > > > > > > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory > > > > > > > > > > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > This patch aims to fix the cvtps2pd insn, which should also > > > > > > > work on memory operand but currently does not. After this fix, > > > > > > > when loop == 2, it will eliminate movq instruction. > > > > > > > > > > > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk? > > > > > > > > > > > > > > BRs, > > > > > > > Haochen > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > PR target/43618 > > > > > > > * config/i386/sse.md (extendv2sfv2df2): New define_expand. > > > > > > > (sse2_cvtps2pd_load): Rename > extendvsdfv2df2. > > > > > > Rename FROM ... > > > > > > Please also mention change to sse2_cvtps2pd. > > > > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > PR target/43618 > > > > > > > * gcc.target/i386/pr43618-1.c: New test. > > > > > > > > > > > > This patch could be as simple as: > > > > > > > > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > > > > > index 8cd0f617bf3..c331445cb2d 100644 > > > > > > --- a/gcc/config/i386/sse.md > > > > > > +++ b/gcc/config/i386/sse.md > > > > > > @@ -9195,7 +9195,7 @@ > > > > > > (define_insn "extendv2sfv2df2" > > > > > > [(set (match_operand:V2DF 0 "register_operand" "=v") > > > > > > (float_extend:V2DF > > > > > > - (match_operand:V2SF 1 "register_operand" "v")))] > > > > > > + (match_operand:V2SF 1 "nonimmediate_operand" "vm")))] > > > > > > "TARGET_MMX_WITH_SSE" > > > > > > "%vcvtps2pd\t{%1, %0|%0, %1}" > > > > > > [(set_attr "type" "ssecvt") > > > > > > > > > > We also tested on this version, it is ok. > > > > > > > > > > The reason why the patch looks like this is because in the > > > > > previous insn sse2_cvtps2pd, the constraint vm and > > > > > vector_operand actually does not match the actual instruction. > > > > > Memory operand is V2SF, not V4SF. > > > > > > > > > > Therefore, we changed the constraint in that insn. Then it caused > another > > issue. > > > > > For memory operand, it seems that we cannot generate those mask > > instructions. > > > > > So I change the pattern to how extendv2hfv2df2 works. > > > > > > > > If you want to change the memory access in > sse2_cvtps2pd, > > > > then please see how e.g. v2hiv2di is handled in sse.md. In > > > > addition to two instructions, you will need one > > > > define_insn_and_split with a pre-reload splitter. > > > > > > Oh, nowadays combine does vec_select from a paradoxical subreg on its > own. > > > > > > +(define_expand "extendv2sfv2df2" > > > + [(set (match_operand:V2DF 0 "register_operand") > > > + (float_extend:V2DF > > > + (match_operand:V2SF 1 "nonimmediate_operand")))] > > > + "TARGET_MMX_WITH_SSE" > > > +{ > > > + if (!MEM_P (operands[1])) > > > + { > > > > > > You will need force reg here: > > > > > > rtx op1 = force_reg (V2SFmode, operands[1]); > > > + operands[1] = lowpart_subreg (V4SFmode, op1, V2SFmode); > > > + emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1])); > > > + DONE; > > > + } > > > +}) > > > > > > > > > -(define_insn "extendv2sfv2df2" > > > +(define_insn "sse2_cvtps2pd_load" > > > > > > Please name this insn "*sse2_cvtps2pd_1". Please note > the > > > star at the beginning, You don't have to make the name public. > > > > > > OK with the above changes. > > > > Forgot to mention: > > > > > > - (match_operand:V2SF 1 "register_operand" "v")))] > > - "TARGET_MMX_WITH_SSE" > > - "%vcvtps2pd\t{%1, %0|%0, %1}" > > + (match_operand:V2SF 1 "memory_operand" "m")))] > > + "TARGET_MMX_WITH_SSE && " > > + "%vcvtps2pd\t{%1, %0|%0 > and2>, %q1}" > > [(set_attr "type" "ssecvt") > > > > The new insn does not need to be limited to TARGET_MMX_WITH_SSE, so > we > > can use TARGET_SSE2 here. > > > > Which opens the question if the expander could also be TARGET_SSE2 only. > > There are no MMX registers involved in any of the patterns anymore. > Yes. > > > > Uros. > > > > > > Thanks, > > > Uros,