From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 3756A385E02C for ; Fri, 30 Jul 2021 09:05:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3756A385E02C Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CA4E61FB; Fri, 30 Jul 2021 02:05:19 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3A9933F66F; Fri, 30 Jul 2021 02:05:19 -0700 (PDT) From: Richard Sandiford To: "H.J. Lu" Mail-Followup-To: "H.J. Lu" , "H.J. Lu via Gcc-patches" , Jeff Law , richard.sandiford@arm.com Cc: "H.J. Lu via Gcc-patches" , Jeff Law Subject: Re: [PATCH v4] Add QI vector mode support to by-pieces for memset References: <20210721200216.256776-1-hjl.tools@gmail.com> Date: Fri, 30 Jul 2021 10:05:18 +0100 In-Reply-To: (H. J. Lu's message of "Tue, 27 Jul 2021 20:20:21 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2021 09:05:22 -0000 "H.J. Lu" writes: > On Tue, Jul 27, 2021 at 8:31 AM H.J. Lu wrote: >> >> On Mon, Jul 26, 2021 at 4:19 PM H.J. Lu wrote: >> > >> > On Mon, Jul 26, 2021 at 3:56 PM H.J. Lu wrote: >> > > >> > > On Mon, Jul 26, 2021 at 2:53 PM Richard Sandiford >> > > wrote: >> > > > >> > > > "H.J. Lu via Gcc-patches" writes: >> > > > > On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford >> > > > > wrote: >> > > > >> >> > > > >> "H.J. Lu via Gcc-patches" writes: >> > > > >> > +to avoid stack realignment when expanding memset. The defau= lt is >> > > > >> > +@code{gen_reg_rtx}. >> > > > >> > +@end deftypefn >> > > > >> > + >> > > > >> > @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST = (unsigned @var{nunroll}, class loop *@var{loop}) >> > > > >> > This target hook returns a new value for the number of times= @var{loop} >> > > > >> > should be unrolled. The parameter @var{nunroll} is the numbe= r of times >> > > > >> > [=E2=80=A6] >> > > > >> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned HOST_WID= E_INT len, >> > > > >> > max_size =3D STORE_MAX_PIECES + 1; >> > > > >> > while (max_size > 1 && l > 0) >> > > > >> > { >> > > > >> > - scalar_int_mode mode =3D widest_int_mode_for_size (ma= x_size); >> > > > >> > + /* Since this can be called before virtual registers = are ready >> > > > >> > + to use, avoid QI vector mode here. */ >> > > > >> > + fixed_size_mode mode >> > > > >> > + =3D widest_fixed_size_mode_for_size (max_size, fals= e); >> > > > >> >> > > > >> I think I might have asked this before, sorry, but: when is tha= t true >> > > > >> and why does it matter? >> > > > > >> > > > > can_store_by_pieces may be called: >> > > > > >> > > > > value-prof.c: if (!can_store_by_pieces (val, builtin_memset= _read_str, >> > > > > value-prof.c: if (!can_store_by_pieces (val, builtin_memset= _read_str, >> > > > > >> > > > > before virtual registers can be used. When true is passed to >> > > > > widest_fixed_size_mode_for_size, virtual registers may be used >> > > > > to expand memset to broadcast, which leads to ICE. Since for t= he >> > > > > purpose of can_store_by_pieces, we don't need to expand memset >> > > > > to broadcast and pass false here can avoid ICE. >> > > > >> > > > Ah, I see, thanks. >> > > > >> > > > That sounds like a problem in the way that the memset const functi= on is >> > > > written though. can_store_by_pieces is just a query function, so = I don't >> > > > think it should be trying to create new registers for can_store_by= _pieces, >> > > > even if it could. At the same time, can_store_by_pieces should ma= ke the >> > > > same choices as the real expander would. >> > > > >> > > > I think this means that: >> > > > >> > > > - gen_memset_broadcast should be inlined into its callers, with the >> > > > builtin_memset_read_str getting the CONST_INT_P case and >> > > > builtin_memset_gen_str getting the variable case. >> > > > >> > > > - builtin_memset_read_str should then stop at and return the >> > > > gen_const_vec_duplicate when the prev argument is null. >> > >> > This doesn't work since can_store_by_pieces has >> > >> > cst =3D (*constfun) (constfundata, nullptr, offset, m= ode); >> > if (!targetm.legitimate_constant_p (mode, cst)) >> >> We can add a target hook, targetm.legitimate_memset_constant_p, >> which defaults to targetm.legitimate_constant_p. Will it be acceptable? > > In the v5 patch, I changed it to > > cst =3D (*constfun) (constfundata, nullptr, offset, mode= ); > /* All CONST_VECTORs are legitimate if vec_duplicate > is supported. */ Maybe =E2=80=9Ccan be loaded=E2=80=9D rather than =E2=80=9Care legitimate= =E2=80=9D, since they're not necessarily legitimate in the sense of legitimate_constant_p (hence the patch). Also, since we assume elsewhere that vec_duplicate is a precondition for picking a vector mode, I think we should do the same here (and note that in the comment). So=E2=80=A6 > if (!((memsetp > && VECTOR_MODE_P (mode) > && GET_MODE_INNER (mode) =3D=3D QImode > && (optab_handler (vec_duplicate_optab, mode) > !=3D CODE_FOR_nothing)) I think we need only the (memsetp && VECTOR_MODE_P (mode)) check. This feels a bit of a hack TBH. I think the same principles apply to vectors and integers here: forcing the constant to memory is still likely to be an optimisation, but is an extra overhead that we should probably account for. However, I agree this is probably the most practical way forward at the moment. Thanks, Richard > || targetm.legitimate_constant_p (mode, cst))) > return 0; > >> > ix86_legitimate_constant_p only allows 0 or -1 for CONST_VECTOR. >> > can_store_by_pieces doesn't work well for vector modes. >> > >> > > > Only when prev is nonnull should it go on to call the hook >> > > > and copy the constant to the register that the hook returns. >> > > >> > > How about keeping gen_memset_broadcast and passing PREV to it: >> > > >> > > rtx target; >> > > if (CONST_INT_P (data)) >> > > { >> > > rtx const_vec =3D gen_const_vec_duplicate (mode, data); >> > > if (prev =3D=3D NULL) >> > > /* Return CONST_VECTOR when called by a query function. */ >> > > target =3D const_vec; >> > > else >> > > { >> > > /* Use the move expander with CONST_VECTOR. */ >> > > target =3D targetm.gen_memset_scratch_rtx (mode); >> > > emit_move_insn (target, const_vec); >> > > } >> > > } >> > > else >> > > { >> > > target =3D targetm.gen_memset_scratch_rtx (mode); >> > > class expand_operand ops[2]; >> > > create_output_operand (&ops[0], target, mode); >> > > create_input_operand (&ops[1], data, QImode); >> > > expand_insn (icode, 2, ops); >> > > if (!rtx_equal_p (target, ops[0].value)) >> > > emit_move_insn (target, ops[0].value); >> > > } >> > > >> > > > I admit that's uglier than the current version, but it looks like = that's >> > > > what the current interface expects. >> > > > >> > > > Thanks, >> > > > Richard >> > > >> > > >> > > >> > > -- >> > > H.J. >> > >> > >> > >> > -- >> > H.J. >> >> >> >> -- >> H.J.