From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CBEDB385AC19 for ; Tue, 20 Jul 2021 14:26:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CBEDB385AC19 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6762E1FB; Tue, 20 Jul 2021 07:26:22 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CA4193F694; Tue, 20 Jul 2021 07:26:21 -0700 (PDT) From: Richard Sandiford To: "H.J. Lu via Gcc-patches" Mail-Followup-To: "H.J. Lu via Gcc-patches" , Jeff Law , "H.J. Lu" , richard.sandiford@arm.com Cc: Jeff Law , "H.J. Lu" Subject: Re: [PATCH] Add QI vector mode support to by-pieces for memset References: <20210713214956.2010942-1-hjl.tools@gmail.com> Date: Tue, 20 Jul 2021 15:26:20 +0100 In-Reply-To: (H. J. Lu via Gcc-patches's message of "Tue, 20 Jul 2021 05:48:35 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2021 14:26:25 -0000 "H.J. Lu via Gcc-patches" writes: > On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford > wrote: >> >> "H.J. Lu via Gcc-patches" writes: >> >> > + { >> >> > + /* First generate subreg of word mode if the previous mode = is >> >> > + wider than word mode and word mode is wider than MODE. = */ >> >> > + prev_rtx =3D simplify_gen_subreg (word_mode, prev_rtx, >> >> > + prev_mode, 0); >> >> > + prev_mode =3D word_mode; >> >> > + } >> >> > + if (prev_rtx !=3D nullptr) >> >> > + target =3D simplify_gen_subreg (mode, prev_rtx, prev_mode, 0); >> >> >> >> This should be lowpart_subreg, since 0 isn't the right offset for >> >> big-endian targets. Using lowpart_subreg should also avoid the need >> >> for the word_size =E2=80=9Cif=E2=80=9D above: lowpart_subreg can hand= le lowpart subword >> >> subregs of multiword values. >> > >> > I tried it. It didn't work since it caused the LRA failure. I repla= ced >> > simplify_gen_subreg with lowpart_subreg instead. >> >> What specifically went wrong? > > With vector broadcast, for > --- > extern void *ops; > > void > foo (int c) > { > __builtin_memset (ops, c, 18); > } > --- > we generate HI from V16QI. With a single lowpart_subreg, I get > > (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > (const_int 16 [0x10])) [0 MEM [(void > *)ops.0_1]+16 S2 A8]) > (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_intern= al} > (nil)) > > instead of > > (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > (const_int 16 [0x10])) [0 MEM [(void > *)ops.0_1]+16 S2 A8]) > (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} > (nil)) > > IRA and LRA fail to reload: > > (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > (const_int 16 [0x10])) [0 MEM [(void > *)ops.0_1]+16 S2 A8]) > (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_intern= al} > (nil)) > > since ix86_can_change_mode_class has > > if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass)) > { > /* Vector registers do not support QI or HImode loads. If we don't > disallow a change to these modes, reload will assume it's ok to > drop the subreg from (subreg:SI (reg:HI 100) 0). This affects > the vec_dupv4hi pattern. */ > if (GET_MODE_SIZE (from) < 4) > return false; > } Ah! OK. In that case, maybe we should have something like: if (REG_P (prev_rtx) && HARD_REGISTER_P (prev_rtx) && REG_CAN_CHANGE_MODE_P (REGNO (prev_rtx), prev->mode, mode)) prev_rtx =3D copy_to_reg (prev_rtx); and then just have the single lowpart_subreg after that. Thanks, Richard