From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 447F9385800E for ; Tue, 20 Jul 2021 15:12:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 447F9385800E Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E37FF31B; Tue, 20 Jul 2021 08:12:21 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 528753F694; Tue, 20 Jul 2021 08:12:21 -0700 (PDT) From: Richard Sandiford To: "H.J. Lu via Gcc-patches" Mail-Followup-To: "H.J. Lu via Gcc-patches" , Jeff Law , "H.J. Lu" , richard.sandiford@arm.com Cc: Jeff Law , "H.J. Lu" Subject: Re: [PATCH] Add QI vector mode support to by-pieces for memset References: <20210713214956.2010942-1-hjl.tools@gmail.com> Date: Tue, 20 Jul 2021 16:12:20 +0100 In-Reply-To: (Richard Sandiford via Gcc-patches's message of "Tue, 20 Jul 2021 15:26:20 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2021 15:12:23 -0000 Richard Sandiford via Gcc-patches writes: > "H.J. Lu via Gcc-patches" writes: >> On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford >> wrote: >>> >>> "H.J. Lu via Gcc-patches" writes: >>> >> > + { >>> >> > + /* First generate subreg of word mode if the previous mode= is >>> >> > + wider than word mode and word mode is wider than MODE. = */ >>> >> > + prev_rtx =3D simplify_gen_subreg (word_mode, prev_rtx, >>> >> > + prev_mode, 0); >>> >> > + prev_mode =3D word_mode; >>> >> > + } >>> >> > + if (prev_rtx !=3D nullptr) >>> >> > + target =3D simplify_gen_subreg (mode, prev_rtx, prev_mode, 0= ); >>> >> >>> >> This should be lowpart_subreg, since 0 isn't the right offset for >>> >> big-endian targets. Using lowpart_subreg should also avoid the need >>> >> for the word_size =E2=80=9Cif=E2=80=9D above: lowpart_subreg can han= dle lowpart subword >>> >> subregs of multiword values. >>> > >>> > I tried it. It didn't work since it caused the LRA failure. I repl= aced >>> > simplify_gen_subreg with lowpart_subreg instead. >>> >>> What specifically went wrong? >> >> With vector broadcast, for >> --- >> extern void *ops; >> >> void >> foo (int c) >> { >> __builtin_memset (ops, c, 18); >> } >> --- >> we generate HI from V16QI. With a single lowpart_subreg, I get >> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) >> (const_int 16 [0x10])) [0 MEM [(void >> *)ops.0_1]+16 S2 A8]) >> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_inter= nal} >> (nil)) >> >> instead of >> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) >> (const_int 16 [0x10])) [0 MEM [(void >> *)ops.0_1]+16 S2 A8]) >> (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} >> (nil)) >> >> IRA and LRA fail to reload: >> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) >> (const_int 16 [0x10])) [0 MEM [(void >> *)ops.0_1]+16 S2 A8]) >> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_inter= nal} >> (nil)) >> >> since ix86_can_change_mode_class has >> >> if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass)) >> { >> /* Vector registers do not support QI or HImode loads. If we don't >> disallow a change to these modes, reload will assume it's ok to >> drop the subreg from (subreg:SI (reg:HI 100) 0). This affects >> the vec_dupv4hi pattern. */ >> if (GET_MODE_SIZE (from) < 4) >> return false; >> } > > Ah! OK. In that case, maybe we should have something like: > > if (REG_P (prev_rtx) > && HARD_REGISTER_P (prev_rtx) > && REG_CAN_CHANGE_MODE_P (REGNO (prev_rtx), prev->mode, mode)) Sorry, make that last line: && lowpart_subreg_regno (REGNO (prev_rtx), prev->mode, mode) < 0 where lowpart_subreg_regno is a new wrapper around simplify_subreg_regno that uses subreg_lowpart_offset (mode, prev->mode) as the offset. Thanks, Richard > prev_rtx =3D copy_to_reg (prev_rtx); > > and then just have the single lowpart_subreg after that. > > Thanks, > Richard