From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by sourceware.org (Postfix) with ESMTPS id 3AC013857007 for ; Tue, 20 Jul 2021 13:11:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3AC013857007 Received: by mail-pj1-x102b.google.com with SMTP id gx2so2553830pjb.5 for ; Tue, 20 Jul 2021 06:11:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=sZzfptFZBTnAQCrsvHvXlGN6QrIum6XGJ5xk0hs/cSE=; b=Ki3z7aWZM9DT0fRX0lPaiReGS3/Ml5TB68C637CbIpVWZRgteSxI01r829f+RNYXoW +TXBjBhCUgS3I77k05b8IxUJjaXem1IRtk1nIUwQqWOhqJm0PnjghzPJdeXfVO8j01/V xZn5bDYVdHSHmRaofVVAbKuycJHgNNfVXbmUGeQQLzddo94U3h0Otf2rgffWBeUo/BPv BY0vI1n5qyr5iO7nk2Sb7PLSWzhXlx7vLdBSqJq8P9zdljyBw0CBK7DOTbX5kYbkvWpv ZMCWU1VjJVcvBgHpDSHGAGEygjfQ9kRVbbY+Qq4io+QdFu2svpToSh/OHDSOFFlu8jBy 1NrA== X-Gm-Message-State: AOAM533Sro1TaUY9Zk7t/ILMSyknh5mTqWmgwQ6s8L7dEKDRy//GgvcH HHfgTu3lclx0TQS3WeBKKv3pZR/SYIFt6m90+dXAfd/lJ54= X-Google-Smtp-Source: ABdhPJz3KI/ObbFUyTYY4Xvu/Kl3GoK7V2fe+Wk7Os7PwQLjnGpO0Hi8lugXUh4nRTsUf1EFzIGTY/yUUB8qkTSyEy0= X-Received: by 2002:a17:90a:d254:: with SMTP id o20mr35633696pjw.136.1626786682015; Tue, 20 Jul 2021 06:11:22 -0700 (PDT) MIME-Version: 1.0 References: <20210713214956.2010942-1-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Tue, 20 Jul 2021 06:10:46 -0700 Message-ID: Subject: Re: [PATCH] Add QI vector mode support to by-pieces for memset To: "H.J. Lu via Gcc-patches" , Jeff Law , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3024.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2021 13:11:27 -0000 On Tue, Jul 20, 2021 at 5:48 AM H.J. Lu wrote: > > On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford > wrote: > > > > "H.J. Lu via Gcc-patches" writes: > > >> > + { > > >> > + /* First generate subreg of word mode if the previous mode= is > > >> > + wider than word mode and word mode is wider than MODE. = */ > > >> > + prev_rtx =3D simplify_gen_subreg (word_mode, prev_rtx, > > >> > + prev_mode, 0); > > >> > + prev_mode =3D word_mode; > > >> > + } > > >> > + if (prev_rtx !=3D nullptr) > > >> > + target =3D simplify_gen_subreg (mode, prev_rtx, prev_mode, 0= ); > > >> > > >> This should be lowpart_subreg, since 0 isn't the right offset for > > >> big-endian targets. Using lowpart_subreg should also avoid the need > > >> for the word_size =E2=80=9Cif=E2=80=9D above: lowpart_subreg can han= dle lowpart subword > > >> subregs of multiword values. > > > > > > I tried it. It didn't work since it caused the LRA failure. I repl= aced > > > simplify_gen_subreg with lowpart_subreg instead. > > > > What specifically went wrong? > > With vector broadcast, for > --- > extern void *ops; > > void > foo (int c) > { > __builtin_memset (ops, c, 18); > } > --- > we generate HI from V16QI. With a single lowpart_subreg, I get > > (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > (const_int 16 [0x10])) [0 MEM [(void > *)ops.0_1]+16 S2 A8]) > (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_intern= al} > (nil)) > > instead of > > (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > (const_int 16 [0x10])) [0 MEM [(void > *)ops.0_1]+16 S2 A8]) > (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} > (nil)) > > IRA and LRA fail to reload: > > (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > (const_int 16 [0x10])) [0 MEM [(void > *)ops.0_1]+16 S2 A8]) > (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_intern= al} > (nil)) > > since ix86_can_change_mode_class has > > if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass)) > { > /* Vector registers do not support QI or HImode loads. If we don't > disallow a change to these modes, reload will assume it's ok to > drop the subreg from (subreg:SI (reg:HI 100) 0). This affects > the vec_dupv4hi pattern. */ > if (GET_MODE_SIZE (from) < 4) > return false; > } Correction. It is ix86_hard_regno_mode_ok which doesn't allow HImode in XMM registers. > If we don't use a hard scratch register, (subreg:HI (reg:V16QI)) compiles= . But > codegen is worse: > > vmovd %edi, %xmm0 > vpbroadcastb %xmm0, %xmm0 > vmovdqa %xmm0, -24(%rsp) > movq ops(%rip), %rax > movzwl -24(%rsp), %edx > vmovdqu %xmm0, (%rax) > movw %dx, 16(%rax) > > vs > > vmovd %edi, %xmm15 > movq ops(%rip), %rax > vpbroadcastb %xmm15, %xmm15 > vmovq %xmm15, %rdx > movw %dx, 16(%rax) > vmovdqu %xmm15, (%rax) > > > >> > +/* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (= MODE) > > >> > + bytes from constant string DATA + OFFSET and return it as targ= et > > >> > + constant. If PREV isn't nullptr, it has the RTL info from the > > >> > + previous iteration. */ > > >> > + > > >> > +rtx > > >> > +builtin_memset_read_str (void *data, void *prev, > > >> > + HOST_WIDE_INT offset ATTRIBUTE_UNUSED, > > >> > + machine_mode mode) > > >> > +{ > > >> > + rtx target; > > >> > const char *c =3D (const char *) data; > > >> > - char *p =3D XALLOCAVEC (char, GET_MODE_SIZE (mode)); > > >> > + char *p; > > >> > + unsigned int size =3D GET_MODE_SIZE (mode).to_constant (); > > >> > + > > >> > + /* Don't use the previous value if size is 1. */ > > >> > > >> Why not though? The existing code does, and that seems like the rig= ht > > >> thing to do when operating on integers. > > >> > > >> I can see the check would be a good thing to do if MODE isn't a vect= or > > >> mode and the previous mode was. Is that the case you're worried abo= ut? > > >> If so, that check could go in gen_memset_value_from_prev instead. > > > > > > We are storing one byte. Doing it directly is faster. > > > > But the first thing being protected here is=E2=80=A6 > > > > >> > + if (size !=3D 1) > > >> > + { > > >> > + target =3D gen_memset_value_from_prev (prev, mode); > > >> > + if (target !=3D nullptr) > > >> > + return target; > > > > =E2=80=A6this attempt to use the previous value. If the target uses, s= ay, > > SImode for the first piece and QImode for a final byte, using the QImod= e > > lowpart of the SImode register would avoid having to move the byte valu= e > > into a separate QImode register. Why's that a bad thing to do? AFAICT > > it's what the current code would do, so if we want to change it even fo= r > > integer modes, I think it should be a separate patch with a separate > > justification. > > I removed the size =3D=3D 1 check. I didn't notice any issues. > > > Like I say, I can understand that using the QImode lowpart of a vector > > wouldn't be a good idea. But if that's specifically what you're trying > > to prevent, I think we should test for it. > > > > Thanks, > > Richard > > I will submit the v3 patch. > > Thanks. > > -- > H.J. --=20 H.J.