From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 6847A3857424 for ; Wed, 21 Jul 2021 19:42:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6847A3857424 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 122BAD6E; Wed, 21 Jul 2021 12:42:55 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 518243F66F; Wed, 21 Jul 2021 12:42:54 -0700 (PDT) From: Richard Sandiford To: "H.J. Lu via Gcc-patches" Mail-Followup-To: "H.J. Lu via Gcc-patches" , Richard Biener , Jeff Law , "H.J. Lu" , richard.sandiford@arm.com Cc: Richard Biener , Jeff Law , "H.J. Lu" Subject: Re: [PATCH v3] Add QI vector mode support to by-pieces for memset References: <20210720185058.244593-1-hjl.tools@gmail.com> Date: Wed, 21 Jul 2021 20:42:53 +0100 In-Reply-To: (Richard Sandiford's message of "Wed, 21 Jul 2021 20:20:55 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2021 19:42:57 -0000 Richard Sandiford writes: > "H.J. Lu via Gcc-patches" writes: >> On Wed, Jul 21, 2021 at 7:50 AM Richard Sandiford >> wrote: >>> >>> "H.J. Lu" writes: >>> > diff --git a/gcc/builtins.c b/gcc/builtins.c >>> > index 39ab139b7e1..1972301ce3c 100644 >>> > --- a/gcc/builtins.c >>> > +++ b/gcc/builtins.c >>> > @@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode) >>> > >>> > static rtx >>> > builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset, >>> > - scalar_int_mode mode) >>> > + fixed_size_mode mode) >>> > { >>> > /* The REPresentation pointed to by DATA need not be a nul-terminated >>> > string but the caller guarantees it's large enough for MODE. */ >>> > const char *rep = (const char *) data; >>> > >>> > - return c_readstr (rep + offset, mode, /*nul_terminated=*/false); >>> > + /* NB: Vector mode in the by-pieces infrastructure is only used by >>> > + the memset expander. */ >>> >>> Sorry to nitpick, but I guess this might get out out-of-date. Maybe: >>> >>> /* The by-pieces infrastructure does not try to pick a vector mode >>> for memcpy expansion. */ >> >> Fixed. >> >>> > + return c_readstr (rep + offset, as_a (mode), >>> > + /*nul_terminated=*/false); >>> > } >>> > >>> > /* LEN specify length of the block of memcpy/memset operation. >>> > @@ -6478,14 +6481,16 @@ expand_builtin_stpncpy (tree exp, rtx) >>> > >>> > rtx >>> > builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset, >>> > - scalar_int_mode mode) >>> > + fixed_size_mode mode) >>> > { >>> > const char *str = (const char *) data; >>> > >>> > if ((unsigned HOST_WIDE_INT) offset > strlen (str)) >>> > return const0_rtx; >>> > >>> > - return c_readstr (str + offset, mode); >>> > + /* NB: Vector mode in the by-pieces infrastructure is only used by >>> > + the memset expander. */ >>> >>> Similarly here for strncpy expansion. >> >> Fixed. >> >>> > + return c_readstr (str + offset, as_a (mode)); >>> > } >>> > >>> > /* Helper to check the sizes of sequences and the destination of calls >>> > @@ -6686,30 +6691,117 @@ expand_builtin_strncpy (tree exp, rtx target) >>> > return NULL_RTX; >>> > } >>> > >>> > -/* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (MODE) >>> > - bytes from constant string DATA + OFFSET and return it as target >>> > - constant. If PREV isn't nullptr, it has the RTL info from the >>> > +/* Return the RTL of a register in MODE generated from PREV in the >>> > previous iteration. */ >>> > >>> > -rtx >>> > -builtin_memset_read_str (void *data, void *prevp, >>> > - HOST_WIDE_INT offset ATTRIBUTE_UNUSED, >>> > - scalar_int_mode mode) >>> > +static rtx >>> > +gen_memset_value_from_prev (by_pieces_prev *prev, fixed_size_mode mode) >>> > { >>> > - by_pieces_prev *prev = (by_pieces_prev *) prevp; >>> > + rtx target = nullptr; >>> > if (prev != nullptr && prev->data != nullptr) >>> > { >>> > /* Use the previous data in the same mode. */ >>> > if (prev->mode == mode) >>> > return prev->data; >>> > + >>> > + fixed_size_mode prev_mode = prev->mode; >>> > + >>> > + /* Don't use the previous data to write QImode if it is in a >>> > + vector mode. */ >>> > + if (VECTOR_MODE_P (prev_mode) && mode == QImode) >>> > + return target; >>> > + >>> > + rtx prev_rtx = prev->data; >>> > + >>> > + if (REG_P (prev_rtx) >>> > + && HARD_REGISTER_P (prev_rtx) >>> > + && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0) >>> > + { >>> > + /* If we can't put a hard register in MODE, first generate a >>> > + subreg of word mode if the previous mode is wider than word >>> > + mode and word mode is wider than MODE. */ >>> > + if (UNITS_PER_WORD < GET_MODE_SIZE (prev_mode) >>> > + && UNITS_PER_WORD > GET_MODE_SIZE (mode)) >>> > + { >>> > + prev_rtx = lowpart_subreg (word_mode, prev_rtx, >>> > + prev_mode); >>> > + if (prev_rtx != nullptr) >>> > + prev_mode = word_mode; >>> > + } >>> > + else >>> > + prev_rtx = nullptr; >>> >>> I don't understand this. Why not just do the: >>> >>> if (REG_P (prev_rtx) >>> && HARD_REGISTER_P (prev_rtx) >>> && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0) >>> prev_rtx = copy_to_reg (prev_rtx); >>> >>> that I suggested in the previous review? >> >> But for >> --- >> extern void *ops; >> >> void >> foo (int c) >> { >> __builtin_memset (ops, c, 18); >> } >> --- >> I got >> >> vpbroadcastb %edi, %xmm31 >> vmovdqa64 %xmm31, -24(%rsp) >> movq ops(%rip), %rax >> movzwl -24(%rsp), %edx >> vmovdqu8 %xmm31, (%rax) >> movw %dx, 16(%rax) >> ret >> >> I want to avoid store and load. I am testing >> >> if (REG_P (prev_rtx) >> && HARD_REGISTER_P (prev_rtx) >> && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0) >> { >> /* Find the smallest subreg mode in the same mode class which >> is not narrower than MODE and narrower than PREV_MODE. */ >> machine_mode m; >> fixed_size_mode candidate; >> FOR_EACH_MODE_IN_CLASS (m, GET_MODE_CLASS (mode)) >> if (is_a (m, &candidate)) >> { >> if (GET_MODE_SIZE (candidate) >> >= GET_MODE_SIZE (prev_mode)) >> break; >> if (GET_MODE_SIZE (candidate) >= GET_MODE_SIZE (mode) >> && lowpart_subreg_regno (REGNO (prev_rtx), >> prev_mode, candidate) >= 0) >> { >> target = lowpart_subreg (candidate, prev_rtx, >> prev_mode); >> prev_rtx = target; >> prev_mode = candidate; >> break; >> } >> } > > That doesn't seem better though. There are still two subregs involved. Actually, I take that back. I guess it does make sense, and is definitely better than hard-coding word_mode. How about a comment along the lines of the following, instead of the one above: /* This case occurs when PREV_MODE is a vector and when MODE is too small to store using vector operations. After register allocation, the code will need to move the lowpart of the vector register into a non-vector register. Also, the target has chosen to use a hard register instead of going with the default choice of using a pseudo register. We should respect that choice and try to avoid creating a pseudo register with the same mode as the current hard register. In principle, we could just use a lowpart MODE subreg of the vector register. However, the vector register mode might be too wide for non-vector registers, and we already know that the non-vector mode is too small for vector registers. It's therefore likely that we'd need to spill to memory in the vector mode and reload the non-vector value from there. Try to avoid that by reducing the vector register to the smallest size that it can hold. This should increase the chances that non-vector registers can hold both the inner and outer modes of the subreg that we generate later. */ Thanks, Richard