From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by sourceware.org (Postfix) with ESMTPS id 57C503858D32 for ; Thu, 25 May 2023 10:51:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 57C503858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2af177f12d1so4721611fa.0 for ; Thu, 25 May 2023 03:51:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685011888; x=1687603888; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VlwlNhwG5vVqLRFZL2WkvgnTo5mgu3SMhnpe4sK4SPc=; b=YCnuddeB3bfCimBHSx2swT8xSzH6Fl812jXwPaRY16nPCdqChkfUpPSOzt2cccXe7c IjL5/LrItgeMPjD2YgY/U09zrxTms34a9rMUy6/wYoTq37QDc3IhPvZ5A/bNifFEi4uW ZR6oZIClTM19Vl4YLZMGSo8Rk50rA6YWjb7IbYrb0Q/lyp6lbvy1cRvj5rQfbhXvVufT hnUvdmA0zG4MU0qR9LVE3L1sxI0sWRF8gQg5flER09OPIF4+tR36KDOPpa0lb+Vbx2oB c3WLg2ZBd5YtPGR2MOkeyOX2tVgmvUzNGAre/O046ZedvnNy9uEflZI2KqJOUvhDKrjP sh0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685011888; x=1687603888; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VlwlNhwG5vVqLRFZL2WkvgnTo5mgu3SMhnpe4sK4SPc=; b=ZgPNPX4ubrv/F+V2IKGNB/2vPtoaWx+7VElpiwlnEJnagngFuxzruYB8NkK45AOokW /V+EaigcPdZPuqitJ4Jmlb8o0jeugIbhUJ0roy0mCkZEDH8pCaWGuAsfF1dknU63x6F4 SifgFY/tmdKONph2Hhj7u+E6sbsIxOfnREaxHSP2zN9y7rb3+ENXqzyXFi1JKlitr55b oEHH085PbynRfxs1a0ZFRPDg04wzBNQtIXRB0oj/XAARHXXMMR9JyErO7FrWC6NOCl3E OrbKFb8l+qFlLVwtYytOnwGF+JtUrKEPAHbvjH8nvuFfrpadLHmzRIqezbzhb8ustf7w dQwg== X-Gm-Message-State: AC+VfDwdHraNSl3EmKpSCNshsW4nDi45AHLVwtBMUpaP0J9OdLlFuo9H Km2qjY62LkJRMESqc5bt+PAEpI3gD4DkWWW8vLAzVyfw X-Google-Smtp-Source: ACHHUZ7S5DRI+08CvdiR64rD+bFlb0qYSYbQsYfr60NthuNRoB4OqzecNGq0kAWLHOT/ZKN/f4itquvUSBZAFBkrzEc= X-Received: by 2002:a2e:b177:0:b0:2a8:d021:4121 with SMTP id a23-20020a2eb177000000b002a8d0214121mr872071ljm.26.1685011887700; Thu, 25 May 2023 03:51:27 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Thu, 25 May 2023 12:49:20 +0200 Message-ID: Subject: Re: [PATCH] [x86] reenable dword MOVE_MAX for better memmove inlining To: Alexandre Oliva Cc: gcc-patches@gcc.gnu.org, "H.J. Lu" , Jan Hubicka , Uros Bizjak Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, May 25, 2023 at 12:01=E2=80=AFPM Alexandre Oliva wrote: > > --text follows this line-- > On May 24, 2023, Richard Biener wrote: > > > gimple_fold_builtin_memory_op tries to expand the call to a single > > load plus a single store so we can handle overlaps by first loading > > everything to registers and then storing: > > *nod*, that's why I figured we could afford to go back to allowing > DImode (with -m32) or TImode (with -m64) even without vector modes: we'd > just use a pair of registers, a single insn, even though not a single > hardware instruction. > > > using DImode on i?86 without SSE means we eventually perform two > > loads and two stores which means we need two registers available. > > *nod*. But the alternative is to issue an out-of-line call to memmove, > which would clobber more than 2 registers. ISTM that inlining such > calls is better, whether optimizing for speed or size. > > > So I think if we want to expand this further at the GIMPLE level we > > should still honor MOVE_MAX but eventually emit multiple loads/stores > > honoring the MOVE_MAX_PIECES set of constraints there and avoid > > expanding to sequences where we cannot interleave the loads/stores > > (aka for the memmove case). > > But... don't we already? If I'm reading the code right, we'll already > issue gimple code to load the whole block into a temporary and then > store it, but current MOVE_MAX won't let us go past 4 bytes on SSE-less > x86. I mean we could do what RTL expansion would do later and do by-pieces, thus emit multiple loads/stores but not n loads and then n stores but interleaved. Richard. > > -- > Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ > Free Software Activist GNU Toolchain Engineer > Disinformation flourishes because many people care deeply about injustice > but very few check the facts. Ask me about