From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id 535913858C27 for ; Fri, 30 Apr 2021 17:33:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 535913858C27 Received: by mail-pg1-x52b.google.com with SMTP id q10so49875192pgj.2 for ; Fri, 30 Apr 2021 10:33:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=m+JcVknFjaAwJ7PWNrflu1P3/i0B2nu0L+KghTWYwYA=; b=r0wee9UHW98EygBh31g2dgho90lxkUTS8OeBNlWHMTv6WQ9xrMu/G6NrVEeT0iEaTy /u4ngoo79eKdSvLRKbcx86CzaVAXl1ZiKSYXJ5GFy5DHi72ZI4v9Gb0iTKFg1orkpZUM vsSAqIwzPM4xBgAgBKsKzDH2YVZ3WHib6ATq24EmkSp1k4W50BFQHNkP3ZDaAzclwDab neizxmaEaWVc0ipBS9xUitgPAO+4NsOpii4siFCGun0Dk4CI5ioBdZzBEuMqFrIOmlJw DmQzMmyitev9tmaHi9xxz/QqYW/wfwyyCHN+So/Ly1nXjAX9PSinJxTgxt7ZjSxEQLn3 BLXA== X-Gm-Message-State: AOAM531UR74t4s25VMxHeZho2pywkR0vVuEjZj5hQBF+1KUjQXHf3FV4 h3hGnETFMzXYdfXBKqUTpzQ= X-Google-Smtp-Source: ABdhPJwYzwCGrYhcTzG5d5W86bvcswM/hxyc0MSOZjuSo2FaO1rZZcLvQAvSm1pFrqNO3xP+A5TpJg== X-Received: by 2002:a05:6a00:13aa:b029:27a:13d9:9314 with SMTP id t42-20020a056a0013aab029027a13d99314mr6190597pfg.15.1619803987212; Fri, 30 Apr 2021 10:33:07 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.58.35.177]) by smtp.gmail.com with ESMTPSA id o4sm2541687pjg.2.2021.04.30.10.33.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Apr 2021 10:33:06 -0700 (PDT) Received: by gnu-cfl-2.localdomain (Postfix, from userid 1000) id CA7D9C0445; Fri, 30 Apr 2021 10:33:04 -0700 (PDT) Date: Fri, 30 Apr 2021 10:33:04 -0700 From: "H.J. Lu" To: "H.J. Lu via Gcc-patches" , richard.sandiford@arm.com Subject: Re: [PATCH 02/12] Allow generating pseudo register with specific alignment Message-ID: References: <20210429125415.1634118-1-hjl.tools@gmail.com> <20210429125415.1634118-3-hjl.tools@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-3029.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Apr 2021 17:33:10 -0000 On Fri, Apr 30, 2021 at 04:56:30PM +0100, Richard Sandiford wrote: > "H.J. Lu via Gcc-patches" writes: > > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu wrote: > >> > >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford > >> wrote: > >> > > >> > "H.J. Lu via Gcc-patches" writes: > >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford > >> > > wrote: > >> > >> > >> > >> "H.J. Lu via Gcc-patches" writes: > >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo registers so that > >> > >> > associated hard registers can be properly spilled onto stack. But there > >> > >> > are cases where associated hard registers will never be spilled onto > >> > >> > stack. gen_reg_rtx is changed to take an argument for register alignment > >> > >> > so that stack realignment can be avoided when not needed. > >> > >> > >> > >> How is it guaranteed that they will never be spilled though? > >> > >> I don't think that that guarantee exists for any kind of pseudo, > >> > >> except perhaps for the temporary pseudos that the RA creates to > >> > >> replace (match_scratch …)es. > >> > >> > >> > > > >> > > The caller of creating pseudo registers with specific alignment must > >> > > guarantee that they will never be spilled. I am only using it in > >> > > > >> > > /* Make operand1 a register if it isn't already. */ > >> > > if (can_create_pseudo_p () > >> > > && !register_operand (op0, mode) > >> > > && !register_operand (op1, mode)) > >> > > { > >> > > /* NB: Don't increase stack alignment requirement when forcing > >> > > operand1 into a pseudo register to copy data from one memory > >> > > location to another since it doesn't require a spill. */ > >> > > emit_move_insn (op0, > >> > > force_reg (GET_MODE (op0), op1, > >> > > (UNITS_PER_WORD * BITS_PER_UNIT))); > >> > > return; > >> > > } > >> > > > >> > > for vector moves. RA shouldn't spill it. > >> > > >> > But this is the point: it's a case of hoping that the RA won't spill it, > >> > rather than having a guarantee that it won't. > >> > > >> > Even if the moves start out adjacent, they could be separated by later > >> > RTL optimisations, particularly scheduling. (I realise pre-RA scheduling > >> > isn't enabled by default for x86, but it can still be enabled explicitly.) > >> > Or if the same data is being copied to two locations, we might reuse > >> > values loaded by the first copy for the second copy as well. > > > > There are cases where pseudo vector registers are created as pure > > temporary registers in the backend and they shouldn't ever be spilled > > to stack. They will be spilled to stack only if there are other non-temporary > > vector register usage in which case stack will be properly re-aligned. > > Caller of creating pseudo registers with specific alignment guarantees > > that they are used only as pure temporary registers. > > I don't think there's really a distinct category of pure temporary > registers though. The things I mentioned above can happen for any > kind of pseudo register. > This special pseudo register is only generated when inlining memcpy and memset. For memcpy, there is no need to spill: [hjl@gnu-cfl-2 pieces]$ cat spill1.i extern void *ops1; extern void *ops2; extern void bar (void); void foo (void) { __builtin_memcpy (ops1, ops2, 32); bar (); __builtin_memcpy (ops1, ops2, 32); } [hjl@gnu-cfl-2 pieces]$ make spill1.s /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -O2 -march=haswell -S spill1.i [hjl@gnu-cfl-2 pieces]$ cat spill1.s .file "spill1.i" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movq ops2(%rip), %rax vmovdqu (%rax), %ymm0 movq ops1(%rip), %rax vmovdqu %ymm0, (%rax) vzeroupper call bar movq ops2(%rip), %rax vmovdqu (%rax), %ymm0 movq ops1(%rip), %rax vmovdqu %ymm0, (%rax) vzeroupper addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 12.0.0 20210430 (experimental)" .section .note.GNU-stack,"",@progbits [hjl@gnu-cfl-2 pieces]$ For memeset, x86 backend supports unaligned spill: [hjl@gnu-cfl-2 pieces]$ cat spill2.i extern void *ops1; extern void *ops2; extern void bar (void); void foo (int c) { __builtin_memset (ops1, c, 32); bar (); __builtin_memset (ops2, c, 32); } [hjl@gnu-cfl-2 pieces]$ make spill2.s /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -O2 -march=haswell -S spill2.i [hjl@gnu-cfl-2 pieces]$ cat spill2.s .file "spill2.i" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc subq $40, %rsp .cfi_def_cfa_offset 48 vmovd %edi, %xmm0 movq ops1(%rip), %rax vpbroadcastb %xmm0, %ymm0 vmovdqu %ymm0, (%rax) vmovdqu %ymm0, (%rsp) vzeroupper call bar movq ops2(%rip), %rax vmovdqu (%rsp), %ymm0 vmovdqu %ymm0, (%rax) vzeroupper addq $40, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 12.0.0 20210430 (experimental)" .section .note.GNU-stack,"",@progbits [hjl@gnu-cfl-2 pieces]$ H.J.