From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <hjl.tools@gmail.com>
Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com
 [IPv6:2607:f8b0:4864:20::52b])
 by sourceware.org (Postfix) with ESMTPS id 535913858C27
 for <gcc-patches@gcc.gnu.org>; Fri, 30 Apr 2021 17:33:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 535913858C27
Received: by mail-pg1-x52b.google.com with SMTP id q10so49875192pgj.2
 for <gcc-patches@gcc.gnu.org>; Fri, 30 Apr 2021 10:33:08 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:subject:message-id:references
 :mime-version:content-disposition:content-transfer-encoding
 :in-reply-to;
 bh=m+JcVknFjaAwJ7PWNrflu1P3/i0B2nu0L+KghTWYwYA=;
 b=r0wee9UHW98EygBh31g2dgho90lxkUTS8OeBNlWHMTv6WQ9xrMu/G6NrVEeT0iEaTy
 /u4ngoo79eKdSvLRKbcx86CzaVAXl1ZiKSYXJ5GFy5DHi72ZI4v9Gb0iTKFg1orkpZUM
 vsSAqIwzPM4xBgAgBKsKzDH2YVZ3WHib6ATq24EmkSp1k4W50BFQHNkP3ZDaAzclwDab
 neizxmaEaWVc0ipBS9xUitgPAO+4NsOpii4siFCGun0Dk4CI5ioBdZzBEuMqFrIOmlJw
 DmQzMmyitev9tmaHi9xxz/QqYW/wfwyyCHN+So/Ly1nXjAX9PSinJxTgxt7ZjSxEQLn3
 BLXA==
X-Gm-Message-State: AOAM531UR74t4s25VMxHeZho2pywkR0vVuEjZj5hQBF+1KUjQXHf3FV4
 h3hGnETFMzXYdfXBKqUTpzQ=
X-Google-Smtp-Source: ABdhPJwYzwCGrYhcTzG5d5W86bvcswM/hxyc0MSOZjuSo2FaO1rZZcLvQAvSm1pFrqNO3xP+A5TpJg==
X-Received: by 2002:a05:6a00:13aa:b029:27a:13d9:9314 with SMTP id
 t42-20020a056a0013aab029027a13d99314mr6190597pfg.15.1619803987212; 
 Fri, 30 Apr 2021 10:33:07 -0700 (PDT)
Received: from gnu-cfl-2.localdomain ([172.58.35.177])
 by smtp.gmail.com with ESMTPSA id o4sm2541687pjg.2.2021.04.30.10.33.06
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 30 Apr 2021 10:33:06 -0700 (PDT)
Received: by gnu-cfl-2.localdomain (Postfix, from userid 1000)
 id CA7D9C0445; Fri, 30 Apr 2021 10:33:04 -0700 (PDT)
Date: Fri, 30 Apr 2021 10:33:04 -0700
From: "H.J. Lu" <hjl.tools@gmail.com>
To: "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org>,
 richard.sandiford@arm.com
Subject: Re: [PATCH 02/12] Allow generating pseudo register with specific
 alignment
Message-ID: <YIw/UBl0OEhlKl40@gmail.com>
References: <20210429125415.1634118-1-hjl.tools@gmail.com>
 <20210429125415.1634118-3-hjl.tools@gmail.com>
 <mpt8s509m6c.fsf@arm.com>
 <CAMe9rOp8fG5xbcx9jesAdjm1i6eYy5_dLcZLVeLgpc9t6YLkbg@mail.gmail.com>
 <mpttuno7xkp.fsf@arm.com>
 <CAMe9rOoR7edt2zLE0QO=v2G_cC+pFSer=LNwAtcNgoXL+hPZ4Q@mail.gmail.com>
 <CAMe9rOqpLYizFH7XWO44iKetzJ=wdnKcJCmgM0SSdVjtKSsGMg@mail.gmail.com>
 <mpttunn7olt.fsf@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <mpttunn7olt.fsf@arm.com>
X-Spam-Status: No, score=-3029.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Apr 2021 17:33:10 -0000

On Fri, Apr 30, 2021 at 04:56:30PM +0100, Richard Sandiford wrote:
> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford
> >> <richard.sandiford@arm.com> wrote:
> >> >
> >> > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford
> >> > > <richard.sandiford@arm.com> wrote:
> >> > >>
> >> > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo registers so that
> >> > >> > associated hard registers can be properly spilled onto stack.  But there
> >> > >> > are cases where associated hard registers will never be spilled onto
> >> > >> > stack.  gen_reg_rtx is changed to take an argument for register alignment
> >> > >> > so that stack realignment can be avoided when not needed.
> >> > >>
> >> > >> How is it guaranteed that they will never be spilled though?
> >> > >> I don't think that that guarantee exists for any kind of pseudo,
> >> > >> except perhaps for the temporary pseudos that the RA creates to
> >> > >> replace (match_scratch …)es.
> >> > >>
> >> > >
> >> > > The caller of creating pseudo registers with specific alignment must
> >> > > guarantee that they will never be spilled.   I am only using it in
> >> > >
> >> > >   /* Make operand1 a register if it isn't already.  */
> >> > >   if (can_create_pseudo_p ()
> >> > >       && !register_operand (op0, mode)
> >> > >       && !register_operand (op1, mode))
> >> > >     {
> >> > >       /* NB: Don't increase stack alignment requirement when forcing
> >> > >          operand1 into a pseudo register to copy data from one memory
> >> > >          location to another since it doesn't require a spill.  */
> >> > >       emit_move_insn (op0,
> >> > >                       force_reg (GET_MODE (op0), op1,
> >> > >                                  (UNITS_PER_WORD * BITS_PER_UNIT)));
> >> > >       return;
> >> > >     }
> >> > >
> >> > > for vector moves.  RA shouldn't spill it.
> >> >
> >> > But this is the point: it's a case of hoping that the RA won't spill it,
> >> > rather than having a guarantee that it won't.
> >> >
> >> > Even if the moves start out adjacent, they could be separated by later
> >> > RTL optimisations, particularly scheduling.  (I realise pre-RA scheduling
> >> > isn't enabled by default for x86, but it can still be enabled explicitly.)
> >> > Or if the same data is being copied to two locations, we might reuse
> >> > values loaded by the first copy for the second copy as well.
> >
> > There are cases where pseudo vector registers are created as pure
> > temporary registers in the backend and they shouldn't ever be spilled
> > to stack.   They will be spilled to stack only if there are other non-temporary
> > vector register usage in which case stack will be properly re-aligned.
> > Caller of creating pseudo registers with specific alignment guarantees
> > that they are used only as pure temporary registers.
> 
> I don't think there's really a distinct category of pure temporary
> registers though.  The things I mentioned above can happen for any
> kind of pseudo register.
> 

This special pseudo register is only generated when inlining memcpy and
memset.  For memcpy, there is no need to spill:

[hjl@gnu-cfl-2 pieces]$ cat spill1.i
extern void *ops1;
extern void *ops2;

extern void bar (void);

void
foo (void)
{
  __builtin_memcpy (ops1, ops2, 32);
  bar ();
  __builtin_memcpy (ops1, ops2, 32);
}
[hjl@gnu-cfl-2 pieces]$ make spill1.s
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -O2 -march=haswell -S spill1.i
[hjl@gnu-cfl-2 pieces]$ cat spill1.s
	.file	"spill1.i"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	movq	ops2(%rip), %rax
	vmovdqu	(%rax), %ymm0
	movq	ops1(%rip), %rax
	vmovdqu	%ymm0, (%rax)
	vzeroupper
	call	bar
	movq	ops2(%rip), %rax
	vmovdqu	(%rax), %ymm0
	movq	ops1(%rip), %rax
	vmovdqu	%ymm0, (%rax)
	vzeroupper
	addq	$8, %rsp
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (GNU) 12.0.0 20210430 (experimental)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 pieces]$

For memeset, x86 backend supports unaligned spill:

[hjl@gnu-cfl-2 pieces]$ cat spill2.i
extern void *ops1;
extern void *ops2;

extern void bar (void);

void
foo (int c)
{
  __builtin_memset (ops1, c, 32);
  bar ();
  __builtin_memset (ops2, c, 32);
}
[hjl@gnu-cfl-2 pieces]$ make spill2.s
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -O2 -march=haswell -S spill2.i
[hjl@gnu-cfl-2 pieces]$ cat spill2.s
	.file	"spill2.i"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	subq	$40, %rsp
	.cfi_def_cfa_offset 48
	vmovd	%edi, %xmm0
	movq	ops1(%rip), %rax
	vpbroadcastb	%xmm0, %ymm0
	vmovdqu	%ymm0, (%rax)
	vmovdqu	%ymm0, (%rsp)
	vzeroupper
	call	bar
	movq	ops2(%rip), %rax
	vmovdqu	(%rsp), %ymm0
	vmovdqu	%ymm0, (%rax)
	vzeroupper
	addq	$40, %rsp
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (GNU) 12.0.0 20210430 (experimental)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 pieces]$


H.J.