Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Michael Meissner <meissner@linux.ibm.com>
To: Ajit Agarwal <aagarwa1@linux.ibm.com>
Cc: Richard Biener <richard.guenther@gmail.com>,
	"Kewen.Lin" <linkw@linux.ibm.com>,
	Vladimir Makarov <vmakarov.gcc@gmail.com>,
	Michael Meissner <meissner@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Peter Bergner <bergner@linux.ibm.com>,
	David Edelsohn <dje.gcc@gmail.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp
Date: Thu, 18 Jan 2024 23:19:00 -0500	[thread overview]
Message-ID: <Zan4NM7CmsLVtFdo@cowardly-lion.the-meissners.org> (raw)
In-Reply-To: <00272349-aa2a-4ea3-9859-913b7b4fe049@linux.ibm.com>

On Mon, Jan 15, 2024 at 06:25:13PM +0530, Ajit Agarwal wrote:
> Also Mike and Kewwn suggested to use this pass \before IRA register
> allocator. They are in To List. They have other concerns doing after 
> register allocator.
> 
> They have responded in other mail Chain.

The problem with doing it after register allocation is it limits the hit rate
to the situation where the register allocation happened to guess right, and
allocated adjacent registers.

Note, the PowerPC has some twists:

1) load/store vector pair must use an even/odd VSX register pair.

2) Some instructions only operate on traditional FPR registers (VSX registers
0..31) and others only operate on traditional Altivec registers (VSX reigsters
32..63).  I.e. if you are doing a load vector pair, and you are going to do say
a V2DI vector add, you need to load the vector pair into Altivec registers to
avoid having to do a copy operation.

In general, I tend to feel stuffing things into a larger register and then
using SUBREG is going to be often times generate other moves.  On the PowerPC
right now, we can't even use SUBREG of OOmode (the 256-bit opaque type), but
Peter has patches to deal with some of the issues.

But at the moment, we don't have support for expressing this load such that
register allocation can handle it.

Rather than using a large register mode, I tend to feel that we should enhace
match_parallel so that register allocation can allocate the registers
sequentially.  Now, I haven't looked at match_parallel for 15-20 years, but my
sense was it only worked for fixed registers generated elsewhere (such as for
the load/store string instruction support).

I.e. rather than doing something like:

	(set (reg:OO <oo_reg1>)
	     (mem:OO <oo_mem1>))

	(set (reg:V2DF <v2df_reg1>)
	     (subreg:V2DF (reg:OO <oo_reg1>) 0))

	(set (reg:V2DF <v2df_reg2>)
	     (subreg:V2DF (reg:OO <oo_reg1>) 16))

	; do stuff involving v2df_reg1 and v2df_reg2

	(clobber (reg:OO <oo_reg2>)

	(set (subreg:V2DF (reg:OO <oo_reg2>) 0)
	     (reg:V2DF <v2df_reg1>))

	(set (subreg:V2DF (reg:OO <oo_reg2>) 16)
	     (reg:V2DF <v2df_reg2>))

	(set (mem:OO <oo_mem2>)
	     (reg:OO <oo_reg2>))

We would do:

	(parallel [(set (reg:V2DF <v2df_reg1>)
	                (mem:V2DF <v2df_mem1>))
	           (set (reg:V2DF <v2df_reg2>)
		        (mem:V2DF <v2df_mem2>)))])

	; do stuff involving v2df_reg1 and v2df_reg2

	(parallel [(set (mem:V2DF <v2df_mem3>)
	                (reg:V2DF <v2df_reg1>))
		   (set (mem:V2DF <v2df_mem4>)
		        (reg:V2DF <v2df_reg2>))])

Now in those two parallels above, we would need to use match_parallel to ensure
that the registers are allocated sequentially (and in the PowerPC, start on an
even VSX register), and the addresses are bumped up by 16 bytes.

Ideally, the combiner should try to combine things, but it may be simpler to
use a separate MD pass.

It would be nice if we had a standard constraint mechanism like %<n> that says
use %<n> but add 1/2/3/etc. to the register number if it is a REG, or a
size*number added to a memory address if it is a MEM.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

     prev parent reply	other threads:[~2024-01-19  4:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-14 15:25 Ajit Agarwal
2024-01-15  9:03 ` Ajit Agarwal
2024-01-15  9:33 ` Richard Biener
2024-01-15 12:44   ` Ajit Agarwal
2024-01-15 12:55     ` Ajit Agarwal
2024-01-15 22:22       ` Ajit Agarwal
2024-01-17  7:02         ` Kewen.Lin
2024-01-17  9:34           ` Ajit Agarwal
2024-01-17 14:28             ` Michael Matz
2024-01-18 12:17               ` Ajit Agarwal
2024-01-19  4:19       ` Michael Meissner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zan4NM7CmsLVtFdo@cowardly-lion.the-meissners.org \
    --to=meissner@linux.ibm.com \
    --cc=aagarwa1@linux.ibm.com \
    --cc=bergner@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=richard.guenther@gmail.com \
    --cc=richard.sandiford@arm.com \
    --cc=segher@kernel.crashing.org \
    --cc=vmakarov.gcc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).