Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Maciej W. Rozycki" <macro@embecosm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Rainer Orth <ro@cebitec.uni-bielefeld.de>,
	 Mike Stump <mikestump@comcast.net>,
	gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
Date: Tue, 11 Jul 2023 16:01:24 +0100 (BST)	[thread overview]
Message-ID: <alpine.DEB.2.20.2307111423190.28892@tpp.orcam.me.uk> (raw)
In-Reply-To: <CAFiYyc23ujqdOfJ=R5uQL1YycJvxgdiyMuYY=-+J2ppSu7JDvA@mail.gmail.com>

On Fri, 7 Jul 2023, Richard Biener wrote:

> > The bb-slp-pr95839.c test assumes quad-single float vector support, but
> > some targets only support pairs of floats, causing this test to fail
> > with such targets.  Limit this test to targets that support at least
> > 128-bit vectors then, and add a complementing test that can be run with
> > targets that have support for 64-bit vectors only.  There is no need to
> > adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest
> > vector of doubles, so support is implied by the presence of vectors of
> > doubles.
> 
> I wonder why you see the testcase FAIL, on x86-64 when doing
> 
> typedef float __attribute__((vector_size(32))) v4f32;
> 
> v4f32 f(v4f32 a, v4f32 b)
> {
>   /* Check that we vectorize this CTOR without any loads.  */
>   return (v4f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3],
>   a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]};
> }
> 
> I see we vectorize the add and the "store".  We fail to perform
> extraction from the incoming vectors (unless you enable AVX),
> that's a missed optimization.
> 
> So with paired floats I would expect sth similar?  Maybe
> x86 is saved by kind-of-presence (but disabled) of V8SFmode vectors.

 I am not familiar enough with this stuff to answer your question.

 As we pass and return V2SF data in FP registers just as with complex 
float data with this hardware the function from my bb-slp-pr95839-v8.c 
expands to a single vector FP add instruction, followed by a function 
return.

 Conversely, the original function from bb-slp-pr95839.c expands to a 
sequence of 22 instructions to extract incoming vector FP data from 4 
64-bit GPRs into 8 FPRs, add the vectors piecemeal with 4 scalar FP add 
instructions, and then insert outgoing vector FP data from 4 FPRs back to 
2 64-bit GPRs.  As an experiment I have modified the backend minimally so 
as to pass and return V4SF data in FP registers as well, but that didn't 
make the vectoriser trigger.

> That said, we should handle this better so can you file an
> enhancement bugreport for this?

 Filed as PR -optimization/110630.  I can't publish RISC-V information 
related to the hardware affected, but as a quick check I ran the MIPS 
compiler:

$ mips-linux-gnu-gcc -march=mips64 -mabi=64 -mpaired-single -O2 -S bb-slp-pr95839*.c

and got this code for bb-slp-pr95839-v8.c (mind the branch delay slot):

	jr	$31
	add.ps	$f0,$f12,$f13

vs code for bb-slp-pr95839.c:

	daddiu	$sp,$sp,-64
	sd	$5,24($sp)
	sd	$7,40($sp)
	lwc1	$f0,24($sp)
	lwc1	$f1,40($sp)
	sd	$4,16($sp)
	sd	$6,32($sp)
	add.s	$f3,$f0,$f1
	lwc1	$f0,28($sp)
	lwc1	$f1,44($sp)
	lwc1	$f4,36($sp)
	swc1	$f3,56($sp)
	add.s	$f2,$f0,$f1
	lwc1	$f0,16($sp)
	lwc1	$f1,32($sp)
	swc1	$f2,60($sp)
	add.s	$f1,$f0,$f1
	lwc1	$f0,20($sp)
	ld	$3,56($sp)
	add.s	$f0,$f0,$f4
	swc1	$f1,48($sp)
	swc1	$f0,52($sp)
	ld	$2,48($sp)
	jr	$31
	daddiu	$sp,$sp,64

so this is essentially the same scenario (up to the machine instruction 
count), and therefore it seems backend-agnostic.  I can imagine the latter 
case could expand to something like (instruction reordering surely needed 
for performance omitted for clarity):

	dmtc1	$4,$f0
	dmtc1	$5,$f1
	dmtc1	$6,$f2
	dmtc1	$7,$f3
	add.ps	$f0,$f0,$f1
	add.ps	$f2,$f2,$f3
	dmfc1	$2,$f0
	jr	$31
	dmfc1	$3,$f2

saving a lot of cycles, and removing the need for spilling temporaries to 
the stack and for frame creation in the first place.

 Do you agree it still makes sense to include bb-slp-pr95839-v8.c with the 
testsuite?

  Maciej

next prev parent reply	other threads:[~2023-07-11 15:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-06 21:35 [PATCH 0/3] testsuite: Exclude vector tests for unsupported targets Maciej W. Rozycki
2023-07-06 21:35 ` [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported Maciej W. Rozycki
2023-07-07  6:24   ` Richard Biener
2023-07-11 15:00     ` Maciej W. Rozycki
2023-07-06 21:36 ` [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c Maciej W. Rozycki
2023-07-07  6:33   ` Richard Biener
2023-07-11 15:01     ` Maciej W. Rozycki [this message]
2023-07-12  7:15       ` Richard Biener
2023-07-19 14:34         ` Maciej W. Rozycki
2023-07-20  7:15           ` Richard Biener
2023-07-20  9:16             ` Maciej W. Rozycki
2023-07-20 12:09               ` Richard Biener
2023-07-20 13:02                 ` Maciej W. Rozycki
2023-07-06 21:36 ` [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c Maciej W. Rozycki
2023-07-07  6:33   ` Richard Biener
2023-07-11 15:01     ` Maciej W. Rozycki
2023-07-12  7:15       ` Richard Biener
2023-07-19 10:29         ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.2307111423190.28892@tpp.orcam.me.uk \
    --to=macro@embecosm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=mikestump@comcast.net \
    --cc=richard.guenther@gmail.com \
    --cc=ro@cebitec.uni-bielefeld.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).