Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Alan Lawrence <alan.lawrence@arm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	 Michael Meissner <meissner@linux.vnet.ibm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	 David Edelsohn <dje.gcc@gmail.com>
Subject: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
Date: Wed, 12 Nov 2014 18:53:00 -0000	[thread overview]
Message-ID: <5463ABD0.8020004@arm.com> (raw)
In-Reply-To: <54635096.1040508@arm.com>

Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using 
this snippet on top of original patch; no regressions.

Alan Lawrence wrote:
> So I'm no expert on RS6000 here, but following on from Segher's observation 
> about the change in pattern...so the difference in 'expand' is exactly that, a 
> vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a 
> vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by combining 
> the two previous insns.
> 
> However, inspecting the logs from -fdump-rtl-combine-all, *without* my patch, 
> when the combiner tries to put those two together, I see:
> 
> Trying 30 -> 31:
> Failed to match this instruction:
> (set (reg:DF 179 [ stmp_s_5.7D.2196 ])
>      (vec_select:DF (plus:V2DF (vec_select:V2DF (reg:V2DF 173 [ vect_s_5.6D.2195 ])
>                  (parallel [
>                          (const_int 1 [0x1])
>                          (const_int 0 [0])
>                      ]))
>              (reg:V2DF 173 [ vect_s_5.6D.2195 ]))
>          (parallel [
>                  (const_int 1 [0x1])
>              ])))
> 
> That is, it looks like combine_simplify_rtx has transformed the (vec_concat 
> (vec_select ... 1) (vec_select ... 0)) from the vsx_reduc_plus_v2df insn, into a 
> single vec_select, which does not match the vsx_reduc_plus_v2df_scalar insn.
> 
> So despite the comment (in vsx.md):
> 
> ;; Combiner patterns with the vector reduction patterns that knows we can get
> ;; to the top element of the V2DF array without doing an extract.
> 
> It looks like the code generation prior to my patch, considered better, was 
> because the combiner didn't actually use the pattern?
> 
> In that case whilst you may want to dig into register allocation, 
> cannot_change_mode_class, etc., for other reasons, I think the best fix for 
> migrating to reduc_plus_scal... is simply to avoid using the "Combiner" patterns 
> and just emit two insns, the old pattern followed by a vec_extract. The attached 
> snippet does this (I won't call it a patch yet, and it applies on top of the 
> previous patch - I went the route of calling the two gen functions rather than 
> copying their RTL sequences, but could do the latter if that were 
> preferable???), and restores code generation to the original form on your 
> example above; it bootstraps OK but I'm still running check-gcc on the Compile 
> Farm...
> 
> However, again on your example above, I note that if I *remove* the 
> reduc_plus_scal_v2df pattern altogether, I get:
> 
> .sum:
>          li 10,512        # 52   *movdi_internal64/4     [length = 4]
>          ld 9,.LC2@toc(2)         # 20   *movdi_internal64/2     [length = 4]
>          xxlxor 0,0,0     # 17   *vsx_movv2df/12 [length = 4]
>          mtctr 10         # 48   *movdi_internal64/11    [length = 4]
>          .align 4
> .L2:
>          lxvd2x 12,0,9    # 23   *vsx_movv2df/2  [length = 4]
>          addi 9,9,16      # 25   *adddi3_internal1/2     [length = 4]
>          xvadddp 0,0,12   # 24   *vsx_addv2df3/1 [length = 4]
>          bdnz .L2         # 47   *ctrdi_internal1/1      [length = 4]
>          xxsldwi 12,0,0,2         # 30   vsx_xxsldwi_v2df        [length = 4]
>          xvadddp 1,0,12   # 31   *vsx_addv2df3/1 [length = 4]
>          nop      # 37   *vsx_extract_v2df_internal2/1   [length = 4]
>          blr      # 55   return  [length = 4]
> 
> this is presumably using gcc's scalar reduction code, but (to my untrained eye 
> on powerpc!) it looks even better than the first form above (the same in the 
> loop, and in the reduction, an xxpermdi is replaced by a nop !)...
> 
> --Alan
> 
> 
> Segher Boessenkool wrote:
>> On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote:
>>> However, the double pattern is completely broken.  This cannot go in.
>> [snip]
>>
>>> It is unacceptable to have to do the inner loop doing a load, vector add, and
>>> store in the loop.
>> Before the patch, the final reduction used *vsx_reduc_splus_v2df; after
>> the patch, it is *vsx_reduc_plus_v2df_scalar.  The former does a vector
>> add, the latter a float add.  And it uses the same pseudoregister for the
>> accumulator throughout.  IRA decides a register is more expensive than
>> memory for this, I suppose because it wants both V2DF and DF?  It doesn't
>> seem to like the subreg very much.
>>
>> The new code does look nicer otherwise :-)
>>
>>
>> Segher

next prev parent reply	other threads:[~2014-11-12 18:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-24 11:57 [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral Alan Lawrence
2014-10-24 11:58 ` [PATCH 7/11][ARM] Migrate to new reduc_plus_scal_optab Alan Lawrence
2014-11-03 17:32   ` Ramana Radhakrishnan
2014-10-24 12:01 ` [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral Richard Biener
2014-10-24 12:05 ` [PATCH 8/11][ARM] Migrate to new reduc_[us](min|max)_scal_optab Alan Lawrence
2014-11-04 11:08   ` Ramana Radhakrishnan
2014-10-24 12:06 ` [PATCH 9/11][i386] Migrate reduction optabs to reduc_..._scal Alan Lawrence
2014-10-24 12:07 ` [PATCH 10/11][RS6000] " Alan Lawrence
2014-10-24 12:14   ` Alan Lawrence
2014-10-25  0:08   ` David Edelsohn
2014-11-03 17:51     ` Bill Schmidt
2014-11-06 16:44       ` Alan Lawrence
2014-11-06 18:57         ` Bill Schmidt
2014-11-07 10:09           ` Alan Lawrence
2014-11-10 22:39   ` Michael Meissner
2014-11-11  7:10     ` Segher Boessenkool
2014-11-12  1:54       ` Michael Meissner
2014-11-12  9:26         ` Segher Boessenkool
2014-11-12 19:20           ` Michael Meissner
2014-11-12 12:32       ` Alan Lawrence
2014-11-12 18:53         ` Alan Lawrence [this message]
2014-12-11 15:59           ` Ping: " Alan Lawrence
2014-12-11 18:37             ` Alan Lawrence
2014-10-24 12:12 ` [Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab Alan Lawrence
2014-10-24 12:50   ` Alan Lawrence
2014-10-24 15:19 ` [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral Matthew Fortune
2014-10-27 11:48   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5463ABD0.8020004@arm.com \
    --to=alan.lawrence@arm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=meissner@linux.vnet.ibm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).