public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/54400] New: recognize haddpd
@ 2012-08-29  6:10 glisse at gcc dot gnu.org
  2012-09-01  9:40 ` [Bug target/54400] " glisse at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-08-29  6:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

             Bug #: 54400
           Summary: recognize haddpd
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: glisse@gcc.gnu.org
            Target: x86_64-linux-gnu


Hello,

for this program:

#include <x86intrin.h>
double f(__m128d v){return v[1]+v[0];}

gcc -O3 -msse4 (same with -Os) generates:

    movapd    %xmm0, %xmm2
    unpckhpd    %xmm2, %xmm2
    movapd    %xmm2, %xmm1
    addsd    %xmm0, %xmm1
    movapd    %xmm1, %xmm0

(yes, the number of mov instructions is a bit high...)

Looking at the x86 backend, it can expand reduc_splus_v2df and
__builtin_ia32_haddpd, but it doesn't provide any pattern that could be
recognized. hsubpd is even less present.

It seems to me that, considering only the low part of the result of haddpd, the
pattern should be small enough to be matched: (plus (vec_select (match_operand
1) const_a) (vec_select (match_dup 1) const_b)) where a and b are 0 and 1 in
any order.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/54400] recognize haddpd
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
@ 2012-09-01  9:40 ` glisse at gcc dot gnu.org
  2012-09-03 10:13 ` [Bug middle-end/54400] " rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-09-01  9:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-09-01 09:40:14 UTC ---
The code below seems to optimize v[0]-v[1] and v[1]+v[0]. It doesn't recognize
v[0]+v[1], but that would not be too hard to add I guess. Compared to the true
hadd insn, I removed the setattr "type" "sseadd" because it crashed the
compiler (in cost computation maybe). Apart from the things left in here that
may not make sense, I don't know if a peephole would be more relevant. Maybe
the insn helps more if I want to recognize dot products (dppd) later on? At
least thanks to it {v[0]-v[1],w[0]-w[1]} is now recognized as a hsub (although
it doesn't work if v==w because vec_duplicate doesn't match vec_concat).

(define_insn "*sse3_h<plusminus_insn>v2df3_low_MARC"
  [(set (match_operand:DF 0 "register_operand" "=x,x")
        (plusminus:DF
          (vec_select:DF
            (match_operand:V2DF 1 "register_operand" "0,x")
            (parallel [(const_int 0)]))
          (vec_select:DF
            (match_dup 1)
            (parallel [(const_int 1)]))))]
  "TARGET_SSE3"
  "@
   h<plusminus_mnemonic>pd\t{%0, %0|%0, %0}
   vh<plusminus_mnemonic>pd\t{%1, %1, %0|%0, %1, %1}"
  [(set_attr "isa" "noavx,avx")
   (set_attr "prefix" "orig,vex")
   (set_attr "mode" "V2DF")])


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize haddpd
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
  2012-09-01  9:40 ` [Bug target/54400] " glisse at gcc dot gnu.org
@ 2012-09-03 10:13 ` rguenth at gcc dot gnu.org
  2012-09-03 10:22 ` glisse at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-03 10:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-09-03
          Component|target                      |middle-end
             Blocks|                            |53947
     Ever Confirmed|0                           |1

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-03 10:12:36 UTC ---
The basic-block vectorizer does not handle reductions and it would be another
natural place to perform this optimization.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize haddpd
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
  2012-09-01  9:40 ` [Bug target/54400] " glisse at gcc dot gnu.org
  2012-09-03 10:13 ` [Bug middle-end/54400] " rguenth at gcc dot gnu.org
@ 2012-09-03 10:22 ` glisse at gcc dot gnu.org
  2012-10-08 20:46 ` glisse at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-09-03 10:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2012-09-03 10:21:48 UTC ---
(In reply to comment #2)
> The basic-block vectorizer does not handle reductions and it would be another
> natural place to perform this optimization.

I thought about turning a PLUS_EXPR of BIT_FIELD_REF into a REDUC_PLUS_EXPR (in
forwprop), but that wouldn't handle the MINUS_EXPR case (can still be worth
doing though, especially if the code is common to the other reductions).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize haddpd
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2012-09-03 10:22 ` glisse at gcc dot gnu.org
@ 2012-10-08 20:46 ` glisse at gcc dot gnu.org
  2012-10-08 21:03 ` [Bug middle-end/54400] recognize vector reductions glisse at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-08 20:46 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-08 20:46:04 UTC ---
Author: glisse
Date: Mon Oct  8 20:45:56 2012
New Revision: 192223

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192223
Log:
2012-10-08  Marc Glisse  <marc.glisse@inria.fr>

gcc/
    PR target/54400
    * config/i386/i386.md (type attribute): Add sseadd1.
    (unit attribute): Add support for sseadd1.
    (memory attribute): Likewise.
    * config/i386/athlon.md: Likewise.
    * config/i386/core2.md: Likewise.
    * config/i386/atom.md: Likewise.
    * config/i386/ppro.md: Likewise.
    * config/i386/bdver1.md: Likewise.
    * config/i386/sse.md (sse3_h<plusminus_insn>v2df3): split into...
    (sse3_haddv2df3): ... expander.
    (*sse3_haddv2df3): ... define_insn. Accept permuted operands.
    (sse3_hsubv2df3): ... define_insn.
    (*sse3_haddv2df3_low): New define_insn.
    (*sse3_hsubv2df3_low): New define_insn.

gcc/testsuite/
    PR target/54400
    * gcc.target/i386/pr54400.c: New testcase.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr54400.c   (with props)
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/athlon.md
    trunk/gcc/config/i386/atom.md
    trunk/gcc/config/i386/bdver1.md
    trunk/gcc/config/i386/core2.md
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/ppro.md
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Propchange: trunk/gcc/testsuite/gcc.target/i386/pr54400.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/i386/pr54400.c
            ('svn:keywords' added)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize vector reductions
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2012-10-08 20:46 ` glisse at gcc dot gnu.org
@ 2012-10-08 21:03 ` glisse at gcc dot gnu.org
  2012-10-25  8:40 ` vincenzo.innocente at cern dot ch
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-08 21:03 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|recognize haddpd            |recognize vector reductions

--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-08 21:02:56 UTC ---
Renaming since the specific x86 case is done, and this is now a middle-end PR.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize vector reductions
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2012-10-08 21:03 ` [Bug middle-end/54400] recognize vector reductions glisse at gcc dot gnu.org
@ 2012-10-25  8:40 ` vincenzo.innocente at cern dot ch
  2021-06-08 13:36 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-10-25  8:40 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

vincenzo Innocente <vincenzo.innocente at cern dot ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vincenzo.innocente at cern
                   |                            |dot ch

--- Comment #6 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-10-25 08:40:27 UTC ---
*** Bug 55071 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize vector reductions
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2012-10-25  8:40 ` vincenzo.innocente at cern dot ch
@ 2021-06-08 13:36 ` rguenth at gcc dot gnu.org
  2021-06-08 22:04 ` glisse at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-08 13:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, with vectorizer support we get

  <bb 2> [local count: 1073741824]:
  _7 = .REDUC_PLUS (v_3(D)); [tail call]
  return _7;

but

f:
.LFB5669:
        .cfi_startproc
        movapd  %xmm0, %xmm1
        unpckhpd        %xmm0, %xmm0
        addpd   %xmm1, %xmm0
        ret

(note avoiding hadd in the reduc pattern was intended).  Not sure if
two element reduction vectorization is worthwhile - my current patch
bails w/o -fassociative-math (which is supposedly unnecessary for
two elements).

But mine anyway.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize vector reductions
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-06-08 13:36 ` rguenth at gcc dot gnu.org
@ 2021-06-08 22:04 ` glisse at gcc dot gnu.org
  2021-06-17  7:52 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-06-08 22:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

--- Comment #8 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> (note avoiding hadd in the reduc pattern was intended).

Indeed. Except with -Os, or if a processor with a fast hadd appears,
vectorising this doesn't bring anything. It doesn't hurt either though.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize vector reductions
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-06-08 22:04 ` glisse at gcc dot gnu.org
@ 2021-06-17  7:52 ` cvs-commit at gcc dot gnu.org
  2021-06-17  8:00 ` rguenth at gcc dot gnu.org
  2021-09-11 14:19 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-06-17  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:3dfa4fe9f1a089b2b3906c83e22a1b39c49d937c

commit r12-1551-g3dfa4fe9f1a089b2b3906c83e22a1b39c49d937c
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jun 8 15:10:45 2021 +0200

    Vectorization of BB reductions

    This adds a simple reduction vectorization capability to the
    non-loop vectorizer.  Simple meaning it lacks any of the fancy
    ways to generate the reduction epilogue but only supports
    those we can handle via a direct internal function reducing
    a vector to a scalar.  One of the main reasons is to avoid
    massive refactoring at this point but also that more complex
    epilogue operations are hardly profitable.

    Mixed sign reductions are for now fend off and I'm not finally
    settled with whether we want an explicit SLP node for the
    reduction epilogue operation.  Handling mixed signs could be
    done by multiplying with a { 1, -1, .. } vector.  Fend off
    are also reductions with non-internal operands (constants
    or register parameters for example).

    Costing is done by accounting the original scalar participating
    stmts for the scalar cost and log2 permutes and operations for
    the vectorized epilogue.

    --

    SPEC CPU 2017 FP with rate workload measurements show (picked
    fastest runs of three) regressions for 507.cactuBSSN_r (1.5%),
    508.namd_r (2.5%), 511.povray_r (2.5%), 526.blender_r (0.5) and
    527.cam4_r (2.5%) and improvements for 510.parest_r (5%) and
    538.imagick_r (1.5%).  This is with -Ofast -march=znver2 on a Zen2.

    Statistics on CPU 2017 shows that the overwhelming number of seeds
    we find are reductions of two lanes (well - that's basically every
    associative operation).  That means we put a quite high pressure
    on the SLP discovery process this way.

    In total we find 583218 seeds we put to SLP discovery out of which
    66205 pass that and only 6185 of those make it through
    code generation checks. 796 of those are discarded because the reduction
    is part of a larger SLP instance.  4195 of the remaining
    are deemed not profitable to vectorize and 1194 are finally
    vectorized.  That's a poor 0.2% rate.

    Of the 583218 seeds 486826 (83%) have two lanes, 60912 have three (10%),
    28181 four (5%), 4808 five, 909 six and there are instances up to 120
    lanes.

    There's a set of 54086 candidate seeds we reject because
    they contain a constant or invariant (not implemented yet) but still
    have two or more lanes that could be put to SLP discovery.

    2021-06-16  Richard Biener   <rguenther@suse.de>

            PR tree-optimization/54400
            * tree-vectorizer.h (enum slp_instance_kind): Add
            slp_inst_kind_bb_reduc.
            (reduction_fn_for_scalar_code): Declare.
            * tree-vect-data-refs.c (vect_slp_analyze_instance_dependence):
            Check SLP_INSTANCE_KIND instead of looking at the
            representative.
            (vect_slp_analyze_instance_alignment): Likewise.
            * tree-vect-loop.c (reduction_fn_for_scalar_code): Export.
            * tree-vect-slp.c (vect_slp_linearize_chain): Split out
            chain linearization from vect_build_slp_tree_2 and generalize
            for the use of BB reduction vectorization.
            (vect_build_slp_tree_2): Adjust accordingly.
            (vect_optimize_slp): Elide permutes at the root of BB reduction
            instances.
            (vectorizable_bb_reduc_epilogue): New function.
            (vect_slp_prune_covered_roots): Likewise.
            (vect_slp_analyze_operations): Use them.
            (vect_slp_check_for_constructors): Recognize associatable
            chains for BB reduction vectorization.
            (vectorize_slp_instance_root_stmt): Generate code for the
            BB reduction epilogue.

            * gcc.dg/vect/bb-slp-pr54400.c: New testcase.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize vector reductions
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-06-17  7:52 ` cvs-commit at gcc dot gnu.org
@ 2021-06-17  8:00 ` rguenth at gcc dot gnu.org
  2021-09-11 14:19 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-17  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Declaring fixed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/54400] recognize vector reductions
  2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-06-17  8:00 ` rguenth at gcc dot gnu.org
@ 2021-09-11 14:19 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-11 14:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-09-11 14:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-29  6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
2012-09-01  9:40 ` [Bug target/54400] " glisse at gcc dot gnu.org
2012-09-03 10:13 ` [Bug middle-end/54400] " rguenth at gcc dot gnu.org
2012-09-03 10:22 ` glisse at gcc dot gnu.org
2012-10-08 20:46 ` glisse at gcc dot gnu.org
2012-10-08 21:03 ` [Bug middle-end/54400] recognize vector reductions glisse at gcc dot gnu.org
2012-10-25  8:40 ` vincenzo.innocente at cern dot ch
2021-06-08 13:36 ` rguenth at gcc dot gnu.org
2021-06-08 22:04 ` glisse at gcc dot gnu.org
2021-06-17  7:52 ` cvs-commit at gcc dot gnu.org
2021-06-17  8:00 ` rguenth at gcc dot gnu.org
2021-09-11 14:19 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).