[Bug target/50829] New: avx extra copy for _mm256_insertf128

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd
@ 2011-10-22 14:15 marc.glisse at normalesup dot org
  2011-10-22 16:10 ` [Bug target/50829] " ubizjak at gmail dot com
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: marc.glisse at normalesup dot org @ 2011-10-22 14:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

             Bug #: 50829
           Summary: avx extra copy for _mm256_insertf128_pd
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: marc.glisse@normalesup.org
            Target: x86_64-linux-gnu


With -Ofast -mavx (or -Os -mavx), this code:

__m256d concat(__m128d x){
    __m256d z=_mm256_castpd128_pd256(x);
    return _mm256_insertf128_pd(z,x,1);
}

is compiled (by a snapshot from Oct 10) to:

    .cfi_startproc
    pushq    %rbp
    .cfi_def_cfa_offset 16
    vmovapd    %xmm0, %xmm1
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    andq    $-32, %rsp
    addq    $16, %rsp
    vinsertf128    $0x1, %xmm0, %ymm1, %ymm0
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

Apart from all the fun with stack manipulation, this boils down to:
    vmovapd    %xmm0, %xmm1
    vinsertf128    $0x1, %xmm0, %ymm1, %ymm0

when it looks like this would be enough (and I tested it):
    vinsertf128    $0x1, %xmm0, %ymm0, %ymm0

I am not sure if gcc thinks that vinsertf128 shouldn't use the same register
for everything, or if it doesn't realize that it doesn't need to zero the upper
128 bits of the ymm register before calling insert. I understand that the avx
support is young, but avxintrin.h contains a comment saying that
_mm256_castpd128_pd256 "shouldn't generate any extra moves".

(I am not using broadcast because going through memory looks like a waste).


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
@ 2011-10-22 16:10 ` ubizjak at gmail dot com
  2011-10-22 17:02 ` marc.glisse at normalesup dot org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ubizjak at gmail dot com @ 2011-10-22 16:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra

--- Comment #1 from Uros Bizjak <ubizjak at gmail dot com> 2011-10-22 16:10:12 UTC ---
This looks similar to PR 34283, a RA problem.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
  2011-10-22 16:10 ` [Bug target/50829] " ubizjak at gmail dot com
@ 2011-10-22 17:02 ` marc.glisse at normalesup dot org
  2011-10-23  8:21 ` marc.glisse at normalesup dot org
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: marc.glisse at normalesup dot org @ 2011-10-22 17:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #2 from Marc Glisse <marc.glisse at normalesup dot org> 2011-10-22 17:01:51 UTC ---
(In reply to comment #1)
> This looks similar to PR 34283, a RA problem.

Ah, indeed. I also had a function that ended with:
    vmovapd    %xmm1, %xmm0
    vmaxpd    %xmm2, %xmm0, %xmm0
    vzeroupper
    ret

which looks related too. Thanks for the link.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
  2011-10-22 16:10 ` [Bug target/50829] " ubizjak at gmail dot com
  2011-10-22 17:02 ` marc.glisse at normalesup dot org
@ 2011-10-23  8:21 ` marc.glisse at normalesup dot org
  2011-10-23  8:33 ` [Bug rtl-optimization/50829] " ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: marc.glisse at normalesup dot org @ 2011-10-23  8:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #3 from Marc Glisse <marc.glisse at normalesup dot org> 2011-10-23 08:20:50 UTC ---
(In reply to comment #1)
> This looks similar to PR 34283, a RA problem.

PR 48037 too. I didn't find all of those before reporting because I was looking
for something AVX-specific, sorry.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (2 preceding siblings ...)
  2011-10-23  8:21 ` marc.glisse at normalesup dot org
@ 2011-10-23  8:33 ` ubizjak at gmail dot com
  2011-11-24  3:48 ` vmakarov at redhat dot com
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ubizjak at gmail dot com @ 2011-10-23  8:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011-10-23
                 CC|                            |law at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org
          Component|target                      |rtl-optimization
     Ever Confirmed|0                           |1

--- Comment #4 from Uros Bizjak <ubizjak at gmail dot com> 2011-10-23 08:32:37 UTC ---
Adding some CCs that might be interested in this RA problem(s).


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (3 preceding siblings ...)
  2011-10-23  8:33 ` [Bug rtl-optimization/50829] " ubizjak at gmail dot com
@ 2011-11-24  3:48 ` vmakarov at redhat dot com
  2011-11-24  5:26 ` [Bug target/50829] " pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: vmakarov at redhat dot com @ 2011-11-24  3:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Vladimir Makarov <vmakarov at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at redhat dot com

--- Comment #5 from Vladimir Makarov <vmakarov at redhat dot com> 2011-11-24 03:29:09 UTC ---
The following code is generated before RA:
...
(insn 7 3 11 2 (set (reg:V4DF 63)
        (unspec:V4DF [
                (reg/v:V2DF 62 [ x ])
            ] UNSPEC_CAST)) ./include/avxintrin.h:1413 1960 {avx_pd256_pd}
     (nil))

(insn 11 7 17 2 (set (reg:V4DF 65)
        (vec_concat:V4DF (vec_select:V2DF (reg:V4DF 63)
                (parallel [
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                    ]))
            (reg/v:V2DF 62 [ x ]))) ./include/avxintrin.h:715 1933
{vec_set_hi_v4df}
     (expr_list:REG_DEAD (reg:V4DF 63)
        (expr_list:REG_DEAD (reg/v:V2DF 62 [ x ])
            (nil))))
...

First of all unspec in insn 7 hides that 63 and 62 has the same value.  But
even if the unspec were absent, IRA as most other RAs finds conflicts based on
live ranges not on the value of in the pseudos.  The finding conflicts based on
GVN is very expensive and gives nothing on the most code (I did GVN based
conflict recognition about 8 years ago, it is described in the 2nd GCC summit
proceedings if I remember correctly).

As 62 and 63 conflicts they get different hard registers.

I guess that the right RTL generation (using one pseudo for 62 and 63) should
be done somewhere outside IRA.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (4 preceding siblings ...)
  2011-11-24  3:48 ` vmakarov at redhat dot com
@ 2011-11-24  5:26 ` pinskia at gcc dot gnu.org
  2011-11-24  7:20 ` vmakarov at redhat dot com
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-11-24  5:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |target

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-11-24 03:43:46 UTC ---
The move that is generated in the split should be in the bigger mode.  And then
apply the following patch:
        * regrename.c (maybe_mode_change): Return the reg in the
        new mode if the copy was done in the same mode size.
Index: regrename.c
===================================================================
--- regrename.c    (revision 55954)
+++ regrename.c    (revision 55955)
@@ -1322,6 +1322,15 @@ maybe_mode_change (enum machine_mode ori
            enum machine_mode new_mode, unsigned int regno,
            unsigned int copy_regno ATTRIBUTE_UNUSED)
 {
+  /*  If we are using the register in the copy mode (if the number of hard
+      registers is the same), just used the reg with the new mode.  */
+  if (GET_MODE_SIZE (copy_mode) == GET_MODE_SIZE (new_mode)
+      && hard_regno_nregs[copy_regno][copy_mode] ==
+     hard_regno_nregs[copy_regno][new_mode]
+      && hard_regno_nregs[regno][copy_mode] ==
+     hard_regno_nregs[copy_regno][new_mode])
+    return gen_rtx_raw_REG (new_mode, regno);
+
   if (GET_MODE_SIZE (copy_mode) < GET_MODE_SIZE (orig_mode)
       && GET_MODE_SIZE (copy_mode) < GET_MODE_SIZE (new_mode))
     return NULL_RTX;

This will at least remove it with -frename-registers which we most likely
should have on by default at -O2 and above now anyways.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (5 preceding siblings ...)
  2011-11-24  5:26 ` [Bug target/50829] " pinskia at gcc dot gnu.org
@ 2011-11-24  7:20 ` vmakarov at redhat dot com
  2011-11-24  7:23 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: vmakarov at redhat dot com @ 2011-11-24  7:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #7 from Vladimir Makarov <vmakarov at redhat dot com> 2011-11-24 03:45:24 UTC ---
As for stack allocation.  crtl->stack_realign_needed == 1 results in
frame_pointer_needed:=1 in ira.c::ira_setup_eliminable_regset.  I don't
remember the origin of the code.  Probably, it is from HJ's stack aligning
work.  Sorry, if I am wrong.

I guess we should re-evaluate frame_pointer_needed at the end of RA if we don't
allocate any memory in all RA.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (6 preceding siblings ...)
  2011-11-24  7:20 ` vmakarov at redhat dot com
@ 2011-11-24  7:23 ` pinskia at gcc dot gnu.org
  2012-12-01 16:30 ` glisse at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-11-24  7:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-11-24 03:47:06 UTC ---
I should note that have used that trick (and the same regrename.c patch) on two
different targets (PowerPC and MIPS) while at two different companies.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (7 preceding siblings ...)
  2011-11-24  7:23 ` pinskia at gcc dot gnu.org
@ 2012-12-01 16:30 ` glisse at gcc dot gnu.org
  2012-12-01 19:50 ` glisse at gcc dot gnu.org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-12-01 16:30 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |glisse at gcc dot gnu.org

--- Comment #9 from Marc Glisse <glisse at gcc dot gnu.org> 2012-12-01 16:30:23 UTC ---
The code with intrinsics still has the extra move, but note that this version
without intrinsics generates perfect code:

#include <x86intrin.h>
__m256d concat(__m128d x){
  __m256d z={x[0],x[1],x[0],x[1]};
  return z;
}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (8 preceding siblings ...)
  2012-12-01 16:30 ` glisse at gcc dot gnu.org
@ 2012-12-01 19:50 ` glisse at gcc dot gnu.org
  2012-12-01 20:26 ` hjl.tools at gmail dot com
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-12-01 19:50 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #10 from Marc Glisse <glisse at gcc dot gnu.org> 2012-12-01 19:50:28 UTC ---
Created attachment 28846
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28846
Use subreg

Hmm, I don't understand why we use UNSPEC_CAST. I tried the attached patch to
use a subreg instead. It passed bootstrap+testsuite and generates optimal code
for the testcase of this PR.

So, any idea what advantage the unspec has over a subreg? And if none, what is
the best way to use a subreg?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (9 preceding siblings ...)
  2012-12-01 19:50 ` glisse at gcc dot gnu.org
@ 2012-12-01 20:26 ` hjl.tools at gmail dot com
  2012-12-01 22:22 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: hjl.tools at gmail dot com @ 2012-12-01 20:26 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |areg.melikadamyan at gmail
                   |                            |dot com, hjl.tools at gmail
                   |                            |dot com

--- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> 2012-12-01 20:26:32 UTC ---
(In reply to comment #10)
> Created attachment 28846 [details]
> Use subreg
> 
> Hmm, I don't understand why we use UNSPEC_CAST. I tried the attached patch to
> use a subreg instead. It passed bootstrap+testsuite and generates optimal code
> for the testcase of this PR.
> 
> So, any idea what advantage the unspec has over a subreg? And if none, what is
> the best way to use a subreg?

subreg didn't work before.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (10 preceding siblings ...)
  2012-12-01 20:26 ` hjl.tools at gmail dot com
@ 2012-12-01 22:22 ` hjl.tools at gmail dot com
  2013-03-30 10:13 ` glisse at gcc dot gnu.org
  2020-08-10 17:31 ` glisse at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: hjl.tools at gmail dot com @ 2012-12-01 22:22 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> 2012-12-01 22:22:28 UTC ---
Also see PR 44551.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (11 preceding siblings ...)
  2012-12-01 22:22 ` hjl.tools at gmail dot com
@ 2013-03-30 10:13 ` glisse at gcc dot gnu.org
  2020-08-10 17:31 ` glisse at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-03-30 10:13 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #13 from Marc Glisse <glisse at gcc dot gnu.org> 2013-03-30 10:13:46 UTC ---
(In reply to comment #10)
> Created attachment 28846 [details]
> Use subreg

The patch was submitted at
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00683.html and rejected, see the
discussion in that thread. We need a way to completely remove the pattern.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/50829] avx extra copy for _mm256_insertf128_pd
  2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
                   ` (12 preceding siblings ...)
  2013-03-30 10:13 ` glisse at gcc dot gnu.org
@ 2020-08-10 17:31 ` glisse at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: glisse at gcc dot gnu.org @ 2020-08-10 17:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
      Known to work|                            |10.1.0
         Resolution|---                         |FIXED
      Known to fail|                            |9.3.0

--- Comment #14 from Marc Glisse <glisse at gcc dot gnu.org> ---
This was fixed (by Jakub I think).

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-10 17:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
2011-10-22 16:10 ` [Bug target/50829] " ubizjak at gmail dot com
2011-10-22 17:02 ` marc.glisse at normalesup dot org
2011-10-23  8:21 ` marc.glisse at normalesup dot org
2011-10-23  8:33 ` [Bug rtl-optimization/50829] " ubizjak at gmail dot com
2011-11-24  3:48 ` vmakarov at redhat dot com
2011-11-24  5:26 ` [Bug target/50829] " pinskia at gcc dot gnu.org
2011-11-24  7:20 ` vmakarov at redhat dot com
2011-11-24  7:23 ` pinskia at gcc dot gnu.org
2012-12-01 16:30 ` glisse at gcc dot gnu.org
2012-12-01 19:50 ` glisse at gcc dot gnu.org
2012-12-01 20:26 ` hjl.tools at gmail dot com
2012-12-01 22:22 ` hjl.tools at gmail dot com
2013-03-30 10:13 ` glisse at gcc dot gnu.org
2020-08-10 17:31 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).