public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/53101] New: Recognize casts to sub-vectors
@ 2012-04-24 10:18 marc.glisse at normalesup dot org
  2012-04-24 11:14 ` [Bug target/53101] " rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-04-24 10:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

             Bug #: 53101
           Summary: Recognize casts to sub-vectors
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: marc.glisse@normalesup.org
            Target: x86_64-linux-gnu


Hello,

starting from an AVX __m256d vector x, getting its first element is best done
with *(double*)&x, which is what x[0] internally does, and which generates no
instruction (well, the following has vzeroupper, but let's forget that).
However, *(__m128d*)&x generates 2 movs and I have to explicitly use
_mm256_extractf128_pd to get the proper nop. Could the compiler be taught to
recognize the casts between pointers to vectors of the same object type the
same way it recognizes casts to pointers to that object type?

#include <x86intrin.h>
#if 0
typedef double T;
#else
typedef __m128d T;
#endif
T f(__m256d x){
  return *(T*)&x;
}

The closest report I found is PR 44551, which is quite different. PR 29881
shows that using a union is not an interesting alternative. I marked this one
as target, but it may very well be that the recognition should be in the
middle-end, or even that the front-end should mark the cast somehow.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/53101] Recognize casts to sub-vectors
  2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
@ 2012-04-24 11:14 ` rguenth at gcc dot gnu.org
  2012-05-01 15:10 ` marc.glisse at normalesup dot org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-04-24 11:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-04-24
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-04-24 11:13:33 UTC ---
We get MEM[(T * {ref-all})&x] for the casting (not a BIT_FIELD_REF for
example).
This gets expanded to

(insn 6 5 7 (set (reg:OI 63)
        (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1
     (nil))

(insn 7 6 8 (set (reg:V2DF 60 [ <retval> ])
        (subreg:V2DF (reg:OI 63) 0)) t.c:8 -1
     (nil))

but that should be perfectly optimizable.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/53101] Recognize casts to sub-vectors
  2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
  2012-04-24 11:14 ` [Bug target/53101] " rguenth at gcc dot gnu.org
@ 2012-05-01 15:10 ` marc.glisse at normalesup dot org
  2012-05-01 17:18 ` marc.glisse at normalesup dot org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-05-01 15:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

--- Comment #2 from Marc Glisse <marc.glisse at normalesup dot org> 2012-05-01 15:10:26 UTC ---
(In reply to comment #1)
> We get MEM[(T * {ref-all})&x] for the casting (not a BIT_FIELD_REF for
> example).
> This gets expanded to
> 
> (insn 6 5 7 (set (reg:OI 63)
>         (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1
>      (nil))
> 
> (insn 7 6 8 (set (reg:V2DF 60 [ <retval> ])
>         (subreg:V2DF (reg:OI 63) 0)) t.c:8 -1
>      (nil))
> 
> but that should be perfectly optimizable.

A bit hard for me (never touched those md files before)... This obviously
incorrect code does the transformation:

(define_peephole2
[
(set
 (match_operand:V8SF 2 "memory_operand")
 (match_operand:V8SF 1 "register_operand")
)
(set
 (match_operand:V4SF 0 "register_operand")
 (match_operand:V4SF 3 "memory_operand")
)
]
  "TARGET_AVX"
[(const_int 0)]
{
  emit_insn (gen_vec_extract_lo_v8sf (operands[0], operands[1]));
  DONE;
})

(the code in this experiment uses __v4sf and __v8sf instead of __m128d/__m256d
in the description above)

but operands[2] and operands[3] don't compare equal with rtx_equal_p, and
trying a match_dup refuses to compile because of the mode mismatch, so I don't
know how to constrain 2 and 3 to be "the same". I tried adding some (subreg:
...) in there, but it didn't match, and looking at the rtl peephole dump, there
isn't any subreg there.

Then maybe peephole isn't the right place, but that's the only one where I
managed to get something that compiles and is executed by the compiler on this
testcase.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/53101] Recognize casts to sub-vectors
  2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
  2012-04-24 11:14 ` [Bug target/53101] " rguenth at gcc dot gnu.org
  2012-05-01 15:10 ` marc.glisse at normalesup dot org
@ 2012-05-01 17:18 ` marc.glisse at normalesup dot org
  2012-05-03 19:20 ` marc.glisse at normalesup dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-05-01 17:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

--- Comment #3 from Marc Glisse <marc.glisse at normalesup dot org> 2012-05-01 17:17:42 UTC ---
(In reply to comment #2)
> but operands[2] and operands[3] don't compare equal with rtx_equal_p, and
> trying a match_dup refuses to compile because of the mode mismatch, so I don't
> know how to constrain 2 and 3 to be "the same".

rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))

seems to give the right answer in the 3 manual tests I did. Currently checking
if the testsuite finds something. It is very likely not the right way to do it,
but I didn't find any inspiring pattern in the .md files.

Then I'll see if I understand how the fancy macros make it possible to have a
single piece of code for all modes, and if instead of calling
gen_vec_extract_lo_v8sf I shouldn't give a replacement pattern like (set
(match_dup 0) (vec_select (match_dup 1) (const_int 0))).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/53101] Recognize casts to sub-vectors
  2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
                   ` (2 preceding siblings ...)
  2012-05-01 17:18 ` marc.glisse at normalesup dot org
@ 2012-05-03 19:20 ` marc.glisse at normalesup dot org
  2012-05-06 20:31 ` glisse at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-05-03 19:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

--- Comment #4 from Marc Glisse <marc.glisse at normalesup dot org> 2012-05-03 19:19:00 UTC ---
(define_peephole2
  [(set (mem:VI8F_256 (match_operand 2))
        (match_operand:VI8F_256 1 "register_operand"))
   (set (match_operand:<ssehalfvecmode> 0 "register_operand")
        (mem:<ssehalfvecmode> (match_dup 2)))]
  "TARGET_AVX"
  [(set (match_dup 0)
        (vec_select:<ssehalfvecmode> (match_dup 1)
                                     (parallel [(const_int 0) (const_int
1)])))]
)

(and similar for VI4F_256) is much less hackish than the XEXP stuff. I was
quite sure I'd tested exactly this and it didn't work, but now it looks like it
does :-/

Except that following http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00197.html ,
this is not the right place to try and add such logic. That's a good thing
because it is way too fragile, another instruction can easily squeeze between
the two sets and disable the peephole.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/53101] Recognize casts to sub-vectors
  2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
                   ` (3 preceding siblings ...)
  2012-05-03 19:20 ` marc.glisse at normalesup dot org
@ 2012-05-06 20:31 ` glisse at gcc dot gnu.org
  2012-11-11 22:18 ` glisse at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-05-06 20:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |glisse at gcc dot gnu.org

--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> 2012-05-06 19:55:58 UTC ---
Getting a BIT_FIELD_REF in gimple is simple enough. If I remember correctly,
the patch below was enough (it also fixes the fact the current code will
generate a BIT_FIELD_REF for true elements of the vector and one past the end,
but not further, and I am not sure why that one-past-the-end behavior would be
wanted).

I did it because I thought it would be easier to convince the compiler to
expand that to a vec_select, and I tried patching get_inner_reference and
extract_bit_field_1, but no luck, I give up for now.



--- gimplify.c    (revision 187205)
+++ gimplify.c    (working copy)
@@ -4262,11 +4262,16 @@ gimple_fold_indirect_ref (tree t)
       else if (TREE_CODE (optype) == COMPLEX_TYPE
                && useless_type_conversion_p (type, TREE_TYPE (optype)))
         return fold_build1 (REALPART_EXPR, type, op);
       /* *(foo *)&vectorfoo => BIT_FIELD_REF<vectorfoo,...> */
       else if (TREE_CODE (optype) == VECTOR_TYPE
-               && useless_type_conversion_p (type, TREE_TYPE (optype)))
+               && (useless_type_conversion_p (type, TREE_TYPE (optype))
+           || (TREE_CODE (type) == VECTOR_TYPE
+               && useless_type_conversion_p (TREE_TYPE (type),
+                                         TREE_TYPE (optype))
+               && TYPE_VECTOR_SUBPARTS (type)
+                  < TYPE_VECTOR_SUBPARTS (optype))))
         {
           tree part_width = TYPE_SIZE (type);
           tree index = bitsize_int (0);
           return fold_build3 (BIT_FIELD_REF, type, op, part_width, index);
         }
@@ -4283,22 +4288,29 @@ gimple_fold_indirect_ref (tree t)
       STRIP_NOPS (addr);
       addrtype = TREE_TYPE (addr);

       /* ((foo*)&vectorfoo)[1] -> BIT_FIELD_REF<vectorfoo,...> */
       if (TREE_CODE (addr) == ADDR_EXPR
-      && TREE_CODE (TREE_TYPE (addrtype)) == VECTOR_TYPE
-      && useless_type_conversion_p (type, TREE_TYPE (TREE_TYPE (addrtype)))
-      && host_integerp (off, 1))
+      && host_integerp (off, 1)
+      && ((TREE_CODE (TREE_TYPE (addrtype)) == VECTOR_TYPE
+           && useless_type_conversion_p (type,
+            TREE_TYPE (TREE_TYPE (addrtype))))
+          || (TREE_CODE (type) == VECTOR_TYPE
+              && useless_type_conversion_p (TREE_TYPE (type),
+               TREE_TYPE (TREE_TYPE (addrtype)))
+          && TYPE_VECTOR_SUBPARTS (type)
+             < TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))))
     {
           unsigned HOST_WIDE_INT offset = tree_low_cst (off, 1);
           tree part_width = TYPE_SIZE (type);
           unsigned HOST_WIDE_INT part_widthi
-            = tree_low_cst (part_width, 0) / BITS_PER_UNIT;
+            = tree_low_cst (part_width, 0);
+          unsigned HOST_WIDE_INT orig_widthi
+            = tree_low_cst (TYPE_SIZE (TREE_TYPE (addrtype)), 0);
           unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
           tree index = bitsize_int (indexi);
-          if (offset / part_widthi
-              <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))
+          if (indexi + part_widthi <= orig_widthi)
             return fold_build3 (BIT_FIELD_REF, type, TREE_OPERAND (addr, 0),
                                 part_width, index);
     }

       /* ((foo*)&complexfoo)[1] -> __imag__ complexfoo */


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/53101] Recognize casts to sub-vectors
  2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
                   ` (4 preceding siblings ...)
  2012-05-06 20:31 ` glisse at gcc dot gnu.org
@ 2012-11-11 22:18 ` glisse at gcc dot gnu.org
  2012-11-16 23:04 ` glisse at gcc dot gnu.org
  2021-08-19 19:49 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-11 22:18 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-11 22:18:13 UTC ---
PR 48037 seems related (it was the scalar case).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/53101] Recognize casts to sub-vectors
  2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
                   ` (5 preceding siblings ...)
  2012-11-11 22:18 ` glisse at gcc dot gnu.org
@ 2012-11-16 23:04 ` glisse at gcc dot gnu.org
  2021-08-19 19:49 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-16 23:04 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-16 23:03:47 UTC ---
Created attachment 28713
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28713
Tweak on the patch of PR48037

This is a slight extension of Richard's patch for PR 48037. It needs testing
and doesn't solve the real problem (which is at the RTL level).

I am not sure if BIT_FIELD_REF (a mention of that tree in doc/generic.texi
would be nice) is allowed to refer to an unaligned subvector, the patch allows
it as long as the elements are aligned. See PR 55359 for a related issue.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/53101] Recognize casts to sub-vectors
  2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
                   ` (6 preceding siblings ...)
  2012-11-16 23:04 ` glisse at gcc dot gnu.org
@ 2021-08-19 19:49 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-19 19:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
   Target Milestone|---                         |8.0
             Status|NEW                         |RESOLVED

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC 5+ just produces a stack increment/decrement but no stores (unlike 4.9.x).
GCC 6+ is able to remove the stack increment/decrement.
GCC 7+ uses BIT_FIELD_REF on the gimple level.
GCC 8+ expands the BIT_FIELD_REF to use vec_select directly and get:
(insn 6 5 7 (set (reg:TI 90)
        (vec_select:TI (subreg:V2TI (reg/v:V4DF 88 [ x ]) 0)
            (parallel [
                    (const_int 0 [0])
                ]))) "/app/example.cpp":9 -1
     (nil))

So this is fully fixed for GCC 8+ with incremental steps along the way.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-19 19:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
2012-04-24 11:14 ` [Bug target/53101] " rguenth at gcc dot gnu.org
2012-05-01 15:10 ` marc.glisse at normalesup dot org
2012-05-01 17:18 ` marc.glisse at normalesup dot org
2012-05-03 19:20 ` marc.glisse at normalesup dot org
2012-05-06 20:31 ` glisse at gcc dot gnu.org
2012-11-11 22:18 ` glisse at gcc dot gnu.org
2012-11-16 23:04 ` glisse at gcc dot gnu.org
2021-08-19 19:49 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).