* [Bug target/53101] Recognize casts to sub-vectors
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
@ 2012-04-24 11:14 ` rguenth at gcc dot gnu.org
2012-05-01 15:10 ` marc.glisse at normalesup dot org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-04-24 11:14 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-04-24
Ever Confirmed|0 |1
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-04-24 11:13:33 UTC ---
We get MEM[(T * {ref-all})&x] for the casting (not a BIT_FIELD_REF for
example).
This gets expanded to
(insn 6 5 7 (set (reg:OI 63)
(subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1
(nil))
(insn 7 6 8 (set (reg:V2DF 60 [ <retval> ])
(subreg:V2DF (reg:OI 63) 0)) t.c:8 -1
(nil))
but that should be perfectly optimizable.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/53101] Recognize casts to sub-vectors
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
2012-04-24 11:14 ` [Bug target/53101] " rguenth at gcc dot gnu.org
@ 2012-05-01 15:10 ` marc.glisse at normalesup dot org
2012-05-01 17:18 ` marc.glisse at normalesup dot org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-05-01 15:10 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
--- Comment #2 from Marc Glisse <marc.glisse at normalesup dot org> 2012-05-01 15:10:26 UTC ---
(In reply to comment #1)
> We get MEM[(T * {ref-all})&x] for the casting (not a BIT_FIELD_REF for
> example).
> This gets expanded to
>
> (insn 6 5 7 (set (reg:OI 63)
> (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1
> (nil))
>
> (insn 7 6 8 (set (reg:V2DF 60 [ <retval> ])
> (subreg:V2DF (reg:OI 63) 0)) t.c:8 -1
> (nil))
>
> but that should be perfectly optimizable.
A bit hard for me (never touched those md files before)... This obviously
incorrect code does the transformation:
(define_peephole2
[
(set
(match_operand:V8SF 2 "memory_operand")
(match_operand:V8SF 1 "register_operand")
)
(set
(match_operand:V4SF 0 "register_operand")
(match_operand:V4SF 3 "memory_operand")
)
]
"TARGET_AVX"
[(const_int 0)]
{
emit_insn (gen_vec_extract_lo_v8sf (operands[0], operands[1]));
DONE;
})
(the code in this experiment uses __v4sf and __v8sf instead of __m128d/__m256d
in the description above)
but operands[2] and operands[3] don't compare equal with rtx_equal_p, and
trying a match_dup refuses to compile because of the mode mismatch, so I don't
know how to constrain 2 and 3 to be "the same". I tried adding some (subreg:
...) in there, but it didn't match, and looking at the rtl peephole dump, there
isn't any subreg there.
Then maybe peephole isn't the right place, but that's the only one where I
managed to get something that compiles and is executed by the compiler on this
testcase.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/53101] Recognize casts to sub-vectors
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
2012-04-24 11:14 ` [Bug target/53101] " rguenth at gcc dot gnu.org
2012-05-01 15:10 ` marc.glisse at normalesup dot org
@ 2012-05-01 17:18 ` marc.glisse at normalesup dot org
2012-05-03 19:20 ` marc.glisse at normalesup dot org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-05-01 17:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
--- Comment #3 from Marc Glisse <marc.glisse at normalesup dot org> 2012-05-01 17:17:42 UTC ---
(In reply to comment #2)
> but operands[2] and operands[3] don't compare equal with rtx_equal_p, and
> trying a match_dup refuses to compile because of the mode mismatch, so I don't
> know how to constrain 2 and 3 to be "the same".
rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))
seems to give the right answer in the 3 manual tests I did. Currently checking
if the testsuite finds something. It is very likely not the right way to do it,
but I didn't find any inspiring pattern in the .md files.
Then I'll see if I understand how the fancy macros make it possible to have a
single piece of code for all modes, and if instead of calling
gen_vec_extract_lo_v8sf I shouldn't give a replacement pattern like (set
(match_dup 0) (vec_select (match_dup 1) (const_int 0))).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/53101] Recognize casts to sub-vectors
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
` (2 preceding siblings ...)
2012-05-01 17:18 ` marc.glisse at normalesup dot org
@ 2012-05-03 19:20 ` marc.glisse at normalesup dot org
2012-05-06 20:31 ` glisse at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-05-03 19:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
--- Comment #4 from Marc Glisse <marc.glisse at normalesup dot org> 2012-05-03 19:19:00 UTC ---
(define_peephole2
[(set (mem:VI8F_256 (match_operand 2))
(match_operand:VI8F_256 1 "register_operand"))
(set (match_operand:<ssehalfvecmode> 0 "register_operand")
(mem:<ssehalfvecmode> (match_dup 2)))]
"TARGET_AVX"
[(set (match_dup 0)
(vec_select:<ssehalfvecmode> (match_dup 1)
(parallel [(const_int 0) (const_int
1)])))]
)
(and similar for VI4F_256) is much less hackish than the XEXP stuff. I was
quite sure I'd tested exactly this and it didn't work, but now it looks like it
does :-/
Except that following http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00197.html ,
this is not the right place to try and add such logic. That's a good thing
because it is way too fragile, another instruction can easily squeeze between
the two sets and disable the peephole.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/53101] Recognize casts to sub-vectors
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
` (3 preceding siblings ...)
2012-05-03 19:20 ` marc.glisse at normalesup dot org
@ 2012-05-06 20:31 ` glisse at gcc dot gnu.org
2012-11-11 22:18 ` glisse at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-05-06 20:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
Marc Glisse <glisse at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |glisse at gcc dot gnu.org
--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> 2012-05-06 19:55:58 UTC ---
Getting a BIT_FIELD_REF in gimple is simple enough. If I remember correctly,
the patch below was enough (it also fixes the fact the current code will
generate a BIT_FIELD_REF for true elements of the vector and one past the end,
but not further, and I am not sure why that one-past-the-end behavior would be
wanted).
I did it because I thought it would be easier to convince the compiler to
expand that to a vec_select, and I tried patching get_inner_reference and
extract_bit_field_1, but no luck, I give up for now.
--- gimplify.c (revision 187205)
+++ gimplify.c (working copy)
@@ -4262,11 +4262,16 @@ gimple_fold_indirect_ref (tree t)
else if (TREE_CODE (optype) == COMPLEX_TYPE
&& useless_type_conversion_p (type, TREE_TYPE (optype)))
return fold_build1 (REALPART_EXPR, type, op);
/* *(foo *)&vectorfoo => BIT_FIELD_REF<vectorfoo,...> */
else if (TREE_CODE (optype) == VECTOR_TYPE
- && useless_type_conversion_p (type, TREE_TYPE (optype)))
+ && (useless_type_conversion_p (type, TREE_TYPE (optype))
+ || (TREE_CODE (type) == VECTOR_TYPE
+ && useless_type_conversion_p (TREE_TYPE (type),
+ TREE_TYPE (optype))
+ && TYPE_VECTOR_SUBPARTS (type)
+ < TYPE_VECTOR_SUBPARTS (optype))))
{
tree part_width = TYPE_SIZE (type);
tree index = bitsize_int (0);
return fold_build3 (BIT_FIELD_REF, type, op, part_width, index);
}
@@ -4283,22 +4288,29 @@ gimple_fold_indirect_ref (tree t)
STRIP_NOPS (addr);
addrtype = TREE_TYPE (addr);
/* ((foo*)&vectorfoo)[1] -> BIT_FIELD_REF<vectorfoo,...> */
if (TREE_CODE (addr) == ADDR_EXPR
- && TREE_CODE (TREE_TYPE (addrtype)) == VECTOR_TYPE
- && useless_type_conversion_p (type, TREE_TYPE (TREE_TYPE (addrtype)))
- && host_integerp (off, 1))
+ && host_integerp (off, 1)
+ && ((TREE_CODE (TREE_TYPE (addrtype)) == VECTOR_TYPE
+ && useless_type_conversion_p (type,
+ TREE_TYPE (TREE_TYPE (addrtype))))
+ || (TREE_CODE (type) == VECTOR_TYPE
+ && useless_type_conversion_p (TREE_TYPE (type),
+ TREE_TYPE (TREE_TYPE (addrtype)))
+ && TYPE_VECTOR_SUBPARTS (type)
+ < TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))))
{
unsigned HOST_WIDE_INT offset = tree_low_cst (off, 1);
tree part_width = TYPE_SIZE (type);
unsigned HOST_WIDE_INT part_widthi
- = tree_low_cst (part_width, 0) / BITS_PER_UNIT;
+ = tree_low_cst (part_width, 0);
+ unsigned HOST_WIDE_INT orig_widthi
+ = tree_low_cst (TYPE_SIZE (TREE_TYPE (addrtype)), 0);
unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
tree index = bitsize_int (indexi);
- if (offset / part_widthi
- <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))
+ if (indexi + part_widthi <= orig_widthi)
return fold_build3 (BIT_FIELD_REF, type, TREE_OPERAND (addr, 0),
part_width, index);
}
/* ((foo*)&complexfoo)[1] -> __imag__ complexfoo */
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/53101] Recognize casts to sub-vectors
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
` (4 preceding siblings ...)
2012-05-06 20:31 ` glisse at gcc dot gnu.org
@ 2012-11-11 22:18 ` glisse at gcc dot gnu.org
2012-11-16 23:04 ` glisse at gcc dot gnu.org
2021-08-19 19:49 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-11 22:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-11 22:18:13 UTC ---
PR 48037 seems related (it was the scalar case).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/53101] Recognize casts to sub-vectors
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
` (5 preceding siblings ...)
2012-11-11 22:18 ` glisse at gcc dot gnu.org
@ 2012-11-16 23:04 ` glisse at gcc dot gnu.org
2021-08-19 19:49 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-16 23:04 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-16 23:03:47 UTC ---
Created attachment 28713
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28713
Tweak on the patch of PR48037
This is a slight extension of Richard's patch for PR 48037. It needs testing
and doesn't solve the real problem (which is at the RTL level).
I am not sure if BIT_FIELD_REF (a mention of that tree in doc/generic.texi
would be nice) is allowed to refer to an unaligned subvector, the patch allows
it as long as the elements are aligned. See PR 55359 for a related issue.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/53101] Recognize casts to sub-vectors
2012-04-24 10:18 [Bug target/53101] New: Recognize casts to sub-vectors marc.glisse at normalesup dot org
` (6 preceding siblings ...)
2012-11-16 23:04 ` glisse at gcc dot gnu.org
@ 2021-08-19 19:49 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-19 19:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Target Milestone|--- |8.0
Status|NEW |RESOLVED
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC 5+ just produces a stack increment/decrement but no stores (unlike 4.9.x).
GCC 6+ is able to remove the stack increment/decrement.
GCC 7+ uses BIT_FIELD_REF on the gimple level.
GCC 8+ expands the BIT_FIELD_REF to use vec_select directly and get:
(insn 6 5 7 (set (reg:TI 90)
(vec_select:TI (subreg:V2TI (reg/v:V4DF 88 [ x ]) 0)
(parallel [
(const_int 0 [0])
]))) "/app/example.cpp":9 -1
(nil))
So this is fully fixed for GCC 8+ with incremental steps along the way.
^ permalink raw reply [flat|nested] 9+ messages in thread