public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code @ 2009-05-08 16:57 meissner at linux dot vnet dot ibm dot com 2009-05-08 16:59 ` [Bug tree-optimization/40073] " meissner at linux dot vnet dot ibm dot com ` (5 more replies) 0 siblings, 6 replies; 13+ messages in thread From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 16:57 UTC (permalink / raw) To: gcc-bugs On machines like the x86_64/i386 with -msse2 option or powerpc with the -maltivec option that support vector 8-bit/16-bit shift instructions, GCC generates suboptimal code for variable shifts. Rather than generate the native instruction, the compiler converts the vector to V4SI vector, does the shift, and then converts the vector back to V16QI/V8HI mode. I speculate that this is due to the normal binary operator rules being done to bring both sides to the same type. Shifts and rotates are different in that the right hand side is an int type. -- Summary: Vector short/char shifts generate sub-optimal code Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: meissner at linux dot vnet dot ibm dot com GCC build triplet: x86_64-unknown-linux-gnu, powerpc64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu, powerpc64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu, powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code 2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com @ 2009-05-08 16:59 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 17:02 ` meissner at linux dot vnet dot ibm dot com ` (4 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 16:59 UTC (permalink / raw) To: gcc-bugs ------- Comment #1 from meissner at linux dot vnet dot ibm dot com 2009-05-08 16:59 ------- Created an attachment (id=17827) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17827&action=view) test case This code is an example of the code that generates sub-optimal code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code 2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com 2009-05-08 16:59 ` [Bug tree-optimization/40073] " meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:02 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 17:03 ` meissner at linux dot vnet dot ibm dot com ` (3 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:02 UTC (permalink / raw) To: gcc-bugs ------- Comment #2 from meissner at linux dot vnet dot ibm dot com 2009-05-08 17:02 ------- Created an attachment (id=17828) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17828&action=view) Replacement test case that doesn't need -DTYPE to show the bug Replacement test case. -- meissner at linux dot vnet dot ibm dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #17827|0 |1 is obsolete| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code 2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com 2009-05-08 16:59 ` [Bug tree-optimization/40073] " meissner at linux dot vnet dot ibm dot com 2009-05-08 17:02 ` meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:03 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 17:05 ` meissner at linux dot vnet dot ibm dot com ` (2 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:03 UTC (permalink / raw) To: gcc-bugs ------- Comment #3 from meissner at linux dot vnet dot ibm dot com 2009-05-08 17:03 ------- Created an attachment (id=17829) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17829&action=view) Powerpc example code This code was compiled with -O3 -maltivec. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code 2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com ` (2 preceding siblings ...) 2009-05-08 17:03 ` meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:05 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 17:06 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 20:39 ` rguenth at gcc dot gnu dot org 5 siblings, 0 replies; 13+ messages in thread From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:05 UTC (permalink / raw) To: gcc-bugs ------- Comment #4 from meissner at linux dot vnet dot ibm dot com 2009-05-08 17:04 ------- Created an attachment (id=17830) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17830&action=view) X86_64 example code This code was compiled with an x86_64 compiler with the -O3 -msse3 options. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code 2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com ` (3 preceding siblings ...) 2009-05-08 17:05 ` meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:06 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 20:39 ` rguenth at gcc dot gnu dot org 5 siblings, 0 replies; 13+ messages in thread From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:06 UTC (permalink / raw) To: gcc-bugs ------- Comment #5 from meissner at linux dot vnet dot ibm dot com 2009-05-08 17:06 ------- Created an attachment (id=17831) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17831&action=view) Vectorizer verbose output This is the output from the Powerpc compiler with -fdump-tree-vect -ftree-vectorizer-verbose=10 -fdump-tree-vect -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code 2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com ` (4 preceding siblings ...) 2009-05-08 17:06 ` meissner at linux dot vnet dot ibm dot com @ 2009-05-08 20:39 ` rguenth at gcc dot gnu dot org 5 siblings, 0 replies; 13+ messages in thread From: rguenth at gcc dot gnu dot org @ 2009-05-08 20:39 UTC (permalink / raw) To: gcc-bugs -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu dot | |org Severity|normal |enhancement Keywords| |missed-optimization http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <bug-40073-4@http.gcc.gnu.org/bugzilla/>]
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/> @ 2014-04-26 7:00 ` glisse at gcc dot gnu.org 2015-06-12 9:56 ` alalaw01 at gcc dot gnu.org ` (4 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: glisse at gcc dot gnu.org @ 2014-04-26 7:00 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 Marc Glisse <glisse at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2014-04-26 Ever confirmed|0 |1 Known to fail| |4.9.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/> 2014-04-26 7:00 ` glisse at gcc dot gnu.org @ 2015-06-12 9:56 ` alalaw01 at gcc dot gnu.org 2015-06-12 10:08 ` pinskia at gcc dot gnu.org ` (3 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: alalaw01 at gcc dot gnu.org @ 2015-06-12 9:56 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 alalaw01 at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |alalaw01 at gcc dot gnu.org Version|4.5.0 |6.0 --- Comment #6 from alalaw01 at gcc dot gnu.org --- Same problem on AArch64 (at -O3). We've decided the shift is to be done on integers, widening the arguments and then truncating, before we hit the vectorizer: int i; short unsigned int _4; int _5; int _8; short unsigned int _9; int pretmp_18; unsigned int ivtmp_21; unsigned int ivtmp_22; <bb 2>: pretmp_18 = (int) j_6(D); <bb 3>: # i_14 = PHI <i_11(4), 0(2)> # ivtmp_22 = PHI <ivtmp_21(4), 1024(2)> _4 = b[i_14]; _5 = (int) _4; _8 = _5 << pretmp_18; _9 = (short unsigned int) _8; a[i_14] = _9; i_11 = i_14 + 1; ivtmp_21 = ivtmp_22 - 1; if (ivtmp_21 == 0) ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/> 2014-04-26 7:00 ` glisse at gcc dot gnu.org 2015-06-12 9:56 ` alalaw01 at gcc dot gnu.org @ 2015-06-12 10:08 ` pinskia at gcc dot gnu.org 2015-06-12 11:39 ` alalaw01 at gcc dot gnu.org ` (2 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: pinskia at gcc dot gnu.org @ 2015-06-12 10:08 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 --- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to alalaw01 from comment #6) > Same problem on AArch64 (at -O3). We've decided the shift is to be done on > integers, widening the arguments and then truncating, before we hit the > vectorizer: This is needed for correctness due to c/c++ promotion rules. > > int i; > short unsigned int _4; > int _5; > int _8; > short unsigned int _9; > int pretmp_18; > unsigned int ivtmp_21; > unsigned int ivtmp_22; > > <bb 2>: > pretmp_18 = (int) j_6(D); > > <bb 3>: > # i_14 = PHI <i_11(4), 0(2)> > # ivtmp_22 = PHI <ivtmp_21(4), 1024(2)> > _4 = b[i_14]; > _5 = (int) _4; > _8 = _5 << pretmp_18; > _9 = (short unsigned int) _8; > a[i_14] = _9; > i_11 = i_14 + 1; > ivtmp_21 = ivtmp_22 - 1; > if (ivtmp_21 == 0) ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/> ` (2 preceding siblings ...) 2015-06-12 10:08 ` pinskia at gcc dot gnu.org @ 2015-06-12 11:39 ` alalaw01 at gcc dot gnu.org 2015-06-12 12:01 ` glisse at gcc dot gnu.org 2022-03-08 17:25 ` law at gcc dot gnu.org 5 siblings, 0 replies; 13+ messages in thread From: alalaw01 at gcc dot gnu.org @ 2015-06-12 11:39 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 --- Comment #8 from alalaw01 at gcc dot gnu.org --- Is there a case where the result is different with vs without all the extending/truncating? It seems we should need the extending/truncating on vectors exactly iff we need it on scalars? ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/> ` (3 preceding siblings ...) 2015-06-12 11:39 ` alalaw01 at gcc dot gnu.org @ 2015-06-12 12:01 ` glisse at gcc dot gnu.org 2022-03-08 17:25 ` law at gcc dot gnu.org 5 siblings, 0 replies; 13+ messages in thread From: glisse at gcc dot gnu.org @ 2015-06-12 12:01 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 Marc Glisse <glisse at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |law at gcc dot gnu.org --- Comment #9 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to alalaw01 from comment #8) > Is there a case where the result is different with vs without all the > extending/truncating? > > It seems we should need the extending/truncating on vectors exactly iff we > need it on scalars? The extending/truncating is what the standard requires. Then we can start optimizing (since indeed in many cases it isn't necessary), and Jeff is working on exactly that (shortening). This seems like a rather straightforward case, but who knows... ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/> ` (4 preceding siblings ...) 2015-06-12 12:01 ` glisse at gcc dot gnu.org @ 2022-03-08 17:25 ` law at gcc dot gnu.org 5 siblings, 0 replies; 13+ messages in thread From: law at gcc dot gnu.org @ 2022-03-08 17:25 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073 --- Comment #19 from Jeffrey A. Law <law at gcc dot gnu.org> --- I stumbled over this as well as some point. One thing I started playing with, but had to set aside was making vect_get_range_info smarter. In particular the case I was looking at VAR would have a single use that was a narrowing conversion. Taking advantage of that narrowing conversion would tend to allow us to use VxQI and VxHI shifts more often. It's just something we noticed, but never chased down if it was important in terms of real world code generation. I see two patches in my stash. No idea the state on either one, but they might point you at something useful... diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 803de3fc287..43369eb8f4e 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -58,6 +58,37 @@ vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value) value_range_kind vr_type = get_range_info (var, min_value, max_value); wide_int nonzero = get_nonzero_bits (var); signop sgn = TYPE_SIGN (TREE_TYPE (var)); + + /* If VAR has a single use in a narrowing conversion, then we may be + able to use the narrowing conversion to get a tighter range. */ + gimple *use_stmt; + use_operand_p use; + if (vr_type == VR_VARYING + && single_imm_use (var, &use, &use_stmt) + && is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == NOP_EXPR) + { + /* We know VAR has a single use that is a conversion. Now check + if it is a narrowing conversion. */ + tree lhs = gimple_assign_lhs (use_stmt); + unsigned HOST_WIDE_INT orig_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))); + unsigned HOST_WIDE_INT lhs_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs))); + + if (lhs_size < orig_size) + { + /* The single use of VAR was a narrowing conversion. + Use the nonzero bits from the narrower type and + the min/max values of VAR's type. + + This allows the intersect call below to work in the expected way. */ + nonzero = get_nonzero_bits (lhs); + sgn = TYPE_SIGN (TREE_TYPE (lhs)); + *min_value = wi::to_wide (vrp_val_min (TREE_TYPE (lhs))); + *max_value = wi::to_wide (vrp_val_min (TREE_TYPE (lhs))); + vr_type = VR_RANGE; + } + } + if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value, nonzero, sgn) == VR_RANGE) { And another variant: @@ -74,6 +74,38 @@ vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value) } else { + /* Try a bit harder to get a narrowed range. If VAR has a single use that + is a conversion, see if we can use the converted range. */ + gimple *stmt; + use_operand_p use; + if (single_imm_use (var, &use, &stmt) + && is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == NOP_EXPR) + { + /* If this is a narrowing conversion, then we win as it + narrows the range of VAR. */ + tree lhs = gimple_assign_lhs (stmt); + unsigned HOST_WIDE_INT orig_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))); + unsigned HOST_WIDE_INT lhs_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs))); + if (lhs_size < orig_size) + { + *min_value = wi::to_wide (TYPE_MIN_VALUE (TREE_TYPE (lhs))); + *max_value = wi::to_wide (TYPE_MAX_VALUE (TREE_TYPE (lhs))); + if (dump_enabled_p ()) + { + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var); + dump_printf (MSG_NOTE, " has range ["); + dump_hex (MSG_NOTE, *min_value); + dump_printf (MSG_NOTE, ", "); + dump_hex (MSG_NOTE, *max_value); + dump_printf (MSG_NOTE, "]\n"); + } + return true; + } + + + } + if (dump_enabled_p ()) { dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var); ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-03-08 17:25 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com 2009-05-08 16:59 ` [Bug tree-optimization/40073] " meissner at linux dot vnet dot ibm dot com 2009-05-08 17:02 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 17:03 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 17:05 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 17:06 ` meissner at linux dot vnet dot ibm dot com 2009-05-08 20:39 ` rguenth at gcc dot gnu dot org [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/> 2014-04-26 7:00 ` glisse at gcc dot gnu.org 2015-06-12 9:56 ` alalaw01 at gcc dot gnu.org 2015-06-12 10:08 ` pinskia at gcc dot gnu.org 2015-06-12 11:39 ` alalaw01 at gcc dot gnu.org 2015-06-12 12:01 ` glisse at gcc dot gnu.org 2022-03-08 17:25 ` law at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).