[Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/40073]  New: Vector short/char shifts generate sub-optimal code
@ 2009-05-08 16:57 meissner at linux dot vnet dot ibm dot com
  2009-05-08 16:59 ` [Bug tree-optimization/40073] " meissner at linux dot vnet dot ibm dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 16:57 UTC (permalink / raw)
  To: gcc-bugs

On machines like the x86_64/i386 with -msse2 option or powerpc with the
-maltivec option that support vector 8-bit/16-bit shift instructions, GCC
generates suboptimal code for variable shifts.  Rather than generate the native
instruction, the compiler converts the vector to V4SI vector, does the shift,
and then converts the vector back to V16QI/V8HI mode.  I speculate that this is
due to the normal binary operator rules being done to bring both sides to the
same type.  Shifts and rotates are different in that the right hand side is an
int type.


-- 
           Summary: Vector short/char shifts generate sub-optimal code
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: meissner at linux dot vnet dot ibm dot com
 GCC build triplet: x86_64-unknown-linux-gnu, powerpc64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu, powerpc64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu, powerpc64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
  2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com
@ 2009-05-08 16:59 ` meissner at linux dot vnet dot ibm dot com
  2009-05-08 17:02 ` meissner at linux dot vnet dot ibm dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 16:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from meissner at linux dot vnet dot ibm dot com  2009-05-08 16:59 -------
Created an attachment (id=17827)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17827&action=view)
test case

This code is an example of the code that generates sub-optimal code.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
  2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com
  2009-05-08 16:59 ` [Bug tree-optimization/40073] " meissner at linux dot vnet dot ibm dot com
@ 2009-05-08 17:02 ` meissner at linux dot vnet dot ibm dot com
  2009-05-08 17:03 ` meissner at linux dot vnet dot ibm dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from meissner at linux dot vnet dot ibm dot com  2009-05-08 17:02 -------
Created an attachment (id=17828)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17828&action=view)
Replacement test case that doesn't need -DTYPE to show the bug

Replacement test case.


-- 

meissner at linux dot vnet dot ibm dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #17827|0                           |1
        is obsolete|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
  2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com
  2009-05-08 16:59 ` [Bug tree-optimization/40073] " meissner at linux dot vnet dot ibm dot com
  2009-05-08 17:02 ` meissner at linux dot vnet dot ibm dot com
@ 2009-05-08 17:03 ` meissner at linux dot vnet dot ibm dot com
  2009-05-08 17:05 ` meissner at linux dot vnet dot ibm dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from meissner at linux dot vnet dot ibm dot com  2009-05-08 17:03 -------
Created an attachment (id=17829)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17829&action=view)
Powerpc example code

This code was compiled with -O3 -maltivec.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
  2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com
                   ` (2 preceding siblings ...)
  2009-05-08 17:03 ` meissner at linux dot vnet dot ibm dot com
@ 2009-05-08 17:05 ` meissner at linux dot vnet dot ibm dot com
  2009-05-08 17:06 ` meissner at linux dot vnet dot ibm dot com
  2009-05-08 20:39 ` rguenth at gcc dot gnu dot org
  5 siblings, 0 replies; 13+ messages in thread
From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from meissner at linux dot vnet dot ibm dot com  2009-05-08 17:04 -------
Created an attachment (id=17830)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17830&action=view)
X86_64 example code

This code was compiled with an x86_64 compiler with the -O3 -msse3 options.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
  2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com
                   ` (3 preceding siblings ...)
  2009-05-08 17:05 ` meissner at linux dot vnet dot ibm dot com
@ 2009-05-08 17:06 ` meissner at linux dot vnet dot ibm dot com
  2009-05-08 20:39 ` rguenth at gcc dot gnu dot org
  5 siblings, 0 replies; 13+ messages in thread
From: meissner at linux dot vnet dot ibm dot com @ 2009-05-08 17:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from meissner at linux dot vnet dot ibm dot com  2009-05-08 17:06 -------
Created an attachment (id=17831)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17831&action=view)
Vectorizer verbose output

This is the output from the Powerpc compiler with -fdump-tree-vect
-ftree-vectorizer-verbose=10 -fdump-tree-vect


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
  2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com
                   ` (4 preceding siblings ...)
  2009-05-08 17:06 ` meissner at linux dot vnet dot ibm dot com
@ 2009-05-08 20:39 ` rguenth at gcc dot gnu dot org
  5 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-08 20:39 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073


^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <bug-40073-4@http.gcc.gnu.org/bugzilla/>]

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
       [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/>
@ 2014-04-26  7:00 ` glisse at gcc dot gnu.org
  2015-06-12  9:56 ` alalaw01 at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-04-26  7:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-04-26
     Ever confirmed|0                           |1
      Known to fail|                            |4.9.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
       [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/>
  2014-04-26  7:00 ` glisse at gcc dot gnu.org
@ 2015-06-12  9:56 ` alalaw01 at gcc dot gnu.org
  2015-06-12 10:08 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-06-12  9:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073

alalaw01 at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alalaw01 at gcc dot gnu.org
            Version|4.5.0                       |6.0

--- Comment #6 from alalaw01 at gcc dot gnu.org ---
Same problem on AArch64 (at -O3). We've decided the shift is to be done on
integers, widening the arguments and then truncating, before we hit the
vectorizer:

  int i;
  short unsigned int _4;
  int _5;
  int _8;
  short unsigned int _9;
  int pretmp_18;
  unsigned int ivtmp_21;
  unsigned int ivtmp_22;

  <bb 2>:
  pretmp_18 = (int) j_6(D);

  <bb 3>:
  # i_14 = PHI <i_11(4), 0(2)>
  # ivtmp_22 = PHI <ivtmp_21(4), 1024(2)>
  _4 = b[i_14];
  _5 = (int) _4;
  _8 = _5 << pretmp_18;
  _9 = (short unsigned int) _8;
  a[i_14] = _9;
  i_11 = i_14 + 1;
  ivtmp_21 = ivtmp_22 - 1;
  if (ivtmp_21 == 0)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
       [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/>
  2014-04-26  7:00 ` glisse at gcc dot gnu.org
  2015-06-12  9:56 ` alalaw01 at gcc dot gnu.org
@ 2015-06-12 10:08 ` pinskia at gcc dot gnu.org
  2015-06-12 11:39 ` alalaw01 at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-06-12 10:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to alalaw01 from comment #6)
> Same problem on AArch64 (at -O3). We've decided the shift is to be done on
> integers, widening the arguments and then truncating, before we hit the
> vectorizer:

This is needed for correctness due to c/c++ promotion rules. 

> 
>   int i;
>   short unsigned int _4;
>   int _5;
>   int _8;
>   short unsigned int _9;
>   int pretmp_18;
>   unsigned int ivtmp_21;
>   unsigned int ivtmp_22;
> 
>   <bb 2>:
>   pretmp_18 = (int) j_6(D);
> 
>   <bb 3>:
>   # i_14 = PHI <i_11(4), 0(2)>
>   # ivtmp_22 = PHI <ivtmp_21(4), 1024(2)>
>   _4 = b[i_14];
>   _5 = (int) _4;
>   _8 = _5 << pretmp_18;
>   _9 = (short unsigned int) _8;
>   a[i_14] = _9;
>   i_11 = i_14 + 1;
>   ivtmp_21 = ivtmp_22 - 1;
>   if (ivtmp_21 == 0)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
       [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2015-06-12 10:08 ` pinskia at gcc dot gnu.org
@ 2015-06-12 11:39 ` alalaw01 at gcc dot gnu.org
  2015-06-12 12:01 ` glisse at gcc dot gnu.org
  2022-03-08 17:25 ` law at gcc dot gnu.org
  5 siblings, 0 replies; 13+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-06-12 11:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073

--- Comment #8 from alalaw01 at gcc dot gnu.org ---
Is there a case where the result is different with vs without all the
extending/truncating?

It seems we should need the extending/truncating on vectors exactly iff we need
it on scalars?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
       [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2015-06-12 11:39 ` alalaw01 at gcc dot gnu.org
@ 2015-06-12 12:01 ` glisse at gcc dot gnu.org
  2022-03-08 17:25 ` law at gcc dot gnu.org
  5 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2015-06-12 12:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org

--- Comment #9 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to alalaw01 from comment #8)
> Is there a case where the result is different with vs without all the
> extending/truncating?
> 
> It seems we should need the extending/truncating on vectors exactly iff we
> need it on scalars?

The extending/truncating is what the standard requires. Then we can start
optimizing (since indeed in many cases it isn't necessary), and Jeff is working
on exactly that (shortening). This seems like a rather straightforward case,
but who knows...


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/40073] Vector short/char shifts generate sub-optimal code
       [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2015-06-12 12:01 ` glisse at gcc dot gnu.org
@ 2022-03-08 17:25 ` law at gcc dot gnu.org
  5 siblings, 0 replies; 13+ messages in thread
From: law at gcc dot gnu.org @ 2022-03-08 17:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40073

--- Comment #19 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I stumbled over this as well as some point.   One thing I started playing with,
but had to set aside was making vect_get_range_info smarter.

In particular the case I was looking at VAR would have a single use that was a
narrowing conversion.  Taking advantage of that narrowing conversion would tend
to allow us to use VxQI and VxHI shifts more often.

It's just something we noticed, but never chased down if it was important in
terms of real world code generation.  I see two patches in my stash.  No idea
the state on either one, but they might point you at something useful...


diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 803de3fc287..43369eb8f4e 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -58,6 +58,37 @@ vect_get_range_info (tree var, wide_int *min_value, wide_int
*max_value)
   value_range_kind vr_type = get_range_info (var, min_value, max_value);
   wide_int nonzero = get_nonzero_bits (var);
   signop sgn = TYPE_SIGN (TREE_TYPE (var));
+
+  /* If VAR has a single use in a narrowing conversion, then we may be
+     able to use the narrowing conversion to get a tighter range.  */
+  gimple *use_stmt;
+  use_operand_p use;
+  if (vr_type == VR_VARYING
+      && single_imm_use (var, &use, &use_stmt)
+      && is_gimple_assign (use_stmt)
+      && gimple_assign_rhs_code (use_stmt) == NOP_EXPR)
+    {
+      /* We know VAR has a single use that is a conversion.  Now check
+        if it is a narrowing conversion.  */
+      tree lhs = gimple_assign_lhs (use_stmt);
+      unsigned HOST_WIDE_INT orig_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE
(var)));
+      unsigned HOST_WIDE_INT lhs_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE
(lhs)));
+
+      if (lhs_size < orig_size)
+       {
+         /* The single use of VAR was a narrowing conversion.
+            Use the nonzero bits from the narrower type and
+            the min/max values of VAR's type.
+
+            This allows the intersect call below to work in the expected way. 
*/
+         nonzero = get_nonzero_bits (lhs);
+         sgn = TYPE_SIGN (TREE_TYPE (lhs));
+         *min_value = wi::to_wide (vrp_val_min (TREE_TYPE (lhs)));
+         *max_value = wi::to_wide (vrp_val_min (TREE_TYPE (lhs)));
+         vr_type = VR_RANGE;
+       }
+    }
+
   if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
                                         nonzero, sgn) == VR_RANGE)
     {


And another variant:

@@ -74,6 +74,38 @@ vect_get_range_info (tree var, wide_int *min_value, wide_int
*max_value)
     }
   else
     {
+      /* Try a bit harder to get a narrowed range.  If VAR has a single use
that
+        is a conversion, see if we can use the converted range.  */
+      gimple *stmt;
+      use_operand_p use;
+      if (single_imm_use (var, &use, &stmt)
+         && is_gimple_assign (stmt)
+         && gimple_assign_rhs_code (stmt) == NOP_EXPR)
+       {
+         /* If this is a narrowing conversion, then we win as it
+            narrows the range of VAR.  */
+         tree lhs = gimple_assign_lhs (stmt);
+         unsigned HOST_WIDE_INT orig_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE
(var)));
+         unsigned HOST_WIDE_INT lhs_size = tree_to_uhwi (TYPE_SIZE (TREE_TYPE
(lhs)));
+         if (lhs_size < orig_size)
+           {
+             *min_value = wi::to_wide (TYPE_MIN_VALUE (TREE_TYPE (lhs)));
+             *max_value = wi::to_wide (TYPE_MAX_VALUE (TREE_TYPE (lhs)));
+             if (dump_enabled_p ())
+               {
+                 dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM,
var);
+                 dump_printf (MSG_NOTE, " has range [");
+                 dump_hex (MSG_NOTE, *min_value);
+                 dump_printf (MSG_NOTE, ", ");
+                 dump_hex (MSG_NOTE, *max_value);
+                 dump_printf (MSG_NOTE, "]\n");
+               }
+             return true;
+           }
+
+
+       }
+
       if (dump_enabled_p ())
        {
          dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-08 17:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-08 16:57 [Bug tree-optimization/40073] New: Vector short/char shifts generate sub-optimal code meissner at linux dot vnet dot ibm dot com
2009-05-08 16:59 ` [Bug tree-optimization/40073] " meissner at linux dot vnet dot ibm dot com
2009-05-08 17:02 ` meissner at linux dot vnet dot ibm dot com
2009-05-08 17:03 ` meissner at linux dot vnet dot ibm dot com
2009-05-08 17:05 ` meissner at linux dot vnet dot ibm dot com
2009-05-08 17:06 ` meissner at linux dot vnet dot ibm dot com
2009-05-08 20:39 ` rguenth at gcc dot gnu dot org
     [not found] <bug-40073-4@http.gcc.gnu.org/bugzilla/>
2014-04-26  7:00 ` glisse at gcc dot gnu.org
2015-06-12  9:56 ` alalaw01 at gcc dot gnu.org
2015-06-12 10:08 ` pinskia at gcc dot gnu.org
2015-06-12 11:39 ` alalaw01 at gcc dot gnu.org
2015-06-12 12:01 ` glisse at gcc dot gnu.org
2022-03-08 17:25 ` law at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).