public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/40589]  New: efficiency problem with V2HI add
@ 2009-06-29 15:06 Tom dot de dot Vries at Nxp dot com
  2009-06-29 15:40 ` [Bug middle-end/40589] " rguenth at gcc dot gnu dot org
  0 siblings, 1 reply; 3+ messages in thread
From: Tom dot de dot Vries at Nxp dot com @ 2009-06-29 15:06 UTC (permalink / raw)
  To: gcc-bugs

I am working on a port for the TriMedia processor family, and I was playing
around with the following example (extracted from
gcc.c-torture/execute/simd-2.c) to see how well our port takes advantage of the
addv2hi3 operator of our tm3271 processor.

test.c:
...
typedef short __attribute__((vector_size (N))) vecint;
vecint i, j, k;
void f () {  k = i + j; }
...

test.c.016veclower, N=4. This looks good, the addv2hi3 has been used once.
...
  vector short int k.2;
  vector short int j.1;
  vector short int i.0;
  i.0 = i;
  j.1 = j;
  k.2 = i.0 + j.1;
  k = k.2;
}
...

test.c.016veclower, N=32. This also looks good, the addv2hi3 has been used 8x.
...
  vector short unsigned int D.1445;
  vector short unsigned int D.1444;
  vector short unsigned int D.1443;
  vector short unsigned int D.1442;
  vector short unsigned int D.1441;
  vector short unsigned int D.1440;
  vector short unsigned int D.1439;
  vector short unsigned int D.1438;
  vector short unsigned int D.1437;
  vector short unsigned int D.1436;
  vector short unsigned int D.1435;
  vector short unsigned int D.1434;
  vector short unsigned int D.1433;
  vector short unsigned int D.1432;
  vector short unsigned int D.1431;
  vector short unsigned int D.1430;
  vector short unsigned int D.1429;
  vector short unsigned int D.1428;
  vector short unsigned int D.1427;
  vector short unsigned int D.1426;
  vector short unsigned int D.1425;
  vector short unsigned int D.1424;
  vector short unsigned int D.1423;
  vector short unsigned int D.1422;
  vector short int k.2;
  vector short int j.1;
  vector short int i.0;
  i.0 = i;
  j.1 = j;
  D.1422 = BIT_FIELD_REF <i.0, 32, 0>;
  D.1423 = BIT_FIELD_REF <j.1, 32, 0>;
  D.1424 = D.1422 + D.1423;
  D.1425 = BIT_FIELD_REF <i.0, 32, 32>;
  D.1426 = BIT_FIELD_REF <j.1, 32, 32>;
  D.1427 = D.1425 + D.1426;
  D.1428 = BIT_FIELD_REF <i.0, 32, 64>;
  D.1429 = BIT_FIELD_REF <j.1, 32, 64>;
  D.1430 = D.1428 + D.1429;
  D.1431 = BIT_FIELD_REF <i.0, 32, 96>;
  D.1432 = BIT_FIELD_REF <j.1, 32, 96>;
  D.1433 = D.1431 + D.1432;
  D.1434 = BIT_FIELD_REF <i.0, 32, 128>;
  D.1435 = BIT_FIELD_REF <j.1, 32, 128>;
  D.1436 = D.1434 + D.1435;
  D.1437 = BIT_FIELD_REF <i.0, 32, 160>;
  D.1438 = BIT_FIELD_REF <j.1, 32, 160>;
  D.1439 = D.1437 + D.1438;
  D.1440 = BIT_FIELD_REF <i.0, 32, 192>;
  D.1441 = BIT_FIELD_REF <j.1, 32, 192>;
  D.1442 = D.1440 + D.1441;
  D.1443 = BIT_FIELD_REF <i.0, 32, 224>;
  D.1444 = BIT_FIELD_REF <j.1, 32, 224>;
  D.1445 = D.1443 + D.1444;
  k.2 = {D.1424, D.1427, D.1430, D.1433, D.1436, D.1439, D.1442, D.1445};
  k = k.2;
...

test.c.016veclower, N=8. This does not look good. The addv2hi3 has not been
used. The addsi3 has been used 4 times, while the addv2hi3 could have been used
only 2 times.
...
  short int D.1431;
  short int D.1430;
  short int D.1429;
  short int D.1428;
  short int D.1427;
  short int D.1426;
  short int D.1425;
  short int D.1424;
  short int D.1423;
  short int D.1422;
  short int D.1421;
  short int D.1420;
  vector short int k.2;
  vector short int j.1;
  vector short int i.0;
  i.0 = i;
  j.1 = j;
  D.1420 = BIT_FIELD_REF <i.0, 16, 0>;
  D.1421 = BIT_FIELD_REF <j.1, 16, 0>;
  D.1422 = D.1420 + D.1421;
  D.1423 = BIT_FIELD_REF <i.0, 16, 16>;
  D.1424 = BIT_FIELD_REF <j.1, 16, 16>;
  D.1425 = D.1423 + D.1424;
  D.1426 = BIT_FIELD_REF <i.0, 16, 32>;
  D.1427 = BIT_FIELD_REF <j.1, 16, 32>;
  D.1428 = D.1426 + D.1427;
  D.1429 = BIT_FIELD_REF <i.0, 16, 48>;
  D.1430 = BIT_FIELD_REF <j.1, 16, 48>;
  D.1431 = D.1429 + D.1430;
  k.2 = {D.1422, D.1425, D.1428, D.1431};
  k = k.2;
...

This grep illustrates that the problem only occurs for N=8/16:
...
$ for N in 4 8 16 32 64; do \
  rm -f *.c.* ; \
  cc1 test.c -quiet -march=tm3271 -O2 -DN=${N} \
      -fdump-rtl-all -fdump-tree-all \
  && grep -c '+' test.c.016t.veclower ; \
done
1
4
8
8
16
...

So why does the problem occur? Lets look at the TYPE_MODE (type) in
expand_vector_operations_1() for different values of N:
...
N=4  V2HI
N=8  DImode
N=16 TImode
N=32 BLKmode
N=64 BLKmode
...

For the DImode and TImode, we don't generate efficient code, due to the test on
BLKmode:
...
  /* For very wide vectors, try using a smaller vector mode.  */
  compute_type = type;
  if (TYPE_MODE (type) == BLKmode && op)
...
in expand_vector_operations_1(). For my target, which has a native addv2hi3
operator, also DImode/TImode can be considered a 'wide vector'.

Using this patch, I also generate addv2hi3 for N=8/N=16: 
...
Index: tree-vect-generic.c
===================================================================
--- tree-vect-generic.c (revision 14)
+++ tree-vect-generic.c (working copy)
@@ -462,7 +462,7 @@

   /* For very wide vectors, try using a smaller vector mode.  */
   compute_type = type;
-  if (TYPE_MODE (type) == BLKmode && op)
+  if (op)
     {
       tree vector_compute_type
         = type_for_widest_vector_mode (TYPE_MODE (TREE_TYPE (type)), op,
...


Furthermore, I think this patch (in the style of
expmed.c:extract_bit_field_1()) could be useful:
...
Index: tree-vect-generic.c
===================================================================
--- tree-vect-generic.c (revision 14)
+++ tree-vect-generic.c (working copy)
@@ -35,6 +35,7 @@
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"


 /* Build a constant of type TYPE, made of VALUE's bits replicated
@@ -369,6 +370,7 @@
   for (; mode != VOIDmode; mode = GET_MODE_WIDER_MODE (mode))
     if (GET_MODE_INNER (mode) == inner_mode
         && GET_MODE_NUNITS (mode) > best_nunits
+       && targetm.vector_mode_supported_p(mode)
        && optab_handler (op, mode)->insn_code != CODE_FOR_nothing)
       best_mode = mode, best_nunits = GET_MODE_NUNITS (mode);
...
It automatically disables a addv4hi3 if v4hi is disabled in
TARGET_VECTOR_MODE_SUPPORTED_P.


-- 
           Summary: efficiency problem with V2HI add
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: Tom dot de dot Vries at Nxp dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40589


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug middle-end/40589] efficiency problem with V2HI add
  2009-06-29 15:06 [Bug middle-end/40589] New: efficiency problem with V2HI add Tom dot de dot Vries at Nxp dot com
@ 2009-06-29 15:40 ` rguenth at gcc dot gnu dot org
  0 siblings, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-06-29 15:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-06-29 15:40 -------
Good observations.  Patches should be sent to gcc-patches@gcc.gnu.org together
with a changelog entry following existing practice and a note how you tested
the patch.  See gcc.gnu.org/contribute.html.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40589


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug middle-end/40589] efficiency problem with V2HI add
       [not found] <bug-40589-4@http.gcc.gnu.org/bugzilla/>
@ 2021-08-29 22:49 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-29 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40589

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |4.9.1

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fixed for GCC 5 by r5-1564 and backported to GCC 4.9.1.

The code now looks like:
  /* For very wide vectors, try using a smaller vector mode.  */
  tree compute_type = type;
  if (op
      && (!VECTOR_MODE_P (TYPE_MODE (type))
          || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing))
    {

Which solved this exact issue:

N=4  V2HI
N=8  DImode
N=16 TImode
N=32 BLKmode
N=64 BLKmode

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-29 22:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-29 15:06 [Bug middle-end/40589] New: efficiency problem with V2HI add Tom dot de dot Vries at Nxp dot com
2009-06-29 15:40 ` [Bug middle-end/40589] " rguenth at gcc dot gnu dot org
     [not found] <bug-40589-4@http.gcc.gnu.org/bugzilla/>
2021-08-29 22:49 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).