[Bug target/54236] New: [SH] Improve addc and subc insn utilization

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/54236] New: [SH] Improve addc and subc insn utilization
@ 2012-08-12 22:25 olegendo at gcc dot gnu.org
  2012-08-12 22:35 ` [Bug target/54236] " olegendo at gcc dot gnu.org
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-08-12 22:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

             Bug #: 54236
           Summary: [SH] Improve addc and subc insn utilization
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: olegendo@gcc.gnu.org
        ReportedBy: olegendo@gcc.gnu.org
            Target: sh*-*-*


There are currently a couple of cases, where it would be better if addc or subc
insns were used.  For example:

int test00 (int a, int b)
{
  return a + b + 1;
}


gets compiled to:

        mov     r4,r0   ! MT
        add     r5,r0   ! EX
        rts
        add     #1,r0   ! EX

could be better as:

        mov     r4,r0   ! MT
        sett    r5,r0   ! MT (SH4)
        rts
        addc    #1,r0   ! EX

As a proof of concept, I've applied the following to handle the above case:

Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md    (revision 190326)
+++ gcc/config/sh/sh.md    (working copy)
@@ -1465,7 +1465,7 @@

 (define_insn "addc"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
-    (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "0")
+    (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "%0")
               (match_operand:SI 2 "arith_reg_operand" "r"))
          (reg:SI T_REG)))
    (set (reg:SI T_REG)
@@ -1516,6 +1516,24 @@
   "add    %2,%0"
   [(set_attr "type" "arith")])

+(define_insn_and_split "*addsi3_compact"
+  [(set (match_operand:SI 0 "arith_reg_dest" "")
+    (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "")
+               (match_operand:SI 2 "arith_reg_operand" ""))
+         (const_int 1)))
+   (clobber (reg:SI T_REG))]
+  "TARGET_SH1"
+  "#"
+  "&& 1"
+  [(set (reg:SI T_REG) (const_int 1))
+   (parallel [(set (match_dup 0)
+           (plus:SI (plus:SI (match_dup 1)
+                     (match_dup 2))
+                (reg:SI T_REG)))
+          (set (reg:SI T_REG)
+           (ltu:SI (plus:SI (match_dup 1) (match_dup 2))
+               (match_dup 1)))])])
+
 ;; -------------------------------------------------------------------------
 ;; Subtraction instructions
 ;; -------------------------------------------------------------------------

.. and observed some code from the CSiBE set for -O2 -m4-single -ml
-mpretend-cmove.  It doesn't affect code size that much (some incs/decs here
and there), but more importantly it does this (libmpeg2/motion_comp.c):

_MC_avg_o_16_c:            --> 
        mov.b   @r5,r1            mov.b    @r5,r2
.L16:                .L16:
        mov.b   @r4,r2                  sett
        extu.b  r1,r1                   mov.b    @r4,r1
        extu.b  r2,r2                   extu.b    r2,r2
        add     r2,r1                   extu.b    r1,r1
        add     #1,r1                   addc    r2,r1
        shar    r1                      shar    r1
        mov.b   r1,@r4                  mov.b    r1,@r4
        mov.b   @(1,r5),r0              sett
        extu.b  r0,r1                   mov.b    @(1,r5),r0
        mov.b   @(1,r4),r0              extu.b    r0,r1
        extu.b  r0,r0                   mov.b    @(1,r4),r0
        add     r0,r1                   extu.b    r0,r0
        add     #1,r1                   addc    r1,r0
        shar    r1                      shar    r0
        mov     r1,r0                   mov.b    r0,@(1,r4)
        mov.b   r0,@(1,r4)

In such cases, the sett,addc sequence can be scheduled much better and in most
cases the sett insn can be executed in parallel with some other insn.
Unfortunately, on SH4A the sett insn has been moved from MT group to EX group,
but still it seems beneficial.  I've also seen a couple of places, where
sett-subc sequences would be better.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
@ 2012-08-12 22:35 ` olegendo at gcc dot gnu.org
  2012-08-16 23:20 ` olegendo at gcc dot gnu.org
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-08-12 22:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2012-08-12
     Ever Confirmed|0                           |1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
  2012-08-12 22:35 ` [Bug target/54236] " olegendo at gcc dot gnu.org
@ 2012-08-16 23:20 ` olegendo at gcc dot gnu.org
  2012-09-19 17:46 ` olegendo at gcc dot gnu.org
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-08-16 23:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #1 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-08-16 23:19:51 UTC ---
Author: olegendo
Date: Thu Aug 16 23:19:45 2012
New Revision: 190459

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190459
Log:
    PR target/54236
    * config/sh/sh.md (addc): Add commutative modifier.
    (*addc, *minus_plus_one, *subc, *negc): New insns and splits.

    PR target/54236
    * gcc.target/sh/pr54236-1.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
  2012-08-12 22:35 ` [Bug target/54236] " olegendo at gcc dot gnu.org
  2012-08-16 23:20 ` olegendo at gcc dot gnu.org
@ 2012-09-19 17:46 ` olegendo at gcc dot gnu.org
  2013-09-22 22:22 ` olegendo at gcc dot gnu.org
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-09-19 17:46 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #2 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-09-19 17:45:53 UTC ---
Author: olegendo
Date: Wed Sep 19 17:45:37 2012
New Revision: 191489

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191489
Log:
    PR target/54236
    * config/sh/sh.md (*addc): Add pattern to handle one bit left shifts.

    PR target/54236
    * gcc.target/sh/pr54236-1.c (test_08): Add one bit left shift case.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2012-09-19 17:46 ` olegendo at gcc dot gnu.org
@ 2013-09-22 22:22 ` olegendo at gcc dot gnu.org
  2013-10-29 20:46 ` olegendo at gcc dot gnu.org
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-09-22 22:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Some more addc candidates I ran into:

int test00 (int a, int b, int c, int d)
{
  return (d & 1) + a + b;
}

-O2 -m4:
        mov     r7,r0
        and     #1,r0
        add     r4,r0
        rts
        add     r5,r0

better:
        shlr    r7      // T = r7 & 1
        addc    r4,r5
        rts
        mov     r5,r0



int test01 (int a, int b, int c, int d)
{
  return ((d >> 31) & 1) + a + b;
}

-O2 -m4:
        shll    r7
        movt    r0
        add     r4,r0
        rts
        add     r5,r0

better:
        shll    r7      // T = (r7 >> 31) & 1
        addc    r4,r5
        rts
        mov     r5,r0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2013-09-22 22:22 ` olegendo at gcc dot gnu.org
@ 2013-10-29 20:46 ` olegendo at gcc dot gnu.org
  2013-11-02 10:20 ` olegendo at gcc dot gnu.org
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-10-29 20:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Tue Oct 29 20:45:56 2013
New Revision: 204180

URL: http://gcc.gnu.org/viewcvs?rev=204180&root=gcc&view=rev
Log:
    PR target/54236
    * config/sh/sh.md (*addc): Rename existing variations to ...
    (*addc_r_r_1, *addc_2r_1, *addc_r_1): ... these.
    (*addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_2r_lsb, *addc_r_msb,
    *addc_r_r_msb, *addc_2r_msb): New insn_and_split patterns.
    * config/sh/sh.c (addsubcosts): Handle some addc special cases.

    PR target/54236
    * gcc.target/sh/pr54236-2: New.
    * gcc.target/sh/pr54089-6: Add another rotl special case.


Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.c
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54089-6.c


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2013-10-29 20:46 ` olegendo at gcc dot gnu.org
@ 2013-11-02 10:20 ` olegendo at gcc dot gnu.org
  2013-11-03 17:50 ` olegendo at gcc dot gnu.org
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-11-02 10:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 31141
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31141&action=edit
widening unsigned addition

The addc insn can also be used to implement widening 32 bit -> 64 bit unsigned
addition, and to do some sort of unsigned overflow checks.  For example:

unsigned long long
test_1 (unsigned int a, unsigned int b)
{
  // 1x addc, 1x clrt, 1x movt, no tst
  return (unsigned long long)a + (unsigned long long)b;
}

minimal code:
        mov     r5,r1
        clrt
        addc    r4,r1
        rts
        movt    r0


unsigned int
test_2 (unsigned int a, unsigned int b)
{
  // 1x addc, 1x clrt, no movt, no tst
  unsigned long long x = (unsigned long long)a + (unsigned long long)b;
  if (x >> 32)
    return 0xFFFFFFFF;
  else
    return (unsigned int)x;
}

minimal code (branchy, generic):
        clrt
        addc    r4,r5
        bt/s    .L6
        mov     #-1,r0
        mov     r5,r0
.L6:
        rts    
        nop

minimal code (branchless, special case for constant 0xFFFFFFFF):
        clrt
        addc    r4,r5
        subc    r1,r1     // T = 0: r1 = 0x00000000
                          // T = 1: r1 = 0xFFFFFFFF
        or      r1,r5
        rts
        mov     r5,r0


The attached patch makes the compiler output the minimal code sequences, except
for the branchless version of test_2.  It would require folding basic blocks
which combine won't do.
The patch enables the "zero_extendsidi2" insn on non-SHMedia.  Without that
insn, the following is expanded:

(insn 22 4 7 2 (set (reg:DI 167 [ D.1680 ])
        (const_int 0 [0])) sh_tmp.cpp:22 -1
     (nil))
(insn 7 22 8 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 4)
        (reg:SI 4 r4 [ a ])) sh_tmp.cpp:22 257 {movsi_ie}
     (expr_list:REG_DEAD (reg:SI 4 r4 [ a ])
        (nil)))
(insn 8 7 23 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 0)
        (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie}
     (nil))
(insn 23 8 9 2 (set (reg:DI 168 [ D.1680 ])
        (const_int 0 [0])) sh_tmp.cpp:22 -1
     (nil))
(insn 9 23 10 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 4)
        (reg:SI 5 r5 [ b ])) sh_tmp.cpp:22 257 {movsi_ie}
     (expr_list:REG_DEAD (reg:SI 5 r5 [ b ])
        (nil)))
(insn 10 9 11 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 0)
        (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie}
     (nil))

... and combine doesn't seem to understand this and thus will never try out
things such as

(plus:DI (zero_extend:DI (match_operand:SI 1 "arith_reg_operand"))
         (zero_extend:DI (match_operand:SI 2 "arith_reg_operand"))))

The patch has a downside, though.  It makes 64 bit arithmetic slightly worse in
some cases (according to CSiBE).  As far as I can see it, there are mainly two
issues:
- loads of constant #0 are not hoisted/CSE'd
- some 64 bit highword optimizations are not applied anymore, in particular
comparisons of zero extended unsigned SImode and unsigned DImode.

The constant #0 issue can be addressed by making the "zero_extendsidi2" pattern
'reserve' the constant #0 load, as it is done for some other insns:

-(define_insn "zero_extendsidi2"
+(define_expand "zero_extendsidi2"
+  [(set (match_operand:DI 0 "arith_reg_dest")
+    (zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))]
+  "TARGET_SHMEDIA || TARGET_SH1"
+{
+  if (TARGET_SH1)
+    {
+      if (can_create_pseudo_p ())
+    {
+      rtx zero_val = force_reg (SImode, gen_int_mode (0, SImode));
+      emit_insn (gen_zero_extendsidi2_compact (operands[0], operands[1],
+                           zero_val));
+      DONE;
+    }
+      else
+    FAIL;
+    }
+})
+
+(define_insn "*zero_extendsidi2"
   [(set (match_operand:DI 0 "arith_reg_dest" "=r")
     (zero_extend:DI (match_operand:SI 1 "extend_reg_operand" "r")))]
   "TARGET_SHMEDIA"
@@ -5905,6 +5943,20 @@
   [(set_attr "type" "arith_media")
    (set_attr "highpart" "extend")])

+(define_insn_and_split "zero_extendsidi2_compact"
+  [(set (match_operand:DI 0 "arith_reg_dest")
+    (zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))
+   (use (match_operand:SI 2 "arith_reg_operand"))]
+  "TARGET_SH1"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  emit_move_insn (gen_lowpart (SImode, operands[0]), operands[1]);
+  emit_move_insn (gen_highpart (SImode, operands[0]), operands[2]);
+  DONE;
+})
+

However, maybe it'd be better to have a more generic solution to the constant
load/sharing problem after combine (see PR 51708 comment #3).

As for the comparisons, additional patterns would be required, which could be
part of PR 52898.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2013-11-02 10:20 ` olegendo at gcc dot gnu.org
@ 2013-11-03 17:50 ` olegendo at gcc dot gnu.org
  2013-11-03 20:44 ` olegendo at gcc dot gnu.org
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-11-03 17:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #6 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 31144
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31144&action=edit
stitching addc insns

The attached patch is an example that shows how widening additions can be
stitched together.  One application would be arithmetic on user defined integer
types with an arbitrary number of bits.
For example (requires c++11):

template <unsigned int Bits> class integer
{
public:
  typedef unsigned int word_type;
  typedef unsigned long long ext_word_type;

  static constexpr unsigned int bit_count = Bits;
  static constexpr unsigned int word_bit_count = sizeof (word_type) * 8;
  static constexpr unsigned int word_count = (bit_count + word_bit_count - 1)
                                              / word_bit_count;

private:
  word_type word[word_count];

public:
  friend integer
  operator + (const integer& a, const integer& b)
  {
    integer result;

    word_type carry = 0;
    for (unsigned int i = 0; i < word_count; ++i)
    {
      auto sum = (ext_word_type)a.word[i] + (ext_word_type)b.word[i] + carry;
      result.word[i] = (word_type)sum;
      carry = (sum >> word_bit_count) == 0 ? 0 : 1;
    }

    return result;
  }
};

With this patch the following examples, compiled with -funroll-all-loops -O2
demonstrate the effect:

--------------------------

integer<64> test_02 (const integer<64>& a, const integer<64>& b)
{
  return a + b;
}

        mov.l   @r5,r1
        mov.l   @r4,r0
        clrt
        mov.l   @(4,r5),r2
        addc    r1,r0
        mov.l   @(4,r4),r1
        rts
        addc    r2,r1

this is the same as a 'native' 64 bit addition.

--------------------------

integer<80> test_03 (const integer<80>& a, const integer<80>& b)
{
  return a + b;
}

        mov.l   @r5,r3
        mov.l   @r4,r1
        clrt
        mov.l   @(4,r5),r0
        mov.l   @(4,r4),r6
        addc    r3,r1
        mov.l   @(8,r5),r5
        mov.l   @(8,r4),r4
        addc    r0,r6
        mov.l   r1,@r2
        mov     r2,r0
        addc    r5,r4
        mov.l   r6,@(4,r2)
        rts
        mov.l   r4,@(8,r2)

80 bits are rounded up to 96 in the template 'integer', thus 3 addc insns are
required to do the 96 bit addition.

However, when compiling without loop unrolling, it doesn't work because there
is no mechanism to feed back the carry variable inside a loop.  The test_03
function becomes:

        mov.l   r8,@-r15
        mov.l   r9,@-r15
        mov.l   r10,@-r15
        mov.l   r11,@-r15
        mov     #0,r0
        mov     #0,r10
.L4:
        mov.l   @(r0,r4),r3
        clrt
        mov.l   @(r0,r5),r1
        mov     r10,r7     // r7 = carry from previous iteration
        mov     #0,r6
        mov     r1,r9
        addc    r3,r9      // r9 = a[i] + b[i] (lsw of 64 bit result)
        movt    r10        // r10 = carry (msw of 64 bit result)
        clrt
        mov     r9,r11
        addc    r7,r11     // r11 = r9 + previous carry (lsw of 64 bit result)
        addc    r6,r10     // r10 = carry for next iteration
        mov.l   r11,@(r0,r2)
        add     #4,r0
        cmp/eq  #12,r0
        bf      .L4

        mov.l   @r15+,r11
        mov     r2,r0
        mov.l   @r15+,r10
        mov.l   @r15+,r9
        rts
        mov.l   @r15+,r8


it could be something like this:

        mov     #0,r0
        mov     #0,r6
.L4:
        mov.l   @(r0,r4),r3
        mov.l   @(r0,r5),r1
        cmp/pl  r6         // T = r6 > 0 (get carry into T bit)
        addc    r3,r1      // r1 = a[i] + b[i] + T (lsw of 64 bit result)
        movt    r6         // r6 = new carry (msw of 64 bit result)
        mov.l   r1,@(r0,r2)
        add     #4,r0
        cmp/eq  #12,r0
        bf      .L4

        rts
        mov     r2,r0

However, that would require some loop analysis in order to discover the T bit
feedback opportunity.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2013-11-03 17:50 ` olegendo at gcc dot gnu.org
@ 2013-11-03 20:44 ` olegendo at gcc dot gnu.org
  2014-05-21  8:06 ` olegendo at gcc dot gnu.org
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-11-03 20:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #7 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #6)
> Created attachment 31144 [details]
> stitching addc insns
> 
> The attached patch is an example that shows how widening additions can be
> stitched together.  One application would be arithmetic on user defined
> integer types with an arbitrary number of bits.
> For example (requires c++11):
> 
> template <unsigned int Bits> class integer
> {
> public:
>   typedef unsigned int word_type;
>   typedef unsigned long long ext_word_type;
> 
>   static constexpr unsigned int bit_count = Bits;
>   static constexpr unsigned int word_bit_count = sizeof (word_type) * 8;
>   static constexpr unsigned int word_count = (bit_count + word_bit_count - 1)
>                                               / word_bit_count;
> 
> private:
>   word_type word[word_count];
> 
> public:
>   friend integer
>   operator + (const integer& a, const integer& b)
>   {
>     integer result;
> 
>     word_type carry = 0;
>     for (unsigned int i = 0; i < word_count; ++i)
>     {
>       auto sum = (ext_word_type)a.word[i] + (ext_word_type)b.word[i] + carry;
>       result.word[i] = (word_type)sum;
>       carry = (sum >> word_bit_count) == 0 ? 0 : 1;
>     }
> 
>     return result;
>   }
> };
> 

I forgot to mention that the patch works only if the type of the carry variable
is 'unsigned int' or 'unsigned long long'.  If a signed or boolean type is used
some weird things seem to happen and there are some sign extensions somewhere
around the T bit.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2013-11-03 20:44 ` olegendo at gcc dot gnu.org
@ 2014-05-21  8:06 ` olegendo at gcc dot gnu.org
  2014-09-27 17:42 ` olegendo at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-05-21  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #8 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Wed May 21 08:06:06 2014
New Revision: 210682

URL: http://gcc.gnu.org/viewcvs?rev=210682&root=gcc&view=rev
Log:
gcc/
        PR target/54236
        * config/sh/sh.md (*addc_r_1): Rename to addc_t_r.  Remove empty
        constraints.
        (*addc_r_t): Add new insn_and_split.

gcc/testsuite/
        PR target/54236
        * gcc.target/sh/pr54236-3.c: New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2014-05-21  8:06 ` olegendo at gcc dot gnu.org
@ 2014-09-27 17:42 ` olegendo at gcc dot gnu.org
  2015-01-24 13:05 ` olegendo at gcc dot gnu.org
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-09-27 17:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #9 from Oleg Endo <olegendo at gcc dot gnu.org> ---
The following function compiled with -O2

unsigned int check (unsigned int x)
{
  return x == 0 ? 1 : x;
}

results in:
        tst     r4,r4
        bt/s    .L7
        mov     #1,r0
        mov     r4,r0
.L7:
        rts
        nop


Writing it as:
unsigned int check (unsigned int x)
{
  return x + (x == 0);
}

results in:
        tst     r4,r4
        mov     #0,r0
        rts
        addc    r4,r0

It seems that ifcvt is trying to utilize the 'add<mode>cc' standard name
pattern.  If the 2nd operand of the conditional addition is a constant 1 or -1
the addcc insn can be implemented via addc or subc without a branch.

If the comparison can't be done in one insn (e.g. x != 0) and requires T bit
negation/inversion:

unsigned int check (unsigned int x)
{
  return x + (x != 0);
}

results in (non-SH2A):
        tst     r4,r4
        mov     #-1,r0
        negc    r0,r0
        rts
        add     r4,r0

slightly better (one arith insn less):
        tst     r4,r4
        mov     #-1,r0
        subc    r0,r4    // r4 = r4 - (-1) - T = r4 + 1 - T
        rts
        mov     r4,r0

and on SH2A (already minimal):
        tst     r4,r4
        movrt   r0
        rts
        add     r4,r0



subc case:

unsigned int check (unsigned int x)
{
  return x - (x != 0);
}

non-SH2A:

        tst     r4,r4
        mov     #-1,r1
        mov     r4,r0
        negc    r1,r1
        rts
        sub     r1,r0    // r0 = r4 - (1 - T) = r4 - 1 + T

SH2A:
        tst     r4,r4
        movrt   r1
        mov     r4,r0
        rts
        sub     r1,r0

better (non-SH2A and SH2A):
        tst    r4,r4
        mov    #-1,r0
        rts
        addc   r4,r0     // r0 = -1 + r4 + T = r4 - 1 + T


The equivalent conditional version should result in the same (currently it
isn't):

unsigned int check (unsigned int x)
{
  return x != 0 ? (x - 1) : x;
}

Since ifcvt is also ran after combine, combine won't get the chance to combine
patterns such as:

(set (reg:SI 165 [ D.1635 ])
    (plus:SI (xor:SI (reg:SI 147 t)
            (const_int 1 [0x1]))
        (reg/v:SI 164 [ x ])))

which would occur after emitting a comparison, negating the result (T bit) and
then adding/subtracting the T bit via addc/subc.  If that happens, the combine
must be done manually in the split pass as it's done with some of the other
insns.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2014-09-27 17:42 ` olegendo at gcc dot gnu.org
@ 2015-01-24 13:05 ` olegendo at gcc dot gnu.org
  2015-01-25 16:42 ` olegendo at gcc dot gnu.org
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-01-24 13:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #10 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Sat Jan 24 13:04:53 2015
New Revision: 220081

URL: https://gcc.gnu.org/viewcvs?rev=220081&root=gcc&view=rev
Log:
gcc/
    PR target/49263
    PR target/53987
    PR target/64345
    PR target/59533
    PR target/52933
    PR target/54236
    PR target/51244
    * config/sh/sh-protos.h
    (sh_extending_set_of_reg::can_use_as_unextended_reg,
    sh_extending_set_of_reg::use_as_unextended_reg,
    sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_is_movt_insn,
    sh_is_movrt_insn, sh_insn_operands_modified_between_p,
    sh_reg_dead_or_unused_after_insn, sh_in_recog_treg_set_expr,
    sh_recog_treg_set_expr, sh_split_treg_set_expr): New functions.
    (sh_treg_insns): New class.
    * config/sh/sh.c (TARGET_LEGITIMATE_COMBINED_INSN): Define target hook.
    (scope_counter): New class.
    (sh_legitimate_combined_insn, sh_is_nott_insn, sh_movt_set_dest,
    sh_movrt_set_dest, sh_reg_dead_or_unused_after_insn,
    sh_extending_set_of_reg::can_use_as_unextended_reg,
    sh_extending_set_of_reg::use_as_unextended_reg, sh_recog_treg_set_expr,
    sh_in_recog_treg_set_expr, sh_try_split_insn_simple,
    sh_split_treg_set_expr): New functions.
    (addsubcosts): Handle treg_set_expr.
    (sh_rtx_costs): Handle IF_THEN_ELSE and ZERO_EXTRACT.
    (sh_rtx_costs): Use arith_reg_operand in SIGN_EXTEND and ZERO_EXTEND.
    (sh_rtx_costs): Handle additional bit test patterns in EQ and AND cases.
    (sh_insn_operands_modified_between_p): Make non-static.
    * config/sh/predicates.md (zero_extend_movu_operand): Allow
    simple_mem_operand in addition to displacement_mem_operand.
    (zero_extend_operand): Don't allow zero_extend_movu_operand.
    (treg_set_expr, treg_set_expr_not_const01,
    arith_reg_or_treg_set_expr): New predicates.
    * config/sh/sh.md (tstsi_t): Use arith_reg_operand and
    arith_or_int_operand instead of logical_operand.  Convert to
    insn_and_split.  Try to optimize constant operand in splitter.
    (tsthi_t, tstqi_t): Fold into *tst<mode>_t.  Convert to insn_and_split.
    (*tstqi_t_zero): Delete.
    (*tst<mode>_t_subregs): Add !sh_in_recog_treg_set_expr split condition.
    (tstsi_t_and_not): Delete.
    (tst<mode>_t_zero_extract_eq): Rename to *tst<mode>_t_zero_extract.
    Convert to insn_and_split.
    (unnamed split, tstsi_t_zero_extract_xor,
    tstsi_t_zero_extract_subreg_xor_little,
    tstsi_t_zero_extract_subreg_xor_big): Delete.
    (*tstsi_t_shift_mask): New insn_and_split.
    (cmpeqsi_t, cmpgesi_t): Add new split for const_int 0 operands and try
    to recombine with surrounding insns when splitting.
    (*negtstsi): Add !sh_in_recog_treg_set_expr condition.
    (cmp_div0s_0, cmp_div0s_1, *cmp_div0s_0, *cmp_div0s_1): Rewrite as ...
    (cmp_div0s, *cmp_div0s_1, *cmp_div0s_2, *cmp_div0s_3, *cmp_div0s_4,
    *cmp_div0s_5, *cmp_div0s_6): ... these new insn_and_split patterns.
    (*cbranch_div0s: Delete.
    (*addc): Convert to insn_and_split.  Use treg_set_expr as 3rd operand.
    Try to recombine with surrounding insns when splitting.  Add operand
    order variants.
    (*addc_t_r, *addc_r_t): Use treg_set_expr_not_const01.
    (*addc_r_r_1, *addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_r_msb,
    *addc_r_r_msb, *addc_2r_msb): Delete.
    (*addc_2r_lsb): Rename to *addc_2r_t.  Use treg_set_expr.  Add operand
    order variant.
    (*addc_negreg_t): New insn_and_split.
    (*subc): Convert to insn_and_split.  Use treg_set_expr as 3rd operand.
    Try to recombine with surrounding insns when splitting.
    Add operand order variants.  
    (*subc_negt_reg, *subc_negreg_t, *reg_lsb_t, *reg_msb_t): New
    insn_and_split patterns.
    (*rotcr): Use arith_reg_or_treg_set_expr.  Try to recombine with
    surrounding insns when splitting.
    (unnamed rotcr split): Use arith_reg_or_treg_set_expr.
    (*rotcl): Likewise.  Add zero_extract variant.
    (*ashrsi2_31): New insn_and_split.
    (*negc): Convert to insn_and_split.  Use treg_set_expr.
    (*zero_extend<mode>si2_disp_mem): Update comment.
    (movrt_negc, *movrt_negc, nott): Add !sh_in_recog_treg_set_expr split
    condition.
    (*mov_t_msb_neg, mov_neg_si_t): Use treg_set_expr.  Try to recombine
    with surrounding insns when splitting.
    (any_treg_expr_to_reg): New insn_and_split.
    (*neg_zero_extract_0, *neg_zero_extract_1, *neg_zero_extract_2,
    *neg_zero_extract_3, *neg_zero_extract_4, *neg_zero_extract_5,
    *neg_zero_extract_6, *zero_extract_0, *zero_extract_1,
    *zero_extract_2): New single bit zero extract patterns.
    (bld_reg, *bld_regqi): Fold into bld<mode>_reg.
    (*get_thread_pointersi, store_gbr, *mov<mode>_gbr_load,
    *mov<mode>_gbr_load, *mov<mode>_gbr_load, *mov<mode>_gbr_load,
    *movdi_gbr_load): Use arith_reg_dest instead of register_operand for
    set destination.
    (set_thread_pointersi, load_gbr): Use arith_reg_operand instead of
    register_operand for set source.

gcc/testsuite/
    PR target/49263
    PR target/53987
    PR target/64345
    PR target/59533
    PR target/52933
    PR target/54236
    PR target/51244
    * gcc.target/sh/pr64345-1.c: New.
    * gcc.target/sh/pr64345-2.c: New.
    * gcc.target/sh/pr59533-1.c: New.
    * gcc.target/sh/pr49263.c: Adjust matching of expected insns.
    * gcc.target/sh/pr52933-2.c: Likewise.
    * gcc.target/sh/pr54089-1.c: Likewise.
    * gcc.target/sh/pr54236-1.c: Likewise.
    * gcc.target/sh/pr51244-20-sh2a.c: Likewise.
    * gcc.target/sh/pr49263-1.c: Remove xfails.
    * gcc.target/sh/pr49263-2.c: Likewise.
    * gcc.target/sh/pr49263-3.c: Likewise.
    * gcc.target/sh/pr53987-1.c: Likewise.
    * gcc.target/sh/pr52933-1.c: Adjust matching of expected insns.
    (test_24, test_25, test_26, test_27, test_28, test_29, test_30): New.
    * gcc.target/sh/pr51244-12.c: Adjust matching of expected insns.
    (test05, test06, test07, test08, test09, test10, test11, test12): New.
    * gcc.target/sh/pr54236-3.c: Adjust matching of expected insns.
    (test_002, test_003, test_004, test_005, test_006, test_007, test_008,
    test_009): New.
    * gcc.target/sh/pr51244-4.c: Adjust matching of expected insns.
    (test_02): New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr59533-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr64345-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr64345-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/predicates.md
    trunk/gcc/config/sh/sh-protos.h
    trunk/gcc/config/sh/sh.c
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr49263-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263-3.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-12.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-4.c
    trunk/gcc/testsuite/gcc.target/sh/pr52933-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr52933-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr53987-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54236-3.c


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2015-01-24 13:05 ` olegendo at gcc dot gnu.org
@ 2015-01-25 16:42 ` olegendo at gcc dot gnu.org
  2015-05-17 21:02 ` olegendo at gcc dot gnu.org
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-01-25 16:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #11 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Sun Jan 25 16:41:25 2015
New Revision: 220093

URL: https://gcc.gnu.org/viewcvs?rev=220093&root=gcc&view=rev
Log:
gcc/testsuite/
    PR target/54236
    * gcc.target/sh/pr54236-4.c: New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-4.c
Modified:
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2015-01-25 16:42 ` olegendo at gcc dot gnu.org
@ 2015-05-17 21:02 ` olegendo at gcc dot gnu.org
  2015-05-19  8:01 ` olegendo at gcc dot gnu.org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-05-17 21:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #12 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Sun May 17 21:01:29 2015
New Revision: 223274

URL: https://gcc.gnu.org/viewcvs?rev=223274&root=gcc&view=rev
Log:
gcc/
        PR target/54236
        * config/sh/sh.md (*addc_2r_t): Use ashift instead of mult.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2015-05-17 21:02 ` olegendo at gcc dot gnu.org
@ 2015-05-19  8:01 ` olegendo at gcc dot gnu.org
  2015-05-21 12:37 ` olegendo at gcc dot gnu.org
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-05-19  8:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #13 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Tue May 19 08:00:41 2015
New Revision: 223346

URL: https://gcc.gnu.org/viewcvs?rev=223346&root=gcc&view=rev
Log:
gcc/
        PR target/54236
        * config/sh/sh.md (*round_int_even): New insn_and_split and
        accompanying new unnamed split.

gcc/testsuite/
        PR target/54236
        * gcc.target/sh/pr54236-2.c: Adjust expected insn counts.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2015-05-19  8:01 ` olegendo at gcc dot gnu.org
@ 2015-05-21 12:37 ` olegendo at gcc dot gnu.org
  2015-07-19  5:32 ` olegendo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-05-21 12:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #14 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Thu May 21 12:36:35 2015
New Revision: 223479

URL: https://gcc.gnu.org/viewcvs?rev=223479&root=gcc&view=rev
Log:
gcc/
        PR target/54236
        * config/sh/sh.md (*round_int_even): Reject pattern if operands[0] and
        operands[1] are the same.

gcc/testsuite/
        PR target/54236
        * gcc.target/sh/pr54236-2.c: Fix typo in comment.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2015-05-21 12:37 ` olegendo at gcc dot gnu.org
@ 2015-07-19  5:32 ` olegendo at gcc dot gnu.org
  2015-07-19  6:47 ` olegendo at gcc dot gnu.org
  2015-09-28 14:01 ` olegendo at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-07-19  5:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #15 from Oleg Endo <olegendo at gcc dot gnu.org> ---
The following shows missed subc cases when there are constants involved.  Addc
cases can be constructed in the same way.

int fun (int x)
{
  return x - 1 - (x > 100);
}

-O2 -m4:
        mov     #100,r1
        mov     r4,r0
        cmp/gt  r1,r4
        movt    r1
        sett
        rts     
        subc    r1,r0

better:
        mov     #100,r1
        mov     r4,r0
        cmp/gt  r1,r4
        mov     #1,r1
        rts
        subc    r1,r0

Combine is looking for a pattern like:
(parallel [
        (set (reg:SI 168 [ D.1659 ])
            (plus:SI (not:SI (gt:SI (reg/v:SI 167 [ x ])
                        (reg:SI 172)))
                (reg/v:SI 167 [ x ])))
        (clobber (reg:SI 147 t))
    ])

constants -1 and -2 are represented as not (0) and not (1) in this case.


int fun (int x)
{
  return x - 10 - (x > 100);
}

-O2 -m4:
        mov     #100,r1
        mov     r4,r0
        cmp/gt  r1,r4
        add     #-10,r0
        mov     #0,r1
        rts     
        subc    r1,r0

better:
        mov     #100,r1
        mov     r4,r0
        cmp/gt  r1,r4
        mov     #10,r1
        rts     
        subc    r1,r0

Combine is looking for a pattern like

Failed to match this instruction:
(parallel [
        (set (reg:SI 168 [ D.1659 ])
            (plus:SI (minus:SI (reg/v:SI 167 [ x ])
                    (gt:SI (reg/v:SI 167 [ x ])
                        (reg:SI 172)))
                (const_int -10 [0xfffffffffffffff6])))
        (clobber (reg:SI 147 t))
    ])

there is already a similar pattern *subc_negreg_t


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2015-07-19  5:32 ` olegendo at gcc dot gnu.org
@ 2015-07-19  6:47 ` olegendo at gcc dot gnu.org
  2015-09-28 14:01 ` olegendo at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-07-19  6:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #16 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 36012
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36012&action=edit
addsicc pattern

(In reply to Oleg Endo from comment #9)
> The following function compiled with -O2
> 
> unsigned int check (unsigned int x)
> {
>   return x == 0 ? 1 : x;
> }
> 
> results in:
>         tst     r4,r4
>         bt/s    .L7
>         mov     #1,r0
>         mov     r4,r0
> .L7:
>         rts
>         nop
> 
> 
> Writing it as:
> unsigned int check (unsigned int x)
> {
>   return x + (x == 0);
> }
> 
> results in:
>         tst     r4,r4
>         mov     #0,r0
>         rts
>         addc    r4,r0
> 
> It seems that ifcvt is trying to utilize the 'add<mode>cc' standard name
> pattern.  If the 2nd operand of the conditional addition is a constant 1 or
> -1 the addcc insn can be implemented via addc or subc without a branch.

The attached patch adds support for the addsicc pattern and a few other
improvements. However, the first case above doesn't see any improvement.  It
seems that it's a missed ifcvt optimization.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/54236] [SH] Improve addc and subc insn utilization
  2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2015-07-19  6:47 ` olegendo at gcc dot gnu.org
@ 2015-09-28 14:01 ` olegendo at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-09-28 14:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #17 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Mon Sep 28 14:00:44 2015
New Revision: 228202

URL: https://gcc.gnu.org/viewcvs?rev=228202&root=gcc&view=rev
Log:
gcc/
        PR target/54236
        * config/sh/predicates.md (t_reg_operand, negt_reg_operand): Allow
        and handle ne and eq codes.
        * config/sh/sh.c (sh_rtx_costs): Adjust matching of tst #imm,r0 insn.
        (sh_recog_treg_set_expr): Early accept negt_reg_operand.  Eearly reject
        CONST_INT_P.  Use reverse_condition.
        (sh_split_treg_set_expr): Likewise.

gcc/testsuite/
        PR target/54236
        * gcc.target/sh/pr54236-1.c (test_09, test_10, test_11): New.
        * gcc.target/sh/pr59533-1.c (test_23, test_24, test_25, test_26,
        test_27): New.
        * gcc.target/sh/pr54236-5.c: New.
        * gcc.target/sh/pr54236-6.c: New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-5.c
    trunk/gcc/testsuite/gcc.target/sh/pr54236-6.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/predicates.md
    trunk/gcc/config/sh/sh.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr59533-1.c


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-09-28 14:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-12 22:25 [Bug target/54236] New: [SH] Improve addc and subc insn utilization olegendo at gcc dot gnu.org
2012-08-12 22:35 ` [Bug target/54236] " olegendo at gcc dot gnu.org
2012-08-16 23:20 ` olegendo at gcc dot gnu.org
2012-09-19 17:46 ` olegendo at gcc dot gnu.org
2013-09-22 22:22 ` olegendo at gcc dot gnu.org
2013-10-29 20:46 ` olegendo at gcc dot gnu.org
2013-11-02 10:20 ` olegendo at gcc dot gnu.org
2013-11-03 17:50 ` olegendo at gcc dot gnu.org
2013-11-03 20:44 ` olegendo at gcc dot gnu.org
2014-05-21  8:06 ` olegendo at gcc dot gnu.org
2014-09-27 17:42 ` olegendo at gcc dot gnu.org
2015-01-24 13:05 ` olegendo at gcc dot gnu.org
2015-01-25 16:42 ` olegendo at gcc dot gnu.org
2015-05-17 21:02 ` olegendo at gcc dot gnu.org
2015-05-19  8:01 ` olegendo at gcc dot gnu.org
2015-05-21 12:37 ` olegendo at gcc dot gnu.org
2015-07-19  5:32 ` olegendo at gcc dot gnu.org
2015-07-19  6:47 ` olegendo at gcc dot gnu.org
2015-09-28 14:01 ` olegendo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).