s390: Avoid CAS boolean output inefficiency

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Henderson <rth@redhat.com>
To: Ulrich Weigand <uweigand@de.ibm.com>
Cc: gcc-patches@gcc.gnu.org
Subject: s390: Avoid CAS boolean output inefficiency
Date: Mon, 06 Aug 2012 22:40:00 -0000	[thread overview]
Message-ID: <502047E4.50304@redhat.com> (raw)
In-Reply-To: <201208061834.q76IY8HS013445@d06av02.portsmouth.uk.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]

On 08/06/2012 11:34 AM, Ulrich Weigand wrote:
> There is one particular inefficiency I have noticed.  This code:
> 
>   if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
>     abort ();
> 
> from atomic-compare-exchange-3.c gets compiled into:
> 
>         l       %r3,0(%r2)
>         larl    %r1,v
>         cs      %r3,%r4,0(%r1)
>         ipm     %r1
>         sra     %r1,28
>         st      %r3,0(%r2)
>         ltr     %r1,%r1
>         jne     .L3
> 
> which is extremely inefficient; it converts the condition code into
> an integer using the slow ipm, sra sequence, just so that it can
> convert the integer back into a condition code via ltr and branch
> on it ...

This was caused (or perhaps abetted by) the representation of EQ
as NE ^ 1.  With the subsequent truncation and zero-extend, I
think combine reached its insn limit of 3 before seeing everything
it needed to see.

I'm able to fix this problem by representing EQ as EQ before reload.
For extimm targets this results in identical code; for older targets
it requires avoidance of the constant pool, i.e. LHI+XR instead of X.

        l       %r2,0(%r3)
        larl    %r1,v
        cs      %r2,%r5,0(%r1)
        st      %r2,0(%r3)
        jne     .L3

That fixed, we see the second CAS in that file:

        .loc 1 27 0
        cs      %r2,%r2,0(%r1)
        ipm     %r5
        sll     %r5,28
        lhi     %r0,1
        xr      %r5,%r0
        st      %r2,0(%r3)
        ltr     %r5,%r5
        je      .L20

This happens because CSE notices the cbranch vs 0, and sets r116
to zero along the path to

     32   if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG,
                                            __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))

at which point CSE decides that it would be cheaper to "re-use"
the zero already in r116 instead of load another constant 0 here.
After that, combine is ham-strung because r116 is not dead.

I'm not quite sure the best way to fix this, since rtx_costs already
has all constants cost 0.  CSE ought not believe that r116 is better
than a plain constant.  CSE also shouldn't be extending the life of
pseudos this way.

A short-term possibility is to have the CAS insns accept general_operand,
so that the 0 gets merged.  With reload inheritance and post-reload cse,
that might produce code that is "good enough".  Certainly it's effective
for the atomic-compare-exchange-3.c testcase.  I'm less than happy with
that, since the non-optimization of CAS depends on following code that
is totally unrelated.

This patch ought to be independent of any other patch so far.


r~

[-- Attachment #2: z --]
[-- Type: text/plain, Size: 2305 bytes --]

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 0e43e51..bed6b79 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -5325,12 +5325,15 @@
 	    (match_operand 3 "const0_operand")]))
      (clobber (reg:CC CC_REGNUM))])]
   ""
-  "emit_insn (gen_sne (operands[0], operands[2]));
-   if (GET_CODE (operands[1]) == EQ)
-     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
-   DONE;")
+{
+  if (!TARGET_EXTIMM && GET_CODE (operands[1]) == EQ)
+    {
+      emit_insn (gen_seq_neimm (operands[0], operands[2]));
+      DONE;
+    }
+})
 
-(define_insn_and_split "sne"
+(define_insn_and_split "*sne"
   [(set (match_operand:SI 0 "register_operand" "=d")
 	(ne:SI (match_operand:CCZ1 1 "register_operand" "0")
 	       (const_int 0)))
@@ -5342,6 +5345,48 @@
     [(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28)))
      (clobber (reg:CC CC_REGNUM))])])
 
+(define_insn_and_split "*seq"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+	(eq:SI (match_operand:CCZ1 1 "register_operand" "0")
+	       (const_int 0)))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_EXTIMM"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  emit_insn (gen_lshrsi3 (op0, op0, GEN_INT (28)));
+  emit_insn (gen_xorsi3 (op0, op0, const1_rtx));
+  DONE;
+})
+
+;; ??? Ideally we'd USE a const1_rtx, properly reloaded, but that makes
+;; things more difficult for combine (which can only insert clobbers).
+;; But perhaps it would be better still to have simply used a branch around
+;; constant load instead of beginning with the IPM?
+;;
+;; What about LOCR for Z196?  That's a more general question about cstore
+;; being decomposed into movcc...
+
+(define_insn_and_split "seq_neimm"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+	(eq:SI (match_operand:CCZ1 1 "register_operand" "0")
+	       (const_int 0)))
+   (clobber (match_scratch:SI 2 "=&d"))
+   (clobber (reg:CC CC_REGNUM))]
+  "!TARGET_EXTIMM"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op2 = operands[2];
+  emit_insn (gen_ashlsi3 (op0, op0, GEN_INT (28)));
+  emit_move_insn (op2, const1_rtx);
+  emit_insn (gen_xorsi3 (op0, op0, op2));
+  DONE;
+})
 
 ;;
 ;; - Conditional move instructions (introduced with z196)

next prev parent reply	other threads:[~2012-08-06 22:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-29 21:32 [CFT] s390: Convert from sync to atomic optabs Richard Henderson
2012-07-30 14:19 ` Ulrich Weigand
2012-07-30 15:12   ` Richard Henderson
2012-07-30 15:51     ` Ulrich Weigand
2012-07-30 18:53       ` Richard Henderson
2012-07-30 22:33         ` [PATCH 0/2] Convert s390 to atomic optabs, v2 Richard Henderson
2012-07-30 22:33           ` [PATCH 1/2] s390: Reorg s390_expand_insv Richard Henderson
2012-07-30 22:36           ` [PATCH 2/2] s390: Convert from sync to atomic optabs Richard Henderson
2012-08-06 18:34             ` Ulrich Weigand
2012-08-06 18:51               ` Richard Henderson
2012-08-06 19:45                 ` Richard Henderson
2012-08-06 22:40               ` Richard Henderson [this message]
2012-08-07 17:02                 ` s390: Avoid CAS boolean output inefficiency Ulrich Weigand
2012-08-07 22:13                   ` Richard Henderson
2012-08-08 18:05                     ` Ulrich Weigand
2012-08-09 16:55                 ` Eric Botcazou
2012-07-31  9:11           ` [PATCH 0/2] Convert s390 to atomic optabs, v2 Richard Guenther
2012-07-31 15:27             ` Andrew MacLeod
2012-07-31 16:07             ` Richard Henderson
2012-08-01  8:41               ` Richard Guenther
2012-08-01 15:59                 ` Richard Henderson
2012-08-01 17:14                   ` Richard Guenther
2012-08-01 19:42                     ` Richard Henderson
2012-07-31 18:36           ` Ulrich Weigand
2012-07-31 19:54             ` Richard Henderson
2012-08-01 23:23             ` Richard Henderson
2012-08-03 12:20               ` Ulrich Weigand
2012-08-03 14:21                 ` Ulrich Weigand
2012-08-06 16:44               ` Ulrich Weigand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502047E4.50304@redhat.com \
    --to=rth@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=uweigand@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).