From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-323877-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 25034 invoked by alias); 6 Aug 2012 18:34:35 -0000
Received: (qmail 24739 invoked by uid 22791); 6 Aug 2012 18:34:31 -0000
X-SWARE-Spam-Status: No, hits=-4.0 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,MSGID_FROM_MTA_HEADER,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,TW_LP,TW_LR,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from e06smtp16.uk.ibm.com (HELO e06smtp16.uk.ibm.com) (195.75.94.112)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 06 Aug 2012 18:34:15 +0000
Received: from /spool/local	by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <uweigand@de.ibm.com>;	Mon, 6 Aug 2012 19:34:13 +0100
Received: from d06nrmr1806.portsmouth.uk.ibm.com (9.149.39.193)	by e06smtp16.uk.ibm.com (192.168.101.146) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Mon, 6 Aug 2012 19:34:10 +0100
Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228])	by d06nrmr1806.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q76IY9OF2941040	for <gcc-patches@gcc.gnu.org>; Mon, 6 Aug 2012 19:34:09 +0100
Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1])	by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q76IY9de013460	for <gcc-patches@gcc.gnu.org>; Mon, 6 Aug 2012 12:34:09 -0600
Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9])	by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id q76IY8HS013445;	Mon, 6 Aug 2012 12:34:08 -0600
Message-Id: <201208061834.q76IY8HS013445@d06av02.portsmouth.uk.ibm.com>
Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Mon, 06 Aug 2012 20:34:08 +0200
Subject: Re: [PATCH 2/2] s390: Convert from sync to atomic optabs
To: rth@redhat.com (Richard Henderson)
Date: Mon, 06 Aug 2012 18:34:00 -0000
From: "Ulrich Weigand" <uweigand@de.ibm.com>
Cc: gcc-patches@gcc.gnu.org
In-Reply-To: <1343687574-3244-3-git-send-email-rth@redhat.com> from "Richard Henderson" at Jul 30, 2012 03:32:54 PM
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
x-cbid: 12080618-3548-0000-0000-000002C0B067
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-08/txt/msg00313.txt.bz2

Richard Henderson wrote:


Some more comments on this patch.

> +; Different from movdi_31 in that we have no splitters.
> +(define_insn "atomic_loaddi_1"
> +  [(set (match_operand:DI 0 "register_operand" "=d,d,!*f,!*f")
> +	(unspec:DI [(match_operand:DI 1 "s_operand" "Q,S,Q,m")]

The constraint line doesn't look right.  ld(y) accept base+index,
so they should get R and T constraints, respectively.

In fact, I wouldn't use "s_operand" as a predicate, since this
excludes valid addresses for those cases, and since in general
I've found it to be preferable to have reload fix up addresses.
So I'd just use "memory_operand" here, and actually in all the
other patterns as well ...  (This also simplifes the expander.)

> +		   UNSPEC_MOVA))]
> +  "!TARGET_ZARCH"
> +  "@
> +   lm\t%0,%M0,%S1
> +   lmy\t%0,%M0,%S1
> +   ld\t%0,%1
> +   ldy\t%0,%1"
> +  [(set_attr "op_type" "RS,RSY,RS,RSY")
> +   (set_attr "type" "lm,lm,floaddf,floaddf")])
> +
> +(define_insn "atomic_loadti_1"
> +  [(set (match_operand:TI 0 "register_operand" "=r")
> +	(unspec:TI [(match_operand:TI 1 "memory_operand" "m")]

"m" is wrong here (and basically everywhere), since it includes
SYMBOL_REFs for lrl etc. on z10.  For lpq we need "RT".

> +  "lpq\t%0,%1"
> +  [(set_attr "op_type" "RXY")])

LPQ and STPQ probably should get a "type" of "other".  This may not
model the pipeline exactly, but it's better than just assuming
"integer" by default.  These instructions are special (and slow).

> +(define_expand "atomic_compare_and_swap<mode>"
> +  [(match_operand:SI 0 "register_operand")	;; bool success output
> +   (match_operand:DGPR 1 "register_operand")	;; oldval output
> +   (match_operand:DGPR 2 "s_operand")		;; memory

memory_operand probably better again.

> +   (match_operand:DGPR 3 "register_operand")	;; expected intput
> +   (match_operand:DGPR 4 "register_operand")	;; newval intput
> +   (match_operand:SI 5 "const_int_operand")	;; is_weak
> +   (match_operand:SI 6 "const_int_operand")	;; success model
> +   (match_operand:SI 7 "const_int_operand")]	;; failure model
> +  ""
> +{
> +  rtx cc, cmp;
> +  emit_insn (gen_atomic_compare_and_swap<mode>_internal
> +	     (operands[1], operands[2], operands[3], operands[4]));
> +  cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
> +  cmp = gen_rtx_NE (SImode, cc, const0_rtx);

As already pointed out, this needs to be EQ.

> -  ""
> -  "cds<tg>\t%0,%3,%S1"
> -  [(set_attr "op_type" "RS<TE>")
> +  "TARGET_ZARCH"
> +  "c<td>sg\t%0,%3,%S1"
> +  [(set_attr "op_type" "RXY")

Should be "RSY"

> +  "@
> +   cds\t%0,%3,%S1
> +   cdsy\t%0,%3,%S1"
> +  [(set_attr "op_type" "RX,RXY")

Should be "RS, RSY"

> +  "@
> +   cs\t%0,%3,%S1
> +   csy\t%0,%3,%S1"
> +  [(set_attr "op_type" "RX,RXY")

Likewise.

> +(define_insn "atomic_fetch_<atomic><mode>_iaf"
> +  [(set (match_operand:GPR 0 "register_operand" "=d")
> +	(match_operand:GPR 1 "s_operand" "+S"))
> +   (set (match_dup 1)
> +	(unspec_volatile:GPR
> +	 [(ATOMIC_Z196:GPR (match_dup 1)
> +			   (match_operand:GPR 2 "general_operand" "d"))]
> +	 UNSPECV_ATOMIC_OP))
> +   (clobber (reg:CC CC_REGNUM))]
>    "TARGET_Z196"
> -  "la<noxa><g>\t%0,%2,%1")
> +  "la<noxa><g>\t%0,%2,%1"
> +  [(set_attr "op_type" "RXY")

Again RSY.


There is one particular inefficiency I have noticed.  This code:

  if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
    abort ();

from atomic-compare-exchange-3.c gets compiled into:

        l       %r3,0(%r2)
        larl    %r1,v
        cs      %r3,%r4,0(%r1)
        ipm     %r1
        sra     %r1,28
        st      %r3,0(%r2)
        ltr     %r1,%r1
        jne     .L3

which is extremely inefficient; it converts the condition code into
an integer using the slow ipm, sra sequence, just so that it can
convert the integer back into a condition code via ltr and branch
on it ...

Now, with the old code this sequence used to be optimized away by
combine and we'd just have the cs followed by a branch.  This is
not done any more because we have the store to "expected" in between.

This is because of this code in combine:

      /* Make sure that the value that is to be substituted for the register
         does not use any registers whose values alter in between.  However,
         If the insns are adjacent, a use can't cross a set even though we
         think it might (this can happen for a sequence of insns each setting
         the same destination; last_set of that register might point to
         a NOTE).  If INSN has a REG_EQUIV note, the register is always
         equivalent to the memory so the substitution is valid even if there
         are intervening stores.  Also, don't move a volatile asm or
         UNSPEC_VOLATILE across any other insns.  */
      || (! all_adjacent
          && (((!MEM_P (src)
                || ! find_reg_note (insn, REG_EQUIV, src))
               && use_crosses_set_p (src, DF_INSN_LUID (insn)))
              || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
              || GET_CODE (src) == UNSPEC_VOLATILE))

Note that we have exactly the case mentioned, where a series of instructions
to be combined all set the same (CC) register.  If they're all adjacent,
this still gets optimized away -- but they no longer are due to the store.

Is there a way of structuring the atomic optabs/expander so that the store
gets done either before or after all the CC operations?


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com