From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-323547-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 645 invoked by alias); 31 Jul 2012 18:17:37 -0000
Received: (qmail 631 invoked by uid 22791); 31 Jul 2012 18:17:35 -0000
X-SWARE-Spam-Status: No, hits=-4.1 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,MSGID_FROM_MTA_HEADER,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,TW_SR,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from e06smtp14.uk.ibm.com (HELO e06smtp14.uk.ibm.com) (195.75.94.110)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 31 Jul 2012 18:17:22 +0000
Received: from /spool/local	by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <uweigand@de.ibm.com>;	Tue, 31 Jul 2012 19:17:20 +0100
Received: from d06nrmr1407.portsmouth.uk.ibm.com (9.149.38.185)	by e06smtp14.uk.ibm.com (192.168.101.144) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Tue, 31 Jul 2012 19:17:19 +0100
Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228])	by d06nrmr1407.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q6VIHI6G3104892	for <gcc-patches@gcc.gnu.org>; Tue, 31 Jul 2012 19:17:18 +0100
Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1])	by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q6VIHIWV000895	for <gcc-patches@gcc.gnu.org>; Tue, 31 Jul 2012 12:17:18 -0600
Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9])	by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id q6VIHH8e000866;	Tue, 31 Jul 2012 12:17:17 -0600
Message-Id: <201207311817.q6VIHH8e000866@d06av02.portsmouth.uk.ibm.com>
Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Tue, 31 Jul 2012 20:17:16 +0200
Subject: Re: [PATCH 0/2] Convert s390 to atomic optabs, v2
To: rth@redhat.com (Richard Henderson)
Date: Tue, 31 Jul 2012 18:36:00 -0000
From: "Ulrich Weigand" <uweigand@de.ibm.com>
Cc: gcc-patches@gcc.gnu.org, rguenther@suse.de
In-Reply-To: <1343687574-3244-1-git-send-email-rth@redhat.com> from "Richard Henderson" at Jul 30, 2012 03:32:52 PM
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
x-cbid: 12073118-1948-0000-0000-0000028C5F79
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-07/txt/msg01584.txt.bz2

Richard Henderson wrote:

> I've had a go at generating better code in the HQImode CAS
> loop for aligned memory, but I don't know that I'd call it
> the most efficient thing ever.

Thanks for having a look at this!

>   (3) Support for IC, and ICM via the insv pattern is lacking.
>       I've added a tiny bit of support here, in the form of using
>       the existing strict_low_part patterns, but most definitely we
>       could do better.

This doesn't look correct:
+      /* Emit a strict_low_part pattern if possible.  */
+      if (bitpos == 0 && GET_MODE_BITSIZE (smode) == bitsize)

With bitpos == 0 we need to insert into the *high* part, not
the low part on a big-endian platform.  This probably causes
this incorrect code below:
         icm     %r5,3,0(%r12)
We'd need icm mask 12, not 3, to load into the two upper bytes.

[ This is also probably causing the testing failures I'm seeing
with the patch as-is.  I haven't looked into them in detail yet.  ]

>   (4) The *sethighpartsi and *sethighpartdi_64 patterns ought to be
>       more different.  As is, we can't insert into bits 48-56 of a
>       DImode quantity, because we don't generate ICM for DImode,
>       only ICMH.
> 
>   (5) Missing support for RISBGZ in the form of an extv/z expander.
>       The existing *extv/z splitters probably ought to be conditionalized
>       on !Z10.
> 
>   (6) The strict_low_part patterns should allow registers for at
>       least Z10.  The SImode strict_low_part can use LR everywhere.
> 
>   (7) RISBGZ could be used for a 3-address constant lshrsi3 before
>       srlk is available.

Good points, agreed with all of that.  None of that ought to be
a prerequisite for the atomic patch, of course ...

>    * Given that we're having to zap the mask in %r1 for the second
>      compare anyway, I wonder if RISBG is really beneficial over OR.
>      Is RISBG (or ICM for that matter) any faster (or even smaller)?

Just a plain OR is preferable to a RISBG.  I guess the point of the
RISBG is that you can avoid the extra shift ...  Now, if that shift
can be moved ahead of the loop, that may not be all that big of a
win.  On the other hand, these loops hopefully don't loop very often
if we don't have a lot of contention ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com