From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <meissner@linux.ibm.com>
Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 86C06385842C
 for <gcc-patches@gcc.gnu.org>; Wed,  8 Sep 2021 15:31:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 86C06385842C
Received: from pps.filterd (m0127361.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id
 188F3IQ0185994; Wed, 8 Sep 2021 11:31:10 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3axrd4mww6-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 08 Sep 2021 11:31:09 -0400
Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 188FJAIM066369;
 Wed, 8 Sep 2021 11:31:09 -0400
Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com
 [169.53.41.122])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3axrd4mwvr-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 08 Sep 2021 11:31:09 -0400
Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1])
 by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 188FI1WW024108;
 Wed, 8 Sep 2021 15:31:08 GMT
Received: from b03cxnp08027.gho.boulder.ibm.com
 (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19])
 by ppma04dal.us.ibm.com with ESMTP id 3axcnj3c75-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 08 Sep 2021 15:31:08 +0000
Received: from b03ledav005.gho.boulder.ibm.com
 (b03ledav005.gho.boulder.ibm.com [9.17.130.236])
 by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 188FV6Yj15139268
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Wed, 8 Sep 2021 15:31:06 GMT
Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 7C830BE06D;
 Wed,  8 Sep 2021 15:31:06 +0000 (GMT)
Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id C4C62BE05B;
 Wed,  8 Sep 2021 15:31:05 +0000 (GMT)
Received: from toto.the-meissners.org (unknown [9.160.139.21])
 by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTPS;
 Wed,  8 Sep 2021 15:31:05 +0000 (GMT)
Date: Wed, 8 Sep 2021 11:31:03 -0400
From: Michael Meissner <meissner@linux.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Michael Meissner <meissner@linux.ibm.com>, gcc-patches@gcc.gnu.org,
 David Edelsohn <dje.gcc@gmail.com>, Bill Schmidt <wschmidt@linux.ibm.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
Subject: Re: [PATCH] Fix SFmode subreg of DImode and TImode
Message-ID: <YTjXN8SGpFXHYOfM@toto.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>,
 Bill Schmidt <wschmidt@linux.ibm.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
References: <YTcQ5BZr4NQvH7DY@toto.the-meissners.org>
 <20210907230730.GM1583@gate.crashing.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="Rw4F8Ozu2gBrB4Nr"
Content-Disposition: inline
In-Reply-To: <20210907230730.GM1583@gate.crashing.org>
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: a0av7c2msK43EoMk--S_kKriBapHJ7O5
X-Proofpoint-ORIG-GUID: beRcbCqNMzLgwD8XWf3T_MRMqaE9qRrd
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790
 definitions=2021-09-08_06:2021-09-07,
 2021-09-08 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 mlxscore=0 adultscore=0
 phishscore=0 malwarescore=0 lowpriorityscore=0 spamscore=0 mlxlogscore=999
 clxscore=1015 priorityscore=1501 bulkscore=0 suspectscore=0
 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2109030001 definitions=main-2109080094
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Sep 2021 15:31:16 -0000


--Rw4F8Ozu2gBrB4Nr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tue, Sep 07, 2021 at 06:07:30PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 07, 2021 at 03:12:36AM -0400, Michael Meissner wrote:
> > [PATCH] Fix SFmode subreg of DImode and TImode
> > 
> > This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
> > behavior.
> 
> But what was that change?  And was that intentional?  If so, why wasn't
> it documented, was the existing behaviour considered buggy?  But the
> documentation agrees with the previous behaviour afaics.

I haven't investigated what the machine independent change was that suddenly
started allowing:

	(subreg:SF (reg:TI ...))

This patch just allows the subreg's to exist instead of getting an insn not
found message.  An alternative approach would be to allow wider subregs in the
insn matching and deal with getting the right physical register when
splitting.

> > While it is arguable that the patch that caused the breakage should
> > be reverted, this patch should be a bandage to prevent these changes from
> > happening again.
> 
> NAK.  This patch will likely cause us to generate worse code.  If that
> is not the case it will need a long, in-depth explanation of why not.

Trying to replicate the problem, I get exactly the same code with an older
compiler.  I will attach the test program as an attachment.

> Sorry.
> 
> > I first noticed it in building the Spec 2017 wrf_r and blender_r
> > benchmarks.  Once I applied this patch, I also noticed several of the
> > tests now pass.
> > 
> > The core of the problem is we need to treat SUBREG's of SFmode and SImode
> > specially on the PowerPC.  This is due to the fact that SFmode values that are
> > in the vector and floating point registers are represented as DFmode.  When we
> > want to do a direct move between the GPR registers and the vector registers, we
> > have to convert the value from the DFmode representation to/from the SFmode
> > representation.
> 
> The core of the problem is that subreg of pseudos has three meanings:
>   -- Paradoxical subregs;
>   -- Actual subregs;
>   -- "bit_cast" thingies: treat the same bits as something else.  Like
>      looking at the bits of a float as its memory image.
> 
> Ignoring paradoxical subregs (as well as subregs of mem, which should
> have disappeared by now), and subregs of hard registers as well (those
> have *different* semantics after all), the other two kinds can be mixed,
> and *have to* be mixed, because subregs of subregs are non-canonical.

This isn't subregs of subregs.  This is a subreg of a larger integer type, for
example having a subreg representing an __int128_t or structure, and accessing
the bottom/top element.  Instead of doing an extract, the compiler generates a
subreg.

Before the machine independent part of the compiler would not generate these
subregs, and now it does.

> 
> Is there any reason why not to allow this kind of subreg?
> 
> If we want to not allow mixing bit_cast with subregs, we should make it
> its own RTL code.
> 
> > +      /* In case we are given a SUBREG for a larger type, reduce it to
> > +	 SImode.  */
> > +      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
> > +	{
> > +	  rtx tmp = gen_reg_rtx (SImode);
> > +	  emit_move_insn (tmp, gen_lowpart (SImode, source));
> > +	  emit_insn (gen_movsf_from_si (dest, tmp));
> > +	  return true;
> > +	}
> 
> This makes it two separate insns.  Is that always optimised to code that
> is at least as good as before?

It is likely even if it generates an extra move, it will be better, because the
default would be to do the transfer via store on one register bank and load of
the other.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

--Rw4F8Ozu2gBrB4Nr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="foo-subreg.c"

union u_128 {
  __int128_t i128;
  long l[2];
  int i[4];
  double d[2];
  float f[4];
};

union u_64 {
  long l;
  int i[2];
  double d;
  float f[2];
};

float ret_u128_f0 (long a, long b, long x, long y)
{
  union u_128 u;

  u.l[0] = a ^ x;
  u.l[1] = b ^ y;

  return u.f[0];
}

float ret_u128_f1 (long a, long b, long x, long y)
{
  union u_128 u;

  u.l[0] = a ^ x;
  u.l[1] = b ^ y;

  return u.f[1];
}

float ret_u128_f2 (long a, long b, long x, long y)
{
  union u_128 u;

  u.l[0] = a ^ x;
  u.l[1] = b ^ x;

  return u.f[2];
}

float ret_u128_f3 (long a, long b, long x, long y)
{
  union u_128 u;

  u.l[0] = a ^ x;
  u.l[1] = b ^ x;

  return u.f[3];
}

float ret_u64_f0 (long a, long x)
{
  union u_64 u;

  u.l = a ^ x;

  return u.f[0];
}

float ret_u64_f1 (long a, long x)
{
  union u_64 u;

  u.l = a ^ x;

  return u.f[1];
}

--Rw4F8Ozu2gBrB4Nr--