From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <iii@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 766F0394481E
 for <gcc-patches@gcc.gnu.org>; Wed,  4 Nov 2020 22:13:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 766F0394481E
Received: from pps.filterd (m0098414.ppops.net [127.0.0.1])
 by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 0A4M2jej171908
 for <gcc-patches@gcc.gnu.org>; Wed, 4 Nov 2020 17:13:03 -0500
Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com
 [149.81.74.108])
 by mx0b-001b2d01.pphosted.com with ESMTP id 34kv449g33-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <gcc-patches@gcc.gnu.org>; Wed, 04 Nov 2020 17:13:02 -0500
Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1])
 by ppma05fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0A4M7tJZ020699
 for <gcc-patches@gcc.gnu.org>; Wed, 4 Nov 2020 22:13:01 GMT
Received: from b06cxnps3074.portsmouth.uk.ibm.com
 (d06relay09.portsmouth.uk.ibm.com [9.149.109.194])
 by ppma05fra.de.ibm.com with ESMTP id 34h01qteu1-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <gcc-patches@gcc.gnu.org>; Wed, 04 Nov 2020 22:13:00 +0000
Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com
 [9.149.105.58])
 by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 0A4MCw6N3342888
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Wed, 4 Nov 2020 22:12:58 GMT
Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 300314C046;
 Wed,  4 Nov 2020 22:12:58 +0000 (GMT)
Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id F05294C040;
 Wed,  4 Nov 2020 22:12:57 +0000 (GMT)
Received: from sig-9-145-1-210.uk.ibm.com (unknown [9.145.1.210])
 by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP;
 Wed,  4 Nov 2020 22:12:57 +0000 (GMT)
Message-ID: <18e1d3edb99d7656fcf04134a94b491abeffda6f.camel@linux.ibm.com>
Subject: Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when
 possible
From: Ilya Leoshkevich <iii@linux.ibm.com>
To: Andreas Krebbel <krebbel@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org
Date: Wed, 04 Nov 2020 23:12:57 +0100
In-Reply-To: <fcc058d4-40d0-1f0c-8057-671cd98724d4@linux.ibm.com>
References: <20201103213637.1876906-1-iii@linux.ibm.com>
 <20201103214547.1877703-1-iii@linux.ibm.com>
 <fcc058d4-40d0-1f0c-8057-671cd98724d4@linux.ibm.com>
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) 
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737
 definitions=2020-11-04_15:2020-11-04,
 2020-11-04 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 spamscore=0
 priorityscore=1501 mlxscore=0 bulkscore=0 impostorscore=0 malwarescore=0
 suspectscore=4 lowpriorityscore=0 phishscore=0 clxscore=1015 adultscore=0
 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2009150000 definitions=main-2011040156
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Nov 2020 22:13:05 -0000

On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote:
> On 03.11.20 22:45, Ilya Leoshkevich wrote:
> > On z14+, there are instructions for working with 128-bit floats
> > (long
> > doubles) in vector registers.  It's beneficial to use them instead
> > of
> > instructions that operate on floating point register pairs, because
> > it
> > allows to store 4 times more data in registers at a time,
> > relieveing
> > register pressure.  The performance of new instructions is almost
> > the
> > same.
> > 
> > Implement by storing TFmode values in vector registers on
> > z14+.  Since
> > not all operations are available with the new instructions, keep
> > the old
> > ones using the new FPRX2 mode, and convert between it and TFmode
> > when
> > necessary (this is called "forwarder" expanders below).  Change the
> > existing TFmode expanders to call either new- or old-style ones
> > depending on whether we are on z14+ or older machines ("dispatcher"
> > expanders).
> > 
> > gcc/ChangeLog:
> > 
> > 2020-11-03  Ilya Leoshkevich  <iii@linux.ibm.com>
> > 
> > 	* config/s390/s390-modes.def (FPRX2): New mode.
> > 	* config/s390/s390-protos.h (s390_fma_allowed_p): New function.
> > 	* config/s390/s390.c (s390_fma_allowed_p): Likewise.
> > 	(s390_build_signbit_mask): Support 128-bit masks.
> > 	(print_operand): Support printing the second word of a TFmode
> > 	operand as vector register.
> > 	(constant_modes): Add FPRX2mode.
> > 	(s390_class_max_nregs): Return 1 for TFmode on z14+.
> > 	(s390_is_fpr128): New function.
> > 	(s390_is_vr128): Likewise.
> > 	(s390_can_change_mode_class): Use s390_is_fpr128 and
> > 	s390_is_vr128 in order to determine whether mode refers to a
> > FPR
> > 	pair or to a VR.
> > 	* config/s390/s390.h (EXPAND_MOVTF): New macro.
> > 	(EXPAND_TF): Likewise.
> > 	* config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
> > 	alias.
> > 	(ALL): Add FPRX2.
> > 	(FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
> > 	(FP): Likewise.
> > 	(FP_ANYTF): New mode iterator.
> > 	(BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
> > 	(TD_TF): Likewise.
> > 	(xde): Add FPRX2.
> > 	(nBFP): Likewise.
> > 	(nDFP): Likewise.
> > 	(DSF): Likewise.
> > 	(DFDI): Likewise.
> > 	(SFSI): Likewise.
> > 	(DF): Likewise.
> > 	(SF): Likewise.
> > 	(fT0): Likewise.
> > 	(bt): Likewise.
> > 	(_d): Likewise.
> > 	(HALF_TMODE): Likewise.
> > 	(tf_fpr): New mode_attr.
> > 	(type): New mode_attr.
> > 	(*cmp<mode>_ccz_0): Use type instead of mode with fsimp.
> > 	(*cmp<mode>_ccs_0_fastmath): Likewise.
> > 	(*cmptf_ccs): New pattern for wfcxb.
> > 	(*cmptf_ccsfps): New pattern for wfkxb.
> > 	(mov<mode>): Rename to mov<mode><tf_fpr>.
> > 	(signbit<mode>2): Rename to signbit<mode>2<tf_fpr>.
> > 	(isinf<mode>2): Renamed to isinf<mode>2<tf_fpr>.
> > 	(*TDC_insn_<mode>): Use type instead of mode with fsimp.
> > 	(fixuns_trunc<FP:mode><GPR:mode>2): Rename to
> > 	fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>.
> > 	(fix_trunctf<mode>2): Rename to fix_trunctf<mode>2_fpr.
> > 	(floatdi<mode>2): Rename to floatdi<mode>2<tf_fpr>, use type
> > 	instead of mode with itof.
> > 	(floatsi<mode>2): Rename to floatsi<mode>2<tf_fpr>, use type
> > 	instead of mode with itof.
> > 	(*floatuns<GPR:mode><FP:mode>2): Use type instead of mode for
> > 	itof.
> > 	(floatuns<GPR:mode><FP:mode>2): Rename to
> > 	floatuns<GPR:mode><FP:mode>2<tf_fpr>.
> > 	(trunctf<mode>2): Rename to trunctf<mode>2_fpr, use type
> > instead
> > 	of mode with fsimp.
> > 	(extend<DSF:mode><BFP:mode>2): Rename to
> > 	extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>.
> > 	(<FPINT:fpint_name><BFP:mode>2): Rename to
> > 	<FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>, use type instead of
> > 	mode with fsimp.
> > 	(rint<BFP:mode>2): Rename to rint<BFP:mode>2<BFP:tf_fpr>, use
> > 	type instead of mode with fsimp.
> > 	(<FPINT:fpint_name><DFP:mode>2): Use type instead of mode for
> > 	fsimp.
> > 	(rint<DFP:mode>2): Likewise.
> > 	(trunc<BFP:mode><DFP_ALL:mode>2): Rename to
> > 	trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
> > 	(trunc<DFP_ALL:mode><BFP:mode>2): Rename to
> > 	trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
> > 	(extend<BFP:mode><DFP_ALL:mode>2): Rename to
> > 	extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
> > 	(extend<DFP_ALL:mode><BFP:mode>2): Rename to
> > 	extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
> > 	(add<mode>3): Rename to add<mode>3<tf_fpr>, use type instead of
> > 	mode with fsimp.
> > 	(*add<mode>3_cc): Use type instead of mode with fsimp.
> > 	(*add<mode>3_cconly): Likewise.
> > 	(sub<mode>3): Rename to sub<mode>3<tf_fpr>, use type instead of
> > 	mode with fsimp.
> > 	(*sub<mode>3_cc): Use type instead of mode with fsimp.
> > 	(*sub<mode>3_cconly): Likewise.
> > 	(mul<mode>3): Rename to mul<mode>3<tf_fpr>, use type instead of
> > 	mode with fsimp.
> > 	(fma<mode>4): Restrict using s390_fma_allowed_p.
> > 	(fms<mode>4): Restrict using s390_fma_allowed_p.
> > 	(div<mode>3): Rename to div<mode>3<tf_fpr>, use type instead of
> > 	mode with fdiv.
> > 	(neg<mode>2): Rename to neg<mode>2<tf_fpr>.
> > 	(*neg<mode>2_cc): Use type instead of mode with fsimp.
> > 	(*neg<mode>2_cconly): Likewise.
> > 	(*neg<mode>2_nocc): Likewise.
> > 	(*neg<mode>2): Likeiwse.
> > 	(abs<mode>2): Rename to abs<mode>2<tf_fpr>, use type instead of
> > 	mode with fdiv.
> > 	(*abs<mode>2_cc): Use type instead of mode with fsimp.
> > 	(*abs<mode>2_cconly): Likewise.
> > 	(*abs<mode>2_nocc): Likewise.
> > 	(*abs<mode>2): Likewise.
> > 	(*negabs<mode>2_cc): Likewise.
> > 	(*negabs<mode>2_cconly): Likewise.
> > 	(*negabs<mode>2_nocc): Likewise.
> > 	(*negabs<mode>2): Likewise.
> > 	(sqrt<mode>2): Rename to sqrt<mode>2<tf_fpr>, use type instead
> > 	of mode with fsqrt.
> > 	(cbranch<mode>4): Use FP_ANYTF instead of FP.
> > 	(copysign<mode>3): Rename to copysign<mode>3<tf_fpr>, use type
> > 	instead of mode with fsimp.
> > 	* config/s390/s390.opt (flag_vx_long_double_fma): New
> > 	undocumented option.
> > 	* config/s390/vector.md (V_HW): Add TF for z14+.
> > 	(V_HW2): Likewise.
> > 	(VFT): Likewise.
> > 	(VF_HW): Likewise.
> > 	(V_128): Likewise.
> > 	(tf_vr): New mode_attr.
> > 	(tointvec): Add TF.
> > 	(mov<mode>): Rename to mov<mode><tf_vr>.
> > 	(movetf): New dispatcher.
> > 	(*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to
> > 	z13-.
> > 	(*vec_tf_to_v1tf_vr): New pattern for z14+.
> > 	(*fprx2_to_tf): Likewise.
> > 	(*mov_tf_to_fprx2_0): Likewise.
> > 	(*mov_tf_to_fprx2_1): Likewise.
> > 	(add<mode>3): Rename to add<mode>3<tf_vr>.
> > 	(addtf3): New dispatcher.
> > 	(sub<mode>3): Rename to sub<mode>3<tf_vr>.
> > 	(subtf3): New dispatcher.
> > 	(mul<mode>3): Rename to mul<mode>3<tf_vr>.
> > 	(multf3): New dispatcher.
> > 	(div<mode>3): Rename to div<mode>3<tf_vr>.
> > 	(divtf3): New dispatcher.
> > 	(sqrt<mode>2): Rename to sqrt<mode>2<tf_vr>.
> > 	(sqrttf2): New dispatcher.
> > 	(fma<mode>4): Restrict using s390_fma_allowed_p.
> > 	(fms<mode>4): Likewise.
> > 	(neg_fma<mode>4): Likewise.
> > 	(neg_fms<mode>4): Likewise.
> > 	(neg<mode>2): Rename to neg<mode>2<tf_vr>.
> > 	(negtf2): New dispatcher.
> > 	(abs<mode>2): Rename to abs<mode>2<tf_vr>.
> > 	(abstf2): New dispatcher.
> > 	(float<mode>tf2_vr): New forwarder.
> > 	(float<mode>tf2): New dispatcher.
> > 	(floatuns<mode>tf2_vr): New forwarder.
> > 	(floatuns<mode>tf2): New dispatcher.
> > 	(fix_trunctf<mode>2_vr): New forwarder.
> > 	(fix_trunctf<mode>2): New dispatcher.
> > 	(fixuns_trunctf<mode>2_vr): New forwarder.
> > 	(fixuns_trunctf<mode>2): New dispatcher.
> > 	(<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>): New pattern.
> > 	(<FPINT:fpint_name>tf2): New forwarder.
> > 	(rint<mode>2<tf_vr>): New pattern.
> > 	(rinttf2): New forwarder.
> > 	(*trunctfdf2_vr): New pattern.
> > 	(trunctfdf2_vr): New forwarder.
> > 	(trunctfdf2): New dispatcher.
> > 	(trunctfsf2_vr): New forwarder.
> > 	(trunctfsf2): New dispatcher.
> > 	(extenddftf2_vr): New pattern.
> > 	(extenddftf2): New dispatcher.
> > 	(extendsftf2_vr): New forwarder.
> > 	(extendsftf2): New dispatcher.
> > 	(signbittf2_vr): New forwarder.
> > 	(signbittf2): New dispatchers.
> > 	(isinftf2_vr): New forwarder.
> > 	(isinftf2): New dispatcher.
> > 	* config/s390/vx-builtins.md (*vftci<mode>_cconly): Use VF_HW
> > 	instead of VECF_HW, add missing constraint, add vw support.
> > 	(vftci<mode>_intcconly): Use VF_HW instead of VECF_HW.
> > 	(*vftci<mode>): Rename to vftci<mode>, use VF_HW instead of
> > 	VECF_HW, and vw support.
> > 	(vftci<mode>_intcc): Use VF_HW instead of VECF_HW.
> 
> ...
> 
> > +; VX: TFmode in VR: use wfcxb
> > +(define_insn "*cmptf_ccs"
> > +  [(set (reg CC_REGNUM)
> > +	(compare (match_operand:TF 0 "register_operand" "v")
> > +		 (match_operand:TF 1 "general_operand"  "v")))]
> 
> Is this really benefitial to allow general_operands here? Everything
> except registers need to be reloaded anyway.  To my experience it is
> helpful to emit the extra moves as early as possible to let the other
> optimizers work with them.

The rtxes recognized by this pattern are initially generated by the
generic cbranch expander, which allows general_operands and thus
doesn't immediately reload.  If we don't allow general_operands here,
rtxes generated by cbranch will be unrecognizable.