From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <krebbel@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 515833850430
 for <gcc-patches@gcc.gnu.org>; Thu,  5 Nov 2020 07:27:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 515833850430
Received: from pps.filterd (m0098414.ppops.net [127.0.0.1])
 by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 0A572p0V123213
 for <gcc-patches@gcc.gnu.org>; Thu, 5 Nov 2020 02:27:02 -0500
Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com
 [159.122.73.70])
 by mx0b-001b2d01.pphosted.com with ESMTP id 34m5dbbbjq-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <gcc-patches@gcc.gnu.org>; Thu, 05 Nov 2020 02:27:01 -0500
Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1])
 by ppma01fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0A57N0BE011770
 for <gcc-patches@gcc.gnu.org>; Thu, 5 Nov 2020 07:27:00 GMT
Received: from b06cxnps3074.portsmouth.uk.ibm.com
 (d06relay09.portsmouth.uk.ibm.com [9.149.109.194])
 by ppma01fra.de.ibm.com with ESMTP id 34jbytsqvv-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <gcc-patches@gcc.gnu.org>; Thu, 05 Nov 2020 07:26:59 +0000
Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com
 [9.149.105.59])
 by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 0A57QvLj5243434
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Thu, 5 Nov 2020 07:26:57 GMT
Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 005A4A4051;
 Thu,  5 Nov 2020 07:26:57 +0000 (GMT)
Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id B81A1A4040;
 Thu,  5 Nov 2020 07:26:56 +0000 (GMT)
Received: from [9.145.174.26] (unknown [9.145.174.26])
 by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP;
 Thu,  5 Nov 2020 07:26:56 +0000 (GMT)
Subject: Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when
 possible
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org
References: <20201103213637.1876906-1-iii@linux.ibm.com>
 <20201103214547.1877703-1-iii@linux.ibm.com>
 <fcc058d4-40d0-1f0c-8057-671cd98724d4@linux.ibm.com>
 <18e1d3edb99d7656fcf04134a94b491abeffda6f.camel@linux.ibm.com>
From: Andreas Krebbel <krebbel@linux.ibm.com>
Message-ID: <ce8d8f59-24f9-e7b8-ccd4-76ae7be27d30@linux.ibm.com>
Date: Thu, 5 Nov 2020 08:26:56 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <18e1d3edb99d7656fcf04134a94b491abeffda6f.camel@linux.ibm.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737
 definitions=2020-11-05_02:2020-11-05,
 2020-11-05 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 adultscore=0 mlxlogscore=999
 clxscore=1015 phishscore=0 lowpriorityscore=0 suspectscore=0 spamscore=0
 priorityscore=1501 bulkscore=0 mlxscore=0 impostorscore=0 malwarescore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000
 definitions=main-2011050051
X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Nov 2020 07:27:04 -0000

On 04.11.20 23:12, Ilya Leoshkevich wrote:
> On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote:
>> On 03.11.20 22:45, Ilya Leoshkevich wrote:
>>> On z14+, there are instructions for working with 128-bit floats
>>> (long
>>> doubles) in vector registers.  It's beneficial to use them instead
>>> of
>>> instructions that operate on floating point register pairs, because
>>> it
>>> allows to store 4 times more data in registers at a time,
>>> relieveing
>>> register pressure.  The performance of new instructions is almost
>>> the
>>> same.
>>>
>>> Implement by storing TFmode values in vector registers on
>>> z14+.  Since
>>> not all operations are available with the new instructions, keep
>>> the old
>>> ones using the new FPRX2 mode, and convert between it and TFmode
>>> when
>>> necessary (this is called "forwarder" expanders below).  Change the
>>> existing TFmode expanders to call either new- or old-style ones
>>> depending on whether we are on z14+ or older machines ("dispatcher"
>>> expanders).
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-11-03  Ilya Leoshkevich  <iii@linux.ibm.com>
>>>
>>> 	* config/s390/s390-modes.def (FPRX2): New mode.
>>> 	* config/s390/s390-protos.h (s390_fma_allowed_p): New function.
>>> 	* config/s390/s390.c (s390_fma_allowed_p): Likewise.
>>> 	(s390_build_signbit_mask): Support 128-bit masks.
>>> 	(print_operand): Support printing the second word of a TFmode
>>> 	operand as vector register.
>>> 	(constant_modes): Add FPRX2mode.
>>> 	(s390_class_max_nregs): Return 1 for TFmode on z14+.
>>> 	(s390_is_fpr128): New function.
>>> 	(s390_is_vr128): Likewise.
>>> 	(s390_can_change_mode_class): Use s390_is_fpr128 and
>>> 	s390_is_vr128 in order to determine whether mode refers to a
>>> FPR
>>> 	pair or to a VR.
>>> 	* config/s390/s390.h (EXPAND_MOVTF): New macro.
>>> 	(EXPAND_TF): Likewise.
>>> 	* config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
>>> 	alias.
>>> 	(ALL): Add FPRX2.
>>> 	(FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
>>> 	(FP): Likewise.
>>> 	(FP_ANYTF): New mode iterator.
>>> 	(BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
>>> 	(TD_TF): Likewise.
>>> 	(xde): Add FPRX2.
>>> 	(nBFP): Likewise.
>>> 	(nDFP): Likewise.
>>> 	(DSF): Likewise.
>>> 	(DFDI): Likewise.
>>> 	(SFSI): Likewise.
>>> 	(DF): Likewise.
>>> 	(SF): Likewise.
>>> 	(fT0): Likewise.
>>> 	(bt): Likewise.
>>> 	(_d): Likewise.
>>> 	(HALF_TMODE): Likewise.
>>> 	(tf_fpr): New mode_attr.
>>> 	(type): New mode_attr.
>>> 	(*cmp<mode>_ccz_0): Use type instead of mode with fsimp.
>>> 	(*cmp<mode>_ccs_0_fastmath): Likewise.
>>> 	(*cmptf_ccs): New pattern for wfcxb.
>>> 	(*cmptf_ccsfps): New pattern for wfkxb.
>>> 	(mov<mode>): Rename to mov<mode><tf_fpr>.
>>> 	(signbit<mode>2): Rename to signbit<mode>2<tf_fpr>.
>>> 	(isinf<mode>2): Renamed to isinf<mode>2<tf_fpr>.
>>> 	(*TDC_insn_<mode>): Use type instead of mode with fsimp.
>>> 	(fixuns_trunc<FP:mode><GPR:mode>2): Rename to
>>> 	fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>.
>>> 	(fix_trunctf<mode>2): Rename to fix_trunctf<mode>2_fpr.
>>> 	(floatdi<mode>2): Rename to floatdi<mode>2<tf_fpr>, use type
>>> 	instead of mode with itof.
>>> 	(floatsi<mode>2): Rename to floatsi<mode>2<tf_fpr>, use type
>>> 	instead of mode with itof.
>>> 	(*floatuns<GPR:mode><FP:mode>2): Use type instead of mode for
>>> 	itof.
>>> 	(floatuns<GPR:mode><FP:mode>2): Rename to
>>> 	floatuns<GPR:mode><FP:mode>2<tf_fpr>.
>>> 	(trunctf<mode>2): Rename to trunctf<mode>2_fpr, use type
>>> instead
>>> 	of mode with fsimp.
>>> 	(extend<DSF:mode><BFP:mode>2): Rename to
>>> 	extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>.
>>> 	(<FPINT:fpint_name><BFP:mode>2): Rename to
>>> 	<FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>, use type instead of
>>> 	mode with fsimp.
>>> 	(rint<BFP:mode>2): Rename to rint<BFP:mode>2<BFP:tf_fpr>, use
>>> 	type instead of mode with fsimp.
>>> 	(<FPINT:fpint_name><DFP:mode>2): Use type instead of mode for
>>> 	fsimp.
>>> 	(rint<DFP:mode>2): Likewise.
>>> 	(trunc<BFP:mode><DFP_ALL:mode>2): Rename to
>>> 	trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
>>> 	(trunc<DFP_ALL:mode><BFP:mode>2): Rename to
>>> 	trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
>>> 	(extend<BFP:mode><DFP_ALL:mode>2): Rename to
>>> 	extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
>>> 	(extend<DFP_ALL:mode><BFP:mode>2): Rename to
>>> 	extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
>>> 	(add<mode>3): Rename to add<mode>3<tf_fpr>, use type instead of
>>> 	mode with fsimp.
>>> 	(*add<mode>3_cc): Use type instead of mode with fsimp.
>>> 	(*add<mode>3_cconly): Likewise.
>>> 	(sub<mode>3): Rename to sub<mode>3<tf_fpr>, use type instead of
>>> 	mode with fsimp.
>>> 	(*sub<mode>3_cc): Use type instead of mode with fsimp.
>>> 	(*sub<mode>3_cconly): Likewise.
>>> 	(mul<mode>3): Rename to mul<mode>3<tf_fpr>, use type instead of
>>> 	mode with fsimp.
>>> 	(fma<mode>4): Restrict using s390_fma_allowed_p.
>>> 	(fms<mode>4): Restrict using s390_fma_allowed_p.
>>> 	(div<mode>3): Rename to div<mode>3<tf_fpr>, use type instead of
>>> 	mode with fdiv.
>>> 	(neg<mode>2): Rename to neg<mode>2<tf_fpr>.
>>> 	(*neg<mode>2_cc): Use type instead of mode with fsimp.
>>> 	(*neg<mode>2_cconly): Likewise.
>>> 	(*neg<mode>2_nocc): Likewise.
>>> 	(*neg<mode>2): Likeiwse.
>>> 	(abs<mode>2): Rename to abs<mode>2<tf_fpr>, use type instead of
>>> 	mode with fdiv.
>>> 	(*abs<mode>2_cc): Use type instead of mode with fsimp.
>>> 	(*abs<mode>2_cconly): Likewise.
>>> 	(*abs<mode>2_nocc): Likewise.
>>> 	(*abs<mode>2): Likewise.
>>> 	(*negabs<mode>2_cc): Likewise.
>>> 	(*negabs<mode>2_cconly): Likewise.
>>> 	(*negabs<mode>2_nocc): Likewise.
>>> 	(*negabs<mode>2): Likewise.
>>> 	(sqrt<mode>2): Rename to sqrt<mode>2<tf_fpr>, use type instead
>>> 	of mode with fsqrt.
>>> 	(cbranch<mode>4): Use FP_ANYTF instead of FP.
>>> 	(copysign<mode>3): Rename to copysign<mode>3<tf_fpr>, use type
>>> 	instead of mode with fsimp.
>>> 	* config/s390/s390.opt (flag_vx_long_double_fma): New
>>> 	undocumented option.
>>> 	* config/s390/vector.md (V_HW): Add TF for z14+.
>>> 	(V_HW2): Likewise.
>>> 	(VFT): Likewise.
>>> 	(VF_HW): Likewise.
>>> 	(V_128): Likewise.
>>> 	(tf_vr): New mode_attr.
>>> 	(tointvec): Add TF.
>>> 	(mov<mode>): Rename to mov<mode><tf_vr>.
>>> 	(movetf): New dispatcher.
>>> 	(*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to
>>> 	z13-.
>>> 	(*vec_tf_to_v1tf_vr): New pattern for z14+.
>>> 	(*fprx2_to_tf): Likewise.
>>> 	(*mov_tf_to_fprx2_0): Likewise.
>>> 	(*mov_tf_to_fprx2_1): Likewise.
>>> 	(add<mode>3): Rename to add<mode>3<tf_vr>.
>>> 	(addtf3): New dispatcher.
>>> 	(sub<mode>3): Rename to sub<mode>3<tf_vr>.
>>> 	(subtf3): New dispatcher.
>>> 	(mul<mode>3): Rename to mul<mode>3<tf_vr>.
>>> 	(multf3): New dispatcher.
>>> 	(div<mode>3): Rename to div<mode>3<tf_vr>.
>>> 	(divtf3): New dispatcher.
>>> 	(sqrt<mode>2): Rename to sqrt<mode>2<tf_vr>.
>>> 	(sqrttf2): New dispatcher.
>>> 	(fma<mode>4): Restrict using s390_fma_allowed_p.
>>> 	(fms<mode>4): Likewise.
>>> 	(neg_fma<mode>4): Likewise.
>>> 	(neg_fms<mode>4): Likewise.
>>> 	(neg<mode>2): Rename to neg<mode>2<tf_vr>.
>>> 	(negtf2): New dispatcher.
>>> 	(abs<mode>2): Rename to abs<mode>2<tf_vr>.
>>> 	(abstf2): New dispatcher.
>>> 	(float<mode>tf2_vr): New forwarder.
>>> 	(float<mode>tf2): New dispatcher.
>>> 	(floatuns<mode>tf2_vr): New forwarder.
>>> 	(floatuns<mode>tf2): New dispatcher.
>>> 	(fix_trunctf<mode>2_vr): New forwarder.
>>> 	(fix_trunctf<mode>2): New dispatcher.
>>> 	(fixuns_trunctf<mode>2_vr): New forwarder.
>>> 	(fixuns_trunctf<mode>2): New dispatcher.
>>> 	(<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>): New pattern.
>>> 	(<FPINT:fpint_name>tf2): New forwarder.
>>> 	(rint<mode>2<tf_vr>): New pattern.
>>> 	(rinttf2): New forwarder.
>>> 	(*trunctfdf2_vr): New pattern.
>>> 	(trunctfdf2_vr): New forwarder.
>>> 	(trunctfdf2): New dispatcher.
>>> 	(trunctfsf2_vr): New forwarder.
>>> 	(trunctfsf2): New dispatcher.
>>> 	(extenddftf2_vr): New pattern.
>>> 	(extenddftf2): New dispatcher.
>>> 	(extendsftf2_vr): New forwarder.
>>> 	(extendsftf2): New dispatcher.
>>> 	(signbittf2_vr): New forwarder.
>>> 	(signbittf2): New dispatchers.
>>> 	(isinftf2_vr): New forwarder.
>>> 	(isinftf2): New dispatcher.
>>> 	* config/s390/vx-builtins.md (*vftci<mode>_cconly): Use VF_HW
>>> 	instead of VECF_HW, add missing constraint, add vw support.
>>> 	(vftci<mode>_intcconly): Use VF_HW instead of VECF_HW.
>>> 	(*vftci<mode>): Rename to vftci<mode>, use VF_HW instead of
>>> 	VECF_HW, and vw support.
>>> 	(vftci<mode>_intcc): Use VF_HW instead of VECF_HW.
>>
>> ...
>>
>>> +; VX: TFmode in VR: use wfcxb
>>> +(define_insn "*cmptf_ccs"
>>> +  [(set (reg CC_REGNUM)
>>> +	(compare (match_operand:TF 0 "register_operand" "v")
>>> +		 (match_operand:TF 1 "general_operand"  "v")))]
>>
>> Is this really benefitial to allow general_operands here? Everything
>> except registers need to be reloaded anyway.  To my experience it is
>> helpful to emit the extra moves as early as possible to let the other
>> optimizers work with them.
> 
> The rtxes recognized by this pattern are initially generated by the
> generic cbranch expander, which allows general_operands and thus
> doesn't immediately reload.  If we don't allow general_operands here,
> rtxes generated by cbranch will be unrecognizable.

Yes, the expander would have to change as well. It would need to force the operand into a reg
whenever TFmodes are supposed to live in VRs. It would be nice to see if it would do any good. But
I'm ok with it as is.

One other thing I noticed is that the new "float<mode>tf2" expander in vector.md is lacking a
TARGET_ZARCH in the insn condition. Did you have multilibs enabled when running the regression tests?

Andreas