From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 766F0394481E for ; Wed, 4 Nov 2020 22:13:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 766F0394481E Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0A4M2jej171908 for ; Wed, 4 Nov 2020 17:13:03 -0500 Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0b-001b2d01.pphosted.com with ESMTP id 34kv449g33-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 04 Nov 2020 17:13:02 -0500 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0A4M7tJZ020699 for ; Wed, 4 Nov 2020 22:13:01 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma05fra.de.ibm.com with ESMTP id 34h01qteu1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 04 Nov 2020 22:13:00 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0A4MCw6N3342888 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Nov 2020 22:12:58 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 300314C046; Wed, 4 Nov 2020 22:12:58 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F05294C040; Wed, 4 Nov 2020 22:12:57 +0000 (GMT) Received: from sig-9-145-1-210.uk.ibm.com (unknown [9.145.1.210]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 4 Nov 2020 22:12:57 +0000 (GMT) Message-ID: <18e1d3edb99d7656fcf04134a94b491abeffda6f.camel@linux.ibm.com> Subject: Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible From: Ilya Leoshkevich To: Andreas Krebbel Cc: gcc-patches@gcc.gnu.org Date: Wed, 04 Nov 2020 23:12:57 +0100 In-Reply-To: References: <20201103213637.1876906-1-iii@linux.ibm.com> <20201103214547.1877703-1-iii@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737 definitions=2020-11-04_15:2020-11-04, 2020-11-04 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 priorityscore=1501 mlxscore=0 bulkscore=0 impostorscore=0 malwarescore=0 suspectscore=4 lowpriorityscore=0 phishscore=0 clxscore=1015 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011040156 X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2020 22:13:05 -0000 On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote: > On 03.11.20 22:45, Ilya Leoshkevich wrote: > > On z14+, there are instructions for working with 128-bit floats > > (long > > doubles) in vector registers. It's beneficial to use them instead > > of > > instructions that operate on floating point register pairs, because > > it > > allows to store 4 times more data in registers at a time, > > relieveing > > register pressure. The performance of new instructions is almost > > the > > same. > > > > Implement by storing TFmode values in vector registers on > > z14+. Since > > not all operations are available with the new instructions, keep > > the old > > ones using the new FPRX2 mode, and convert between it and TFmode > > when > > necessary (this is called "forwarder" expanders below). Change the > > existing TFmode expanders to call either new- or old-style ones > > depending on whether we are on z14+ or older machines ("dispatcher" > > expanders). > > > > gcc/ChangeLog: > > > > 2020-11-03 Ilya Leoshkevich > > > > * config/s390/s390-modes.def (FPRX2): New mode. > > * config/s390/s390-protos.h (s390_fma_allowed_p): New function. > > * config/s390/s390.c (s390_fma_allowed_p): Likewise. > > (s390_build_signbit_mask): Support 128-bit masks. > > (print_operand): Support printing the second word of a TFmode > > operand as vector register. > > (constant_modes): Add FPRX2mode. > > (s390_class_max_nregs): Return 1 for TFmode on z14+. > > (s390_is_fpr128): New function. > > (s390_is_vr128): Likewise. > > (s390_can_change_mode_class): Use s390_is_fpr128 and > > s390_is_vr128 in order to determine whether mode refers to a > > FPR > > pair or to a VR. > > * config/s390/s390.h (EXPAND_MOVTF): New macro. > > (EXPAND_TF): Likewise. > > * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF > > alias. > > (ALL): Add FPRX2. > > (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-. > > (FP): Likewise. > > (FP_ANYTF): New mode iterator. > > (BFP): Add FPRX2 for z14+, restrict TFmode to z13-. > > (TD_TF): Likewise. > > (xde): Add FPRX2. > > (nBFP): Likewise. > > (nDFP): Likewise. > > (DSF): Likewise. > > (DFDI): Likewise. > > (SFSI): Likewise. > > (DF): Likewise. > > (SF): Likewise. > > (fT0): Likewise. > > (bt): Likewise. > > (_d): Likewise. > > (HALF_TMODE): Likewise. > > (tf_fpr): New mode_attr. > > (type): New mode_attr. > > (*cmp_ccz_0): Use type instead of mode with fsimp. > > (*cmp_ccs_0_fastmath): Likewise. > > (*cmptf_ccs): New pattern for wfcxb. > > (*cmptf_ccsfps): New pattern for wfkxb. > > (mov): Rename to mov. > > (signbit2): Rename to signbit2. > > (isinf2): Renamed to isinf2. > > (*TDC_insn_): Use type instead of mode with fsimp. > > (fixuns_trunc2): Rename to > > fixuns_trunc2. > > (fix_trunctf2): Rename to fix_trunctf2_fpr. > > (floatdi2): Rename to floatdi2, use type > > instead of mode with itof. > > (floatsi2): Rename to floatsi2, use type > > instead of mode with itof. > > (*floatuns2): Use type instead of mode for > > itof. > > (floatuns2): Rename to > > floatuns2. > > (trunctf2): Rename to trunctf2_fpr, use type > > instead > > of mode with fsimp. > > (extend2): Rename to > > extend2. > > (2): Rename to > > 2, use type instead of > > mode with fsimp. > > (rint2): Rename to rint2, use > > type instead of mode with fsimp. > > (2): Use type instead of mode for > > fsimp. > > (rint2): Likewise. > > (trunc2): Rename to > > trunc2. > > (trunc2): Rename to > > trunc2. > > (extend2): Rename to > > extend2. > > (extend2): Rename to > > extend2. > > (add3): Rename to add3, use type instead of > > mode with fsimp. > > (*add3_cc): Use type instead of mode with fsimp. > > (*add3_cconly): Likewise. > > (sub3): Rename to sub3, use type instead of > > mode with fsimp. > > (*sub3_cc): Use type instead of mode with fsimp. > > (*sub3_cconly): Likewise. > > (mul3): Rename to mul3, use type instead of > > mode with fsimp. > > (fma4): Restrict using s390_fma_allowed_p. > > (fms4): Restrict using s390_fma_allowed_p. > > (div3): Rename to div3, use type instead of > > mode with fdiv. > > (neg2): Rename to neg2. > > (*neg2_cc): Use type instead of mode with fsimp. > > (*neg2_cconly): Likewise. > > (*neg2_nocc): Likewise. > > (*neg2): Likeiwse. > > (abs2): Rename to abs2, use type instead of > > mode with fdiv. > > (*abs2_cc): Use type instead of mode with fsimp. > > (*abs2_cconly): Likewise. > > (*abs2_nocc): Likewise. > > (*abs2): Likewise. > > (*negabs2_cc): Likewise. > > (*negabs2_cconly): Likewise. > > (*negabs2_nocc): Likewise. > > (*negabs2): Likewise. > > (sqrt2): Rename to sqrt2, use type instead > > of mode with fsqrt. > > (cbranch4): Use FP_ANYTF instead of FP. > > (copysign3): Rename to copysign3, use type > > instead of mode with fsimp. > > * config/s390/s390.opt (flag_vx_long_double_fma): New > > undocumented option. > > * config/s390/vector.md (V_HW): Add TF for z14+. > > (V_HW2): Likewise. > > (VFT): Likewise. > > (VF_HW): Likewise. > > (V_128): Likewise. > > (tf_vr): New mode_attr. > > (tointvec): Add TF. > > (mov): Rename to mov. > > (movetf): New dispatcher. > > (*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to > > z13-. > > (*vec_tf_to_v1tf_vr): New pattern for z14+. > > (*fprx2_to_tf): Likewise. > > (*mov_tf_to_fprx2_0): Likewise. > > (*mov_tf_to_fprx2_1): Likewise. > > (add3): Rename to add3. > > (addtf3): New dispatcher. > > (sub3): Rename to sub3. > > (subtf3): New dispatcher. > > (mul3): Rename to mul3. > > (multf3): New dispatcher. > > (div3): Rename to div3. > > (divtf3): New dispatcher. > > (sqrt2): Rename to sqrt2. > > (sqrttf2): New dispatcher. > > (fma4): Restrict using s390_fma_allowed_p. > > (fms4): Likewise. > > (neg_fma4): Likewise. > > (neg_fms4): Likewise. > > (neg2): Rename to neg2. > > (negtf2): New dispatcher. > > (abs2): Rename to abs2. > > (abstf2): New dispatcher. > > (floattf2_vr): New forwarder. > > (floattf2): New dispatcher. > > (floatunstf2_vr): New forwarder. > > (floatunstf2): New dispatcher. > > (fix_trunctf2_vr): New forwarder. > > (fix_trunctf2): New dispatcher. > > (fixuns_trunctf2_vr): New forwarder. > > (fixuns_trunctf2): New dispatcher. > > (2): New pattern. > > (tf2): New forwarder. > > (rint2): New pattern. > > (rinttf2): New forwarder. > > (*trunctfdf2_vr): New pattern. > > (trunctfdf2_vr): New forwarder. > > (trunctfdf2): New dispatcher. > > (trunctfsf2_vr): New forwarder. > > (trunctfsf2): New dispatcher. > > (extenddftf2_vr): New pattern. > > (extenddftf2): New dispatcher. > > (extendsftf2_vr): New forwarder. > > (extendsftf2): New dispatcher. > > (signbittf2_vr): New forwarder. > > (signbittf2): New dispatchers. > > (isinftf2_vr): New forwarder. > > (isinftf2): New dispatcher. > > * config/s390/vx-builtins.md (*vftci_cconly): Use VF_HW > > instead of VECF_HW, add missing constraint, add vw support. > > (vftci_intcconly): Use VF_HW instead of VECF_HW. > > (*vftci): Rename to vftci, use VF_HW instead of > > VECF_HW, and vw support. > > (vftci_intcc): Use VF_HW instead of VECF_HW. > > ... > > > +; VX: TFmode in VR: use wfcxb > > +(define_insn "*cmptf_ccs" > > + [(set (reg CC_REGNUM) > > + (compare (match_operand:TF 0 "register_operand" "v") > > + (match_operand:TF 1 "general_operand" "v")))] > > Is this really benefitial to allow general_operands here? Everything > except registers need to be reloaded anyway. To my experience it is > helpful to emit the extra moves as early as possible to let the other > optimizers work with them. The rtxes recognized by this pattern are initially generated by the generic cbranch expander, which allows general_operands and thus doesn't immediately reload. If we don't allow general_operands here, rtxes generated by cbranch will be unrecognizable.