From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 8A0513858D20 for ; Thu, 31 Aug 2023 04:32:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8A0513858D20 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1693456354; bh=b+s24or/KlN/ZxMz6FBsjPlG0KhoAkk8r3lLPHdsy8E=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=DqIaaMadAWzG5jUKAzv9tXwoXk8l6E8H2vZyYpfWeHJZ0WIFPGrCDrq815YcGN5m4 D7mzMq54DwgSDLJzgOK7qi8MIpihu+OQu6RlXajpr5sUyUVM3eRUoj9fhmP1603wVh SeWp8ae74Gyj/ExeWiJKZoQdPTs8c1+SCWVYYYBA= Received: from [IPv6:240e:358:1138:c900:dc73:854d:832e:3] (unknown [IPv6:240e:358:1138:c900:dc73:854d:832e:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 9D512659AC; Thu, 31 Aug 2023 00:32:30 -0400 (EDT) Message-ID: <721ef691e3f8e27b8097a8ee2f65d2b0f1315451.camel@xry111.site> Subject: Re: [PATCH v4] LoongArch:Implement 128-bit floating point functions in gcc. From: Xi Ruoyao To: chenxiaolong , gcc-patches@gcc.gnu.org Cc: i@xen0n.name, xuchenghua@loongson.cn, chenglulu@loongson.cn Date: Thu, 31 Aug 2023 12:32:26 +0800 In-Reply-To: <20230831024657.57063-1-chenxiaolong@loongson.cn> References: <20230831024657.57063-1-chenxiaolong@loongson.cn> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 MIME-Version: 1.0 X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,LIKELY_SPAM_FROM,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 2023-08-31 at 10:46 +0800, chenxiaolong wrote: > +;; Implement __builtin_fabs128 function. > + > +(define_expand "abstf2" > +=C2=A0 [(match_operand:TF 0 "register_operand") > +=C2=A0=C2=A0 (match_operand:TF 1 "register_operand")] > +=C2=A0 "TARGET_64BIT" > +{ > +=C2=A0 loongarch_emit_move (operands[0], operands[1]); > +=C2=A0 emit_insn (gen_abstf_local (operands[0])); > +=C2=A0 DONE; > +}) > + > +(define_insn "abstf_local" > +=C2=A0 [(set (match_operand:TF 0 "register_operand" "+r") > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(abs:TF (match_dup 0)))] > +=C2=A0 "TARGET_64BIT" > +{ > +=C2=A0 operands[0] =3D gen_rtx_REG (DImode, REGNO (operands[0]) + 1); > +=C2=A0 return "bstrins.d\t%0,$r0,0x3f,0x3f"; > +}) This should be removed because the "generic" expand works fine: $ cat t.c _Float128 fabsf128 (_Float128 in) { return __builtin_fabsf128 (in); } $ cc t.c -S -O2 -o- fabsf128: .LFB0 =3D . .cfi_startproc bstrpick.d $r5,$r5,62,0 jr $r1 .cfi_endproc It does not work with -O0, but -O0 means "not optimized" anyway. > +;; Implement __builtin_copysignf128 function. > + > +(define_insn_and_split "copysigntf3" > +=C2=A0 [(set (match_operand:TF 0 "register_operand" "=3D&r") > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(unspec:TF [(match_operand:TF = 1 "register_operand" "r") > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (match_operand:TF 2 "register_operand"= "r")] > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 UNSPEC_COPYSIGNF128))] > +=C2=A0 "TARGET_64BIT" > +=C2=A0 "#" > +=C2=A0 "reload_completed" > + [(const_int 0)] > +{ > +=C2=A0 rtx op0_lo =3D gen_rtx_REG (DImode,REGNO (operands[0]) + 0); > +=C2=A0 rtx op0_hi =3D gen_rtx_REG (DImode,REGNO (operands[0]) + 1); > +=C2=A0 rtx op1_lo =3D gen_rtx_REG (DImode,REGNO (operands[1]) + 0); > +=C2=A0 rtx op1_hi =3D gen_rtx_REG (DImode,REGNO (operands[1]) + 1); > +=C2=A0 rtx op2_hi =3D gen_rtx_REG (DImode,REGNO (operands[2]) + 1); > + > +=C2=A0 if (REGNO (operands[1]) =3D=3D REGNO (operands[2])) > +=C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 loongarch_emit_move (operands[0], operand= s[1]); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 DONE; > +=C2=A0=C2=A0=C2=A0 } > +=C2=A0 else > +=C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 loongarch_emit_move (op0_hi, op2_hi); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 loongarch_emit_move (op0_lo, op1_lo); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 emit_insn (gen_insvdi (op0_hi, GEN_INT (6= 3), GEN_INT (0), op1_hi)); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 DONE; > +=C2=A0=C2=A0=C2=A0 } > +}) Hmm... The generic implementation does not work: copysignf128: .LFB0 =3D . .cfi_startproc or $r12,$r0,$r0 lu52i.d $r12,$r12,0x8000000000000000>>52 and $r7,$r7,$r12 bstrpick.d $r5,$r5,62,0 or $r5,$r5,$r7 jr $r1 .cfi_endproc It's sub-optimal. But there seems a general issue about cases like int test(int a, int b) { return (a & ~0x10) | (b & 0x10); } It's compiled to: test: .LFB0 =3D . .cfi_startproc addi.w $r12,$r0,-17 # 0xffffffffffffffef and $r12,$r12,$r4 andi $r5,$r5,16 or $r12,$r12,$r5 slli.w $r4,$r12,0 jr $r1 .cfi_endproc But the optimal implementation should be: bstrpick.w $r4, $r4, 4, 4 bstrins.w $r5, $r4, 4, 4 or $r5, $r4, $r0 So to me we should fix the general case instead. Please hold this part (you can commit the remains of the patch w/o the loongarch.md change for now), and I'll try to fix the general case. Created https://gcc.gnu.org/PR111252 for tracking the issue. --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University