From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 2852F3858C5F for ; Thu, 8 Feb 2024 00:35:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2852F3858C5F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2852F3858C5F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707352558; cv=none; b=dPFveK76td3GdVr+Jy2PcpQYGINRiQkDTJhecuhcAyJhkFxQ0oLgnueGdiuupvOhz/JA/0ZsPnWtSwcLLYv2EijcqARh9HouQNDyXbeI27//CY9nhS8IbbyuVTlPK+Q0piIjvZtPzpK2QoNxlWedlI/Vmg5iYbUGRG3oX93w9gs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707352558; c=relaxed/simple; bh=g1RJcmt+3THW38iJfCn5E5oDWV2HtXflW/W6ywX7qIY=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=DJ2rF+ETX37tGxdryn1Sh0Ol1/Mdyf9+57ZXtbxjeO+EAQqqHcCpDYFgruqghxpTW1BC1grnnKTLcaUxzaRiooe2NK3Dxy/d5UaeubKdR0mTgFHv0zZ4jmtv8yd6sy8i/zgeY8BKc6z7AthV+8cN7jNlfeESTRcvBnEJakckF5g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4180RGtW029332; Thu, 8 Feb 2024 00:35:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=i9zXEyJY5X4zfFj0Z3I998Grr6rq2vpsnojPvpMaM0A=; b=M08zYj1++cdbSlUi/2IdGafXMZzZT+r/Wke+7vrzns63KPH5l9agFXJj1yf+7pO8aAGI nv5OiyFmh7rPvMfa2xa3MOfPZMXGSS0iu5twOYHqDfCIPXoz/vKh7KPHeBJyi8pAa9Uy Occ1zZPnSXQq8SWHM8v1zNPgzI9qDhWPRLB0fWzgYIk0AopN1Y3ePxrDC0ZPTthuX621 l/xZasYKmTmDT3CasY8VvJZ5dvICpWUa5cH+81DElhTR4A0NpjjwgLZctXGuCoystxcq Uto/QsRLmmQ/2onrEUfoGU1ExcRjjhgKNew0g18ZMlgfAuWf3g/z4/6Uh2MXyIUQSk3l BQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3w4me185hj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Feb 2024 00:35:54 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4180U3nP004009; Thu, 8 Feb 2024 00:35:54 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3w4me185h6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Feb 2024 00:35:54 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 417NAQNe008519; Thu, 8 Feb 2024 00:35:53 GMT Received: from smtprelay07.dal12v.mail.ibm.com ([172.16.1.9]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3w221k8whu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Feb 2024 00:35:53 +0000 Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com [10.241.53.101]) by smtprelay07.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4180Zqja35586542 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 8 Feb 2024 00:35:52 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AE37958051; Thu, 8 Feb 2024 00:35:52 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 39DCB5805C; Thu, 8 Feb 2024 00:35:52 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.187.161]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTPS; Thu, 8 Feb 2024 00:35:52 +0000 (GMT) Date: Wed, 7 Feb 2024 19:35:50 -0500 From: Michael Meissner To: "Kewen.Lin" Cc: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner Subject: Re: Repost [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers. Message-ID: Mail-Followup-To: Michael Meissner , "Kewen.Lin" , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: OijaefZTRdrCOhXFxJoeiU69lEJ-JlH9 X-Proofpoint-ORIG-GUID: FtE36O5GB_GA1gPNaWH9nKYbAf1l62bM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-07_09,2024-02-07_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 clxscore=1015 malwarescore=0 spamscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 bulkscore=0 lowpriorityscore=0 impostorscore=0 priorityscore=1501 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402080001 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Feb 05, 2024 at 11:58:31AM +0800, Kewen.Lin wrote: > Hi Mike, I will comment on about 1/2 of the things, and come back with the other comments. > on 2024/1/6 07:42, Michael Meissner wrote: > > This patch is a prelimianry patch to add the full 1,024 bit dense math register> (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the > > DMR register. > > > > This patch only adds the new 1,024 bit register support. It does not add > > support for any instructions that need 1,024 bit registers instead of 512 bit > > registers. > > > > I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit > > typo: 1,204 Thanks. > > +(define_insn_and_split "*movtdo" > > + [(set (match_operand:TDO 0 "nonimmediate_operand" "=wa,m,wa,wD,wD,wa") > > + (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))] > > + "TARGET_DENSE_MATH > > + && (gpc_reg_operand (operands[0], TDOmode) > > + || gpc_reg_operand (operands[1], TDOmode))" > > + "@ > > + # > > + # > > + # > > + # > > + dmmr %0,%1 > > + #" > > + "&& reload_completed > > + && (!dmr_operand (operands[0], TDOmode) || !dmr_operand (operands[1], TDOmode))" > > + [(const_int 0)] > > +{ > > + rtx op0 = operands[0]; > > + rtx op1 = operands[1]; > > + > > + if (REG_P (op0) && REG_P (op1)) > > + { > > + int regno0 = REGNO (op0); > > + int regno1 = REGNO (op1); > > + > > + if (DMR_REGNO_P (regno0) && VSX_REGNO_P (regno1)) > > + { > > + rtx op1_upper = gen_rtx_REG (XOmode, regno1); > > + rtx op1_lower = gen_rtx_REG (XOmode, regno1 + 4); > > + emit_insn (gen_movtdo_insert512_upper (op0, op1_upper)); > > + emit_insn (gen_movtdo_insert512_lower (op0, op0, op1_lower)); > > + DONE; > > + } > > + > > + else if (VSX_REGNO_P (regno0) && DMR_REGNO_P (regno1)) > > + { > > + rtx op0_upper = gen_rtx_REG (XOmode, regno0); > > + rtx op0_lower = gen_rtx_REG (XOmode, regno0 + 4); > > + emit_insn (gen_movtdo_extract512 (op0_upper, op1, const0_rtx)); > > + emit_insn (gen_movtdo_extract512 (op0_lower, op1, const1_rtx)); > > + DONE; > > + } > > Add an assertion like gcc_assert (VSX_REGNO_P (regno1) && VSX_REGNO_P (regno2))? Ok. > > + > > +;; Reload DMR registers from memory > > +(define_insn_and_split "reload_dmr_from_memory" > > + [(set (match_operand:TDO 0 "dmr_operand" "=wD") > > + (unspec:TDO [(match_operand:TDO 1 "memory_operand" "m")] > > + UNSPEC_DMR_RELOAD_FROM_MEMORY)) > > + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] > > + "TARGET_DENSE_MATH" > > + "#" > > + "&& reload_completed" > > + [(const_int 0)] > > +{ > > + rtx dest = operands[0]; > > + rtx src = operands[1]; > > + rtx tmp = operands[2]; > > + rtx mem_upper = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 0 : 32); > > + rtx mem_lower = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 32 : 0); > > I think the offset should be 64 rather than 32. Good catch, thanks. > > + > > + emit_move_insn (tmp, mem_upper); > > + emit_insn (gen_movtdo_insert512_upper (dest, tmp)); > > + > > + emit_move_insn (tmp, mem_lower); > > + emit_insn (gen_movtdo_insert512_lower (dest, dest, tmp)); > > + DONE; > > +} > > + [(set_attr "length" "16") > > + (set_attr "max_prefixed_insns" "2") > > + (set_attr "type" "vecload")]) > > + > > +;; Reload dense math registers to memory > > +(define_insn_and_split "reload_dmr_to_memory" > > + [(set (match_operand:TDO 0 "memory_operand" "=m") > > + (unspec:TDO [(match_operand:TDO 1 "dmr_operand" "wD")] > > + UNSPEC_DMR_RELOAD_TO_MEMORY)) > > + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] > > + "TARGET_DENSE_MATH" > > + "#" > > + "&& reload_completed" > > + [(const_int 0)] > > +{ > > + rtx dest = operands[0]; > > + rtx src = operands[1]; > > + rtx tmp = operands[2]; > > + rtx mem_upper = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 0 : 32); > > + rtx mem_lower = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 32 : 0); > > Ditto. Yep. > > diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc > > index 6698274031b..54868d2009c 100644 > > --- a/gcc/config/rs6000/rs6000-builtin.cc > > +++ b/gcc/config/rs6000/rs6000-builtin.cc > > @@ -495,6 +495,8 @@ const char *rs6000_type_string (tree type_node) > > return "__vector_pair"; > > else if (type_node == vector_quad_type_node) > > return "__vector_quad"; > > + else if (type_node == dmr_type_node) > > + return "__dmr"; > > > > return "unknown"; > > } > > @@ -781,6 +783,17 @@ rs6000_init_builtins (void) > > t = build_qualified_type (vector_quad_type_node, TYPE_QUAL_CONST); > > ptr_vector_quad_type_node = build_pointer_type (t); > > > > + dmr_type_node = make_node (OPAQUE_TYPE); > > + SET_TYPE_MODE (dmr_type_node, TDOmode); > > + TYPE_SIZE (dmr_type_node) = bitsize_int (GET_MODE_BITSIZE (TDOmode)); > > + TYPE_PRECISION (dmr_type_node) = GET_MODE_BITSIZE (TDOmode); > > + TYPE_SIZE_UNIT (dmr_type_node) = size_int (GET_MODE_SIZE (TDOmode)); > > + SET_TYPE_ALIGN (dmr_type_node, 512); > > why not 1024? Since we don't have a 1,024 bit load/store and have to use multiple vector pair or vector load/stores, there is no reason to ask for a 1,024 alignment. In addition, I would worry that having a larger alignment might be an issue with the stack, since I don't believe we have support for aligning the stack to 1,024 bit boundaries. > > --- a/gcc/config/rs6000/rs6000-call.cc > > +++ b/gcc/config/rs6000/rs6000-call.cc > > @@ -437,7 +437,8 @@ rs6000_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED) > > if (cfun > > && !cfun->machine->mma_return_type_error > > && TREE_TYPE (cfun->decl) == fntype > > - && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode)) > > + && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode > > + || TYPE_MODE (type) == TDOmode)) > > May be just with OPAQUE_MODE_P (TYPE_MODE (type)) for all the cases on type mode. Basically I forgot about using OPAQUE_MODE in this case. Using OPAQUE_MODE is better. > So far only rs6000 defines OPAQUE_MODE, if we are worried that there are some generic opaque modes > some day, we can probably add one assertion somewhere to guaratee it. Or add one macro like > OPAQUE_MMA_MODE_P to ensure it only matches {OO,XO,TDO}mode. > > > { > > /* Record we have now handled function CFUN, so the next time we > > are called, we do not re-report the same error. */ > > @@ -1641,6 +1642,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const function_arg_info &arg) > > return NULL_RTX; > > } > > > > + if (mode == TDOmode) > > + { > > + if (TYPE_CANONICAL (type) != NULL_TREE) > > + type = TYPE_CANONICAL (type); > > + error ("invalid use of dense math operand of type %qs as a function " > > + "parameter", > > + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)))); > > + return NULL_RTX; > > + } > > Can we merge this hunk into the above hunk for OOmode and XOmode? Then the code with TYPE_CANONICAL > can be shared and better to maintain. IMHO, this dense math operand is also MMA operand so the above > error message still works, if it's desired to note this dense math operand then we can use > (mode == TDOmode)? "dense math": "MMA" for the different string part. I will need to look into this later. > > + > > /* Return a marker to indicate whether CR1 needs to set or clear the > > bit that V.4 uses to say fp args were passed in registers. > > Assume that we don't need the marker for software floating point, > > diff --git a/gcc/config/rs6000/rs6000-modes.def b/gcc/config/rs6000/rs6000-modes.def > > index 094b246c834..60ebb363196 100644 > > --- a/gcc/config/rs6000/rs6000-modes.def > > +++ b/gcc/config/rs6000/rs6000-modes.def > > @@ -86,3 +86,7 @@ PARTIAL_INT_MODE (TI, 128, PTI); > > /* Modes used by __vector_pair and __vector_quad. */ > > OPAQUE_MODE (OO, 32); > > OPAQUE_MODE (XO, 64); > > + > > +/* Modes used by __dmr. */ > > Nit: s/Modes/Mode/ > > > +OPAQUE_MODE (TDO, 128); > > + > > I assumed that "TD" stands for something but I have no idea (at least not obvious to me), > could we also put some comments for it? Basically Segher and I went back and forth on the names. I would have to dig into my notes what TDO stands for. > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > > index 59517c8608d..aed4b72c4ea 100644 > > --- a/gcc/config/rs6000/rs6000.cc > > +++ b/gcc/config/rs6000/rs6000.cc > > @@ -1846,7 +1846,9 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode mode) > > 128-bit floating point that can go in vector registers, which has VSX > > memory addressing. */ > > if (FP_REGNO_P (regno)) > > - reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode) > > + reg_size = (VECTOR_MEM_VSX_P (mode) > > + || VECTOR_ALIGNMENT_P (mode) > > + || mode == TDOmode > > Redundant change, since VECTOR_ALIGNMENT_P considers TDOmode as this patch changes. Ok. And I'll get back to the rest of the comments shortly. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meissner@linux.ibm.com