From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 05AFD3858D35 for ; Mon, 20 Nov 2023 08:56:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 05AFD3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 05AFD3858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700470604; cv=none; b=YJ3U9D1mp/4znp1pWBSoMtKTXuqfgvU+qJAVN7u2KeAxQ/7SXfoHaiP/TRvgxlr7aIUNx84hEJLV2w7raPC0S8dOfu7wja7XBF+GXJ79h205Xs8+sHay/dVX4vFRXQ525M7OfQ3C84Zd3pBvU8nRZM5QJskzwnnDGcmOL1e1aQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700470604; c=relaxed/simple; bh=0HpspA5TkV+G6zuLCyvDLov/3lTMLMIOlgJ1+rZ095s=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=jQjmwQQoWI2J7nqxexE+p4Nd8+FSeOfmhYoKegxs+U2ymNdThWPJ0SbctSUSiZx9OGKhVBLX1PKSyBVlCVyk3W3jauokw9bfNlkkf/dhoKOsX3MAiFCBrGv5Otc1dc0Hfrw7W11EubNvVhUZ5q2OW3qdgF1IpY2HYiIcmfA9Qu4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK8ObML020694; Mon, 20 Nov 2023 08:56:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=bA3dc3oe3qtvzndHub6gjM3Wdv+yYFYLUn+kDsY600c=; b=oyrzYXFVFJL5qn4QXGU6Nm8QA3+Ys2mAqHmMzzSd1q56J997lZDK2+t7u0bHeghvUaf/ oCxEh1tHv3LzezEYW9T3GkQQsw78yDQnNlLfD3HfXwNqRYyEzofGnbqjfIHOTqFVRwiJ DfX+vhHNlTxij/qDjZRyaBCYI9J2EiWXbypJHy9zzlnGEsGGKwrvGak0ozWzKTQYnbYD cO8PEMATj6K/SLmvJbhlLsyBMdfdC+sHAXQ/5f0b7b/tza252GdGjQcadeNwjYopD7CQ 6m1suUCchydJDhL4KyEZYbmbXj8yQJUzU7YGohp3A69vVputmxD0YP1ssXSkDwwtgI17 Mw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ug3wvrnsb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 08:56:40 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AK8kxgq023734; Mon, 20 Nov 2023 08:56:39 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ug3wvrns3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 08:56:39 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK7KUhm008987; Mon, 20 Nov 2023 08:56:39 GMT Received: from smtprelay06.wdc07v.mail.ibm.com ([172.16.1.73]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3uf7yy873s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 08:56:39 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay06.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AK8ucpo15008454 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 08:56:38 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2707358056; Mon, 20 Nov 2023 08:56:38 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A9F0258061; Mon, 20 Nov 2023 08:56:37 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.1.46]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTPS; Mon, 20 Nov 2023 08:56:37 +0000 (GMT) Date: Mon, 20 Nov 2023 03:56:36 -0500 From: Michael Meissner To: Richard Biener Cc: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: Re: [PATCH 0/4] Add vector pair support to PowerPC attribute((vector_size(32))) Message-ID: Mail-Followup-To: Michael Meissner , Richard Biener , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: EHEKZnL2fYhf6toMjX43_uUYB3T5bq_l X-Proofpoint-GUID: ZkLAq10ZcZPjKwAv_l0mZnDA7gjvICIB X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_06,2023-11-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 clxscore=1015 bulkscore=0 spamscore=0 adultscore=0 mlxlogscore=868 priorityscore=1501 suspectscore=0 lowpriorityscore=0 phishscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200058 X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Nov 20, 2023 at 08:24:35AM +0100, Richard Biener wrote: > I wouldn't expose the "fake" larger modes to the vectorizer but rather > adjust m_suggested_unroll_factor (which you already do to some extent). Thanks. I figure I first need to fix the shuffle byes issue first and get a clean test run (with the flag enabled by default), before delving into the vectorization issues. But testing has shown that at least in the loop I was looking at, that using vector pair instructions (either through the built-ins I had previously posted or with these patches), that even if I turn off unrolling completely for the vector pair case, it still is faster than unrolling the loop 4 times for using vector types (or auto vectorization). Note, of course the margin is much smaller in this case. vector double: (a * b) + c, unroll 4 loop time: 0.55483 vector double: (a * b) + c, unroll default loop time: 0.55638 vector double: (a * b) + c, unroll 0 loop time: 0.55686 vector double: (a * b) + c, unroll 2 loop time: 0.55772 vector32, w/vector pair: (a * b) + c, unroll 4 loop time: 0.48257 vector32, w/vector pair: (a * b) + c, unroll 2 loop time: 0.50782 vector32, w/vector pair: (a * b) + c, unroll default loop time: 0.50864 vector32, w/vector pair: (a * b) + c, unroll 0 loop time: 0.52224 Of course being micro-benchmarks, it doesn't mean that this translates to the behavior on actual code. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meissner@linux.ibm.com