From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 87F023857404; Mon, 31 Oct 2022 02:42:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 87F023857404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29V1se3R035465; Mon, 31 Oct 2022 02:42:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id; s=pp1; bh=BDhawf4A3ITzd9YJ5/c00zwLqtMvJ9Q0F/kEpvwzrIY=; b=UDAnrwdTq37MJdw3OkN7wKNC2dTyMQGlFxzRzGDMKKpsBZkejufZC5bx1B7t96xaE1WO oVTZTh6UYag4CgW4/DruMaMBOyhIi4JqgyO5vM4DCd1o9j+ff8u+seQOHLO0MUrM5E13 fV6/rHi8Y5bj+S4c6/0qegZwXquaqebwScKjsVw1ps4VScK2EOiez5+di9ZApbFja5UG pW+7S3/BJ1tDqqAOEvGtP0AUQcA5oj9Z6/XSYSs3U1WMQwdMGhUdiLeQywaHRcra8Wp9 i12/g/JgHeapkRX4ljdn03TyQuuZmO5AR15cSl6NdXAlyDLaE8aGtytu6/ZnxlCO9wkN PA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3khxgpyy5y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 31 Oct 2022 02:42:41 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 29V2cG6B019422; Mon, 31 Oct 2022 02:42:41 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3khxgpyy5a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 31 Oct 2022 02:42:41 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 29V2aKYB003414; Mon, 31 Oct 2022 02:42:39 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma05fra.de.ibm.com with ESMTP id 3kgut8smv1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 31 Oct 2022 02:42:39 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 29V2bCox47055240 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 31 Oct 2022 02:37:12 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E210711C052; Mon, 31 Oct 2022 02:42:36 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E431111C04A; Mon, 31 Oct 2022 02:42:35 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 31 Oct 2022 02:42:35 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, guojiufu@linux.ibm.com, rguenth@gcc.gnu.org, pinskia@gcc.gnu.org Subject: [RFC] propgation leap over memory copy for struct Date: Mon, 31 Oct 2022 10:42:35 +0800 Message-Id: <20221031024235.110995-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: hW_v0wqdbnfbg6h7su5UD_R_1wcMeoRF X-Proofpoint-ORIG-GUID: kx_3h2at7MVMl6apMHdx8VDzFXGue3JV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-30_16,2022-10-27_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 lowpriorityscore=0 clxscore=1011 phishscore=0 malwarescore=0 bulkscore=0 mlxlogscore=999 suspectscore=0 impostorscore=0 priorityscore=1501 spamscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2210310015 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, We know that for struct variable assignment, memory copy may be used. And for memcpy, we may load and store more bytes as possible at one time. While it may be not best here: 1. Before/after stuct variable assignment, the vaiable may be operated. And it is hard for some optimizations to leap over memcpy. Then some struct operations may be sub-optimimal. Like the issue in PR65421. 2. The size of struct is constant mostly, the memcpy would be expanded. Using small size to load/store and executing in parallel may not slower than using large size to loat/store. (sure, more registers may be used for smaller bytes.) In PR65421, For source code as below: ////////t.c #define FN 4 typedef struct { double a[FN]; } A; A foo (const A *a) { return *a; } A bar (const A a) { return a; } /////// If FN<=2; the size of "A" fits into TImode, then this code can be optimized (by subreg/cse/fwprop/cprop) as: ------- foo: .LFB0: .cfi_startproc blr bar: .LFB1: .cfi_startproc lfd 2,8(3) lfd 1,0(3) blr -------- If the size of "A" is larger than any INT mode size, RTL insns would be generated as: 13: r125:V2DI=[r112:DI+0x20] 14: r126:V2DI=[r112:DI+0x30] 15: [r112:DI]=r125:V2DI 16: [r112:DI+0x10]=r126:V2DI /// memcpy for assignment: D.3338 = arg; 17: r127:DF=[r112:DI] 18: r128:DF=[r112:DI+0x8] 19: r129:DF=[r112:DI+0x10] 20: r130:DF=[r112:DI+0x18] ------------ I'm thinking about ways to improve this. Metod1: One way may be changing the memory copy by referencing the type of the struct if the size of struct is not too big. And generate insns like the below: 13: r125:DF=[r112:DI+0x20] 15: r126:DF=[r112:DI+0x28] 17: r127:DF=[r112:DI+0x30] 19: r128:DF=[r112:DI+0x38] 14: [r112:DI]=r125:DF 16: [r112:DI+0x8]=r126:DF 18: [r112:DI+0x10]=r127:DF 20: [r112:DI+0x18]=r128:DF 21: r129:DF=[r112:DI] 22: r130:DF=[r112:DI+0x8] 23: r131:DF=[r112:DI+0x10] 24: r132:DF=[r112:DI+0x18] Then passes (cse, prop, dse...) could help to optimize the code. Concerns of the method: we may not need to do this if the number of fields is too large. And the types/modes of each load/store may depend on the platform and not same with the type of the fields of the struct. For example: For "struct {double a[3]; long long l;}", on ppc64le, DImode may be better for assignments on parameter. Method2: One way may be enhancing CSE to make it able to treat one large memory slot as two(or more) combined slots: 13: r125:V2DI#0=[r112:DI+0x20] 13': r125:V2DI#8=[r112:DI+0x28] 15: [r112:DI]#0=r125:V2DI#0 15': [r112:DI]#8=r125:V2DI#8 This may seems more hack in CSE. Method3: For some record type, use "PARALLEL:BLK" instead "MEM:BLK". To do this, "moving" between "PARALLEL<->PARALLEL" and "PARALLEL<->MEM" may need to be enhanced. This method may require more effort to make it works for corner/unknown cases. I'm wondering which would be more flexible to handle this issue? Thanks for any comments and suggestions! BR, Jeff(Jiufu)