From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by sourceware.org (Postfix) with ESMTPS id B7A93386F033 for ; Thu, 17 Sep 2020 14:40:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B7A93386F033 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 08HENfX6061226; Thu, 17 Sep 2020 14:40:36 GMT Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 33gnrr9sgn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 17 Sep 2020 14:40:36 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 08HEPaL8099665; Thu, 17 Sep 2020 14:40:35 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 33h88bhywb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 17 Sep 2020 14:40:35 +0000 Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 08HEeSQN029796; Thu, 17 Sep 2020 14:40:29 GMT Received: from dhcp-10-154-150-155.vpn.oracle.com (/10.154.150.155) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 17 Sep 2020 14:40:28 +0000 From: Qing Zhao Message-Id: <2F962748-69FF-4B27-B579-857FE18D462D@ORACLE.COM> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.15\)) Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all] Date: Thu, 17 Sep 2020 09:40:26 -0500 In-Reply-To: Cc: Segher Boessenkool , Kees Cook , Kees Cook via Gcc-patches , Jakub Jelinek , Uros Bizjak , "Rodriguez Bahena, Victor" To: Richard Sandiford References: <2AF34264-D8FD-49F9-AB06-E1243DA9DB8A@ORACLE.COM> <59551C24-AAA9-4C2E-85EF-E204C35796BD@ORACLE.COM> <20200915194121.GS28786@gate.crashing.org> <5457F886-7DB4-46C0-A7EB-0A12D633FEC1@ORACLE.COM> <20200915230942.GY28786@gate.crashing.org> <20200916103535.GA28786@gate.crashing.org> X-Mailer: Apple Mail (2.3445.104.15) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9746 signatures=668679 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 spamscore=0 adultscore=0 suspectscore=4 phishscore=0 malwarescore=0 bulkscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009170113 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9747 signatures=668679 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 mlxscore=0 bulkscore=0 suspectscore=4 clxscore=1015 mlxlogscore=999 adultscore=0 priorityscore=1501 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009170113 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Sep 2020 14:40:43 -0000 > On Sep 17, 2020, at 1:17 AM, Richard Sandiford = wrote: >=20 > Qing Zhao writes: >> Segher and Richard,=20 >>=20 >> Now there are two major concerns from the discussion so far: >>=20 >> 1. (=46rom Richard): Inserting zero insns should be done after = pass_thread_prologue_and_epilogue since later passes (for example, = pass_regrename) might introduce new used caller-saved registers.=20 >> So, we should do this in the beginning of pass_late_compilation = (some targets wouldn=E2=80=99t cope with doing it later).=20 >>=20 >> 2. (=46rom Segher): The inserted zero insns should stay together with = the return, no other insns should move in-between zero insns and return = insns. Otherwise, a valid gadget could be formed.=20 >>=20 >> I think that both of the above 2 concerns are important and should be = addressed for the correct implementation.=20 >>=20 >> In order to support 1, we cannot implementing it in = =E2=80=9Ctargetm.gen_return()=E2=80=9D and = =E2=80=9Ctargetm.gen_simple_return()=E2=80=9D since = =E2=80=9Ctargetm.gen_return()=E2=80=9D and = =E2=80=9Ctargetm.gen_simple_return()=E2=80=9D are called during = pass_thread_prologue_and_epilogue, at that time, the use information = still not correct.=20 >>=20 >> In order to support 2, enhancing EPILOGUE_USES to include the zeroed = registgers is NOT enough to prevent all the zero insns from moving = around. >=20 > Right. The purpose of EPILOGUE_USES was instead to stop the moves = from > being deleted as dead. >=20 >> More restrictions need to be added to these new zero insns. (I think = that marking these new zeroed registers as =E2=80=9Cunspec_volatile=E2=80=9D= at RTL level is necessary to prevent them from deleting from moving = around).=20 >>=20 >>=20 >> So, based on the above, I propose the following approach that will = resolve the above 2 concerns: >>=20 >> 1. Add 2 new target hooks: >> A. targetm.pro_epilogue_use (reg) >> This hook should return a UNSPEC_VOLATILE rtx to mark a register in = use to >> prevent deleting register setting instructions in prologue and = epilogue. >>=20 >> B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs) >> This hook will gen a sequence of zeroing insns that zero the = registers that specified in NEED_ZEROED_HARDREGS. >>=20 >> A default handler of =E2=80=9Cgen_zero_call_used_regs=E2=80=9D = could be defined in middle end, which use mov insns to zero registers, = and then use =E2=80=9Ctargetm.pro_epilogue_use(reg)=E2=80=9D to mark = each zeroed registers.=20 >=20 > This sounds like you're going back to using: >=20 > (insn 18 16 19 2 (set (reg:SI 1 dx) > (const_int 0 [0])) "t10.c":11:1 -1 > (nil)) > (insn 19 18 20 2 (unspec_volatile [ > (reg:SI 1 dx) > ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 > (nil)) >=20 > This also doesn't prevent the zeroing from being moved around. Like = the > EPILOGUE_USES approach, it only prevents the clearing from being = removed > as dead. I still think that the EPILOGUE_USES approach is the better > way of doing that. The following is what I see from i386.md: (I didn=E2=80=99t look at how = =E2=80=9CUNSPEC_volatile=E2=80=9D is used in data flow analysis in GCC = yet) ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers = and ;; all of memory. This blocks insns from being moved across this point. I am not very familiar with how the unspec_volatile actually works in = gcc=E2=80=99s data flow analysis, my understanding from the above is, = the RTL insns marked with UNSPEC_volatile would be served as a barrier = that no other insns can move across this point. At the same time, since = the marked RTL insns is considered to use and clobber all hard registers = and memory, it cannot be deleted either.=20 So, I thought that =E2=80=9CUNSPEC_volatile=E2=80=9D should be stronger = than =E2=80=9CEPILOGUE_USES=E2=80=9D. And it can serve the purpose of = preventing zeroing insns from deleting and moving.=20 >=20 > In other words: the use insns themselves are volatile and so can't be > moved relative to each other and to other volatile insns. But the = uses > are fake instructions that don't do anything. The preceding zeroing > instructions are just normal instructions that can be moved around > freely before their respective uses. But since the UNSPEC_volatile insns is considered as a barrier, no other = insns can move across them, then the zero insns cannot be moved around = too, right? >=20 > I don't think there's a foolproof way of preventing an unknown target > machine_reorg pass from moving the instructions around. But since we > don't have unknown machine_reorgs (at least not in-tree), I think > instead we should be prepared to patch machine_reorgs where necessary > to ensure that they do the right thing. >=20 > If you want to increase the chances that machine_reorgs don't need to = be > patched, you could either: >=20 > (a) to make the zeroing instructions themselves volatile or > (b) to insert a volatile reference to the register before (rather than > after) the zeroing instruction >=20 > IMO (b) is the way to go, because it avoids the need to define special > volatile move patterns for each type of register. (b) would be needed > on top of (rather than instead of) the EPILOGUE_USES thing. >=20 Okay, will take approach b.=20 But I still don=E2=80=99t quite understand why we still need = =E2=80=9CEPILOUGE_USES=E2=80=9D? What=E2=80=99s the additional benefit = from EPILOGUE_USES? > I don't think we need a new target-specific unspec_volatile code to do = (b). > We can just use an automatically-generated volatile asm to clobber the > registers first. See e.g. how expand_asm_memory_blockage handles = memory > scheduling barriers. /* Generate asm volatile("" : : : "memory") as the memory blockage. */ static void expand_asm_memory_blockage (void) { rtx asm_op, clob; asm_op =3D gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0, rtvec_alloc (0), rtvec_alloc (0), rtvec_alloc (0), UNKNOWN_LOCATION); MEM_VOLATILE_P (asm_op) =3D 1; clob =3D gen_rtx_SCRATCH (VOIDmode); clob =3D gen_rtx_MEM (BLKmode, clob); clob =3D gen_rtx_CLOBBER (VOIDmode, clob); emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob))); } As the following?=20 /* Generate asm volatile("" : : : =E2=80=9Cregno") for REGNO. */ static void expand_asm_reg_volatile (machine_mode mode, unsigned int regno) { rtx asm_op, clob; asm_op =3D gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0, rtvec_alloc (0), rtvec_alloc (0), rtvec_alloc (0), UNKNOWN_LOCATION); MEM_VOLATILE_P (asm_op) =3D 1; clob =3D gen_rtx_REG (mode, regno); clob =3D gen_rtx_CLOBBER (VOIDmode, clob); emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob))); } Is the above correct?=20 thanks. Qing >=20 > Thanks, > Richard