From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 07051385782B for ; Mon, 14 Sep 2020 19:20:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 07051385782B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=richard.sandiford@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 28967106F; Mon, 14 Sep 2020 12:20:33 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1C9773F718; Mon, 14 Sep 2020 12:20:32 -0700 (PDT) From: Richard Sandiford To: Qing Zhao Mail-Followup-To: Qing Zhao , Segher Boessenkool , Kees Cook , Kees Cook via Gcc-patches , Jakub Jelinek , Uros Bizjak , "Rodriguez Bahena\, Victor" , richard.sandiford@arm.com Cc: Segher Boessenkool , Kees Cook , Kees Cook via Gcc-patches , Jakub Jelinek , Uros Bizjak , "Rodriguez Bahena\, Victor" Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all] References: <5191C24D-D722-4ECC-A613-15000C81CDFA@ORACLE.COM> <202009031012.4A0D70F@keescook> <51176577-9E37-4BED-ACBC-07D7C0D6EE07@intel.com> <715CE173-31FC-4558-B59C-82AD87D58186@ORACLE.COM> <202009101158.B6A3E1AD17@keescook> <20200911161406.GC28786@gate.crashing.org> <54F98306-1840-40F2-8085-83767B6B5F8B@ORACLE.COM> <57ECA9F8-9C79-4631-9214-12EAFA6A176E@ORACLE.COM> <2CD60E9D-9C02-41AD-BFC5-FEFAEC91B627@ORACLE.COM> <2AF34264-D8FD-49F9-AB06-E1243DA9DB8A@ORACLE.COM> <59551C24-AAA9-4C2E-85EF-E204C35796BD@ORACLE.COM> Date: Mon, 14 Sep 2020 20:20:30 +0100 In-Reply-To: <59551C24-AAA9-4C2E-85EF-E204C35796BD@ORACLE.COM> (Qing Zhao's message of "Mon, 14 Sep 2020 13:50:05 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Sep 2020 19:20:35 -0000 Qing Zhao writes: >> On Sep 14, 2020, at 11:33 AM, Richard Sandiford wrote: >>=20 >> Qing Zhao writes: >>>> Like I mentioned earlier though, passes that run after >>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that >>>> weren't previously used. For example, on x86_64, the function might >>>> not use %r8 when the prologue, epilogue and returns are generated, >>>> but pass_regrename might later introduce a new use of %r8. AIUI, >>>> the =E2=80=9Cused=E2=80=9D version of the new command-line option is s= upposed to clear >>>> %r8 in these circumstances, but it wouldn't do so if the data was >>>> collected at the point that the return is generated. >>>=20 >>> Thanks for the information. >>>=20 >>>>=20 >>>> That's why I think it's more robust to do this later (at the beginning >>>> of pass_late_compilation) and insert the zeroing before returns that >>>> already exist. >>>=20 >>> Yes, looks like it=E2=80=99s not correct to insert the zeroing at the t= ime when prologue, epilogue and return are generated. >>> As I also checked, =E2=80=9Creturn=E2=80=9D might be also generated as = late as pass =E2=80=9Cpass_delay_slots=E2=80=9D, So, shall we move the >>> New pass as late as possible? >>=20 >> If we insert the zeroing before pass_delay_slots and describe the >> result correctly, pass_delay_slots should do the right thing. >>=20 >> Describing the result correctly includes ensuring that the cleared >> registers are treated as live on exit from the function, so that the >> zeroing doesn't get deleted again, or skipped by pass_delay_slots. > > In the current implementation for x86, when we generating a zeroing insn = as the following: > > (insn 18 16 19 2 (set (reg:SI 1 dx) > (const_int 0 [0])) "t10.c":11:1 -1 > (nil)) > (insn 19 18 20 2 (unspec_volatile [ > (reg:SI 1 dx) > ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 > (nil)) > > i.e, after each zeroing insn, the register that is zeroed is marked as = =E2=80=9CUNSPECV_PRO_EPILOGUE_USE=E2=80=9D,=20 > By doing this, we can avoid this zeroing insn from being deleted or skipp= ed.=20 > > Is doing this enough to describe the result correctly? > Is there other thing we need to do in addition to this? I guess that works, but I think it would be better to abstract EPILOGUE_USES into a new target-independent wrapper function that (a) returns true if EPILOGUE_USES itself returns true and (b) returns true for registers that need to be zero on return, if the zeroing instructions have already been inserted. The places that currently test EPILOGUE_USES should then test this new wrapper function instead. After inserting the zeroing instructions, the pass should recompute the live-out sets based on this. >>>> But the dataflow information has to be correct between >>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise >>>> any pass in that region could clobber the registers in the same way. >>>=20 >>> You mean, the data flow information will be not correct after pass_free= _cfg?=20 >>> =E2=80=9Cpass_delay_slots=E2=80=9D is after =E2=80=9Cpass_free_cfg=E2= =80=9D, and there might be new =E2=80=9Creturn=E2=80=9D generated in =E2= =80=9Cpass_delay_slots=E2=80=9D,=20 >>> If we want to generate zeroing for the new =E2=80=9Creturn=E2=80=9D whi= ch was generated in =E2=80=9Cpass_delay_slots=E2=80=9D, can we correctly to= do so? >>=20 >> =E2=80=A6the zeroing has to be done before pass_free_cfg, because the in= formation >> isn't reliable after that point. I think it would make sense to do it >> before pass_compute_alignments, because inserting the zeros will affect >> alignment. > > Okay.=20 > > Then there is another problem: what about the new =E2=80=9Creturn=E2=80= =9Ds that are generated at pass_delay_slots? > > Should we generate the zeroing for these new returns? Since the data flow= information might not be correct at this pass, > It looks like that there is no correct way to add the zeroing insn for th= ese new =E2=80=9Creturn=E2=80=9D, then, what should we do about this? pass_delay_slots isn't a problem. It doesn't change *what* happens on each return path, it just changes how the instructions to achieve it are arranged. So e.g. if every path through the function clears register R before pass_delay_slots, and if that clearing is represented as being necessary, then every path through the function will clear register R after the pass as well. >> For extra safety, you could/should also check targetm.hard_regno_scratch= _ok >> to see whether there's a target-specific reason why a register can't >> be clobbered. > > /* Return true if is OK to use a hard register REGNO as scratch register > in peephole2. */ > DEFHOOK > (hard_regno_scratch_ok, > > > Is this checking only valid for pass_peephole2? No, that comment looks out of date. The hook is already used in postreload, for example. Thanks, Richard