From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 41224 invoked by alias); 16 Aug 2015 19:24:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 41209 invoked by uid 89); 16 Aug 2015 19:24:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.0 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: smtp.ispras.ru Received: from smtp.ispras.ru (HELO smtp.ispras.ru) (83.149.199.79) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 16 Aug 2015 19:24:32 +0000 Received: from [10.10.3.121] (unknown [83.149.199.91]) by smtp.ispras.ru (Postfix) with ESMTP id 732E520516; Sun, 16 Aug 2015 22:24:29 +0300 (MSK) Date: Sun, 16 Aug 2015 19:39:00 -0000 From: Alexander Monakov To: "H.J. Lu" cc: gcc-patches@gcc.gnu.org, Uros Bizjak , Sriraman Tallam Subject: Re: RFC: [PATCH] PR target/67215: -fno-plt needs improvements for x86 In-Reply-To: <20150816183618.GA10898@gmail.com> Message-ID: References: <20150816183618.GA10898@gmail.com> User-Agent: Alpine 2.20 (LNX 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2015-08/txt/msg00868.txt.bz2 On Sun, 16 Aug 2015, H.J. Lu wrote: > prepare_call_address in calls.c is the wrong place to handle -fno-plt. > We shoudn't force function address into register and hope that load > function address via GOT and indirect call via register will be folded > into indirect call via GOT, which doesn't always happen. When the address load initially exists separately from the indirect call, the load can be scheduled or be subject to loop invariant motion. What is your reason to have them fused right from the start? In PR 67215, when you asked for an -O1 testcase, the reporter responded with a case that demonstrates loop invariant motion on the call address. Why should that be avoided? (I think it shouldn't, at least not generally) > Allso non-PIC > case can only be handled in backend. Instead, backend should expand > external function call into indirect call via GOT for -fno-plt. > > This patch reverts -fno-plt in prepare_call_address and handles it in > ix86_expand_call. Other backends may need similar changes to support > -fno-plt. Alternately, we can introduce a target hook to indicate > whether an external function should be called via register for -fno-plt > so that i386 backend can disable it in prepare_call_address. > > Any comments? Initially my patch was x86-only, but then I opted for the generic calls.c change, expecting that it would work fine in a machine-independent manner (which didn't work out, as ARM and AArch64 experience demonstrated). My initial i386 backend patch was much smaller: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 3263656..cd5f246 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -25577,15 +25578,23 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, /* Static functions and indirect calls don't need the pic register. */ if (flag_pic && (!TARGET_64BIT + || !flag_plt || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) { - use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); - if (ix86_use_pseudo_pic_reg ()) - emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM), - pic_offset_table_rtx); + if (flag_plt) + { + use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); + if (ix86_use_pseudo_pic_reg ()) + emit_move_insn (gen_rtx_REG (Pmode, + REAL_PIC_OFFSET_TABLE_REGNUM), + pic_offset_table_rtx); + } + else + fnaddr = gen_rtx_MEM (QImode, + legitimize_pic_address (XEXP (fnaddr, 0), 0)); } } (it doesn't apply to current trunk and doesn't handle all cases your patch handles, but at least it shows how do achieve the goal for "unfused" codegen) Couple more comments on your patch below. > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 4a0986c..bf8a21d 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -25650,21 +25650,50 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, > /* Static functions and indirect calls don't need the pic register. Also, > check if PLT was explicitly avoided via no-plt or "noplt" attribute, making > it an indirect call. */ > + rtx addr = XEXP (fnaddr, 0); > if (flag_pic > - && (!TARGET_64BIT > - || (ix86_cmodel == CM_LARGE_PIC > - && DEFAULT_ABI != MS_ABI)) > - && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF > - && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) > - && flag_plt > - && (SYMBOL_REF_DECL ((XEXP (fnaddr, 0))) == NULL_TREE > - || !lookup_attribute ("noplt", > - DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP (fnaddr, 0)))))) > + && GET_CODE (addr) == SYMBOL_REF > + && !SYMBOL_REF_LOCAL_P (addr)) > { > - use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); > - if (ix86_use_pseudo_pic_reg ()) > - emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM), > - pic_offset_table_rtx); > + if (flag_plt > + && (SYMBOL_REF_DECL (addr) == NULL_TREE > + || !lookup_attribute ("noplt", > + DECL_ATTRIBUTES (SYMBOL_REF_DECL (addr))))) > + { Under what circumstances can SYMBOL_REF_DECL be NULL here? For libcalls? (I realize that your patch doesn't change the treatment; I just want to know) > + if (!TARGET_64BIT > + || (ix86_cmodel == CM_LARGE_PIC > + && DEFAULT_ABI != MS_ABI)) > + { > + use_reg (&use, gen_rtx_REG (Pmode, > + REAL_PIC_OFFSET_TABLE_REGNUM)); > + if (ix86_use_pseudo_pic_reg ()) > + emit_move_insn (gen_rtx_REG (Pmode, > + REAL_PIC_OFFSET_TABLE_REGNUM), > + pic_offset_table_rtx); > + } > + } > + else if (!TARGET_PECOFF && !TARGET_MACHO) > + { > + if (TARGET_64BIT) > + { > + fnaddr = gen_rtx_UNSPEC (Pmode, > + gen_rtvec (1, addr), > + UNSPEC_GOTPCREL); > + fnaddr = gen_rtx_CONST (Pmode, fnaddr); > + } > + else > + { > + fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), > + UNSPEC_GOT); > + fnaddr = gen_rtx_CONST (Pmode, fnaddr); > + fnaddr = gen_rtx_PLUS (Pmode, pic_offset_table_rtx, > + fnaddr); > + } > + fnaddr = gen_const_mem (Pmode, fnaddr); > + if (GET_MODE (fnaddr) != word_mode) > + fnaddr = gen_rtx_ZERO_EXTEND (word_mode, fnaddr); > + fnaddr = gen_rtx_MEM (QImode, fnaddr); > + } > } > } > > @@ -25686,9 +25715,13 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, > && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF > && !local_symbolic_operand (XEXP (fnaddr, 0), VOIDmode)) > fnaddr = gen_rtx_MEM (QImode, construct_plt_address (XEXP (fnaddr, 0))); > - else if (sibcall > - ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode) > - : !call_insn_operand (XEXP (fnaddr, 0), word_mode)) > + else if (!(TARGET_X32 > + && MEM_P (fnaddr) > + && GET_CODE (XEXP (fnaddr, 0)) == ZERO_EXTEND > + && GOT_memory_operand (XEXP (XEXP (fnaddr, 0), 0), Pmode)) > + && (sibcall > + ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode) > + : !call_insn_operand (XEXP (fnaddr, 0), word_mode))) > { > fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1); > fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr)); Perhaps add a comment that GOT slots are 64-bit on x32? Thanks. Alexander