From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 459A93858029 for ; Fri, 2 Feb 2024 16:10:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 459A93858029 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 459A93858029 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706890221; cv=none; b=VBK1t1tEJ5eSahsBka6j2uA+8prND1BhlJSw4QcMdSSXfIAujXpEwK/mhukC6jqogCmTATxFceEhiG0yXZ9+ZQSprpttDPwZi+x4IV50USlTCdWmUmhURs5qeuAuUxIimb7/cqGgzWBFiT9R6gLm8qgiB/aT9mGuoZSZQRAF5Oo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706890221; c=relaxed/simple; bh=mU4Q+e2FfhunRFX1mwdNzCuKwAxsFzSH/6dY+zyeKfQ=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=KQGbwsTPuNYBSRjW75ZU0gHhO3M8ln5RvVVHBAW0puBwB4mvcURTT4sPJMcoKW1/rAHy1SsrBiOVeI45ubjufiYtSYktkvcSZRhWc6JEewh9EV83JzqMJCDV0CcMopF+9l0bf2RQS4G/e2iveYGlUYhgb/Emin8/j9OUocjEsXk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706890216; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=vKW2J1kPOU9qIAfJJKP5++Ku2pTsbeAItdmQ3Tq6aS0=; b=VCLNIaa3rxeHj9HwiWg5OlWGla2Af7Wndk6oibmM4LpRxefoQikT6Fw5regjyBfQ19LOA9 a2jKxdGcHbbCGdeo0QcFu5AhaRpMFaa21WX9Y1Yvd8rfgVtXvAK1M2eSormjYAcowev1Q6 e9NwO6PV7PYcO+HWcS31DjYhkjejH98= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-177-i4JxTylUPka9zFRyQPn5MA-1; Fri, 02 Feb 2024 11:10:13 -0500 X-MC-Unique: i4JxTylUPka9zFRyQPn5MA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7D3A129AC00E; Fri, 2 Feb 2024 16:10:13 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.70]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 32D922166B31; Fri, 2 Feb 2024 16:10:13 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 412GA6tP2738317 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 2 Feb 2024 17:10:06 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 412GA60U2738316; Fri, 2 Feb 2024 17:10:06 +0100 Date: Fri, 2 Feb 2024 17:10:05 +0100 From: Jakub Jelinek To: "H.J. Lu" , Uros Bizjak Cc: gcc-patches@gcc.gnu.org, rep.dot.nop@gmail.com Subject: Re: [PATCH v4] x86-64: Find a scratch register for large model profiling Message-ID: Reply-To: Jakub Jelinek References: <20240202154200.306107-1-hjl.tools@gmail.com> MIME-Version: 1.0 In-Reply-To: <20240202154200.306107-1-hjl.tools@gmail.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Feb 02, 2024 at 07:42:00AM -0800, H.J. Lu wrote: > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22749,6 +22749,39 @@ current_fentry_section (const char **name) > return true; > } > > +/* Return an caller-saved register, which isn't live, at entry for > + profile. */ > + > +static int > +x86_64_select_profile_regnum (bool r11_ok ATTRIBUTE_UNUSED) > +{ > + /* Use %r10 if it isn't used by DRAP. */ > + if (!crtl->drap_reg || REGNO (crtl->drap_reg) != R10_REG) I'd really like to see flag_fentry != 0 || here, if the profiler is emitted before the prologue (so before initializing the drap register), %r10 is a fine choice. > + return R10_REG; > + > + bitmap reg_live = df_get_live_out (ENTRY_BLOCK_PTR_FOR_FN (cfun)); I meant at pro_and_epilogue time, but perhaps doing it here can discover arguments of the function which are used somewhere in the body too. > + int i; > + for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) > + if (GENERAL_REGNO_P (i) > + && i != R10_REG > +#ifdef NO_PROFILE_COUNTERS > + && (r11_ok || i != R11_REG) > +#else > + && i != R11_REG > +#endif > + && (!REX2_INT_REGNO_P (i) || TARGET_APX_EGPR) > + && !fixed_regs[i] > + && call_used_regs[i] I wonder if this shouldn't be && (call_used_regs[i] || X) where X would cover registers known to be saved in the prologue which aren't live from the prologue to the body (stuff like hard frame pointer if used). Because if the prologue say saves %r12 or %rbx to stack but doesn't yet set it to something, why couldn't the profiler use it? I'd expect cfun->machine contains something what has been saved there. > + && !REGNO_REG_SET_P (reg_live, i)) > + return i; > + > + sorry ("no register available for profiling %<-mcmodel=large%s%>", > + ix86_cmodel == CM_LARGE_PIC ? " -fPIC" : ""); > + > + return INVALID_REGNUM; > +} > + > /* Output assembler code to FILE to increment profiler label # LABELNO > for profiling a function entry. */ > void > @@ -22783,42 +22816,60 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) > fprintf (file, "\tleaq\t%sP%d(%%rip), %%r11\n", LPREFIX, labelno); > #endif > > + int scratch; > + const char *reg_prefix; > + const char *reg; > + > if (!TARGET_PECOFF) > { > switch (ix86_cmodel) > { > case CM_LARGE: > - /* NB: R10 is caller-saved. Although it can be used as a > - static chain register, it is preserved when calling > - mcount for nested functions. */ > + scratch = x86_64_select_profile_regnum (true); > + reg = hi_reg_name[scratch]; > + reg_prefix = LEGACY_INT_REGNO_P (scratch) ? "r" : ""; > if (ASSEMBLER_DIALECT == ASM_INTEL) > - fprintf (file, "1:\tmovabs\tr10, OFFSET FLAT:%s\n" > - "\tcall\tr10\n", mcount_name); > + fprintf (file, > + "1:\tmovabs\t%s%s, OFFSET FLAT:%s\n" > + "\tcall\t%s%s\n", > + reg_prefix, reg, mcount_name, reg_prefix, reg); > else > - fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n", > - mcount_name); > + fprintf (file, > + "1:\tmovabsq\t$%s, %%%s%s\n\tcall\t*%%%s%s\n", > + mcount_name, reg_prefix, reg, reg_prefix, reg); > break; > case CM_LARGE_PIC: > #ifdef NO_PROFILE_COUNTERS > + scratch = x86_64_select_profile_regnum (false); > + reg = hi_reg_name[scratch]; > + reg_prefix = LEGACY_INT_REGNO_P (scratch) ? "r" : ""; > if (ASSEMBLER_DIALECT == ASM_INTEL) > { > fprintf (file, "1:movabs\tr11, " > "OFFSET FLAT:_GLOBAL_OFFSET_TABLE_-1b\n"); > - fprintf (file, "\tlea\tr10, 1b[rip]\n"); > - fprintf (file, "\tadd\tr10, r11\n"); > + fprintf (file, "\tlea\t%s%s, 1b[rip]\n", > + reg_prefix, reg); > + fprintf (file, "\tadd\t%s%s, r11\n", > + reg_prefix, reg); > fprintf (file, "\tmovabs\tr11, OFFSET FLAT:%s@PLTOFF\n", > mcount_name); > - fprintf (file, "\tadd\tr10, r11\n"); > - fprintf (file, "\tcall\tr10\n"); > + fprintf (file, "\tadd\t%s%s, r11\n", > + reg_prefix, reg); > + fprintf (file, "\tcall\t%s%s\n", > + reg_prefix, reg); > break; > } > fprintf (file, > "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n"); > - fprintf (file, "\tleaq\t1b(%%rip), %%r10\n"); > - fprintf (file, "\taddq\t%%r11, %%r10\n"); > + fprintf (file, "\tleaq\t1b(%%rip), %%%s%s\n", > + reg_prefix, reg); > + fprintf (file, "\taddq\t%%r11, %%%s%s\n", > + reg_prefix, reg); > fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n", mcount_name); > - fprintf (file, "\taddq\t%%r11, %%r10\n"); > - fprintf (file, "\tcall\t*%%r10\n"); > + fprintf (file, "\taddq\t%%r11, %%%s%s\n", > + reg_prefix, reg); > + fprintf (file, "\tcall\t*%%%s%s\n", > + reg_prefix, reg); > #else > sorry ("profiling %<-mcmodel=large%> with PIC is not supported"); > #endif > diff --git a/gcc/testsuite/gcc.target/i386/pr113689-1.c b/gcc/testsuite/gcc.target/i386/pr113689-1.c > new file mode 100644 > index 00000000000..c32445e0fc4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr113689-1.c > @@ -0,0 +1,49 @@ > +/* { dg-do run { target { lp64 && fpic } } } */ > +/* { dg-options "-O2 -fno-pic -fprofile -mcmodel=large" } */ > + > +#include > + > +__attribute__((noipa,noclone,noinline)) Just noipa instead of noipa,noclone,noinline, the former implies noclone,noinline too. In all tests. And, eventually you want Uros to review this too. Jakub