From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x529.google.com (mail-ed1-x529.google.com [IPv6:2a00:1450:4864:20::529]) by sourceware.org (Postfix) with ESMTPS id D43073858C2C for ; Tue, 4 Jul 2023 06:19:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D43073858C2C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-51cb40f13f6so6520467a12.2 for ; Mon, 03 Jul 2023 23:19:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688451544; x=1691043544; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DxrUkAf+5J2N3gu9jEHmDcjHwPSk6EGpLnAKg4f/Sk4=; b=mvDwUrlDXW/8u2no8lRrIAwGrMrmlA+05diYAqYGEvebeuGgqm/ky3FfBaFJflnkx3 gNIOnZxfZUmcuR2G8F1vpGVQrUj5NLzbhACYUw3ThYm0ETgur9W9xD3aexK489mEwY3u IcDiqcto2EdACkmgOcmlLyyG5JB1vJGPUQlhBmItu3tdTfHU4Ux81GkrIUrcaxyd15rG 0zGzyPMdG0rjETFgsBUmNc/Xg98oDPuB55KRiIY7/qujQEFY6WABI/OsBuua+5YKhARF g7TkXg+oQ54HJUYKi7TbLnDAqahyisvldmmEFuEypRZODgG22El3O+I/Ar/n2cZRM4SB 9P4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688451544; x=1691043544; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DxrUkAf+5J2N3gu9jEHmDcjHwPSk6EGpLnAKg4f/Sk4=; b=GHwWrDvjP72DQO6TMdRNW4/tPtz1bXhexV7DDxyuog+ImxmqqvpHZXNqia9def5fWB fimCCgURdGmGn2v4eMcKjuuazowLc5UN7O9oRxV8f7dpI8IWCTIGfS2WYFjne93epwIG WlmpGA7I14NoYJSf6s2RB7sHKrk8ycsbQK/LOUaQ5T0BtPJ7gh7vN1V7tNl7GL1Oni7F G+9mzJjttliMYT5ougPXJw+h8UxsBRYcSZk7ptgDxJrA/+heCmW+jc5nR/thh+LAWBPR oRnyLRzix7af3+dCihhZ3CB6GMhh9PqDoYTHnL4JVWHvZjJQ2+p/Id3d+fszB5hP6Irn 42Nw== X-Gm-Message-State: ABy/qLanpcMV00Zd4u8GIVPUMube4fOZP/lov2vqv/XzXZppOAVreHiF X89v/NDlOFa8l57RYpVOInaPZs0QTLe2yYj+pfA= X-Google-Smtp-Source: APBJJlGQgcIqYuITPcKWTN+LN+2Rlvxdr3lkDV4wNMpN3xK5pgG8JE0x9a81UI+G6I8xTjEXGI3ZSEZm0+6hVb0FqSY= X-Received: by 2002:aa7:d8da:0:b0:51d:9232:2b5b with SMTP id k26-20020aa7d8da000000b0051d92322b5bmr10056843eds.4.1688451544253; Mon, 03 Jul 2023 23:19:04 -0700 (PDT) MIME-Version: 1.0 References: <20230704031244.1074834-1-hongyu.wang@intel.com> In-Reply-To: <20230704031244.1074834-1-hongyu.wang@intel.com> From: Uros Bizjak Date: Tue, 4 Jul 2023 08:18:52 +0200 Message-ID: Subject: Re: [PATCH V2] i386: Inline function with default arch/tune to caller To: Hongyu Wang Cc: gcc-patches@gcc.gnu.org, hongtao.liu@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Jul 4, 2023 at 5:12=E2=80=AFAM Hongyu Wang = wrote: > > Hi, > > For function with different target attributes, current logic rejects to > inline the callee when any arch or tune is mismatched. Relax the > condition to allow callee with default arch/tune to be inlined. > > Boostrapped/regtested on x86-64-linux-gnu{-m32,}. > > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_can_inline_p): If callee has > default arch=3Dx86-64 and tune=3Dgeneric, do not block the > inlining to its caller. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/inline_target_clones.c: New test. OK. In a follow-up patch, can you please document inlining rules involving -march and -mtune to "x86 Function Attributes" section? Currently, the inlining rules at the end of "target function attribute" section does not even mention -march and -mtune. Maybe a subsubsection "Inlining rules" should be added (like AArch64 has) to mention that only default arch and tune are inlined by default (but inline can be forced with always_inline for different mtune flags). Looking at the above, perhaps inlining of different arches can also be forced with always_inline? This would allow developers some control of inlining, and would not be surprising. Thanks, Uros. > --- > gcc/config/i386/i386.cc | 22 +++++++++++------ > .../gcc.target/i386/inline_target_clones.c | 24 +++++++++++++++++++ > 2 files changed, 39 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 8989985700a..4741c9b5364 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee) > !=3D (callee_opts->x_target_flags & ~always_inline_safe_ma= sk)) > ret =3D false; > > - /* See if arch, tune, etc. are the same. */ > - else if (caller_opts->arch !=3D callee_opts->arch) > - ret =3D false; > - > - else if (!always_inline && caller_opts->tune !=3D callee_opts->tune) > - ret =3D false; > - > else if (caller_opts->x_ix86_fpmath !=3D callee_opts->x_ix86_fpmath > /* If the calle doesn't use FP expressions differences in > ix86_fpmath can be ignored. We are called from FEs > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee) > || ipa_fn_summaries->get (callee_node)->fp_expressions)) > ret =3D false; > > + /* At this point we cannot identify whether arch or tune setting > + comes from target attribute or not. So the most conservative way > + is to allow the callee that uses default arch and tune string to > + be inlined. */ > + else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64") > + && !strcmp (callee_opts->x_ix86_tune_string, "generic")) > + ret =3D true; > + > + /* See if arch, tune, etc. are the same. */ > + else if (caller_opts->arch !=3D callee_opts->arch) > + ret =3D false; > + > + else if (!always_inline && caller_opts->tune !=3D callee_opts->tune) > + ret =3D false; > + > else if (!always_inline > && caller_opts->branch_cost !=3D callee_opts->branch_cost) > ret =3D false; > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/t= estsuite/gcc.target/i386/inline_target_clones.c > new file mode 100644 > index 00000000000..53db1600ce5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > @@ -0,0 +1,24 @@ > +/* { dg-do compile } */ > +/* { dg-require-ifunc "" } */ > +/* { dg-options "-O3 -march=3Dx86-64" } */ > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */ > + > +float callee (float a, float b, float c, float d, > + float e, float f, float g, float h) > +{ > + return a * b + c * d + e * f + g + h + a * c + b * c > + + a * d + b * e + a * f + c * h + > + b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; > +} > + > +__attribute__((target_clones("default","arch=3Dicelake-server"))) > +void caller (int n, float *a, > + float c1, float c2, float c3, > + float c4, float c5, float c6, > + float c7) > +{ > + for (int i =3D 0; i < n; i++) > + { > + a[i] =3D callee (a[i], c1, c2, c3, c4, c5, c6, c7); > + } > +} > -- > 2.31.1 >