From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id CA44F3858D35 for ; Wed, 28 Jun 2023 08:39:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CA44F3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-51d8fa4dbf9so4370481a12.1 for ; Wed, 28 Jun 2023 01:39:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687941577; x=1690533577; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/AQgiP30MRRBUqj+xkmVpoZq6ezeYadfeIUhugHIWh4=; b=FII1WKVJcsPMNeS9zWXtTpEDCKvjyjPITgRbzzvqAZr9c7Zht7A/z6ECu9T8bgD3TS STWE09w8ZhGyyirYXcm+7182cR0H0EjSMFQRfFw4M6uraHqtNhd7GtRJb/mWNsa2scqn 5+X4H+2a7En2vtB9NTygEp1mlJShAsV6t9ZFSnft3AIoaVJVHUX5iCsGA/yTdXNwlIME i8thRBisU+Rccpfbo3+1o6Dgm1cdk0BBfqT/dzHeEG3H5NWz7aRpbIFqV6tOuH2JHiTi iD64mbITUv2AuhsGwYoq057kv2y5oRxxhHcCBShcuM6sPEVlTkqt9zd7I8SipLZsPV3Y 1YxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687941577; x=1690533577; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/AQgiP30MRRBUqj+xkmVpoZq6ezeYadfeIUhugHIWh4=; b=Do8pzDLOoOXoGUBPDPmYpyJA9R2B1an+UCjp+yQnhhk9r57pHeyoxX3jto3J5G+veB dLQTgYUDnrdlE44RDMrEs5irQzDdQhNUW3l5XBdtmlM531KnI5EkfV89H522m4KNC1rX mP2DrRltxCtFJ7z5iqp4gZXgxYNhQWgmXe2k3lVr4Xbrnob5iPaGct4JrY1JEL66eHgd JaHmxj8XES1mecsEG6NiRrm0tOjrydy+yvhh7Drd1Tr4RPvY7z5PW3UBajjcmNnL/Dy1 dud8g15iEsp6FpEodkZlhZF21THcvxWwsWBTKkls0k11TjdJTIc+gq5ddi7QPCOxzFwr SI2g== X-Gm-Message-State: AC+VfDwff7ZEEL82RsC7+odUVkQVhSedSMwaCT6cz40VMWmCNJWFTzCR SHwAamJjGZFNprPwYYb7tLjumx0vRVh/9bTZhcE= X-Google-Smtp-Source: ACHHUZ4AH+BCjvl7ZZbLqHn9yUNeiNja6kRAQJOiF64JUb1DUIP+nxmVgRwcFSr99/4cR4mnM6Dz+RH2MqWUT/aN2eM= X-Received: by 2002:a05:6402:328:b0:51d:b1e8:aaf0 with SMTP id q8-20020a056402032800b0051db1e8aaf0mr1930681edw.33.1687941577252; Wed, 28 Jun 2023 01:39:37 -0700 (PDT) MIME-Version: 1.0 References: <20230626023408.33758-1-hongyu.wang@intel.com> In-Reply-To: From: Uros Bizjak Date: Wed, 28 Jun 2023 10:39:25 +0200 Message-ID: Subject: Re: [PATCH] i386: Relax inline requirement for functions with different target attrs To: Hongyu Wang Cc: Hongyu Wang , gcc-patches@gcc.gnu.org, hongtao.liu@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jun 28, 2023 at 10:20=E2=80=AFAM Hongyu Wang wrote: > > > If the user specified a different arch for callee than the caller, > > then the compiler will switch on different ISAs (-march is just a > > shortcut for different ISA packs), and the programmer is aware that > > inlining isn't intended here (we have -mtune, which is not as strong > > as -march, but even functions with different -mtune are not inlined > > without always_inline attribute). This is documented as: > > The original issue comes from a case like > > float callee (float a, float b, float c, float d, > float e, float f, float g, float h) > { > return a * b + c * d + e * f + g + h + a * c + b * c > + a * d + b * e + a * f + c * h + > b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; > } > > __attribute__((target_clones("default","arch=3Dicelake-server"))) > void caller (int n, float *a, > float c1, float c2, float c3, > float c4, float c5, float c6, > float c7) > { > for (int i =3D 0; i < n; i++) > { > a[i] =3D callee (a[i], c1, c2, c3, c4, c5, c6, c7); > } > } > > For current gcc, the .icelake_server clone fails to inline callee due > to target specific option mismatch, while the .default clone > succeeded and the loop get vectorized. I think it is not reasonable > that the specific clone with higher arch cannot produce better code. > So I think at least we can decide to inline those callee without any > arch/tune specified, but for now they are rejected by the strict arch=3D > and tune=3D check. Yes, I think it is reasonable to inline callee without an arch/tune specified. We expect "default" callee to have properties that allow inlining it into all callers, independent of callers arch/tune target attribute. Uros. > > Uros Bizjak =E4=BA=8E2023=E5=B9=B46=E6=9C=8828=E6=97= =A5=E5=91=A8=E4=B8=89 14:43=E5=86=99=E9=81=93=EF=BC=9A > > > > On Wed, Jun 28, 2023 at 3:56=E2=80=AFAM Hongyu Wang wrote: > > > > > > > I don't think this is desirable. If we inline something with differ= ent > > > > ISAs, we get some strange mix of ISAs when the function is inlined. > > > > OTOH - we already inline with mismatched tune flags if the function= is > > > > marked with always_inline. > > > > > > Previously ix86_can_inline_p has > > > > > > if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > > > !=3D callee_opts->x_ix86_isa_flags) > > > || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_fla= gs2) > > > !=3D callee_opts->x_ix86_isa_flags2)) > > > ret =3D false; > > > > > > It make sure caller ISA is a super set of callee, and the inlined one > > > should follow caller's ISA specification. > > > > > > IMHO I cannot give a real example that after inline the caller's > > > performance get harmed, I added PVW since there might > > > be some callee want to limit its vector size and caller may have > > > larger preferred vector size. At least with current change > > > we get more optimization opportunity for different target_clones. > > > > > > But I agree the tuning setting may be a factor that affect the > > > performance. One possible choice is that if the > > > tune for callee is unspecified or default, just inline it to the > > > caller with specified arch and tune. > > > > If the user specified a different arch for callee than the caller, > > then the compiler will switch on different ISAs (-march is just a > > shortcut for different ISA packs), and the programmer is aware that > > inlining isn't intended here (we have -mtune, which is not as strong > > as -march, but even functions with different -mtune are not inlined > > without always_inline attribute). This is documented as: > > > > --q-- > > On the x86, the inliner does not inline a function that has different > > target options than the caller, unless the callee has a subset of the > > target options of the caller. For example a function declared with > > target("sse3") can inline a function with target("sse2"), since -msse3 > > implies -msse2. > > --/q-- > > > > I don't think arch=3Dskylake can be considered as a subset of arch=3Dic= elake-server. > > > > I agree that the compiler should reject functions with different PVW. > > This is also in accordance with the documentation. > > > > Uros. > > > > > > > > Uros Bizjak via Gcc-patches =E4=BA=8E2023= =E5=B9=B46=E6=9C=8827=E6=97=A5=E5=91=A8=E4=BA=8C 17:16=E5=86=99=E9=81=93=EF= =BC=9A > > > > > > > > > > > > > > > > > On Mon, Jun 26, 2023 at 4:36=E2=80=AFAM Hongyu Wang wrote: > > > > > > > > > > Hi, > > > > > > > > > > For function with different target attributes, current logic reje= cts to > > > > > inline the callee when any arch or tune is mismatched. Relax the > > > > > condition to honor just prefer_vecotr_width_type and other flags = that > > > > > may cause safety issue so caller can get more optimization opport= unity. > > > > > > > > I don't think this is desirable. If we inline something with differ= ent > > > > ISAs, we get some strange mix of ISAs when the function is inlined. > > > > OTOH - we already inline with mismatched tune flags if the function= is > > > > marked with always_inline. > > > > > > > > Uros. > > > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} > > > > > > > > > > Ok for trunk? > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * config/i386/i386.cc (ix86_can_inline_p): Do not check a= rch or > > > > > tune directly, just check prefer_vector_width_type and ma= ke sure > > > > > not to inline if they mismatch. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * gcc.target/i386/inline-target-attr.c: New test. > > > > > --- > > > > > gcc/config/i386/i386.cc | 11 +++++---- > > > > > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++= ++++++ > > > > > 2 files changed, 30 insertions(+), 5 deletions(-) > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-a= ttr.c > > > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > > index 0761965344b..1d86384ac06 100644 > > > > > --- a/gcc/config/i386/i386.cc > > > > > +++ b/gcc/config/i386/i386.cc > > > > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee= ) > > > > > !=3D (callee_opts->x_target_flags & ~always_inline= _safe_mask)) > > > > > ret =3D false; > > > > > > > > > > - /* See if arch, tune, etc. are the same. */ > > > > > - else if (caller_opts->arch !=3D callee_opts->arch) > > > > > - ret =3D false; > > > > > - > > > > > - else if (!always_inline && caller_opts->tune !=3D callee_opts-= >tune) > > > > > + /* Do not inline when specified perfer-vector-width mismatched= between > > > > > + callee and caller. */ > > > > > + else if ((callee_opts->x_prefer_vector_width_type !=3D PVW_NON= E > > > > > + && caller_opts->x_prefer_vector_width_type !=3D PVW_NO= NE) > > > > > + && callee_opts->x_prefer_vector_width_type > > > > > + !=3D caller_opts->x_prefer_vector_width_type) > > > > > ret =3D false; > > > > > > > > > > else if (caller_opts->x_ix86_fpmath !=3D callee_opts->x_ix86_f= pmath > > > > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b= /gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > > new file mode 100644 > > > > > index 00000000000..995502165f0 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > > @@ -0,0 +1,24 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-options "-O2" } */ > > > > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ > > > > > + > > > > > +__attribute__((target("arch=3Dskylake"))) > > > > > +int callee (int n) > > > > > +{ > > > > > + int sum =3D 0; > > > > > + for (int i =3D 0; i < n; i++) > > > > > + { > > > > > + if (i % 2 =3D=3D 0) > > > > > + sum +=3Di; > > > > > + else > > > > > + sum +=3D (i - 1); > > > > > + } > > > > > + return sum + n; > > > > > +} > > > > > + > > > > > +__attribute__((target("arch=3Dicelake-server"))) > > > > > +int caller (int n) > > > > > +{ > > > > > + return callee (n) + n; > > > > > +} > > > > > + > > > > > -- > > > > > 2.31.1 > > > > >