From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by sourceware.org (Postfix) with ESMTPS id 2D4F13858D32 for ; Wed, 28 Jun 2023 01:56:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2D4F13858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-5149aafef44so5739268a12.0 for ; Tue, 27 Jun 2023 18:56:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687917415; x=1690509415; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Uo2QGV4D/Cn3kpTWQI6f4twFlH4ZzIW8nCQYKsboork=; b=UD/1wpQN2PvjoikpPTuUFukytJFlH7Cpj9PcOb4NTMvmaWnJzAPeptFollEHgI3TV6 EPzgWyY8OxrB8wX702k7dcr/LuRB2Gz8i6wsMRO+MSgx/0pxnnd5g7iXlY/Y3UD46s9H N/xPW5nLoUstn15OvlyX55SdEnrJ/2FZzqkXnhLkjwg1csxP8Z55lg+ALgbicxNMdRSt VP8TSiIqcp6v2P0T77CwCn1Tt2vh9HONuClFRZ4nd2TlDYXlmSwP2n9cbYI3LOtyZ8Ay M4cUruHBUfYJ0uirIkcSYZlRcDMUSNGqpehHGbGsYV6n0ZQtFIslpUnP2Dq2LXF9F1e+ xLGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687917415; x=1690509415; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Uo2QGV4D/Cn3kpTWQI6f4twFlH4ZzIW8nCQYKsboork=; b=H8WUqDlfPnhOwLbN172yBppZlusNBKP2eNU6RtcBk5GRPKRblKT9cDXdQFylmIhhFH a4p2+ZehO4R3zk2ZUoZ5nIse/Ec8YvqW06mIRsDHmW4Sz6N0QGqE6XqPsZR1vWmdZb6s vVCnBFAj0PPXo3+YZivzMZJQHGQ2vEXezVWNCc9ez5YECdqSs8xitTF7aBD534WmYF4y pbGL0yxy9ryAhGoX7yPtHfPyk0gPysQnSiF9zg2BEKKUD93dm3s2qzzFLqE6FT2/Nz36 W3X6EaCv0q9YyTge/hofUfyaxTJaqmjdV6WsWRyidB774cdraBj3GTlLvmys+rRPSnw1 f4ug== X-Gm-Message-State: AC+VfDxVKaw+6Biczsxg2AMs4Bhayd/sVxmOsxGye+MfLQdC0S1ZXtw3 7LZUxZNyE3OQoDYGuAFT4oyr5KzlyMcm6azWg/+pqLKXU5iyB0ju X-Google-Smtp-Source: ACHHUZ7UYUHnN+WJR0b3elSdqkYbPFxeWfRBKBIk/rGF9am2bAdDKPcQnfzRFx7uCMu9VR2YPsIIA9R7w5VrZSlpwLY= X-Received: by 2002:a50:ec9a:0:b0:51d:9f71:23e2 with SMTP id e26-20020a50ec9a000000b0051d9f7123e2mr3947173edr.21.1687917415027; Tue, 27 Jun 2023 18:56:55 -0700 (PDT) MIME-Version: 1.0 References: <20230626023408.33758-1-hongyu.wang@intel.com> In-Reply-To: From: Hongyu Wang Date: Wed, 28 Jun 2023 09:49:38 +0800 Message-ID: Subject: Re: [PATCH] i386: Relax inline requirement for functions with different target attrs To: Uros Bizjak Cc: Hongyu Wang , gcc-patches@gcc.gnu.org, hongtao.liu@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > I don't think this is desirable. If we inline something with different > ISAs, we get some strange mix of ISAs when the function is inlined. > OTOH - we already inline with mismatched tune flags if the function is > marked with always_inline. Previously ix86_can_inline_p has if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) !=3D callee_opts->x_ix86_isa_flags) || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) !=3D callee_opts->x_ix86_isa_flags2)) ret =3D false; It make sure caller ISA is a super set of callee, and the inlined one should follow caller's ISA specification. IMHO I cannot give a real example that after inline the caller's performance get harmed, I added PVW since there might be some callee want to limit its vector size and caller may have larger preferred vector size. At least with current change we get more optimization opportunity for different target_clones. But I agree the tuning setting may be a factor that affect the performance. One possible choice is that if the tune for callee is unspecified or default, just inline it to the caller with specified arch and tune. Uros Bizjak via Gcc-patches =E4=BA=8E2023=E5=B9= =B46=E6=9C=8827=E6=97=A5=E5=91=A8=E4=BA=8C 17:16=E5=86=99=E9=81=93=EF=BC=9A > > On Mon, Jun 26, 2023 at 4:36=E2=80=AFAM Hongyu Wang wrote: > > > > Hi, > > > > For function with different target attributes, current logic rejects to > > inline the callee when any arch or tune is mismatched. Relax the > > condition to honor just prefer_vecotr_width_type and other flags that > > may cause safety issue so caller can get more optimization opportunity. > > I don't think this is desirable. If we inline something with different > ISAs, we get some strange mix of ISAs when the function is inlined. > OTOH - we already inline with mismatched tune flags if the function is > marked with always_inline. > > Uros. > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} > > > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or > > tune directly, just check prefer_vector_width_type and make sur= e > > not to inline if they mismatch. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/inline-target-attr.c: New test. > > --- > > gcc/config/i386/i386.cc | 11 +++++---- > > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++++++++ > > 2 files changed, 30 insertions(+), 5 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index 0761965344b..1d86384ac06 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee) > > !=3D (callee_opts->x_target_flags & ~always_inline_safe_= mask)) > > ret =3D false; > > > > - /* See if arch, tune, etc. are the same. */ > > - else if (caller_opts->arch !=3D callee_opts->arch) > > - ret =3D false; > > - > > - else if (!always_inline && caller_opts->tune !=3D callee_opts->tune) > > + /* Do not inline when specified perfer-vector-width mismatched betwe= en > > + callee and caller. */ > > + else if ((callee_opts->x_prefer_vector_width_type !=3D PVW_NONE > > + && caller_opts->x_prefer_vector_width_type !=3D PVW_NONE) > > + && callee_opts->x_prefer_vector_width_type > > + !=3D caller_opts->x_prefer_vector_width_type) > > ret =3D false; > > > > else if (caller_opts->x_ix86_fpmath !=3D callee_opts->x_ix86_fpmath > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/t= estsuite/gcc.target/i386/inline-target-attr.c > > new file mode 100644 > > index 00000000000..995502165f0 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > @@ -0,0 +1,24 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ > > + > > +__attribute__((target("arch=3Dskylake"))) > > +int callee (int n) > > +{ > > + int sum =3D 0; > > + for (int i =3D 0; i < n; i++) > > + { > > + if (i % 2 =3D=3D 0) > > + sum +=3Di; > > + else > > + sum +=3D (i - 1); > > + } > > + return sum + n; > > +} > > + > > +__attribute__((target("arch=3Dicelake-server"))) > > +int caller (int n) > > +{ > > + return callee (n) + n; > > +} > > + > > -- > > 2.31.1 > >