From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-488373-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 72543 invoked by alias); 26 Oct 2018 08:42:18 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 36550 invoked by uid 89); 26 Oct 2018 08:40:12 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Received:sk:h2-v6mr, CALL_EXPR_ARG, call_expr_arg
X-HELO: mail-lf1-f66.google.com
Received: from mail-lf1-f66.google.com (HELO mail-lf1-f66.google.com) (209.85.167.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 26 Oct 2018 08:40:10 +0000
Received: by mail-lf1-f66.google.com with SMTP id c16so297032lfj.8        for <gcc-patches@gcc.gnu.org>; Fri, 26 Oct 2018 01:40:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=ULdcfAS58m4aRwK4zFPmaMH21gtGZqWHICLWQx3QDQQ=;        b=RI2zNEi8icSbDomnm5jXfcXSKhfIeRSq8Yb3amTQnVUdrr7S8uJzMC9FPx30SQS3iI         1lEPpgB/jtf3KectoRkOwNluJF2zN3tYtenlcR3d7CngCFtPOqB8z0hhvb2dWqwdGIXv         zaPY5dYyogyw+ctS83yh7IC/Yy7nk5p1/vH/r+NZpPjmb+c5X9QtK12bf8HlQlA8ql0A         pBk5A6PqqBg9nmFgLVSlR/FtyOKGfUa4PGVrn+J6aNHxJFgWrFooihBWF4HLRTdBt1Oq         clmmWY73h+EKadQVkDOWmlRhy39+W+t3f83wJijGaNblEXgT69Fk9pvLe4xt9l4FZSl3         dMmA==
MIME-Version: 1.0
References: <CAELXzTP6SOzBp-sZBk2TX0P_20gqyCM+yFOTmZjmkxrHDCZAtA@mail.gmail.com> <c57076c6-5e37-3522-3168-0b61028c2a35@redhat.com>
In-Reply-To: <c57076c6-5e37-3522-3168-0b61028c2a35@redhat.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Fri, 26 Oct 2018 09:18:00 -0000
Message-ID: <CAFiYyc1iic-871sYUOk_sOB1YC7y7YqW=UFwSYcJOa6X4Lkf+A@mail.gmail.com>
Subject: Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
To: Jeff Law <law@redhat.com>, Richard Sandiford <rdsandiford@googlemail.com>
Cc: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>, 	GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2018-10/txt/msg01677.txt.bz2

On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
>
> On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > Hi,
> >
> > PR87528 showed a case where libgcc generated popcount is causing
> > regression for Skylake.
> > We also have PR86677 where kernel build is failing because the kernel
> > does not use the libgcc (when backend is not defining popcount
> > pattern).  While I agree that the kernel should implement its own
> > functionality when it is not using the libgcc, I am afraid that the
> > implementation can have the same performance issues reported for
> > Skylake in PR87528.
> >
> > Therefore, I would like to propose that we disable popcount detection
> > when we don't have a pattern for that. The attached patch (based on
> > previous discussions) does this.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > regressions. We need to disable the popcount* testcases. I will have
> > to define a effective_target_with_popcount in
> > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > Thanks,
> > Kugan
> >
> >
> > gcc/ChangeLog:
> >
> > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> >
> >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> >     as expensive when backend does not define it.
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> >
> >     * gcc.target/aarch64/popcount4.c: New test.
> >
> FWIW, I've been disabling by checking direct_optab_handler elsewhere
> (number_of_iterations_popcount) in my tester.  It may in fact be an old
> patch from you.
>
> Richi argued that it's the kernel team's responsibility to provide a
> popcount since they don't link with libgcc.  And I'm generally in
> agreement with that position, though it does tend to generate some
> friction with the kernel developers.  We also run the real risk of GCC 9
> not being able to build the kernel which, IMHO, would be a disaster from
> a PR standpoint.
>
> I'd like to hear from others here.  I fully realize we're beyond the
> realm of what is strictly technically correct here from a review standpoint.

As said final value replacement to a library call is probably not wanted
for optimization purpose, so adjusting expression_expensive_p is OK with
me.  It might not fully solve the (non-)issue in case another optimization pass
chooses to materialize niter computation result.

Few comments on the patch:

+      tree fndecl = get_callee_fndecl (expr);
+
+      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+       {
+         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));

  combined_fn cfn = gimple_call_combined_fn (expr);
  switch (cfn)
    {
...

cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
offline but eventually he knows whether there is a better way to query

+           CASE_CFN_POPCOUNT:
+             /* Check if opcode for popcount is available.  */
+             if (optab_handler (popcount_optab,
+                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
(expr, 0))))
+                 == CODE_FOR_nothing)
+               return true;

note that we currently generate builtin calls rather than IFN calls
(when a direct
optab is supported).

Another comment on the patch is that you probably have to adjust existing
popcount testcases to add architecture specific flags enabling suport for
the instructions, otherwise you won't see loop replacement.

Also I think that the expression is only expensive (for final value
replacement!)
if you consider optimizing for speed.  When optimizing for size getting rid of
the loop is probably beneificial unconditionally.  That would leave the
possibility to switch said testcases to -Os.  It would require adding a
bool size_p flag to expression_expensive and passing down
optimize_loop_for_size_p ().

_NOTE_ that expression_expensive_p is also used by IVOPTs and there
replacing sth with an expression based on the niter analysis result doesn't
mean we get rid of the loop (but only of an IV), so maybe that reasoning
doesn't apply there.

Richard.

> Jeff
>