From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-505967-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 48203 invoked by alias); 1 Aug 2019 09:38:08 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 48195 invoked by uid 89); 1 Aug 2019 09:38:08 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mail-io1-f66.google.com
Received: from mail-io1-f66.google.com (HELO mail-io1-f66.google.com) (209.85.166.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 01 Aug 2019 09:38:07 +0000
Received: by mail-io1-f66.google.com with SMTP id f4so142896976ioh.6        for <gcc-patches@gcc.gnu.org>; Thu, 01 Aug 2019 02:38:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=rZ6EbJHpKhA/Fz2AxOWKY9JS7Jp2kqnd0ynljv77Zhk=;        b=CX/Ykz6JdkdIUFaPDR7qSCbVLrlO+jPvKC+bUvFdIha7araOYPTIBaBAvBuUoVvZvx         MmfzIJn6MBHoY+Fxm+xUi1euMli6Mk7OQoqz+bkqXGI9AcmCJwk+7xh3pJUN1Dk78jA3         iODO4fXyoD7AdVeZOBBfWVqfImTYSG4PvMADddCPuDVqFPolTlgzwoFPaazmsNMov9/F         BprAjVBeP+cjuFXIXyBUN9uHaGaUGcB4GsKlIwp3+NJyrGgQvIfUulbP6jdOH4U+Gu1u         OJFkbKLx7jfqgm9k8Bh7ltYOQg2TohnP3ZkzWyHldkGBk/XCDwghvtXue3e75Jac63OP         f6Bw==
MIME-Version: 1.0
References: <alpine.LSU.2.20.1907231548290.30921@zhemvz.fhfr.qr> <ri6o91i32a4.fsf@suse.cz> <alpine.LSU.2.20.1907251405590.4958@zhemvz.fhfr.qr> <CAFULd4ZEAOsRSGZ5SD7kOz2NLmkvkfd6X6v=vkTm99=NgxXZiw@mail.gmail.com> <CAFULd4b9LXOZyqLSSS5mz1r7xe71LiwXFAgm5=VZFRLE57E2cw@mail.gmail.com> <alpine.LSU.2.20.1907311309240.19626@zhemvz.fhfr.qr> <CAFULd4YHaEZsYn8Ym_F61QZ52hFyG4bRwJPU6y3qtSj39xRBuw@mail.gmail.com> <alpine.LSU.2.20.1908011122550.19626@zhemvz.fhfr.qr>
In-Reply-To: <alpine.LSU.2.20.1908011122550.19626@zhemvz.fhfr.qr>
From: Uros Bizjak <ubizjak@gmail.com>
Date: Thu, 01 Aug 2019 09:38:00 -0000
Message-ID: <CAFULd4YmJ1W1is5F3oJCMdk0uKUsLUr6wei8K7TsVrPygBBwgA@mail.gmail.com>
Subject: Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
To: Richard Biener <rguenther@suse.de>
Cc: Martin Jambor <mjambor@suse.cz>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, 	Jakub Jelinek <jakub@redhat.com>, Vladimir Makarov <vmakarov@redhat.com>
Content-Type: text/plain; charset="UTF-8"
X-SW-Source: 2019-08/txt/msg00012.txt.bz2

On Thu, Aug 1, 2019 at 11:28 AM Richard Biener <rguenther@suse.de> wrote:

> > > So you unconditionally add a smaxdi3 pattern - indeed this looks
> > > necessary even when going the STV route.  The actual regression
> > > for the testcase could also be solved by turing the smaxsi3
> > > back into a compare and jump rather than a conditional move sequence.
> > > So I wonder how you'd do that given that there's pass_if_after_reload
> > > after pass_split_after_reload and I'm not sure we can split
> > > as late as pass_split_before_sched2 (there's also a split _after_
> > > sched2 on x86 it seems).
> > >
> > > So how would you go implement {s,u}{min,max}{si,di}3 for the
> > > case STV doesn't end up doing any transform?
> >
> > If STV doesn't transform the insn, then a pre-reload splitter splits
> > the insn back to compare+cmove.
>
> OK, that would work.  But there's no way to force a jumpy sequence then
> which we know is faster than compare+cmove because later RTL
> if-conversion passes happily re-discover the smax (or conditional move)
> sequence.
>
> > However, considering the SImode move
> > from/to int/xmm register is relatively cheap, the cost function should
> > be tuned so that STV always converts smaxsi3 pattern.
>
> Note that on both Zen and even more so bdverN the int/xmm transition
> makes it no longer profitable but a _lot_ slower than the cmp/cmov
> sequence... (for the loop in hmmer which is the only one I see
> any effect of any of my patches).  So identifying chains that
> start/end in memory is important for cost reasons.

Please note that the cost function also considers the cost of move
from/to xmm. So, the cost of the whole chain would disable the
transformation.

> So I think the splitting has to happen after the last if-conversion
> pass (and thus we may need to allocate a scratch register for this
> purpose?)

I really hope that the underlying issue will be solved by a machine
dependant pass inserted somewhere after the pre-reload split. This
way, we can split unconverted smax to the cmove, and this later pass
would handle jcc and cmove instructions. Until then... yes your
proposed approach is one of the ways to avoid unwanted if-conversion,
although sometimes we would like to split to cmove instead.

Uros.