From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by sourceware.org (Postfix) with ESMTPS id 337BB3858C51 for ; Fri, 1 Jul 2022 23:07:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 337BB3858C51 Received: by mail-pg1-x531.google.com with SMTP id s206so3705484pgs.3 for ; Fri, 01 Jul 2022 16:07:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=DjEKXONG+9aO+JZApzdLNNBKasdEYgV71c0l+gQGD8Y=; b=xDF37Nk2tUyww4Q6RBQ2dgy6ltrQfecRu10iRhoI4RWiR/rnQryJnGLy0ufAwy2olZ qAGEKg7trnUeHK+TJ7UXK3/Ze5gGTt+B3Uvu/gqH/+jGlcRYnY09Ubgmfqh8AzYHpjQb hWOCFkkMjbCrJjl70x0qVtr7qC6L6+CcGKRo1ZyrfK9qGk1oolR52FU75fH3qcGJ4/0t B/jPhove3+IgLXzbCqVnsW7jxlEWHocZf3QUD8EBXuZvB+GuxqeprwvBqrKUPUVk4R4o jv3GNVHQJJu4Wr6bI0JvVOaWWT98MbwC5RkjkWVcTRLtinp+qsBOH2FlJQfCpp87uu1w w8Gg== X-Gm-Message-State: AJIora+9cWxC0oQ86H15SolWB7owmU0Bi1og89TeHb47u+QQusIu6XLt K61rYkxQX7IVuUTxT+f7pM+6Xq9x8J4= X-Google-Smtp-Source: AGRyM1tinUyZmLXZA1g+drjDu+4ZK84aWFkteZsJyR/qHK+Vn+DkPPSwbzYAEoTRpeBI1KSLP+Nblw== X-Received: by 2002:a05:6a00:2995:b0:525:398b:8585 with SMTP id cj21-20020a056a00299500b00525398b8585mr23911186pfb.7.1656716842782; Fri, 01 Jul 2022 16:07:22 -0700 (PDT) Received: from [192.168.0.23] (65-130-94-229.slkc.qwest.net. [65.130.94.229]) by smtp.gmail.com with ESMTPSA id s42-20020a056a0017aa00b0052553215444sm16464047pfg.101.2022.07.01.16.07.21 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 01 Jul 2022 16:07:22 -0700 (PDT) Message-ID: Date: Fri, 1 Jul 2022 17:07:21 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR. Content-Language: en-US To: gcc-patches@gcc.gnu.org References: <1C4185AB-6EE6-4B8B-838C-465098DAFD3B@suse.de> <997q6no-qsqp-1oro-52sp-899sr075p4po@fhfr.qr> From: Jeff Law In-Reply-To: <997q6no-qsqp-1oro-52sp-899sr075p4po@fhfr.qr> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2022 23:07:25 -0000 On 6/20/2022 5:56 AM, Richard Biener via Gcc-patches wrote: > > > Note one option would be to emit a multiply with { 1, -1, 1, -1 } on > GIMPLE where then targets could opt-in to handle this via a DFmode > negate via a combine pattern? Not sure if this can be even done > starting from the vec-perm RTL IL. FWIW, FP multiply is the same cost as FP add/sub on our target. > > I fear whether (neg:V2DF (subreg:V2DF (reg:V4SF))) is a good idea > will heavily depend on the target CPU (not only the ISA). For RISC-V > for example I think the DF lanes do not overlap with two SF lanes > (so same with gcn I think). Absolutely.  I've regularly seen introduction of subregs like that ultimately result in the SUBREG_REG object getting dumped into memory rather than be allocated into a register.  It could well be a problem with our port, I haven't started chasing it down yet. One such case where that came up recently was the addition of something like this to simplify-rtx.  Basically in some cases we can turn a VEC_SELECT into a SUBREG, so I had this little hack in simplify-rtx that I was playing with: > +      /* If we have a VEC_SELECT of a SUBREG try to change the SUBREG so > +        that we eliminate the VEC_SELECT.  */ > +      if (GET_CODE (op0) == SUBREG > +         && subreg_lowpart_p (op0) > +         && VECTOR_MODE_P (GET_MODE (op0)) > +         && GET_MODE_INNER (GET_MODE (op0)) == mode > +         && XVECLEN (trueop1, 0) == 1 > +         && CONST_INT_P (XVECEXP (trueop1, 0, 0))) > +       { > +         return simplify_gen_subreg (mode, SUBREG_REG (op0), GET_MODE > (SUBREG_REG (op0)), INTVAL (XVECEXP (trueop1, 0, 0)) * 8); > +       } Seemed like a no-brainer win, but in reality it made things worse pretty consistently. jeff