From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) by sourceware.org (Postfix) with ESMTPS id 6A5D03882ACB for ; Tue, 18 Jun 2024 20:38:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6A5D03882ACB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6A5D03882ACB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1135 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718743085; cv=none; b=addEJXJ+8BNYi4LHQW2BC8SNRLCpaUKHoLw99wWjt1Zdj1syWu7l5pD3VOjtHQgRSBc9nHWy/r8OQuRqOnAJ4FIsyj4wDzgG29AeK+cdsHZEH6IvaHVfDVTRkto8Z67/cOpvPFQm+/YkBpJ9g2PCN2kZioaqNrySzijR8UnhZ7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718743085; c=relaxed/simple; bh=nuiJfFaecBp3+nDyEoYIPZZdmPz/6PVFhTQVzEQUuHI=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=LLiNo8F6AiKu0MRar6T3Ri2rjc8lHbpY6ou4AJPxWVsfL2ItVtHiTgI6gEWdP8ecO+gNMnzwTYwDtJ4kVXitDtiIrZA6fDj522F4ZR4HjOq0VOJ8JwxsO82wlPlbshaoAFsAI552tXn1jQwlRKC6BHjC68RXTZBFDOblOYIbO4g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-62cddff7f20so58344237b3.3 for ; Tue, 18 Jun 2024 13:38:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718743083; x=1719347883; darn=sourceware.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=nuiJfFaecBp3+nDyEoYIPZZdmPz/6PVFhTQVzEQUuHI=; b=HUjP8c03cPK4aDBdIDl/Xg8cWaTRs2On+ZhcfG2MGcs8MxEU3gVnDMDmcKpZ67Fj7V oTN2kOoNtrgHDbUpN1WMP0jUk/AO2jbfTFg9GzN9K1wzFi6Ekfn1G+WKzAUwW9qJwCpj eJwd4gCQqAzaLy8WZC7hCaYBfhLTErpcFDWcENeWn3lGbugRRCdaXQ1x3V1MrdfJVoQv cEaTf2090xryVBA1H/2wJcJw5BPtwNtPAxB0E77CN6rrA1oGcjWSOury/wRV8+8c/4VG cM3fC3WZliOdCMqpyaU/TrBiI2Vy0ObAH2IJsCl0puYO20vRCypEC+0teegU5+vIpDF3 uI0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718743083; x=1719347883; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nuiJfFaecBp3+nDyEoYIPZZdmPz/6PVFhTQVzEQUuHI=; b=kpjcX2EZNpgVnf1LRC0fmJGd/AAONyTdzzcEXrqaKPT9U0uAT1IfbKN1UDAedyAPzN LspkbckF6F9gQ0+owkM9uLYN5BySr698pHcgKzQHRSaRdSqd7aOK51F/nvqnkD7vL2My 048q0hQkqpqZ5JIjGnAkd8/gnmg4ld206fu3O19mrQqichjEi7D4vvpMhxPfxwcuS+lt Olo4xbgsgYqh/0+hUsN1gZinlgv6Fa0SBbfd6JXB3pr7CJ14tXK//T4ZrP7Qg41i+O+7 lJNX2ZYFJgxfbBeJ/AZb/4l6RTevOULoN5CgO1DYen6kX6BjmuBqz0PkL3Sw7b+0FL/q UTnw== X-Forwarded-Encrypted: i=1; AJvYcCVAtZCKhYla8+bgMUc/qA0XFuB5Fi/fZE5e/v8QQYJ7KlzYCf3FPaxE3zKJY8wGD8pxngDZRCbMRDPEcYYvMgFKh5nm6UH/9g== X-Gm-Message-State: AOJu0YzJ8anxY+0TvExjxa7U35n0IdvO3Ka+ghVDWZkD3Clr3egN19Mv Mgk28kjxTUtVdyU9q7KItA7Kx5OQfO5+iXDRJoh6V6MnIgXJr7tEeA8Jz+1JqstnLxQbSYD3cZT o0tcyit/OL+jCHZgpcO2LDEjDNhmsN64y/sjqSA== X-Google-Smtp-Source: AGHT+IG0MtvbmF7yFf5u8WLHwwvP46ogcK3WEYzht+mTiDLMEtGGmTBWklCVmAzwQnNRgtnmSxBF8KWJup3P2rfJcWQ= X-Received: by 2002:a0d:d643:0:b0:61a:d21f:a131 with SMTP id 00721157ae682-63a8fddbf8dmr9114597b3.37.1718743081775; Tue, 18 Jun 2024 13:38:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "H.J. Lu" Date: Wed, 19 Jun 2024 04:37:49 +0800 Message-ID: Subject: Re: [PATCH 6/6] x86: optimize {,V}PEXTR{D,Q} with immediate of 0 To: "Jiang, Haochen" Cc: "Beulich, Jan" , "Cui, Lili" , Binutils Content-Type: multipart/alternative; boundary="000000000000d97d20061b300ea5" X-Spam-Status: No, score=-3012.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000d97d20061b300ea5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable -Os should optimize for code size. Other optimizations should take performance into account. On Tue, Jun 18, 2024, 2:23=E2=80=AFPM Jiang, Haochen wrote: > > >> Wait. While the compiler may use PSRLDQ here, based on knowing > > >> assumptions > > >> made elsewhere, the assembler can't: The replacement insn must > generate the > > >> exact same result in the destination register. PSRLDQ with an > immediate of > > >> 0 (which effectively you're suggesting to use here) doesn't alter the > > >> destination register at all, though. When really we want the upper > bits of > > >> the register cleared. > > > > > > pextrd/q also doesn't clear them at all. For vpextrd/q and vpsrldq, > they will > > > both clear higher bits. So they will be the same. > > > > Wait - your suggestion is even more confusing: The destination of PSRLDQ > is > > an XMM register, whereas the destination of PEXTR* is a GPR or memory. > This > > is properly expressed in the constraints in the compiler, but clearly we > > can't replace insns like this in the assembler. > > Yes, I realized that I am wrong here, there are no constraints. vmovd/q > would be > definitely better and doable here if we would like to do something. > > > >>> Also, I suppose the optimization related to latency should not be > done in > > >>> assembler. > > >> > > >> Why? We have -O, -O1, and -O2 alongside -Os for a reason. > > > > > > I am quite conservative on the optimization in assembler. If we are > also going to > > > optimize those hand-written code, the optimization could work. > > > > > > However, when they hand write some code, are we supposed to change > them? > > > > Well, if we aren't to, people simply don't pass -O. > > > > > For -Os, we could give them all the optimizations we have, but for -O, > I am not > > > that sure. > > > > > > And I suppose we might add too much burden for the assembler if we are > going > > > to add too much optimizations related to latency. It will become > another compiler. > > > Are we supposed to copy all the optimizations from compiler? > > > > Probably not all (and many aren't the the insn level anyway, nor do we - > so > > far at least - optimize for latency/throughput at the expense of code > size). > > But yes - this specific aspect is why I keep raising questions on what > > optimizations are worth it vs where we'd better leave code alone. > > H.J., what is your opinion on that? > > Thx, > Haochen > > > > > Jan > > > > > IMO, optimization to > > > codesize is ok, but for latency, I am a little concerned. > > > > > > Thx, > > > Haochen > > > > > >> > > >> Jan > > --000000000000d97d20061b300ea5--