From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by sourceware.org (Postfix) with ESMTPS id 4BBD7384AB68 for ; Fri, 10 May 2024 07:57:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4BBD7384AB68 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4BBD7384AB68 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::72a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715327826; cv=none; b=EKArYh3Eyk5sXDlmxnIZqPR4kn5xORYWKQ2bg96eQVg8QpGW270zLQmz+5HvTNAyCw7xMRp3CfcvtxpAmvwYnzhkdj+V9tYTAhJm0WM57jcLFfgoXW14u0yHEOARVMvl3bG+85umer3WRklYgn7MnWTeaKRKIiAHTYeK6qLiRdQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715327826; c=relaxed/simple; bh=v6nSQNVJD4b16Q5ieo9HbaSHS0rFmRyWeoXMF/I8PyA=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=WB0Q17BNfntZf5+Mo74iK1VUY7hToSBOtRu+Ar1vaDcHBViazEU/VU/5PwPPxEKzOlIBofULqqc0AxviVPvbz2CFPnqL3iwEndPh9Ou7wDE5Ji9YEjoiEHYJOi0g8K3MaAd2BUboKCnYr+YRdgORKnAy1Pfa8hN94aXswMmjwAs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-7928c351c6bso146047685a.0 for ; Fri, 10 May 2024 00:57:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715327824; x=1715932624; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=oaXQqZVblYRRPpPLD6DseVkuvnUMcmEBWWcef0Gul2k=; b=Er6ZVvP/UUY6KyxfUBZr9scUaA/ziEcHTMfhJMMm1SQFGjVjPFtMZH/LXoSarkQyf+ PvGORfTKso/KJ5wt5mzstjFEtGqIUoF9Dg4nTHrAIoIV2/psPMR/FyAtjexhzOPOihsq +ADGx+IQqwmbL4IbbIgxqU4jvCVPaOuJ/WWESC9Auuff1xDkpCqqpq6AEUCf+q65pTLa +P8TdaJHB6JvyUzS4Fjf/UO3K3RO4lUCf2FHXba5DJze6ULrX8kHEUbSaAxBuxALLOyb UyIxV1pryE3UNPwrW5abJD57f1+ddnNuy+kPdPl55S390+6O3sgNKbo/37ubMkfa1zUm QXxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715327824; x=1715932624; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oaXQqZVblYRRPpPLD6DseVkuvnUMcmEBWWcef0Gul2k=; b=oC4d9KDgDZlRzbYQbUOwHWuUZhquTcHSCizDYOLIE9AJap/lPNIgvosKmKeEYZ3gXX YCaThdm2PbvMC2JUYcTwa75AEGghEgkLNEfzY2V6vnYl7td6svDa3YjQXldiftUg0/m4 igCQxeaiOcNOrKgy9zcxZF9QSktxQEcTYdJZ6r6jc6Dr9lCmV5NC8fTW5XF3tAuDOYKg B6RNnlCCeIqv2AM5/rS/ppDlxSaQz3kN/DjRUYTokCtpkLriExk6Qq7EksxTQCB8fdRy jDcEekUrRtdXaH5imkequHZkfDJTABKDsY3uSZnbIMNDP2J2FGBdwyZyB4SvrPw85ra1 MUEA== X-Gm-Message-State: AOJu0YxHfsaJftaTufdR/x+N9573l4uy+DRF4X1cMndCLmbMrUG2PWol SCL4ZYWHePCYAL5ahHOthRNdqKMz4yEFTKN+abiYDM3g2Quwpj5PmsA7K+c4EpuanF+XCRQv/ud hexKDX/8Y5A0vha1m7dKA4nAekx1NFz42CM8= X-Google-Smtp-Source: AGHT+IFrQ4xjdFXtt8nELhB2j5vhf9glMEjt+ADuV9xer79bYlRxmhITKgPjLj+i6Qh2WO5ITbjBZZzDe+AiUzvVca4= X-Received: by 2002:a05:6214:554a:b0:6a0:d548:cc8 with SMTP id 6a1803df08f44-6a168207e2fmr20048526d6.48.1715327824388; Fri, 10 May 2024 00:57:04 -0700 (PDT) MIME-Version: 1.0 References: <009601daa25f$f2a73c50$d7f5b4f0$@nextmovesoftware.com> <004701daa2ad$6b78c610$426a5230$@nextmovesoftware.com> In-Reply-To: <004701daa2ad$6b78c610$426a5230$@nextmovesoftware.com> From: Hongtao Liu Date: Fri, 10 May 2024 15:56:53 +0800 Message-ID: Subject: Re: [x86 PATCH] Improve V[48]QI shifts on AVX512 To: Roger Sayle Cc: gcc-patches@gcc.gnu.org, Hongtao Liu , Uros Bizjak Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_NUMSUBJECT,KAM_SHORT,LOTS_OF_MONEY,MONEY_NOHTML,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, May 10, 2024 at 3:41=E2=80=AFPM Roger Sayle wrote: > > > Many thanks for the speedy review and correction/improvement. > It's interesting that you spotted the ternlog "spill"... > I have a patch that rewrites ternlog handling that's been > waiting for stage 1, that would also fix this mem operand > issue. I hope to submit it for review this weekend. I opened a PR for that. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D1150= 21 > > Thanks again, > Roger > > > From: Hongtao Liu > > On Fri, May 10, 2024 at 6:26=E2=80=AFAM Roger Sayle > > wrote: > > > > > > > > > The following one line patch improves the code generated for V8QI and > > > V4QI shifts when AV512BW and AVX512VL functionality is available. > > + /* With AVX512 its cheaper to do vpmovsxbw/op/vpmovwb. */ > > + && !(TARGET_AVX512BW && TARGET_AVX512VL && TARGET_SSE4_1) > > && ix86_expand_vec_shift_qihi_constant (code, qdest, qop1, qop2)= ) I think > > TARGET_SSE4_1 is enough, it's always better w/ sse4.1 and above when no= t going > > into ix86_expand_vec_shift_qihi_constant. > > Others LGTM. > > > > > > For the testcase (from gcc.target/i386/vect-shiftv8qi.c): > > > > > > typedef signed char v8qi __attribute__ ((__vector_size__ (8))); v8qi > > > foo (v8qi x) { return x >> 5; } > > > > > > GCC with -O2 -march=3Dcascadelake currently generates: > > > > > > foo: movl $67372036, %eax > > > vpsraw $5, %xmm0, %xmm2 > > > vpbroadcastd %eax, %xmm1 > > > movl $117901063, %eax > > > vpbroadcastd %eax, %xmm3 > > > vmovdqa %xmm1, %xmm0 > > > vmovdqa %xmm3, -24(%rsp) > > > vpternlogd $120, -24(%rsp), %xmm2, %xmm0 > > It looks like a miss-optimization under AVX512, but it's a separate iss= ue. > > > vpsubb %xmm1, %xmm0, %xmm0 > > > ret > > > > > > with this patch we now generate the much improved: > > > > > > foo: vpmovsxbw %xmm0, %xmm0 > > > vpsraw $5, %xmm0, %xmm0 > > > vpmovwb %xmm0, %xmm0 > > > ret > > > > > > This patch also fixes the FAILs of gcc.target/i386/vect-shiftv[48]qi.= c > > > when run with the additional -march=3Dcascadelake flag, by splitting > > > these tests into two; one form testing code generation with -msse2 > > > (and > > > -mno-avx512vl) as originally intended, and the other testing AVX512 > > > code generation with an explicit -march=3Dcascadelake. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=3Dunix{-m32} > > > with no new failures. Ok for mainline? > > > > > > > > > 2024-05-09 Roger Sayle > > > > > > gcc/ChangeLog > > > * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial)= : > > > Don't attempt ix86_expand_vec_shift_qihi_constant on AVX512. > > > > > > gcc/testsuite/ChangeLog > > > * gcc.target/i386/vect-shiftv4qi.c: Specify -mno-avx512vl. > > > * gcc.target/i386/vect-shiftv8qi.c: Likewise. > > > * gcc.target/i386/vect-shiftv4qi-2.c: New test case. > > > * gcc.target/i386/vect-shiftv8qi-2.c: Likewise. > > > > > > > > > Thanks in advance, > > > Roger > > > -- > > > > > -- > > BR, > > Hongtao > --=20 BR, Hongtao