From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 0C47E386EC72 for ; Tue, 17 Nov 2020 00:09:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 0C47E386EC72 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-397-bvk47QdxNPu2RHASNjXHuA-1; Mon, 16 Nov 2020 19:09:00 -0500 X-MC-Unique: bvk47QdxNPu2RHASNjXHuA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4685C5F9C0; Tue, 17 Nov 2020 00:08:59 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-176.phx2.redhat.com [10.3.112.176]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06B615C1CF; Tue, 17 Nov 2020 00:08:58 +0000 (UTC) Subject: Re: [PATCH] Optimize comparison between result of us_minus and 0. To: Hongtao Liu , GCC Patches , Kirill Yukhin References: From: Jeff Law Message-ID: Date: Mon, 16 Nov 2020 17:08:58 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Nov 2020 00:09:06 -0000 On 9/3/20 3:39 AM, Hongtao Liu via Gcc-patches wrote: > Hi: > Add define_peephole2 to perform optimization like bellow: > > +/* Optimize for TARGET_AVX512F > + vpsubusw op1, op2, dst1; > + vxorps xmm, xmm, dst2; ----> vpcmpleuw op1, op2, dst3 > + vpcmpeqw dst1, dst2, dst3 */ > > and > > +/* Optimize for target above TARGET_SSE4_1 > + vpsubusw op1, op2, dst1; vpminuw op1, op2, dst1 > + vpxor xmm, xmm, dst2; ----> vpcmpeqw op1, dst1, dst3 > + vpcmpeqw dst1, dst2, dst3 */ > > Bootstrap is ok, regression test is ok for i386/x86-64 backend. > Ok for trunk? > > gcc/ChangeLog: > PR target/96906 > * config/i386/sse.md (VI12_128_256): New mode iterator. > (define_peephole2): Optimize comparison between result of > us_minus and 0, it could be optimized to "vpcmplequ" for > AVX512 or "pminu + cmpeq" for target above TARGET_SSE4_1. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx2-pr96906-1.c: New test. > * gcc.target/i386/avx512f-pr96906-1.c: New test. > * gcc.target/i386/sse2-pr96906.c: New test. > * gcc.target/i386/sse4_1-pr96906-1.c: New test. I'd look to see if a combiner pattern could help with these too rather than using a peep2. jeff