From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6D8123858D28; Fri, 22 Mar 2024 05:33:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6D8123858D28 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1711085607; bh=CYmnb0hkd3CKNXwEJ6WwiVb804XcrlKcCsQ3bqhen/k=; h=From:To:Subject:Date:From; b=wbs31pbLWbEIlo9gzJVP8v+UVVpjMJkRkFu6O8+uAEvWgXloWDrW1bWfQIbxpvWhZ KCIB5TzP+WG9cw+Lld4pcB2nEW/Id2Mh0VpjukuolqKKJmSXTpHtIKAFL9BGiCorhE A7RpD6QuAxSvU2iXeCVCaKlnYB/h+E/838OzEQxA= From: "liuhongt at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114428] New: [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld Date: Fri, 22 Mar 2024 05:33:26 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: liuhongt at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114428 Bug ID: 114428 Summary: [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef unsigned short uint16_t; typedef short int16_t; #define QUANT_ONE( coef, mf, f ) \ { \ if( (coef) > 0 ) \ (coef) =3D (f + (coef)) * (mf) >> 16; \ else \ (coef) =3D - ((f - (coef)) * (mf) >> 16); \ nz |=3D (coef); \ } int quant_4x4( int16_t dct[16], uint16_t mf[16], uint16_t bias[16] ) { int nz =3D 0; for( int i =3D 0; i < 16; i++ ) QUANT_ONE( dct[i], mf[i], bias[i] ); return !!nz; } gcc -O2 -march=3Dx86-64-v3 -S mov edx, 65535 vmovd xmm4, edx vpbroadcastd ymm4, xmm4 ... vpsrad ymm2, ymm2, 16 vpsrad ymm6, ymm6, 16 vpsrad ymm0, ymm0, 16 vpand ymm2, ymm4, ymm2 vpsrad ymm1, ymm1, 16 vpand ymm6, ymm4, ymm6 vpand ymm0, ymm4, ymm0 vpand ymm4, ymm4, ymm1 vpackusdw ymm2, ymm2, ymm6 vpackusdw ymm0, ymm0, ymm4 vpermq ymm2, ymm2, 216 vpermq ymm0, ymm0, 216 ... it can be optimized to below. vpsrld ymm2, ymm2, 16 vpsrld ymm6, ymm6, 16 vpsrld ymm0, ymm0, 16 vpsrld ymm1, ymm1, 16 vpackusdw ymm2, ymm2, ymm6 vpackusdw ymm0, ymm0, ymm4 vpermq ymm2, ymm2, 216 vpermq ymm0, ymm0, 216 The optimization opportunity is exposed after vec_pack_trunk_expr is expand= ed to vpand + vpackusdw.=