From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BA1EA385841B; Sat, 10 Jun 2023 10:37:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BA1EA385841B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686393454; bh=xBtVOkiS0H6GVxyKjFiuMKqprJ60rWhj0VWzwOClQwU=; h=From:To:Subject:Date:From; b=VeaWTDa5QfEOCiWcHbLh7Aq9bCrZV7x0Fx0dRpSLDe6AcqNdgDZdK6w6sGirt5V5D PuSL4MaHsVL5xKOaSyB8QLZUNgWw54f79VSnGwgsDwJEJxTEG2/HQ05H5XZYrejTqp DAL6jRonLX9z+Ut/3CiPfzP6Qzz9r+06izFJNJTg= From: "fabio at cannizzo dot net" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/110202] New: _mm512_ternarylogic_epi64 generates unnecessary operations Date: Sat, 10 Jun 2023 10:37:34 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 13.1.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: fabio at cannizzo dot net X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110202 Bug ID: 110202 Summary: _mm512_ternarylogic_epi64 generates unnecessary operations Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: fabio at cannizzo dot net Target Milestone: --- Consider the following two alternative implementations of a bitwise complem= ent of an avx512 register. #include __m512i negate1(const __m512i *a) { __m512i res; res =3D c(res, res, *a, 0x55); return res; } __m512i negate2(const __m512i *a) { __m512i res; res =3D _mm512_xor_si512(*a, _mm512_set1_epi32(-1)); return res; } which compiled with "-O3 -mavx512f" generates the asm listings (see godbolt: https://godbolt.org/z/jvrxEjW65) negate1(long long __vector(8) const*): vpxor xmm0, xmm0, xmm0 vpternlogq zmm0, zmm0, ZMMWORD PTR [rdi], 85 ret negate2(long long __vector(8) const*): vpternlogd zmm0, zmm0, ZMMWORD PTR [rdi], 0x55 ret negate1 introduces an unnecessary xor operation. Probably this is because it does not recognize that, when vpternlogd is used with code 0x55, it only us= es the third zmm argument.=