From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 50F5E3846441 for ; Mon, 5 Jun 2023 10:30:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 50F5E3846441 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nexgo.de Received: from mr3.vodafonemail.de ([145.253.228.163]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q67Tz-0005GR-6m for gcc@gnu.org; Mon, 05 Jun 2023 06:30:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1685961030; bh=tk7RFGgCfTxJdP68obn55a8+nTpgpi6K+PYL24+HTP4=; h=Message-ID:From:To:Subject:Date:Content-Type:X-Mailer:From; b=JXqixbE/l2HIZ4CSc5t/a8VHNxNyfHsvWDrZdyhKc14J9AOCJTp2NO2c6VYGMYGc+ 55HVzXP/SKLrJC1h8mdtEpoTG1Uv63tAuFe+N4/J5u7hDxAyU/u+Mv7WlCbZFq4G/y zQiC2ZGBoyS7QIt7SM9ENfMKLoA8Kc8OpJpitmq0= Received: from smtp.vodafone.de (unknown [10.0.0.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mr3.vodafonemail.de (Postfix) with ESMTPS id 4QZVHV2sJXz1ymf for ; Mon, 5 Jun 2023 10:30:30 +0000 (UTC) Received: from H270 (p5b38f631.dip0.t-ipconnect.de [91.56.246.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 4QZVHQ0Tsqz9rxM for ; Mon, 5 Jun 2023 10:30:23 +0000 (UTC) Message-ID: <5982A5DF4D694B4EA971B2597E833FC6@H270> From: "Stefan Kanthak" To: Subject: Will GCC eventually learn to use BSR or even TZCNT on AMD/Intel processors? Date: Mon, 5 Jun 2023 12:17:43 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate: clean X-purgate-size: 1902 X-purgate-ID: 155817::1685961026-28FF84F8-FD557FBE/0/0 Received-SPF: pass client-ip=145.253.228.163; envelope-from=stefan.kanthak@nexgo.de; helo=mr3.vodafonemail.de X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,DKIM_VALID_EF=-0.1,RCVD_IN_DNSWL_LOW=-0.7,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_FAIL,SPF_HELO_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --- failure.c --- int _clz(unsigned long long argument) { return __builtin_clzll(argument); } int _ctz(unsigned long long argument) { return __builtin_ctzll(argument); } --- EOF --- GCC 13.1 -m32 -mabm -mbmi -mlzcnt -O3 failure.c _clz(unsigned long long): mov edx, DWORD PTR [esp+8] xor ecx, ecx xor eax, eax lzcnt eax, DWORD PTR [esp+4] add eax, 32 lzcnt ecx, edx test edx, edx cmovne eax, ecx ret _ctz(unsigned long long): sub esp, 20 push DWORD PTR [esp+28] push DWORD PTR [esp+28] call __ctzdi2 add esp, 28 ret OUCH: although EXPLICITLY enabled via -mabm (for AMD processors) and -mbmi (for Intel processors), GCC generates slowmotion code calling __ctzdi2() instead of TZCNT instructions available since 10 (in words: TEN) years. GCC 13.1 -m32 -march=i386 -O3 failure.c _clz(unsigned long long): mov edx, DWORD PTR [esp+4] mov eax, DWORD PTR [esp+8] test eax, eax je .L2 bsr eax, eax xor eax, 31 ret .L2: bsr eax, edx xor eax, 31 lea eax, [eax+32] ret _ctz(unsigned long long): sub esp, 20 push DWORD PTR [esp+28] push DWORD PTR [esp+28] call __ctzdi2 add esp, 28 ret OUCH²: the BSF/BSR instructions were introduced 38 (in words: THIRTY-EIGHT) years ago with the i386 processor, but GCC fails to know/use BSF -- a real shame! OUCH³: an optimising compiler would of course generate "JMP __ctzdi2" instead of code fiddling with the stack! Stefan Kanthak