From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 97C043858D39 for ; Fri, 26 May 2023 11:42:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 97C043858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nexgo.de Received: from mr4.vodafonemail.de ([145.253.228.164]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q2VqR-0006fi-1k for gcc@gnu.org; Fri, 26 May 2023 07:42:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1685101371; bh=dBCHY6UXhnnsdTvqaQ7O1KiV1zfO57typPF76A4K8Rc=; h=Message-ID:From:To:References:In-Reply-To:Subject:Date: Content-Type:X-Mailer:From; b=Nq0lDkLbr+upUmF45Km/Ond5K/wB9a7sfzv+FvHlEU46fOCWs0bEWkuvy51FI9F5y FVQNLqvHpFQ64N1Y0YJjuDeJsRHuq80oL/i1IOqzdC9d0EeHW4yXk22bUi4ygjwGNk V7orp4Ds0TuV5MgWUEewwTOW9IqYJYuP03txWt/k= Received: from smtp.vodafone.de (unknown [10.0.0.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mr4.vodafonemail.de (Postfix) with ESMTPS id 4QSNMb3GWgz1y37; Fri, 26 May 2023 11:42:51 +0000 (UTC) Received: from H270 (p5b38f631.dip0.t-ipconnect.de [91.56.246.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 4QSNML0HS0zKn81; Fri, 26 May 2023 11:42:35 +0000 (UTC) Message-ID: From: "Stefan Kanthak" To: "Jakub Jelinek" Cc: "Jonathan Wakely" , , "Andrew Pinski" References: <51071A92918346ABBC6B5703179F5174@H270> <896EB515110646CEBAA84E98E273E4B8@H270> <4BD5D8BA8E0F45098CC3E2B188A216E6@H270> In-Reply-To: Subject: Re: Will GCC eventually support SSE2 or SSE4.1? Date: Fri, 26 May 2023 13:36:01 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate: clean X-purgate-size: 2929 X-purgate-ID: 155817::1685101367-D3FFE4F8-672251CB/0/0 Received-SPF: pass client-ip=145.253.228.164; envelope-from=stefan.kanthak@nexgo.de; helo=mr4.vodafonemail.de X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,DKIM_VALID_EF=-0.1,RCVD_IN_DNSWL_LOW=-0.7,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_FAIL,SPF_HELO_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Jakub Jelinek" wrote: > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. >> That's bad, REALITY CHECK, please! > > You're wrong. > SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions > didn't have it. That's correct, I failed to see this difference. > The supported CPU names don't distinguish between core2 submodels, > so if you have core2 with sse4.1, you should either be using -march=native > if compiling on such a machine, or use -march=core2 -msse4.1, This is one of the combinations I didn't test until now; with it (and with -m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise: # Compilation provided by Compiler Explorer at https://godbolt.org/ ispowerof2(unsigned long long): movq xmm1, QWORD PTR [esp+4] pcmpeqd xmm0, xmm0 xor eax, eax paddq xmm0, xmm1 pand xmm0, xmm1 # SUPERFLUOUS! punpcklqdq xmm0, xmm0 # SUPERFLUOUS! ptest xmm0, xmm0 # ptest xmm0, xmm1 sete al ret 9 instructions in 36 bytes instead of 7 instructions in 26 bytes. JFTR: the documentation of MOVQ specifies | when the destination operand is an XMM register, the quadword is | stored to the low quadword of the register, and the high quadword | is cleared to all 0s. > there is no -march={conroe,allendale,wolfdale,merom,penryn,...}. > >> 4) If the documenation is right, then the behaviour of GCC is wrong: it >> doesn't allow to use SSE4.1 without SSE4.2! > > If you aren't able to read the documentation, it is hard to argue. When the documentation is wrong or incomplete it's hard to trust it! | -m32 ... | The -m32 option sets int, long, and pointer types to 32 bits, and | generates code that runs on any i386 system. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but generates SSE2 instructions which DONT run on ANY i386 system! OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates code that does NOT run on ANY i386 system! Where is the precedence of the different -m* feature options for the CPU type documented? Where is their influence on each other documented? Why does the documentation FAIL to specify that CPU features given by -m* override -m32 or enables them in ADDITION to those enabled by -march=? | -march=cpu-type ... | Specifying -march=cpu-type implies -mtune=cpu-type, except where noted | otherwise. ... | -mtune=cpu-type ... | the compiler does not generate any code that cannot run on the default | machine type unless you use a -march=cpu-type option. Why is the "default machine type" not mentioned/specified with -march=? Stefan