From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 951C33858D3C for ; Fri, 26 May 2023 12:09:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 951C33858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nexgo.de Received: from mr3.vodafonemail.de ([145.253.228.163]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q2WGU-0004oL-N0 for gcc@gnu.org; Fri, 26 May 2023 08:09:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1685102986; bh=5tNDckqbAh2UuQvg9n5eMTRx5Qt4wRKqqSLSBHQ315M=; h=Message-ID:From:To:References:In-Reply-To:Subject:Date: Content-Type:X-Mailer:From; b=dKBylM7k2x50n/+leb2Qy1gElCXOnmrRqzv+M2IVJf9e+cuYJ9eesC0P6rLkg3EXC HEX6bqizu/dcNarh+wofr8j/65mz7diL8S74r0Dattp66bONGcCLH+oobLeQEhisyB 8VaC5x1o2jwWX8r+SJR3lV6z4/H29TrOw3ekP7aw= Received: from smtp.vodafone.de (unknown [10.0.0.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mr3.vodafonemail.de (Postfix) with ESMTPS id 4QSNyf20NFz205g; Fri, 26 May 2023 12:09:46 +0000 (UTC) Received: from H270 (p5b38f631.dip0.t-ipconnect.de [91.56.246.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 4QSNyN6Zbcz9sF6; Fri, 26 May 2023 12:09:29 +0000 (UTC) Message-ID: <07FDB7C375CD46C1B6955B66338DD58E@H270> From: "Stefan Kanthak" To: "Jonathan Wakely" Cc: "Jakub Jelinek" , , "Andrew Pinski" References: <51071A92918346ABBC6B5703179F5174@H270> <896EB515110646CEBAA84E98E273E4B8@H270> <4BD5D8BA8E0F45098CC3E2B188A216E6@H270> <7D6327CDEBFD4331B6FFBD67E4B514FD@H270> In-Reply-To: Subject: Re: Will GCC eventually support SSE2 or SSE4.1? Date: Fri, 26 May 2023 14:03:36 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate: clean X-purgate-size: 4518 X-purgate-ID: 155817::1685102982-6AFFC4F8-9DC862DD/0/0 Received-SPF: pass client-ip=145.253.228.163; envelope-from=stefan.kanthak@nexgo.de; helo=mr3.vodafonemail.de X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,DKIM_VALID_EF=-0.1,RCVD_IN_DNSWL_LOW=-0.7,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,SPF_FAIL,SPF_HELO_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 12:29, Stefan Kanthak wrote: >> >> "Jakub Jelinek" wrote: >> >> > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. >> >> That's bad, REALITY CHECK, please! >> > >> > You're wrong. >> > SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions >> > didn't have it. >> >> That's correct, I failed to see this difference. > > REALITY CHECK please! Dumbass check please! >> > The supported CPU names don't distinguish between core2 submodels, >> > so if you have core2 with sse4.1, you should either be using -march=native >> > if compiling on such a machine, or use -march=core2 -msse4.1, >> >> This is one of the combinations I didn't test until now; with it (and with >> -m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise: >> >> # Compilation provided by Compiler Explorer at https://godbolt.org/ >> ispowerof2(unsigned long long): >> movq xmm1, QWORD PTR [esp+4] >> pcmpeqd xmm0, xmm0 >> xor eax, eax >> paddq xmm0, xmm1 >> pand xmm0, xmm1 # SUPERFLUOUS! >> punpcklqdq xmm0, xmm0 # SUPERFLUOUS! >> ptest xmm0, xmm0 # ptest xmm0, xmm1 >> sete al >> ret >> >> 9 instructions in 36 bytes instead of 7 instructions in 26 bytes. No comment here? >> JFTR: the documentation of MOVQ specifies >> >> | when the destination operand is an XMM register, the quadword is >> | stored to the low quadword of the register, and the high quadword >> | is cleared to all 0s. >> >> > there is no -march={conroe,allendale,wolfdale,merom,penryn,...}. >> > >> >> 4) If the documenation is right, then the behaviour of GCC is wrong: it >> >> doesn't allow to use SSE4.1 without SSE4.2! >> > >> > If you aren't able to read the documentation, it is hard to argue. >> >> When the documentation is wrong or incomplete it's hard to trust it! > > Just like when you make incorrect statements and assume everybody else is wrong. Do I assume that? Or did you just make this up? > The documentation isn't perfect, but you should not just ignore it and > assume you know better in all cases. > >> | -m32 >> ... >> | The -m32 option sets int, long, and pointer types to 32 bits, and >> | generates code that runs on any i386 system. >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but >> generates SSE2 instructions which DONT run on ANY i386 system! > > That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954 I posted this here some years ago; see for example Ignorance is bliss?! >> OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates >> code that does NOT run on ANY i386 system! >> >> Where is the precedence of the different -m* options for the CPU type >> documented? >> Where is their influence on each other documented? > > -march enables the instructions listed for the relevant cpu family, > then using -mxxx or -mno-xxx adds or removes particular instruction > sets from the ones enabled by -march. ADD THIS TO THE DOCUMENTATION! > If you give an option twice, e.g. -march=core2 -march=nehalem, then > the second one wins. If you use -msse2 -mno-sse2 then the second one > wins. ARGH: not repetitions of ONE particular option or its negation, stupid! > You can check this using e.g. > > gcc -Q --help=target -march=core2 -msse2 > >> | -march=cpu-type >> ... >> | Specifying -march=cpu-type implies -mtune=cpu-type, except where noted >> | otherwise. >> ... >> | -mtune=cpu-type >> ... >> | the compiler does not generate any code that cannot run on the default >> | machine type unless you use a -march=cpu-type option. >> >> Why is the "default machine type" not mentioned/specified with -march=? > > Using -march overrides it. The default is set during configure. And exactly this is missing in the documentation for -march=! Guess why I cited the documentation for -mtune= where it is mentioned? > Adding -v to the compilation will show what -march option is used by cc1 by > default. Not reliable unless documented elsewhere! Stefan