From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 0A0E83858D39 for ; Fri, 26 May 2023 09:06:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0A0E83858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nexgo.de Received: from mr5.vodafonemail.de ([145.253.228.165]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q2TOu-0000FV-LT for gcc@gnu.org; Fri, 26 May 2023 05:06:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1685091978; bh=Ygf2HuiwlhtbIK6FhlCUl/ucqDMxZHmYZ5giq4LJVL0=; h=Message-ID:From:To:References:In-Reply-To:Subject:Date: Content-Type:X-Mailer:From; b=HckCGvKgOOP5NYR40PUTg5eW7eSv0lrAUgosGFlEtfxpZL5EWuPjyVgW/SpqDIwXd kIT8aJhD3tZPiUbFf2OscK8X8wBzGh9X2h2KVT9Fj6J/QQyzhq5wdnGs9q4Tz1/h2N 7WRIXcWfa0JHTeBe0P17r4vQBpRhbq5bFw6/Kaec= Received: from smtp.vodafone.de (unknown [10.0.0.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mr5.vodafonemail.de (Postfix) with ESMTPS id 4QSJty1qFKz1yBG; Fri, 26 May 2023 09:06:18 +0000 (UTC) Received: from H270 (p5b38f631.dip0.t-ipconnect.de [91.56.246.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 4QSJtl6Hh3zMksD; Fri, 26 May 2023 09:06:04 +0000 (UTC) Message-ID: <4BD5D8BA8E0F45098CC3E2B188A216E6@H270> From: "Stefan Kanthak" To: "Jonathan Wakely" Cc: , "Andrew Pinski" References: <51071A92918346ABBC6B5703179F5174@H270> <896EB515110646CEBAA84E98E273E4B8@H270> In-Reply-To: Subject: Re: Will GCC eventually support SSE2 or SSE4.1? Date: Fri, 26 May 2023 10:59:03 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate: clean X-purgate-size: 3526 X-purgate-ID: 155817::1685091974-76FF44D1-F52354A3/0/0 Received-SPF: pass client-ip=145.253.228.165; envelope-from=stefan.kanthak@nexgo.de; helo=mr5.vodafonemail.de X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,DKIM_VALID_EF=-0.1,RCVD_IN_DNSWL_LOW=-0.7,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,SPF_FAIL,SPF_HELO_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 09:00, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, wrote: >> > >> >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> compile the following function on a system with Core2 processor >> >>> (released January 2008) for the 32-bit execution environment: >> >>> >> >>> --- demo.c --- >> >>> int ispowerof2(unsigned long long argument) >> >>> { >> >>> return (argument & argument - 1) == 0; >> >>> } >> >>> --- EOF --- >> >>> >> >>> GCC 13.3: gcc -m32 -O3 demo.c >> >>> >> >>> NOTE: -mtune=native is the default! >> >> >> >> You need to use -march=native and not -mtune=native .... to turn on >> >> the architecture features. >> >> (Un)fortunately this changes nothing! >> >> STOP: that's wrong, it makes it even WORSE! >> >> # Compilation provided by Compiler Explorer at https://godbolt.org/ >> ispowerof2(unsigned long long): >> vmovq xmm1, QWORD PTR [esp+4] >> vpcmpeqd xmm0, xmm0, xmm0 >> xor eax, eax >> vpaddq xmm0, xmm1, xmm0 >> vpand xmm0, xmm0, xmm1 >> vpunpcklqdq xmm0, xmm0, xmm0 >> vptest xmm0, xmm0 >> sete al >> ret >> >> That's what I call a REALLY EPIC FAILURE! >> >> Compare this unefficient BLOAT to the SSE4.1 code from my original post! >> >> > Yes this is just user error. You didn't use the right options to say you >> > want SSE2. >> >> ARGH: please read CAREFULLY what I wrote! > > You wrote "Now add the -mtune=core2 option to EXPLICITLY enable the > NATIVE SSE4.1 > alias "Penryn New Instruction Set" of the Core2 processor" which is > wrong, that's not what -mtune does. > > Read the docs CAREFULLY: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. That's bad, REALITY CHECK, please! 4) If the documenation is right, then the behaviour of GCC is wrong: it doesn't allow to use SSE4.1 without SSE4.2! 5) Compile the function with -march=nehalem (which according to the documentation enables support for BOTH SSE4.1 and SSE4.2) and notice that GCC fails to use SSE4.1! >> 1) I didn't tell GCC to use SSE at all (I DON'T want any compiler to use >> SSE per default, especially when the generated code is SLOWER and BIGGER >> than conventional code using the general purpose registers)! >> >> 2) GCC uses SSE2 on its own, but doesn't support it well: it FAILS to use >> PMOVMSKB here, despite -O3! > > So report a bug to bugzilla, not via an email to the wrong list. > >> >> 3) -march=core2 doesn't help too, GCC fails to use SSE4.1 at all! > > core2 doesn't enable SSE4.1, as clearly shown in the docs: > https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html > > If you send emails full of confused mistakes, don't be surprised if > the replies aren't what you want. > > If you think GCC is generating bad code, file a bug. But make sure > you're actually using the right options to enable the right > instruction sets before complaining about the instructions used. See above: GCC fails to use SSE4.1, despite -march=nehalem And (if the documentation is right, then) GCC fails to support SSE4.1 without SSE4.2. Stefan