From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 665123858D3C for ; Fri, 26 May 2023 09:22:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 665123858D3C Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com Received: from mail-ej1-x636.google.com ([2a00:1450:4864:20::636]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1q2TeU-0003Ps-FE for gcc@gnu.org; Fri, 26 May 2023 05:22:28 -0400 Received: by mail-ej1-x636.google.com with SMTP id a640c23a62f3a-96f50e26b8bso85849066b.2 for ; Fri, 26 May 2023 02:22:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685092944; x=1687684944; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=EBAAtFdaCpMjLLHUPOXZWGvrfAtoVOdYq/x6YHWy178=; b=fYWWvyA1WNoDc6dxjFD2KWOjkiosCuxnG7I251fzOxr7BJc3pZirfPUVRyNDbbcDTo J8UxAjAbXyBgQ4MlyDTnDJGJR1nsBw668meqkCJ1MEBAkpdFO+YPqHZR4h1Z80iEpsFA mUafX4P8t+wCa5rYsB6C8D9KzWPXMJQaxIrplfKcSNfgP+TemM+vj1SAcEaTxfV9zdlj EoRfjQlYvDNiK98MBz+/SfiZsuP0x6OUeGEiRgqh/9xC3E43rUr0x++g1vUwmCII0MOj MCzIMKNgU14RReWoLxQcXSfpvmC6PjSzchEY2UdUoNgdSUL+zV7+2+4dAq7aCg7KsQJD 1YxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685092944; x=1687684944; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EBAAtFdaCpMjLLHUPOXZWGvrfAtoVOdYq/x6YHWy178=; b=Dx9AOsSbWUU7C742jLYWzg6zCC2CGpSfayfDKnuLLbQgbfSCwdsypHBdAS5S+yU1rK pKMRymcL0rIx8ypJkN07R2qhCvX0eNc5aMBZWvZqDhxij4QKBxlRkFCXQ6lr7WmFTvvv yvvF0JylmzVbTNddhRmf5ASoJEahWNWBkCVvH50XNTd58lvy4KHtSLORMkDOpn7IG0po NEN9okV+jdJdU9lqwa1lsBf55WsoYqr0gLsR0v0t/zjsBqcF93bKVp/tgat09+O+auQ3 B0Hl2Tp44XBbm9ynE+R5IWqNSDkpik2wIqDGqFCuKgt75y8/iq+6vJ4EaYddbzwG2FD5 a7gg== X-Gm-Message-State: AC+VfDx3Cw4T+LQuo9rcrErO8FWywD1ybxdWp+0qiZgaksW6c8ok7c49 NbYvHecoEx6b2ZnsHy9xsoxelpzlIM+xnJyG5Ss= X-Google-Smtp-Source: ACHHUZ5FfZ8sD0GzJXzPXsxo4XbZMoMzi2uOYLPzs7p//FVAwl+r6nsp5aT0aJs3HIlEnPj+zFYXFEXs/nlOEashRWo= X-Received: by 2002:a17:907:9620:b0:961:8fcd:53bc with SMTP id gb32-20020a170907962000b009618fcd53bcmr1484770ejc.21.1685092944455; Fri, 26 May 2023 02:22:24 -0700 (PDT) MIME-Version: 1.0 References: <51071A92918346ABBC6B5703179F5174@H270> <896EB515110646CEBAA84E98E273E4B8@H270> <4BD5D8BA8E0F45098CC3E2B188A216E6@H270> In-Reply-To: <4BD5D8BA8E0F45098CC3E2B188A216E6@H270> From: Jonathan Wakely Date: Fri, 26 May 2023 10:22:13 +0100 Message-ID: Subject: Re: Will GCC eventually support SSE2 or SSE4.1? To: Stefan Kanthak Cc: gcc@gnu.org, Andrew Pinski Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=2a00:1450:4864:20::636; envelope-from=jwakely.gcc@gmail.com; helo=mail-ej1-x636.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,DKIM_VALID_EF=-0.1,FREEMAIL_FROM=0.001,RCVD_IN_DNSWL_NONE=-0.0001,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,SPF_HELO_PASS,SPF_SOFTFAIL,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 26 May 2023 at 10:06, Stefan Kanthak wrote: > > "Jonathan Wakely" wrote: > > > On Fri, 26 May 2023 at 09:00, Stefan Kanthak wrote: > >> > >> "Jonathan Wakely" wrote: > >> > >> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, wrote: > >> > > >> >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak > >> >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> compile the following function on a system with Core2 processor > >> >>> (released January 2008) for the 32-bit execution environment: > >> >>> > >> >>> --- demo.c --- > >> >>> int ispowerof2(unsigned long long argument) > >> >>> { > >> >>> return (argument & argument - 1) == 0; > >> >>> } > >> >>> --- EOF --- > >> >>> > >> >>> GCC 13.3: gcc -m32 -O3 demo.c > >> >>> > >> >>> NOTE: -mtune=native is the default! > >> >> > >> >> You need to use -march=native and not -mtune=native .... to turn on > >> >> the architecture features. > >> > >> (Un)fortunately this changes nothing! > >> > >> STOP: that's wrong, it makes it even WORSE! > >> > >> # Compilation provided by Compiler Explorer at https://godbolt.org/ > >> ispowerof2(unsigned long long): > >> vmovq xmm1, QWORD PTR [esp+4] > >> vpcmpeqd xmm0, xmm0, xmm0 > >> xor eax, eax > >> vpaddq xmm0, xmm1, xmm0 > >> vpand xmm0, xmm0, xmm1 > >> vpunpcklqdq xmm0, xmm0, xmm0 > >> vptest xmm0, xmm0 > >> sete al > >> ret > >> > >> That's what I call a REALLY EPIC FAILURE! > >> > >> Compare this unefficient BLOAT to the SSE4.1 code from my original post! > >> > >> > Yes this is just user error. You didn't use the right options to say you > >> > want SSE2. > >> > >> ARGH: please read CAREFULLY what I wrote! > > > > You wrote "Now add the -mtune=core2 option to EXPLICITLY enable the > > NATIVE SSE4.1 > > alias "Penryn New Instruction Set" of the Core2 processor" which is > > wrong, that's not what -mtune does. > > > > Read the docs CAREFULLY: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html > > 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. > That's bad, REALITY CHECK, please! Are you sure about that? My understanding is that Core2 introduced SSSE3 and Penryn introduced SSE4.1. The list at https://en.wikipedia.org/wiki/List_of_Intel_Core_2_processors shows a lot of Core2 processors without SSE4.1, is it wrong? e.g. Intel Core2 E6400 doesn't support SSE4.1 > > 4) If the documenation is right, then the behaviour of GCC is wrong: it > doesn't allow to use SSE4.1 without SSE4.2! It's not "wrong", it just means GCC has chosen not to add customized behaviour for the models that only support SSE4.1 and not SSE4.2. That's not "wrong" unless it's leaving real performance on the floor for real hardware used by real users. How common are those models, and is there any significant performance benefit in adding yet another arch option for those models? > 5) Compile the function with -march=nehalem (which according to the > documentation enables support for BOTH SSE4.1 and SSE4.2) and notice > that GCC fails to use SSE4.1! If you think the code would perform better with SSE4.1 instructions and GCC doesn't use them for -march=nehalem, PLEASE FILE A BUG. Stop yelling about it on the mailing list, it just makes you look like a troll who isn't actually interesting in improving anything, just complaining. If you think there's something that should be fixed in GCC file a bug. File a bug. File a bug. Did anybody mention yet that you should file a bug?