From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id B83EA3857720 for ; Fri, 26 May 2023 07:00:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B83EA3857720 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com Received: from mail-pj1-x1031.google.com ([2607:f8b0:4864:20::1031]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1q2RQv-0006yJ-5k for gcc@gnu.org; Fri, 26 May 2023 03:00:19 -0400 Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2533d3acd5fso574787a91.2 for ; Fri, 26 May 2023 00:00:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685084414; x=1687676414; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kUb0YaCjSKYRkWUZPygjG8crrwGN0zMRSB8dBgWawxE=; b=XBNx/my1+nbhIqn4MtMJg3EourC5vRW4sOFfwaxhgnvOECa4GrO9c+svSvZf0n+UTH wyILh7VceXpJIs+PTSjUmztAnT1xf3aKBKr+S8cV32XvTQG1LVtqU5oPOY78QjlUgR1K V9tDFS1sUss8tugdNmqYTBh2vaZXwrfiojevynt7EiTiE4wwlxOnd8SYNKslmJldM0gP yTq94zB6P5c3jDgkkFl3xnrB9/1IWLmjb/Plz4lEGw0kVutiMl0xsuwmZk/cWeBZz1sS KQVWI6JnX+zsF7sZRmdFxrpKDCsgvG/VP2LN4TWhSkYD+VlObbT7PJj5KpFKO2AFcsCw sZTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685084414; x=1687676414; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kUb0YaCjSKYRkWUZPygjG8crrwGN0zMRSB8dBgWawxE=; b=FIctFWrWCD8zKJ29CoJeumG0aatkbzSUNnT9y9po3y4KSx8kf6X6UXxsJNSDb0Wl+E 9JI0D5pQlyI9rbdxk+VoRfY9Ad1AXvzSzqUNMi2M04fDWGiJ3/sZAic8ldXuLwMCKPXu PH6U3pOSgcr0DO+35Q2FNvLQNP29YnAmamRowx7U3mrDuWF3H3YQ3Nkfj1IsRqQZ6inb HpskDrLsn286fAD9UDft1Nk3k+Wtj12c79A9rpwD8SN4puNUy958n///CtnZTVp30Ebr LSlWy5HwHHKFU92G07pgAEMIKn2TnP7xL/JiX33/iyTW/KDPLuONhuf/CTJDqkmXGyMG oB6A== X-Gm-Message-State: AC+VfDx+ybBVQtG63Fz7LSR0Ib05U5HNJDvRMx2IgdUglM0y6GiXJj6D 9GCRP5cQ5iC30TXvPAWfxC1k+Ua86ue1mIWj1wOZUXkt5fM= X-Google-Smtp-Source: ACHHUZ7YxTri2n21/H0WenV8Phnw6itf7nIKdk70bzW3nWZzMBqUPxzyzFMPX6iNc/QLSSv+WuHNWmm0fI/hhopo1i4= X-Received: by 2002:a17:90b:4a51:b0:253:3cfa:e310 with SMTP id lb17-20020a17090b4a5100b002533cfae310mr1412901pjb.19.1685084414008; Fri, 26 May 2023 00:00:14 -0700 (PDT) MIME-Version: 1.0 References: <51071A92918346ABBC6B5703179F5174@H270> In-Reply-To: <51071A92918346ABBC6B5703179F5174@H270> From: Andrew Pinski Date: Fri, 26 May 2023 00:00:01 -0700 Message-ID: Subject: Re: Will GCC eventually support SSE2 or SSE4.1? To: Stefan Kanthak Cc: gcc@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2607:f8b0:4864:20::1031; envelope-from=pinskia@gmail.com; helo=mail-pj1-x1031.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,DKIM_VALID_EF=-0.1,FREEMAIL_FROM=0.001,RCVD_IN_DNSWL_NONE=-0.0001,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,SPF_HELO_PASS,SPF_SOFTFAIL,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, May 25, 2023 at 11:56=E2=80=AFPM Stefan Kanthak wrote: > > Hi, > > compile the following function on a system with Core2 processor > (released January 2008) for the 32-bit execution environment: > > --- demo.c --- > int ispowerof2(unsigned long long argument) > { > return (argument & argument - 1) =3D=3D 0; > } > --- EOF --- > > GCC 13.3: gcc -m32 -O3 demo.c > > NOTE: -mtune=3Dnative is the default! You need to use -march=3Dnative and not -mtune=3Dnative .... to turn on the architecture features. Thanks, Andrew > > # https://godbolt.org/z/b43cjGdY9 > ispowerof2(unsigned long long): > movq xmm1, [esp+4] > pcmpeqd xmm0, xmm0 > paddq xmm0, xmm1 > pand xmm0, xmm1 > movd edx, xmm0 # pxor xmm1, xmm1 > psrlq xmm0, 32 # pcmpeqb xmm0, xmm1 > movd eax, xmm0 # pmovmskb eax, xmm0 > or edx, eax # cmp al, 255 > sete al # sete al > movzx eax, al # > ret > > 11 instructions in 40 bytes # 10 instructions in 36 bytes > > OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set= " > here instead of the native SSE4.1 alias "Penryn New Instruction Set= " > of the Core2 (and all later processors)? > > OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on th= e > right side? > > > Now add the -mtune=3Dcore2 option to EXPLICITLY enable the NATIVE SSE4.1 > alias "Penryn New Instruction Set" of the Core2 processor: > > GCC 13.3: gcc -m32 -mtune=3Dcore2 -O3 demo.c > > # https://godbolt.org/z/svhEoYT11 > ispowerof2(unsigned long long): > # xor eax, eax > movq xmm1, [esp+4] # movq xmm1, [esp+4] > pcmpeqd xmm0, xmm0 # pcmpeqq xmm0, xmm0 > paddq xmm0, xmm1 # paddq xmm0, xmm1 > pand xmm0, xmm1 # ptest xmm0, xmm1 > movd edx, xmm0 # > psrlq xmm0, 32 # > movd eax, xmm0 # > or edx, eax # > sete al # sete al > movzx eax, al # > ret # ret > > 11 instructions in 40 bytes # 7 instructions in 26 bytes > > OUCH: GCC FAILS to use SSE4.1 as shown in the comments on the right side. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Last compile with -mtune=3Di386 for the i386 processor: > > GCC 13.3: gcc -m32 -mtune=3Di386 -O3 demo.c > > # https://godbolt.org/z/e76W6dsMj > ispowerof2(unsigned long long): > push ebx # > mov ecx, [esp+8] # mov eax, [esp+4] > mov ebx, [esp+12] # mov edx, [esp+8] > mov eax, ecx # > mov edx, ebx # > add eax, -1 # add eax, -1 > adc edx, -1 # adc edx, -1 > and eax, ecx # and eax, [esp+4] > and edx, ebx # and edx, [esp+8] > or eax, edx # or eax, edx > sete al # neg eax > movzx eax, al # sbb eax, eax > pop ebx # inc eax > ret # ret > > 14 instructions in 33 bytes # 11 instructions in 32 bytes > > OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous > memory write? > > > Stefan Kanthak