From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by sourceware.org (Postfix) with ESMTPS id 24721385843E for ; Fri, 26 May 2023 13:33:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 24721385843E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qv1-xf29.google.com with SMTP id 6a1803df08f44-6260a2522d9so1908206d6.3 for ; Fri, 26 May 2023 06:33:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685108003; x=1687700003; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=r2zmkKXLu784EHkgp6ugdsaHy0Yh/SRvwcbUEzr1DLw=; b=JINY+kz5jtcaADTEG687rWU9Syr+18TWozrPVr6FrdvvkzFRlnjUHgnyN93Ebfmpke QVxO+ngUThyk120s/zDs7lqCIF21Oey3/88Ryaxcao0Ri1Zr+EFPbgNikwf6DL8EhzWx gp5o3+3kvSqRUxjoao5/3d8wshjYzkFhqsBSEbbJOHxqt3T+X5esdYUgpkA3tOG/MYlr GK3R1zdM/prBcs+LgXA7Jmuf1cfkNmy0khvoNdWKceX8SF0ND4vHUAB4zX2M3NrDvH0+ ywT+K92gK2OAf59JMdFaA4zJET1+LgPOj1FQO1E/ALOE3o+5YjNV3zGLTopOk0+zMOsJ iGWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685108003; x=1687700003; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=r2zmkKXLu784EHkgp6ugdsaHy0Yh/SRvwcbUEzr1DLw=; b=LJ9Ig38w+gdwC1+yKNGFTpuJ+yb70mtAjc86WxxAGWXpEh2kgFAqlk8p+aIERGEifg cK0e2cz0nUjVVWyc3DlUlFTzBjVKDfQ0wBDDpRfWkQcamMbVzQZz4aKTiOnWPoX4zVU0 A9DjYXsW3wvZtCSV5+0t9iM26pYm7x8O8v9i3/RllMh5V6inSejzGirnAxSw6ocyGx6v 4eabhP3mtw5WOsKM5aSAZSmv+iAyW2CH/3OJsWO79XV2NlCx3JhvoyoG5iIajBDZRcE4 kRtdqWmAgqFKIKFrKDDwdHqjMv3jhqfnEpBjs/QEmkm86fW2oei6/PoD3t8e13UuBzvU kWpw== X-Gm-Message-State: AC+VfDxxfqT0L8WG0SKHuwRCfg3P7AYOMKxXsiU51sRxQN+/p+1iAo7t /2btlh0m8TWRE3Q4XLdAZyg0U+njYg0= X-Google-Smtp-Source: ACHHUZ4tyGgOhzKW9pSFr6pbiMfj9AHvGl7GLRiq7pLeLrAQe4dE85ML8hgDHCpSWCOHRYxStCpb/w== X-Received: by 2002:ad4:5bcc:0:b0:56a:d94d:6deb with SMTP id t12-20020ad45bcc000000b0056ad94d6debmr1418107qvt.25.1685108002725; Fri, 26 May 2023 06:33:22 -0700 (PDT) Received: from ?IPV6:2602:47:d92c:4400:7705:23c4:b8b1:5bb2? ([2602:47:d92c:4400:7705:23c4:b8b1:5bb2]) by smtp.gmail.com with ESMTPSA id c23-20020a05620a11b700b00759391e7f7asm1159173qkk.90.2023.05.26.06.33.21 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 26 May 2023 06:33:22 -0700 (PDT) Message-ID: Date: Fri, 26 May 2023 09:33:21 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: Will GCC eventually support SSE2 or SSE4.1? Content-Language: en-US To: gcc@gcc.gnu.org References: <51071A92918346ABBC6B5703179F5174@H270> <896EB515110646CEBAA84E98E273E4B8@H270> <4BD5D8BA8E0F45098CC3E2B188A216E6@H270> <2D3BCE2E82544ACD95352C72BE944C59@H270> From: Nicholas Vinson In-Reply-To: <2D3BCE2E82544ACD95352C72BE944C59@H270> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/26/23 08:42, Stefan Kanthak wrote: > > I could have added PROPERLY, because that's where it CLEARLY fails, as > shown by the generated unoptimised code. From what I've seen so far, I find your arguments unconvincing. In this thread alone, you've proven that you don't know how to properly control gcc via its command-line flags, and that you don't know how to properly generate assembly code for your own C example (properly in this case meaning to exhibit the behavior the ISO C standard requires) which makes it hard for me to accept your claims at face value (your C example is also logically incorrect, but that's not important to this discussion). That said assuming that your "optimized assembly" examples (with the exception of the first) are correct, all you've done is shown that your versions are slightly smaller in both instruction count and size and declared your examples "proper". The optimization flag -O3 (like most of the -On flags) optimize for speed over all else, and it has been proven that the faster code isn't necessarily the code with fewer instructions or the smallest size (see the RISC v CISC debate). To accept that your suggestions are the proper ways to generate code using SSE4.1 instructions at -O3, I insist on data that clearly demonstrates that your suggestions are at least as performant than what GCC's currently does.