From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by sourceware.org (Postfix) with ESMTPS id 3B3053858D35 for ; Wed, 22 Dec 2021 12:49:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3B3053858D35 Received: by mail-qv1-xf2a.google.com with SMTP id r6so2155869qvr.13 for ; Wed, 22 Dec 2021 04:49:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=/6EsZeMU8mfoovIE9q+P7TY8Nb89x3uj0JgDSeQmKZQ=; b=lsToxJKlc3ZY9OIvE9dueOyoheNjSsFAMnjMKpk9YatR/ihF50rT8XWiQwISeBooaM +WlQ+a8F0N67FPYeOjWtI4051EpvHpOT/FgQTuOo1gVGvTmfLvCk9SWWUlxHu2ul5ucW 9PuT0umqNyn6LRjUhFeQk+eXMKTtgsZkjvnGZepDKrXPqhyBNWppxqzDWRuRtTpIc0IF jAgPwRwFYGaaCgz+FK/zj6oNYEv/2HBectEGThhx1Nrw0SGBbGIcxNszubtT63AJgxCK 8YGjKD6PDbSb+p78CA7SRYxoIBxk68X1IMetE9S8Uejr++PpN9lj5wDO09Fh5wz0rnIZ DJVg== X-Gm-Message-State: AOAM530Ph0SGwDFnWGWEj1Zq3kiKsBrYN+OT5Suo+7b+vGN6c3bg0OlD oQMiKMmZ2PxTtjKb2F85lEQTDA== X-Google-Smtp-Source: ABdhPJy4aI0fcn37DS+baTSBCn2KwTFpkwi4Sd63aRAfKyPwUk6TLzCl/0NhRFVX7weB96w6DLFNtQ== X-Received: by 2002:a05:6214:1c84:: with SMTP id ib4mr2240259qvb.76.1640177368599; Wed, 22 Dec 2021 04:49:28 -0800 (PST) Received: from ?IPV6:2804:431:c7cb:3b1e:1ba8:c11f:6224:efe1? ([2804:431:c7cb:3b1e:1ba8:c11f:6224:efe1]) by smtp.gmail.com with ESMTPSA id b11sm1564135qtx.85.2021.12.22.04.49.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Dec 2021 04:49:28 -0800 (PST) Message-ID: <28dcdd4f-31fd-7dc6-7eca-c0bdf1470b1c@linaro.org> Date: Wed, 22 Dec 2021 09:49:26 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: x86-64 assembly codes vs C source codes Content-Language: en-US To: "H.J. Lu" , GNU C Library , "Pandey, Sunil K" , "Cornea, Marius" , Florian Weimer References: From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Dec 2021 12:49:31 -0000 On 21/12/2021 13:18, H.J. Lu via Libc-alpha wrote: > Hi, > > There are 421 x86-64 assembly source files. Majority of > them have generic versions in C. We added many x86-64 > assembly source files for performance. Most of x86-64 > assembly source files started from the generic version in C. > They were compiled into assembly and optimized by hand. > Intel is committed to support x86-64 assembly source files > to improve performance and fix any bugs. > > For 2.35, we'd like to add more x86-64 assembly source > files to libmvec. The x86-64 assembly source files are the > preferred form for performance and accuracy today. > > We will evaluate the generic alternative in the future > if it has similar performance and accuracy as the > assembly version, like: I understand this route could be the most straightforward one to provide optimized code, since you do not need to handle multiple compilers or tune the multiple possible options, nor handle different architectures and/or ABI or chips. However, if you also check on the list you will note that a lot of assembly routines removal was possible due the use of better implementation which tried to avoid the use of optimized assembly routines by using generic C source code and leveraging compiler support with minimal arch-specific tuning. This is more costly initially, but imho it does pay off in the long term. For instance, the math implementations from ARM optimized routines [1] provided a better implementation for all architectures, beaten even the x86 assembly one (as you noted in the list). Similar is the new hypot implementation I created with Wilco, it is faster for most architectures, specially when FMA is available (for instance for x86_64-v2). ARM and other developers could just crafted a assembly routine and optimize it solely for ARM. It would require additionnal effort and time from other maintainers to check and craft optimized version for each architecture. I think this does make sense for string ones where it does rely on architecture and chip specific tuning that are hardly to express in generic C code. However I really think with current compiler support, we really should avoid it for math code. Specially because it is already a not easy code to follow, it usually results in a large code that may become unmaintainable due time (or at least require a lot of more effort), it might hinder some compiler support improvements, and it is containerized to a specific architecture (even when the code might be re-used on different ones). So I would ask you if you could improve the libmvec support to make it more generic it would be really useful. > > ba4b8fab20 x86-64: Remove s_sincosf-sse2.S > 4ca945e9c5 x86-64: Remove sysdeps/x86_64/fpu/s_cosf.S > 9574c7b68d x86-64: Remove sysdeps/x86_64/fpu/s_sinf.S > e1f59bebd8 x86-64: Replace assembly versions of e_expf with generic e_expf.c > 8537e0f6cf x86-64: Implement libmathvec IFUNC selectors in C > 10a87ca476 x86-64: Implement libm IFUNC selectors in C > 11ffcacb64 x86-64: Implement strcmp family IFUNC selectors in C > 70fe2eb794 x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C > 9f4254b8bd x86-64: Implement wcscpy IFUNC selector in C > 9ed0aa15d3 x86-64: Implement strcat family IFUNC selectors in C > b91a52d0d7 x86-64: Implement memcmp family IFUNC selectors in C > 93e46f8773 x86-64: Implement memset family IFUNC selectors in C > 5c3e322d3b x86-64: Implement memmove family IFUNC selectors in C > 5a103908c0 x86-64: Implement strcpy family IFUNC selectors in C > > Thanks. > [1] https://github.com/ARM-software/optimized-routines