From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x12e.google.com (mail-il1-x12e.google.com [IPv6:2607:f8b0:4864:20::12e]) by sourceware.org (Postfix) with ESMTPS id B03093858D32 for ; Wed, 27 Dec 2023 13:25:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B03093858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B03093858D32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::12e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703683511; cv=none; b=X4iry+dIvAtkkKsIWVVju2YQes0ITFzw1G2tGxkjYiLMApsgeqkF3uwaYY6Spuify7AK0ivpNwM2itRa8aJyXcVmN4bN3bjAwKpIO8PRGxGSu8b/hX3D8d8m391Ws5yw47X5qO64VyDQupaRXF4k+lE/fCRuA2aJqfBunFyaZIg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703683511; c=relaxed/simple; bh=m/NnU2qjCIbY4GGw416ccihsjdHysj/BMNiVG7iULAY=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=r2f38PsMGusES94cDgsv7iaCNYPY6kzbgOn/Sd+EknHaNP9ln1VB7WMBBmwwQpV5hEEcupV8elLVXbgwJDai4/ZL/jF4tTGGsm3cGA9sX78E30YSMEZ/sQljaSf3YO19bglRxn1TwxSVVOwwg6C6Fq6OajkaR5nmzcRVLovzJfM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-il1-x12e.google.com with SMTP id e9e14a558f8ab-35fea35c4c2so8209475ab.1 for ; Wed, 27 Dec 2023 05:25:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1703683508; x=1704288308; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=lC8Jw3BJoKTJoDAE/8GJ808H+ZJCdnupVN5R27yX7JQ=; b=rgJns/Xfe5W1vIFbTSpGuASJYT7MxzuZ0xffmCmxFPk5D4S02bXT8bmbSEfAfZNram 2EzoAVMuiyCSHjMvJ6FFi1VnQK95RIK6Upmny2XDC5D0JzR2HG+6ZJryiVXCNnCqPULq 3iohLT81pwMOB/pTzzUk1Q3yqA3Ga2cou71/PwxihEfu8NijI2rUAE6zmY9UCpK01YJU 9o2Ch0pgZidEhgx9KVmtPgcGH46BgOYZEVlQFM2S9bj3/4/Mj4zziAnwhpNW1+XNM1is ZGvG/h1AkiiRss6lt/4cPZ48wTlJkjqL0B7kwF8wsS5JZam/u4m+zcJa97EsxZxrAXB1 lvNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703683508; x=1704288308; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lC8Jw3BJoKTJoDAE/8GJ808H+ZJCdnupVN5R27yX7JQ=; b=SWEM/NgYzTWBcE96nTefmekZu3yyhXH3Kxtwbxw9UfSRpMNBJl017pjlHqB5cyUpRK USjSMqMJ1H7L99yueKdSyalyqWgouDlP3R+VYaR5E0tT3Y8f9wnK9Eh3gErcKI6XfkLJ 7BXOYrPEdyesfkhDkT512/ytTnGt5VPyq5VcqM1S74lVw3h8E5arMOjAJBQw6zt7CTX2 dA93NqjT9RaR43WPtnRc+B6Agak5WPCt3gIbYv0XyDvYbV+xoVz63GY1JgqGf96SJSRm GJ4gcyZdWj3bvcCr7InWjmr2ODIhtqc6fdP3V4qZUkKg6UvS/HifqiFbh7FV2XFz0C2d ulJw== X-Gm-Message-State: AOJu0YymDdEDzEb9P6j70tkaWajpAjutGG23k5vKAW7M3abr65RXzlZq qXti/OsteNwacAwv1wKNwwIdX7DqD6cixfhxadXjdJsl4tM= X-Google-Smtp-Source: AGHT+IEycIDMKorAe0sfHsfbLKyYUp7dEq/AlfpHDzwyRZ7767jcIs5lRObhFYW/0bqDF3QrbpO6Tg== X-Received: by 2002:a05:6e02:1a06:b0:35f:ea5b:b0d3 with SMTP id s6-20020a056e021a0600b0035fea5bb0d3mr8209730ild.26.1703683508457; Wed, 27 Dec 2023 05:25:08 -0800 (PST) Received: from ?IPV6:2804:1b3:a7c0:8192:d80:8251:3d98:1794? ([2804:1b3:a7c0:8192:d80:8251:3d98:1794]) by smtp.gmail.com with ESMTPSA id a5-20020a170902ecc500b001d364210979sm11893926plh.224.2023.12.27.05.25.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 27 Dec 2023 05:25:07 -0800 (PST) Message-ID: <760ae346-56ab-40a5-9800-eae9bebe5617@linaro.org> Date: Wed, 27 Dec 2023 10:25:04 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] MIPS: Hard-float rounding instructions support Content-Language: en-US To: Xi Ruoyao , Junxian Zhu , Jiaxun Yang Cc: libc-alpha@sourceware.org References: <20231225103548.1615-2-zhujunxian@oss.cipunited.com> <20231225103548.1615-4-zhujunxian@oss.cipunited.com> <61ecc506-3796-49e1-a4f3-7a39807a1fc3@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 26/12/23 19:50, Xi Ruoyao wrote: > On Wed, 2023-12-27 at 05:50 +0800, Xi Ruoyao wrote: >> On Tue, 2023-12-26 at 17:12 -0300, Adhemerval Zanella Netto wrote: >>> Also, I see no point in implementing this optimizations with assembly where >>> a C implementation would be way simpler and generate similar code. Similar >>> to what I did for powerpc with sysdeps/powerpc/fpu/round_to_integer.h, I >>> implemented a similar approach for MIPS [1].  The resulting code should be >>> similar to the assembly implementation, taking in consideration the correct >>> fix to save/restore floating-point exceptions. I did see no math regression >>> on cfarm23 with a glibc built with -mabi=64 -mips64r2. >> >> Is there a micro-benchmark result on the cfarm machine?  AFAIK the FCSR >> setting instruction may be much more slower than normal instructions, so >> I'm not sure if this is really a win. > > Add Jiaxun who knows MIPS much better than me. > This is a good question, we do have a trunc one which should give us a hint whether this strategy might shows any performance gain. At least with cfarm230 (Cavium Octeon III V0.2 FPU V0.0) my version is actually worse than the generic one: - Using master/generic: $ ./ld.so --library-path . ./bench-trunc "trunc": { "": { "duration": 9.83902e+08, "iterations": 4.2864e+07, "max": 10027.9, "min": 21.329, "mean": 22.954 } } - Using trunc.l.d: $ ./ld.so --library-path . ./bench-trunc "trunc": { "": { "duration": 9.91892e+08, "iterations": 2.2458e+07, "max": 10048.9, "min": 38.326, "mean": 44.1666 } } The system does not have perf, but I guess that libc_fesetenv_mips is really bad performance-wise since it requires to read the current state to flush the fpu pipeline and the set the previous one (which is ends up being two cfc1). In any case I updated my branch [1] with single float implementation, if any wants to experiment with it. At least this experiment raised the issue that some ceil, floor, round, and/or trunc implementation might still raising inexact floating point exceptions due lacking of proper testing. PS: I had to disable the vDSO on this system since clock_gettime was returning bogus value that make the bench tests unusable. [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/mips-hw-fp-round