From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by sourceware.org (Postfix) with ESMTPS id 02B6639A0015 for ; Fri, 4 Jun 2021 19:00:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 02B6639A0015 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rtems.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-f42.google.com with SMTP id p17so14800077lfc.6 for ; Fri, 04 Jun 2021 12:00:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=LWvpOgZuEAwY83LNlaKY4DFqTaFbzuhZLcm265yq/dM=; b=hjIugU80kPmuZNFjSuFGXjXDqErUuPwkmyYWmc/l37/Ix8kFoBZFhhKfxCuvFQC8tJ Xu/yGxfcubKBXmD3ziC16j0n8yDcbJfjWNVAh8AdAnRR8BEYj7C2ySMvvusy/XIdoy74 4E2LHS6HmkG+9HJabNDMMrAUs2AbHzhZtwBLuFuWEXFFtSZT7QWmxs2x83NKop/8braE xZTt2hfAxOxomxaJwDyPKODfZrTsLXtBBSyJ2V6W6oSzzb+lkAAnyRfHy5qqu071qlnL lCpIEnkKCgKdlUqHZHNqkBTQsdW4p8ms6QtsxRy5oW/h6qffap3jOnaCOoQwtF0uMKyY j2pQ== X-Gm-Message-State: AOAM533ui8KwrGZ4duyaO2pO+qXHW5qsTc4NPC0iwAa3Gc72UAov9wL4 pIrhDCKwX7wTnuri1Gg+WSHjyw5i3CK7LA== X-Google-Smtp-Source: ABdhPJyq6WY1rhUTk7r9MCMgZc+aRhOz+zsv/9JmD2IWtCtghIW/1UruHiaNeo2FN0Rno16e2IdzbA== X-Received: by 2002:ac2:4d50:: with SMTP id 16mr3652452lfp.600.1622833207547; Fri, 04 Jun 2021 12:00:07 -0700 (PDT) Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com. [209.85.208.174]) by smtp.gmail.com with ESMTPSA id i132sm37736lfd.131.2021.06.04.12.00.06 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 04 Jun 2021 12:00:07 -0700 (PDT) Received: by mail-lj1-f174.google.com with SMTP id p20so12828916ljj.8 for ; Fri, 04 Jun 2021 12:00:06 -0700 (PDT) X-Received: by 2002:a2e:bb93:: with SMTP id y19mr2515152lje.463.1622833206734; Fri, 04 Jun 2021 12:00:06 -0700 (PDT) MIME-Version: 1.0 References: <2f8796f4-f164-5734-16ca-9a392e788beb@gmail.com> In-Reply-To: Reply-To: joel@rtems.org From: Joel Sherrill Date: Fri, 4 Jun 2021 13:59:55 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: incorrectly rounded square root To: Jeff Johnston Cc: Paul Zimmermann , Newlib X-Spam-Status: No, score=-3031.8 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jun 2021 19:00:19 -0000 On Fri, Jun 4, 2021, 1:44 PM Jeff Johnston wrote: > Ok, I now know exactly what is happening. > > The compiler is optimizing out the rounding check in ef_sqrt.c, probably > due to the operation using two constants. > > 86 ix += (m <<23); > (gdb) list > 81 else > 82 q += (q&1); > > When I debug, it always does the else at line 81 without performing the > one-tiny operation. The difference in the mxcsr > register is the PE bit which I believe gets set when you do the one-tiny > operation. Since we aren't doing it, it never gets > set on and the difference of 0x20 in the mxcsr register is explained. > > By making the constants volatile, I am able to get the code working as it > should. I have pushed a patch for this. > Awesome catch Paul and great eye to spot the problem Jeff! --joel > > -- Jeff J. > > On Fri, Jun 4, 2021 at 3:14 AM Paul Zimmermann > wrote: > >> Hi Jeff, >> >> > I figured the values were off when I had to hard-code them in my own >> > test_sqrt.c but forgot to include that info in my note. >> > >> > Now, that said, using the code I attached earlier, I am seeing the exact >> > values you are quoting above for glibc for the mxcsr register and the >> round >> > is working. Have your >> > tried running that code? >> >> yes it works as expected, but it doesn't work with Newlib's fenv.h and >> libm.a >> (see below). >> >> > The mxcsr values you are seeing that are different are not due to the >> > fesetround code. The code is shifting the round value 13 bits >> > and for 3, that ends up being 0x6000. It is masking mxcsr with >> 0xffff9fff >> > first so when you start with 0x1fxx and end up with 0x7fxx, the code is >> > doing what is supposed to do. >> > The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is >> a >> > bit in the last 2 hex digits which isn't touched by the code logic. >> >> here is how to reproduce the issue: >> >> tar xf newlib-4.1.0.tar.gz >> cd newlib-4.1.0 >> mkdir build >> cd build >> ../configure --prefix=/tmp --disable-multilib --target=x86_64 >> make -j4 >> make install >> >> $ cat test_sqrt_2.c >> #include >> #include >> #include >> >> #ifdef NEWLIB >> /* RedHat's libm claims: >> undefined reference to `__errno' in j1f/y1f */ >> int errno; >> int* __errno () { return &errno; } >> #endif >> >> int main() >> { >> int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD }; >> char Rnd[4] = "NZUD"; >> float x = 0x1.ff07fep+127f; >> float y; >> for (int i = 0; i < 4; i++) >> { >> unsigned short cw; >> unsigned int mxcsr = 0; >> fesetround (rnd[i]); >> __asm__ volatile ("fnstcw %0" : "=m" (cw) : ); >> __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : ); >> y = sqrtf (x); >> printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr); >> } >> } >> >> With GNU libc: >> $ gcc -fno-builtin test_sqrt_2.c -lm >> $ ./a.out >> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 >> RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672 >> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480 >> RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288 >> >> With Newlib: >> $ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c >> /tmp/libm.a >> $ ./a.out >> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 >> RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640 >> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448 >> RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256 >> >> Can you reproduce that on x86_64 Linux? >> >> Paul >> >>