public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Make integer output faster in libgfortran
@ 2021-12-25 13:03 FX
  2021-12-25 14:35 ` Thomas Koenig
  0 siblings, 1 reply; 7+ messages in thread
From: FX @ 2021-12-25 13:03 UTC (permalink / raw)
  To: fortran; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3420 bytes --]

Hi,

Integer output in libgfortran is done by passing values as the largest integer type available. This is what our gfc_itoa() function for conversion to decimal form uses, as well, performing series of divisions by 10. On targets with a 128-bit integer type (which is most targets, really, nowadays), division is slow, because it is implemented in software and requires a call to a libgcc function.

We can speed this up in two easy ways:
- If the value fits into 64-bit, use a simple 64-bit itoa() function, which does the series of divisions by 10 with hardware. Most I/O will actually fall into that case, in real-life, unless you’re printing very big 128-bit integers.
- If the value does not fit into 64-bit, perform only one slow division, by 10^19, and use two calls to the 64-bit function to output each part (the low part needing zero-padding).


What is the speed-up? It really depends on the exact nature of the I/O done. For the most common-case, list-directed I/O with no special format, the patch does not speed (or slow!) things for values up to HUGE(KIND=4), but speeds things up for larger values. For very large 128-bit values, it can cut the I/O time in half.

I attach my own timing code to this email. Results before the patch (with previous itoa-patch applied, though):

 Timing for INTEGER(KIND=1)
 Value 0, time:  0.191409990    
 Value HUGE(KIND=1), time:  0.173687011    
 Timing for INTEGER(KIND=4)
 Value 0, time:  0.171809018    
 Value 1049, time:  0.177439988    
 Value HUGE(KIND=4), time:  0.217984974    
 Timing for INTEGER(KIND=8)
 Value 0, time:  0.178072989    
 Value HUGE(KIND=4), time:  0.214841008    
 Value HUGE(KIND=8), time:  0.276726007    
 Timing for INTEGER(KIND=16)
 Value 0, time:  0.175235987    
 Value HUGE(KIND=4), time:  0.217689037    
 Value HUGE(KIND=8), time:  0.280257106    
 Value HUGE(KIND=16), time:  0.420036077    

Results after the patch:

 Timing for INTEGER(KIND=1)
 Value 0, time:  0.194633007    
 Value HUGE(KIND=1), time:  0.172436997    
 Timing for INTEGER(KIND=4)
 Value 0, time:  0.167517006    
 Value 1049, time:  0.176503003    
 Value HUGE(KIND=4), time:  0.172892988    
 Timing for INTEGER(KIND=8)
 Value 0, time:  0.171101034    
 Value HUGE(KIND=4), time:  0.174461007    
 Value HUGE(KIND=8), time:  0.180289030    
 Timing for INTEGER(KIND=16)
 Value 0, time:  0.175765991    
 Value HUGE(KIND=4), time:  0.181162953    
 Value HUGE(KIND=8), time:  0.186082959    
 Value HUGE(KIND=16), time:  0.207401991    

Times are CPU times in seconds, for one million integer writes into a buffer string. With the patch, we see that integer decimal output is almost independent of the value written, meaning the I/O library overhead is dominant, not the decimal conversion. For this reason, I don’t think we really need a faster implementation of the 64-bit itoa, and can keep the current series-of-division-by-10 approach.

---------------

This patch applies on top of my previous itoa-related patch at https://gcc.gnu.org/pipermail/fortran/2021-December/057218.html

The patch has been bootstrapped and regtested on two 64-bit targets: aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would like it to be tested on a 32-bit target without 128-bit integer type. Does someone have access to that?

Once tested on a 32-bit target, OK to commit?

FX


[-- Attachment #2: itoa-faster.patch --]
[-- Type: application/octet-stream, Size: 19259 bytes --]

commit 4526dd52ebc76de63a8386767eda2f02d8b0a27b
Author: Francois-Xavier Coudert <fxcoudert@gmail.com>
Date:   2021-12-25 13:42:25 +0100

    Fortran: speed up decimal output of integers
    
    libgfortran/ChangeLog:
    
            PR libfortran/98076
            * runtime/string.c (itoa64, itoa64_pad19): New helper functions.
            (gfc_itoa): On targets with 128-bit integers, call fast
            64-bit functions to avoid many slow divisions.
    
    gcc/testsuite/ChangeLog:
    
            PR libfortran/98076
            * gfortran.dg/pr98076.f90: New test.

diff --git a/gcc/testsuite/gfortran.dg/pr98076.f90 b/gcc/testsuite/gfortran.dg/pr98076.f90
new file mode 100644
index 00000000000..d1288a41fef
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr98076.f90
@@ -0,0 +1,293 @@
+! { dg-do run }
+! { dg-require-effective-target fortran_large_int }
+!
+! Check that we can print large integer values
+
+program test
+  implicit none
+  ! 128-bit integer kind
+  integer, parameter :: k = selected_int_kind(38)
+
+  character(len=39) :: s
+  character(len=100) :: buffer
+  integer(kind=k) :: n
+  integer :: i
+
+  ! Random checks
+  do i = 1, 1000
+    call random_digits(s)
+    read(s,*) n
+    write(buffer,'(I0.38)') n
+    print *, s
+    print *, buffer
+    if (adjustl(buffer) /= adjustl(s)) stop 2
+  end do
+
+  ! Systematic check
+  call check(0_k, "0")
+  call check(1_k, "1")
+  call check(9_k, "9")
+  call check(10_k, "10")
+  call check(11_k, "11")
+  call check(99_k, "99")
+  call check(100_k, "100")
+  call check(101_k, "101")
+  call check(999_k, "999")
+  call check(1000_k, "1000")
+  call check(1001_k, "1001")
+  call check(9999_k, "9999")
+  call check(10000_k, "10000")
+  call check(10001_k, "10001")
+  call check(99999_k, "99999")
+  call check(100000_k, "100000")
+  call check(100001_k, "100001")
+  call check(999999_k, "999999")
+  call check(1000000_k, "1000000")
+  call check(1000001_k, "1000001")
+  call check(9999999_k, "9999999")
+  call check(10000000_k, "10000000")
+  call check(10000001_k, "10000001")
+  call check(99999999_k, "99999999")
+  call check(100000000_k, "100000000")
+  call check(100000001_k, "100000001")
+  call check(999999999_k, "999999999")
+  call check(1000000000_k, "1000000000")
+  call check(1000000001_k, "1000000001")
+  call check(9999999999_k, "9999999999")
+  call check(10000000000_k, "10000000000")
+  call check(10000000001_k, "10000000001")
+  call check(99999999999_k, "99999999999")
+  call check(100000000000_k, "100000000000")
+  call check(100000000001_k, "100000000001")
+  call check(999999999999_k, "999999999999")
+  call check(1000000000000_k, "1000000000000")
+  call check(1000000000001_k, "1000000000001")
+  call check(9999999999999_k, "9999999999999")
+  call check(10000000000000_k, "10000000000000")
+  call check(10000000000001_k, "10000000000001")
+  call check(99999999999999_k, "99999999999999")
+  call check(100000000000000_k, "100000000000000")
+  call check(100000000000001_k, "100000000000001")
+  call check(999999999999999_k, "999999999999999")
+  call check(1000000000000000_k, "1000000000000000")
+  call check(1000000000000001_k, "1000000000000001")
+  call check(9999999999999999_k, "9999999999999999")
+  call check(10000000000000000_k, "10000000000000000")
+  call check(10000000000000001_k, "10000000000000001")
+  call check(99999999999999999_k, "99999999999999999")
+  call check(100000000000000000_k, "100000000000000000")
+  call check(100000000000000001_k, "100000000000000001")
+  call check(999999999999999999_k, "999999999999999999")
+  call check(1000000000000000000_k, "1000000000000000000")
+  call check(1000000000000000001_k, "1000000000000000001")
+  call check(9999999999999999999_k, "9999999999999999999")
+  call check(10000000000000000000_k, "10000000000000000000")
+  call check(10000000000000000001_k, "10000000000000000001")
+  call check(99999999999999999999_k, "99999999999999999999")
+  call check(100000000000000000000_k, "100000000000000000000")
+  call check(100000000000000000001_k, "100000000000000000001")
+  call check(999999999999999999999_k, "999999999999999999999")
+  call check(1000000000000000000000_k, "1000000000000000000000")
+  call check(1000000000000000000001_k, "1000000000000000000001")
+  call check(9999999999999999999999_k, "9999999999999999999999")
+  call check(10000000000000000000000_k, "10000000000000000000000")
+  call check(10000000000000000000001_k, "10000000000000000000001")
+  call check(99999999999999999999999_k, "99999999999999999999999")
+  call check(100000000000000000000000_k, "100000000000000000000000")
+  call check(100000000000000000000001_k, "100000000000000000000001")
+  call check(999999999999999999999999_k, "999999999999999999999999")
+  call check(1000000000000000000000000_k, "1000000000000000000000000")
+  call check(1000000000000000000000001_k, "1000000000000000000000001")
+  call check(9999999999999999999999999_k, "9999999999999999999999999")
+  call check(10000000000000000000000000_k, "10000000000000000000000000")
+  call check(10000000000000000000000001_k, "10000000000000000000000001")
+  call check(99999999999999999999999999_k, "99999999999999999999999999")
+  call check(100000000000000000000000000_k, "100000000000000000000000000")
+  call check(100000000000000000000000001_k, "100000000000000000000000001")
+  call check(999999999999999999999999999_k, "999999999999999999999999999")
+  call check(1000000000000000000000000000_k, "1000000000000000000000000000")
+  call check(1000000000000000000000000001_k, "1000000000000000000000000001")
+  call check(9999999999999999999999999999_k, "9999999999999999999999999999")
+  call check(10000000000000000000000000000_k, "10000000000000000000000000000")
+  call check(10000000000000000000000000001_k, "10000000000000000000000000001")
+  call check(99999999999999999999999999999_k, "99999999999999999999999999999")
+  call check(100000000000000000000000000000_k, "100000000000000000000000000000")
+  call check(100000000000000000000000000001_k, "100000000000000000000000000001")
+  call check(999999999999999999999999999999_k, "999999999999999999999999999999")
+  call check(1000000000000000000000000000000_k, "1000000000000000000000000000000")
+  call check(1000000000000000000000000000001_k, "1000000000000000000000000000001")
+  call check(9999999999999999999999999999999_k, "9999999999999999999999999999999")
+  call check(10000000000000000000000000000000_k, "10000000000000000000000000000000")
+  call check(10000000000000000000000000000001_k, "10000000000000000000000000000001")
+  call check(99999999999999999999999999999999_k, "99999999999999999999999999999999")
+  call check(100000000000000000000000000000000_k, "100000000000000000000000000000000")
+  call check(100000000000000000000000000000001_k, "100000000000000000000000000000001")
+  call check(999999999999999999999999999999999_k, "999999999999999999999999999999999")
+  call check(1000000000000000000000000000000000_k, "1000000000000000000000000000000000")
+  call check(1000000000000000000000000000000001_k, "1000000000000000000000000000000001")
+  call check(9999999999999999999999999999999999_k, "9999999999999999999999999999999999")
+  call check(10000000000000000000000000000000000_k, "10000000000000000000000000000000000")
+  call check(10000000000000000000000000000000001_k, "10000000000000000000000000000000001")
+  call check(99999999999999999999999999999999999_k, "99999999999999999999999999999999999")
+  call check(100000000000000000000000000000000000_k, "100000000000000000000000000000000000")
+  call check(100000000000000000000000000000000001_k, "100000000000000000000000000000000001")
+  call check(999999999999999999999999999999999999_k, "999999999999999999999999999999999999")
+  call check(1000000000000000000000000000000000000_k, "1000000000000000000000000000000000000")
+  call check(1000000000000000000000000000000000001_k, "1000000000000000000000000000000000001")
+  call check(9999999999999999999999999999999999999_k, "9999999999999999999999999999999999999")
+  call check(10000000000000000000000000000000000000_k, "10000000000000000000000000000000000000")
+  call check(10000000000000000000000000000000000001_k, "10000000000000000000000000000000000001")
+  call check(99999999999999999999999999999999999999_k, "99999999999999999999999999999999999999")
+  call check(100000000000000000000000000000000000000_k, "100000000000000000000000000000000000000")
+  call check(100000000000000000000000000000000000001_k, "100000000000000000000000000000000000001")
+  call check(109999999999999999999999999999999999999_k, "109999999999999999999999999999999999999")
+
+  call check(-1_k, "-1")
+  call check(-9_k, "-9")
+  call check(-10_k, "-10")
+  call check(-11_k, "-11")
+  call check(-99_k, "-99")
+  call check(-100_k, "-100")
+  call check(-101_k, "-101")
+  call check(-999_k, "-999")
+  call check(-1000_k, "-1000")
+  call check(-1001_k, "-1001")
+  call check(-9999_k, "-9999")
+  call check(-10000_k, "-10000")
+  call check(-10001_k, "-10001")
+  call check(-99999_k, "-99999")
+  call check(-100000_k, "-100000")
+  call check(-100001_k, "-100001")
+  call check(-999999_k, "-999999")
+  call check(-1000000_k, "-1000000")
+  call check(-1000001_k, "-1000001")
+  call check(-9999999_k, "-9999999")
+  call check(-10000000_k, "-10000000")
+  call check(-10000001_k, "-10000001")
+  call check(-99999999_k, "-99999999")
+  call check(-100000000_k, "-100000000")
+  call check(-100000001_k, "-100000001")
+  call check(-999999999_k, "-999999999")
+  call check(-1000000000_k, "-1000000000")
+  call check(-1000000001_k, "-1000000001")
+  call check(-9999999999_k, "-9999999999")
+  call check(-10000000000_k, "-10000000000")
+  call check(-10000000001_k, "-10000000001")
+  call check(-99999999999_k, "-99999999999")
+  call check(-100000000000_k, "-100000000000")
+  call check(-100000000001_k, "-100000000001")
+  call check(-999999999999_k, "-999999999999")
+  call check(-1000000000000_k, "-1000000000000")
+  call check(-1000000000001_k, "-1000000000001")
+  call check(-9999999999999_k, "-9999999999999")
+  call check(-10000000000000_k, "-10000000000000")
+  call check(-10000000000001_k, "-10000000000001")
+  call check(-99999999999999_k, "-99999999999999")
+  call check(-100000000000000_k, "-100000000000000")
+  call check(-100000000000001_k, "-100000000000001")
+  call check(-999999999999999_k, "-999999999999999")
+  call check(-1000000000000000_k, "-1000000000000000")
+  call check(-1000000000000001_k, "-1000000000000001")
+  call check(-9999999999999999_k, "-9999999999999999")
+  call check(-10000000000000000_k, "-10000000000000000")
+  call check(-10000000000000001_k, "-10000000000000001")
+  call check(-99999999999999999_k, "-99999999999999999")
+  call check(-100000000000000000_k, "-100000000000000000")
+  call check(-100000000000000001_k, "-100000000000000001")
+  call check(-999999999999999999_k, "-999999999999999999")
+  call check(-1000000000000000000_k, "-1000000000000000000")
+  call check(-1000000000000000001_k, "-1000000000000000001")
+  call check(-9999999999999999999_k, "-9999999999999999999")
+  call check(-10000000000000000000_k, "-10000000000000000000")
+  call check(-10000000000000000001_k, "-10000000000000000001")
+  call check(-99999999999999999999_k, "-99999999999999999999")
+  call check(-100000000000000000000_k, "-100000000000000000000")
+  call check(-100000000000000000001_k, "-100000000000000000001")
+  call check(-999999999999999999999_k, "-999999999999999999999")
+  call check(-1000000000000000000000_k, "-1000000000000000000000")
+  call check(-1000000000000000000001_k, "-1000000000000000000001")
+  call check(-9999999999999999999999_k, "-9999999999999999999999")
+  call check(-10000000000000000000000_k, "-10000000000000000000000")
+  call check(-10000000000000000000001_k, "-10000000000000000000001")
+  call check(-99999999999999999999999_k, "-99999999999999999999999")
+  call check(-100000000000000000000000_k, "-100000000000000000000000")
+  call check(-100000000000000000000001_k, "-100000000000000000000001")
+  call check(-999999999999999999999999_k, "-999999999999999999999999")
+  call check(-1000000000000000000000000_k, "-1000000000000000000000000")
+  call check(-1000000000000000000000001_k, "-1000000000000000000000001")
+  call check(-9999999999999999999999999_k, "-9999999999999999999999999")
+  call check(-10000000000000000000000000_k, "-10000000000000000000000000")
+  call check(-10000000000000000000000001_k, "-10000000000000000000000001")
+  call check(-99999999999999999999999999_k, "-99999999999999999999999999")
+  call check(-100000000000000000000000000_k, "-100000000000000000000000000")
+  call check(-100000000000000000000000001_k, "-100000000000000000000000001")
+  call check(-999999999999999999999999999_k, "-999999999999999999999999999")
+  call check(-1000000000000000000000000000_k, "-1000000000000000000000000000")
+  call check(-1000000000000000000000000001_k, "-1000000000000000000000000001")
+  call check(-9999999999999999999999999999_k, "-9999999999999999999999999999")
+  call check(-10000000000000000000000000000_k, "-10000000000000000000000000000")
+  call check(-10000000000000000000000000001_k, "-10000000000000000000000000001")
+  call check(-99999999999999999999999999999_k, "-99999999999999999999999999999")
+  call check(-100000000000000000000000000000_k, "-100000000000000000000000000000")
+  call check(-100000000000000000000000000001_k, "-100000000000000000000000000001")
+  call check(-999999999999999999999999999999_k, "-999999999999999999999999999999")
+  call check(-1000000000000000000000000000000_k, "-1000000000000000000000000000000")
+  call check(-1000000000000000000000000000001_k, "-1000000000000000000000000000001")
+  call check(-9999999999999999999999999999999_k, "-9999999999999999999999999999999")
+  call check(-10000000000000000000000000000000_k, "-10000000000000000000000000000000")
+  call check(-10000000000000000000000000000001_k, "-10000000000000000000000000000001")
+  call check(-99999999999999999999999999999999_k, "-99999999999999999999999999999999")
+  call check(-100000000000000000000000000000000_k, "-100000000000000000000000000000000")
+  call check(-100000000000000000000000000000001_k, "-100000000000000000000000000000001")
+  call check(-999999999999999999999999999999999_k, "-999999999999999999999999999999999")
+  call check(-1000000000000000000000000000000000_k, "-1000000000000000000000000000000000")
+  call check(-1000000000000000000000000000000001_k, "-1000000000000000000000000000000001")
+  call check(-9999999999999999999999999999999999_k, "-9999999999999999999999999999999999")
+  call check(-10000000000000000000000000000000000_k, "-10000000000000000000000000000000000")
+  call check(-10000000000000000000000000000000001_k, "-10000000000000000000000000000000001")
+  call check(-99999999999999999999999999999999999_k, "-99999999999999999999999999999999999")
+  call check(-100000000000000000000000000000000000_k, "-100000000000000000000000000000000000")
+  call check(-100000000000000000000000000000000001_k, "-100000000000000000000000000000000001")
+  call check(-999999999999999999999999999999999999_k, "-999999999999999999999999999999999999")
+  call check(-1000000000000000000000000000000000000_k, "-1000000000000000000000000000000000000")
+  call check(-1000000000000000000000000000000000001_k, "-1000000000000000000000000000000000001")
+  call check(-9999999999999999999999999999999999999_k, "-9999999999999999999999999999999999999")
+  call check(-10000000000000000000000000000000000000_k, "-10000000000000000000000000000000000000")
+  call check(-10000000000000000000000000000000000001_k, "-10000000000000000000000000000000000001")
+  call check(-99999999999999999999999999999999999999_k, "-99999999999999999999999999999999999999")
+  call check(-100000000000000000000000000000000000000_k, "-100000000000000000000000000000000000000")
+  call check(-100000000000000000000000000000000000001_k, "-100000000000000000000000000000000000001")
+  call check(-109999999999999999999999999999999999999_k, "-109999999999999999999999999999999999999")
+
+contains
+
+  subroutine check (i, str)
+    implicit none
+    integer(kind=k), intent(in), value :: i
+    character(len=*), intent(in) :: str
+
+    character(len=100) :: buffer
+    write(buffer,*) i
+    if (adjustl(buffer) /= adjustl(str)) stop 1
+  end subroutine
+
+  subroutine random_digits (str)
+    implicit none
+    integer, parameter :: l = 38
+    character(len=l+1) :: str
+    real :: r
+    integer :: i, d
+
+    str = ""
+    do i = 2, l+1
+      call random_number(r)
+      d = floor(r * 10)
+      str(i:i) = achar(48 + d)
+    end do
+
+    call random_number(r)
+    if (r > 0.5) then
+      str(1:1) = '-'
+    end if
+  end subroutine
+end
diff --git a/libgfortran/runtime/string.c b/libgfortran/runtime/string.c
index 835027a7cd6..0ccd731852a 100644
--- a/libgfortran/runtime/string.c
+++ b/libgfortran/runtime/string.c
@@ -23,6 +23,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
 #include "libgfortran.h"
+#include <assert.h>
 #include <string.h>
 #include <strings.h>
 
@@ -169,6 +170,38 @@ find_option (st_parameter_common *cmp, const char *s1, gfc_charlen_type s1_len,
 }
 
 
+/* Fast helper function for a positive value that fits in uint64_t.  */
+
+static inline char *
+itoa64 (uint64_t n, char *p)
+{
+  while (n != 0)
+    {
+      *--p = '0' + (n % 10);
+      n /= 10;
+    }
+  return p;
+}
+
+
+#if defined(HAVE_GFC_INTEGER_16)
+# define TEN19 ((GFC_UINTEGER_LARGEST) 1000000 * (GFC_UINTEGER_LARGEST) 1000000 * (GFC_UINTEGER_LARGEST) 10000000)
+
+/* Same as itoa64(), with zero padding of 19 digits.  */
+
+static inline char *
+itoa64_pad19 (uint64_t n, char *p)
+{
+  for (int k = 0; k < 19; k++)
+    {
+      *--p = '0' + (n % 10);
+      n /= 10;
+    }
+  return p;
+}
+#endif
+
+
 /* Integer to decimal conversion.
 
    This function is much more restricted than the widespread (but
@@ -195,11 +228,33 @@ gfc_itoa (GFC_UINTEGER_LARGEST n, char *buffer, size_t len)
   p = buffer + GFC_ITOA_BUF_SIZE - 1;
   *p = '\0';
 
-  while (n != 0)
+#if defined(HAVE_GFC_INTEGER_16)
+  /* On targets that have a 128-bit integer type, division in that type
+     is slow, because it occurs through a function call. We avoid that.  */
+
+  if (n <= UINT64_MAX)
+    /* If the value fits in uint64_t, use the fast function. */
+    return itoa64 (n, p);
+  else
     {
-      *--p = '0' + (n % 10);
-      n /= 10;
+      /* Otherwise, break down into smaller bits by division. Two calls to
+	 the uint64_t function are not sufficient for all 128-bit unsigned
+	 integers (we would need three calls), but they do suffice for all
+	 values up to 2^127, which is the largest that Fortran can produce
+	 (-HUGE(0_16)-1) with its signed integer types.  */
+      static_assert(sizeof(GFC_UINTEGER_LARGEST) <= 2 * sizeof(uint64_t));
+
+      GFC_UINTEGER_LARGEST r;
+      r = n % TEN19;
+      n = n / TEN19;
+      assert (r <= UINT64_MAX);
+      p = itoa64_pad19 (r, p);
+
+      assert(n <= UINT64_MAX);
+      return itoa64 (n, p);
     }
-
-  return p;
+#else
+  /* On targets where the largest integer is 64-bit, just use that.  */
+  return itoa64 (n, p);
+#endif
 }

[-- Attachment #3: timing.f90 --]
[-- Type: application/octet-stream, Size: 2815 bytes --]

program test
  implicit none
  integer, parameter :: n = 100000
  real :: t1, t2
  character(len=100) :: s
  integer :: i

  integer(kind=1) :: x1(n) 
  integer(kind=4) :: x4(n) 
  integer(kind=8) :: x8(n) 
  integer(kind=16) :: x16(n) 

  print *, "Timing for INTEGER(KIND=1)"

  x1(:) = 0
  call cpu_time(t1)
  do i = 1, 10
    call output1(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value 0, time:", t2 - t1

  x1(:) = huge(x1)
  call cpu_time(t1)
  do i = 1, 10
    call output1(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value HUGE(KIND=1), time:", t2 - t1

  print *, "Timing for INTEGER(KIND=4)"

  x4(:) = 0
  call cpu_time(t1)
  do i = 1, 10
    call output4(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value 0, time:", t2 - t1

  x4(:) = 1049
  call cpu_time(t1)
  do i = 1, 10
    call output4(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value 1049, time:", t2 - t1

  x4(:) = huge(x4)
  call cpu_time(t1)
  do i = 1, 10
    call output4(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value HUGE(KIND=4), time:", t2 - t1

  print *, "Timing for INTEGER(KIND=8)"

  x8(:) = 0
  call cpu_time(t1)
  do i = 1, 10
    call output8(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value 0, time:", t2 - t1

  x8(:) = huge(x4)
  call cpu_time(t1)
  do i = 1, 10
    call output8(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value HUGE(KIND=4), time:", t2 - t1

  x8(:) = huge(x8)
  call cpu_time(t1)
  do i = 1, 10
    call output8(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value HUGE(KIND=8), time:", t2 - t1

  print *, "Timing for INTEGER(KIND=16)"

  x16(:) = 0
  call cpu_time(t1)
  do i = 1, 10
    call output16(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value 0, time:", t2 - t1

  x16(:) = huge(x4)
  call cpu_time(t1)
  do i = 1, 10
    call output16(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value HUGE(KIND=4), time:", t2 - t1

  x16(:) = huge(x8)
  call cpu_time(t1)
  do i = 1, 10
    call output16(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value HUGE(KIND=8), time:", t2 - t1

  x16(:) = huge(x16)
  call cpu_time(t1)
  do i = 1, 10
    call output16(s)
  end do
  call cpu_time(t2)
  write(*,*) "Value HUGE(KIND=16), time:", t2 - t1

contains
  subroutine output1(s)
    implicit none
    character(len=100) :: s
    integer :: i

    do i = 1, n
      write(s,*) x1(i)
    end do
  end subroutine

  subroutine output4(s)
    implicit none
    character(len=100) :: s
    integer :: i

    do i = 1, n
      write(s,*) x4(i)
    end do
  end subroutine

  subroutine output8(s)
    implicit none
    character(len=100) :: s
    integer :: i

    do i = 1, n
      write(s,*) x8(i)
    end do
  end subroutine

  subroutine output16(s)
    implicit none
    character(len=100) :: s
    integer :: i

    do i = 1, n
      write(s,*) x16(i)
    end do
  end subroutine
end

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make integer output faster in libgfortran
  2021-12-25 13:03 [PATCH] Make integer output faster in libgfortran FX
@ 2021-12-25 14:35 ` Thomas Koenig
  2021-12-25 14:50   ` FX
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Koenig @ 2021-12-25 14:35 UTC (permalink / raw)
  To: FX, fortran; +Cc: gcc-patches

Hi FX,

> The patch has been bootstrapped and regtested on two 64-bit targets: aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would like it to be tested on a 32-bit target without 128-bit integer type. Does someone have access to that?

There are two possibilities: Either use gcc45 on the compile farm, or
run it with

make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

which is the magic incantation to also use -m32 binaries.  You'll need
the 32-bit support on your Linux system, of course (which you can
check quickly with a "hello world" kind of program with -m32).

Regards

	Thomas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make integer output faster in libgfortran
  2021-12-25 14:35 ` Thomas Koenig
@ 2021-12-25 14:50   ` FX
  2021-12-25 22:07     ` Thomas Koenig
  0 siblings, 1 reply; 7+ messages in thread
From: FX @ 2021-12-25 14:50 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: fortran, gcc-patches

Hi Thomas,

> There are two possibilities: Either use gcc45 on the compile farm, or
> run it with
> make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

Thanks, right now I don’t have a Linux system with 32-bit support. I’ll see how I can connect to gcc45, but if someone who is already set up to do can fire a quick regtest, that would be great ;)

FX

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make integer output faster in libgfortran
  2021-12-25 14:50   ` FX
@ 2021-12-25 22:07     ` Thomas Koenig
  2021-12-26 11:14       ` FX
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Koenig @ 2021-12-25 22:07 UTC (permalink / raw)
  To: FX; +Cc: gcc-patches, fortran

Hi fX,

> right now I don’t have a Linux system with 32-bit support. I’ll see how I can connect to gcc45, but if someone who is already set up to do can fire a quick regtest, that would be great;)

I tested this on x86_64-pc-linux-gnu with

make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

and didn't see any problems.

So, OK for trunk.

(We could also do something like that for a 32-bit system, but
that is another kettle of fish).

Thanks for taking this up!

Best regards

	Thomas


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make integer output faster in libgfortran
  2021-12-25 22:07     ` Thomas Koenig
@ 2021-12-26 11:14       ` FX
  2021-12-26 16:15         ` Thomas Koenig
  0 siblings, 1 reply; 7+ messages in thread
From: FX @ 2021-12-26 11:14 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: gcc-patches, fortran

Hi,

> I tested this on x86_64-pc-linux-gnu with
> make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"
> and didn't see any problems.

Thanks Thomas! Pushed.


> (We could also do something like that for a 32-bit system, but
> that is another kettle of fish).

We probably wouldn’t get a speed-up that big. Even on 32-bit targets (at least common ones), the 64-bit type and its operations (notably division) are implemented via CPU instructions, not library calls.

At this point, the output of integers is probably bound by the many layers of indirection of libgfortran's I/O system (which are necessary because of the rich I/O formatting allowed by the standard).

Best,
FX

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make integer output faster in libgfortran
  2021-12-26 11:14       ` FX
@ 2021-12-26 16:15         ` Thomas Koenig
  2021-12-27 21:01           ` FX
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Koenig @ 2021-12-26 16:15 UTC (permalink / raw)
  To: FX; +Cc: gcc-patches, fortran

Hi FX,


>> (We could also do something like that for a 32-bit system, but
>> that is another kettle of fish).
> 
> We probably wouldn’t get a speed-up that big. Even on 32-bit targets
> (at least common ones), the 64-bit type and its operations (notably
> division) are implemented via CPU instructions, not library calls.

I'll look at this a bit more closely and report :-)

>  At this point, the output of integers is probably bound by the
> many layers of indirection of libgfortran's I/O system (which
> are necessary because of the rich I/O formatting allowed by
> the standard).

There are a few things we could do.  Getting a LTO-capable version
of libgfortran would be a huge step, because the compiler could
then strip out all of these layers.  The speed of

   character:: c


   write (*,'(A)', advance="no") c

could stand some improvement :-)

Regards

	Thomas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make integer output faster in libgfortran
  2021-12-26 16:15         ` Thomas Koenig
@ 2021-12-27 21:01           ` FX
  0 siblings, 0 replies; 7+ messages in thread
From: FX @ 2021-12-27 21:01 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: gcc-patches, fortran, ro

[-- Attachment #1: Type: text/plain, Size: 369 bytes --]

Follow-up patch committed, after my use of the one-argument variant of static_assert() broke bootstrap on Solaris (sorry Rainer!).
The one-arg form is new since C23, while Solaris <assert.h> only supports the two-arg form (C11).

I have confirmed that other target libraries use the two-arg form, and bootstrapped the attached patch on x86_64-pc-linux-gnu.

FX


[-- Attachment #2: static_assert.diff --]
[-- Type: application/octet-stream, Size: 1030 bytes --]

commit 3430132f3e8289f1b789a1a91206c44c47fb032c
Author: Francois-Xavier Coudert <fxcoudert@gmail.com>
Date:   2021-12-27 21:32:08 +0100

    Fortran: fix use of static_assert() to conform to C11
    
    libgfortran/ChangeLog:
    
            PR libfortran/98076
            * runtime/string.c (gfc_itoa): Use two args for static_assert().

diff --git a/libgfortran/runtime/string.c b/libgfortran/runtime/string.c
index 0ccd731852a..21585f48dc9 100644
--- a/libgfortran/runtime/string.c
+++ b/libgfortran/runtime/string.c
@@ -242,7 +242,8 @@ gfc_itoa (GFC_UINTEGER_LARGEST n, char *buffer, size_t len)
 	 integers (we would need three calls), but they do suffice for all
 	 values up to 2^127, which is the largest that Fortran can produce
 	 (-HUGE(0_16)-1) with its signed integer types.  */
-      static_assert(sizeof(GFC_UINTEGER_LARGEST) <= 2 * sizeof(uint64_t));
+      static_assert(sizeof(GFC_UINTEGER_LARGEST) <= 2 * sizeof(uint64_t),
+		    "integer too large");
 
       GFC_UINTEGER_LARGEST r;
       r = n % TEN19;

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-27 21:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-25 13:03 [PATCH] Make integer output faster in libgfortran FX
2021-12-25 14:35 ` Thomas Koenig
2021-12-25 14:50   ` FX
2021-12-25 22:07     ` Thomas Koenig
2021-12-26 11:14       ` FX
2021-12-26 16:15         ` Thomas Koenig
2021-12-27 21:01           ` FX

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).