Hi, Integer output in libgfortran is done by passing values as the largest integer type available. This is what our gfc_itoa() function for conversion to decimal form uses, as well, performing series of divisions by 10. On targets with a 128-bit integer type (which is most targets, really, nowadays), division is slow, because it is implemented in software and requires a call to a libgcc function. We can speed this up in two easy ways: - If the value fits into 64-bit, use a simple 64-bit itoa() function, which does the series of divisions by 10 with hardware. Most I/O will actually fall into that case, in real-life, unless you’re printing very big 128-bit integers. - If the value does not fit into 64-bit, perform only one slow division, by 10^19, and use two calls to the 64-bit function to output each part (the low part needing zero-padding). What is the speed-up? It really depends on the exact nature of the I/O done. For the most common-case, list-directed I/O with no special format, the patch does not speed (or slow!) things for values up to HUGE(KIND=4), but speeds things up for larger values. For very large 128-bit values, it can cut the I/O time in half. I attach my own timing code to this email. Results before the patch (with previous itoa-patch applied, though): Timing for INTEGER(KIND=1) Value 0, time: 0.191409990 Value HUGE(KIND=1), time: 0.173687011 Timing for INTEGER(KIND=4) Value 0, time: 0.171809018 Value 1049, time: 0.177439988 Value HUGE(KIND=4), time: 0.217984974 Timing for INTEGER(KIND=8) Value 0, time: 0.178072989 Value HUGE(KIND=4), time: 0.214841008 Value HUGE(KIND=8), time: 0.276726007 Timing for INTEGER(KIND=16) Value 0, time: 0.175235987 Value HUGE(KIND=4), time: 0.217689037 Value HUGE(KIND=8), time: 0.280257106 Value HUGE(KIND=16), time: 0.420036077 Results after the patch: Timing for INTEGER(KIND=1) Value 0, time: 0.194633007 Value HUGE(KIND=1), time: 0.172436997 Timing for INTEGER(KIND=4) Value 0, time: 0.167517006 Value 1049, time: 0.176503003 Value HUGE(KIND=4), time: 0.172892988 Timing for INTEGER(KIND=8) Value 0, time: 0.171101034 Value HUGE(KIND=4), time: 0.174461007 Value HUGE(KIND=8), time: 0.180289030 Timing for INTEGER(KIND=16) Value 0, time: 0.175765991 Value HUGE(KIND=4), time: 0.181162953 Value HUGE(KIND=8), time: 0.186082959 Value HUGE(KIND=16), time: 0.207401991 Times are CPU times in seconds, for one million integer writes into a buffer string. With the patch, we see that integer decimal output is almost independent of the value written, meaning the I/O library overhead is dominant, not the decimal conversion. For this reason, I don’t think we really need a faster implementation of the 64-bit itoa, and can keep the current series-of-division-by-10 approach. --------------- This patch applies on top of my previous itoa-related patch at https://gcc.gnu.org/pipermail/fortran/2021-December/057218.html The patch has been bootstrapped and regtested on two 64-bit targets: aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would like it to be tested on a 32-bit target without 128-bit integer type. Does someone have access to that? Once tested on a 32-bit target, OK to commit? FX