* [Patch, libfortran] Improve performance of byte swapped IO
@ 2013-01-04 22:15 Janne Blomqvist
2013-01-04 22:35 ` Andreas Schwab
0 siblings, 1 reply; 10+ messages in thread
From: Janne Blomqvist @ 2013-01-04 22:15 UTC (permalink / raw)
To: GCC Patches, Fortran List
[-- Attachment #1: Type: text/plain, Size: 5653 bytes --]
Hi,
currently byte swapped unformatted IO can be quite slow compared to
the same code with no byte swapping. There are two major reasons for
this:
1) The byte swapping code path resorts to transferring data element by
element, leading to a lot of overhead in the IO library.
2) The function used for the actual byte swapping, reverse_memcpy ,
while able to handle general element sizes, is not particularly fast,
especially considering that many CPU's have fast byte swapping
instructions (e.g. BSWAP on x86). In order to access these fast byte
swapping instructions, gcc provides the __builtin_bswap{16,32,64}
builtins, falling back to libgcc code for targets that lack support.
The attached patch fixes these issues. For issue (1), the read path
uses in-place byte swapping of the data that has been read into the
user buffer, while the write path uses a larger temporary buffer
(since we are not allowed to modify the user supplied data in this
case). For issue(2), the patch uses __builtin_bswap{16,32,64} where
appropriate, only falling back to reverse_memcpy for other sizes.
With the attached test program run on a tmpfs filesystem to avoid
doing actual disk IO, I get the following:
- With no byte swapping:
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 52.723842817422202 72.721158943820441
8 77.508296890856386 97.237815640377221
16 110.26209495334321 143.80831184546381
32 173.94872143231535 221.89704881197937
64 282.19818562682684 373.77854583735541
128 442.22084579742244 628.80041029142183
256 636.69620860705299 966.37723642576316
512 826.05968840738080 1380.8835166612221
1024 987.18686465197561 1763.5990036057208
2048 1047.6721544191710 2058.0875622043550
4096 1115.5817147134801 2251.8731832850176
8192 1191.5021150996590 2283.8893409728184
16384 1417.6110909519391 2441.0530373866482
32768 1570.4413479046018 2543.0836384048471
65536 1673.0378706502966 2651.2182395008308
131072 1697.4944246188445 2688.2398923155783
262144 1669.6329862145872 2735.6611118973292
524288 1594.4669935231552 2697.7208298823243
- Before patch, with byte swapping:
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 50.572812893689793 68.858701306591627
8 58.688513300690317 81.591733130441327
16 73.551188480607820 96.638995590227665
32 91.593767813989018 116.65817140076214
64 107.41379323761915 128.32512066346368
128 121.33499652432221 147.80777892360237
256 128.99627771476628 155.91619889220266
512 135.02742063670030 161.30042382365372
1024 137.02276709585524 164.11267056940963
2048 138.62774254302394 165.22456826188971
4096 139.27695763341924 166.34707691429571
8192 147.64584950575932 166.59526981475742
16384 147.91235479266419 166.77890398940283
32768 150.77029430529927 166.90834867503827
65536 151.59474472614465 166.84075600288520
131072 155.75202672623249 166.96550283835097
262144 155.36506626794849 166.78075976148853
524288 155.64305086921487 167.44468828946083
- After patch, with byte swapping:
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 49.414771776821361 70.808060042286343
8 72.918156402459772 93.234093684373946
16 102.72461544178078 136.21700026949074
32 160.57240200649090 205.97612602315186
64 249.32082957447636 331.85515010907363
128 385.71299236810387 522.06354804855266
256 535.40608912076459 766.59668706247294
512 669.47864120368524 1006.4275938227961
1024 742.90538895500265 1187.9846039167674
2048 789.71340557340523 1333.8411634622269
4096 826.44253204731683 1395.5536995933605
8192 832.93540316116662 1361.4621716558986
16384 897.95081977010113 1469.0940087507722
32768 961.18736308033317 1533.7736812111871
65536 989.41384908496832 1564.7013916917260
131072 1003.6113762068040 1597.4063253370084
262144 980.03067664324396 1602.3188995993287
524288 985.82645661078755 1568.9537807626730
Regtested on x86_64-unknown-linux-gnu, Ok for trunk?
2013-01-04 Janne Blomqvist <jb@gcc.gnu.org>
* io/file_pos.c (unformatted_backspace): Use __builtin_bswapXX
instead of reverse_memcpy.
* io/io.h (reverse_memcpy): Remove prototype.
* io/transfer.c (reverse_memcpy): Make static, move towards
beginning of file.
(bswap_array): New function.
(unformatted_read): Use bswap_array to byte swap the data
in-place.
(unformatted_write): Use a larger temp buffer and bswap_array.
(us_read): Use __builtin_bswapXX instead of reverse_memcpy.
(write_us_marker): Likewise.
--
Janne Blomqvist
[-- Attachment #2: us_perf2.f90 --]
[-- Type: application/octet-stream, Size: 1826 bytes --]
! Test performance of unformatted sequential with different sized records.
! Janne Blomqvist 2013
program us_perf
implicit none
integer, parameter :: d = 8
integer, parameter :: i64 = selected_int_kind(18)
integer :: ii
real(d) :: wspeed, rspeed
print *, 'Unformatted sequential write/read performance test'
print *, 'Record size Write MB/s Read MB/s'
print *, '=========================================================='
ii = 1
do
call run_us_test (ii, wspeed, rspeed)
print *, ii*4, wspeed, rspeed
if (ii > 100000) then
exit
end if
ii = ii * 2
end do
contains
subroutine run_us_test (n, ws, rs)
integer, intent(in) :: n
real(d), intent(out) :: ws, rs
integer, allocatable :: data(:)
real(d) :: t1, t2
integer :: ii, loops
integer, parameter :: nsize = 10000000 ! 10 MB
! Write nsize * log(n + 1) bytes, each record is n elements of 4 bytes each
! + two 4 byte record markers
loops = nsize * log(n + 1._d) / (n*4._d + 8._d)
allocate(data(n))
data = 123
open(10, file="usperf.dat", form='unformatted', access='sequential', status='replace')
call wtime(t1)
do ii = 1, loops
write (10) data
end do
call wtime(t2)
close(10)
ws = nsize * log(n+1._d) / 1024**2 / (t2-t1)
open(10, file="usperf.dat", form='unformatted', access='sequential', status='old')
call wtime(t1)
do ii = 1, loops
read (10) data
end do
call wtime(t2)
close(10, status='delete')
deallocate(data)
rs = nsize * log(n+1._d) / 1024**2 / (t2-t1)
end subroutine run_us_test
subroutine wtime(t)
real(d) :: t
integer(i64):: count, rate
call system_clock(count, rate)
t = real(count, d) / rate
end subroutine wtime
end program us_perf
[-- Attachment #3: bswap.diff --]
[-- Type: application/octet-stream, Size: 7619 bytes --]
diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
index c8ecc3a..bf2250a 100644
--- a/libgfortran/io/file_pos.c
+++ b/libgfortran/io/file_pos.c
@@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
}
else
{
+ uint32_t u32;
+ uint64_t u64;
switch (length)
{
case sizeof(GFC_INTEGER_4):
- reverse_memcpy (&m4, p, sizeof (m4));
+ memcpy (&u32, p, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ m4 = *(GFC_INTEGER_4*)&u32;
m = m4;
break;
case sizeof(GFC_INTEGER_8):
- reverse_memcpy (&m8, p, sizeof (m8));
+ memcpy (&u64, p, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ m8 = *(GFC_INTEGER_8*)&u64;
m = m8;
break;
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 43aeafd..f17de19 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -649,9 +649,6 @@ internal_proto(init_loop_spec);
extern void next_record (st_parameter_dt *, int);
internal_proto(next_record);
-extern void reverse_memcpy (void *, const void *, size_t);
-internal_proto (reverse_memcpy);
-
extern void st_wait (st_parameter_wait *);
export_proto(st_wait);
diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index 6dda1df..eb77df8a 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -878,50 +878,91 @@ write_buf (st_parameter_dt *dtp, void *buf, size_t nbytes)
}
-/* Master function for unformatted reads. */
+/* Reverse memcpy - used for byte swapping. */
static void
-unformatted_read (st_parameter_dt *dtp, bt type,
- void *dest, int kind, size_t size, size_t nelems)
+reverse_memcpy (void *dest, const void *src, size_t n)
{
- if (likely (dtp->u.p.current_unit->flags.convert == GFC_CONVERT_NATIVE)
- || kind == 1)
+ char *d, *s;
+ size_t i;
+
+ d = (char *) dest;
+ s = (char *) src + n - 1;
+
+ /* Write with ascending order - this is likely faster
+ on modern architectures because of write combining. */
+ for (i=0; i<n; i++)
+ *(d++) = *(s--);
+}
+
+
+/* Utility function for byteswapping an array, using the bswap
+ builtins if possible. dest and src can overlap. */
+
+static void
+bswap_array (void *dest, const void *src, size_t size, size_t nelems)
+{
+ char buffer[16];
+ const char *ps;
+ char *pd;
+
+ switch (size)
{
- if (type == BT_CHARACTER)
- size *= GFC_SIZE_OF_CHAR_KIND(kind);
- read_block_direct (dtp, dest, size * nelems);
+ case 1:
+ break;
+ case 2:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint16_t*)dest)[i] = __builtin_bswap16 (((uint16_t*)src)[i]);
+ break;
+ case 4:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint32_t*)dest)[i] = __builtin_bswap32 (((uint32_t*)src)[i]);
+ break;
+ case 8:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint64_t*)dest)[i] = __builtin_bswap64 (((uint64_t*)src)[i]);
+ break;
+ default:
+ ps = src;
+ pd = dest;
+ for (size_t i = 0; i < nelems; i++)
+ {
+ reverse_memcpy (buffer, ps, size);
+ memcpy (pd, buffer, size);
+ ps += size;
+ pd += size;
+ }
}
- else
- {
- char buffer[16];
- char *p;
- size_t i;
+}
- p = dest;
+/* Master function for unformatted reads. */
+
+static void
+unformatted_read (st_parameter_dt *dtp, bt type,
+ void *dest, int kind, size_t size, size_t nelems)
+{
+ if (type == BT_CHARACTER)
+ size *= GFC_SIZE_OF_CHAR_KIND(kind);
+ read_block_direct (dtp, dest, size * nelems);
+
+ if (unlikely (dtp->u.p.current_unit->flags.convert == GFC_CONVERT_SWAP)
+ && kind != 1)
+ {
/* Handle wide chracters. */
- if (type == BT_CHARACTER && kind != 1)
- {
- nelems *= size;
- size = kind;
- }
+ if (type == BT_CHARACTER)
+ {
+ nelems *= size;
+ size = kind;
+ }
/* Break up complex into its constituent reals. */
- if (type == BT_COMPLEX)
- {
- nelems *= 2;
- size /= 2;
- }
-
- /* By now, all complex variables have been split into their
- constituent reals. */
-
- for (i = 0; i < nelems; i++)
- {
- read_block_direct (dtp, buffer, size);
- reverse_memcpy (p, buffer, size);
- p += size;
- }
+ else if (type == BT_COMPLEX)
+ {
+ nelems *= 2;
+ size /= 2;
+ }
+ bswap_array (dest, dest, size, nelems);
}
}
@@ -945,9 +986,10 @@ unformatted_write (st_parameter_dt *dtp, bt type,
}
else
{
- char buffer[16];
+#define BSWAP_BUFSZ 512
+ char buffer[BSWAP_BUFSZ];
char *p;
- size_t i;
+ size_t nrem;
p = source;
@@ -968,12 +1010,21 @@ unformatted_write (st_parameter_dt *dtp, bt type,
/* By now, all complex variables have been split into their
constituent reals. */
- for (i = 0; i < nelems; i++)
+ nrem = nelems;
+ do
{
- reverse_memcpy(buffer, p, size);
- p += size;
- write_buf (dtp, buffer, size);
+ size_t nc;
+ if (size * nrem > BSWAP_BUFSZ)
+ nc = BSWAP_BUFSZ / size;
+ else
+ nc = nrem;
+
+ bswap_array (buffer, p, size, nc);
+ write_buf (dtp, buffer, size * nc);
+ p += size * nc;
+ nrem -= nc;
}
+ while (nrem > 0);
}
}
@@ -2153,15 +2204,22 @@ us_read (st_parameter_dt *dtp, int continued)
}
}
else
+ {
+ uint32_t u32;
+ uint64_t u64;
switch (nr)
{
case sizeof(GFC_INTEGER_4):
- reverse_memcpy (&i4, &i, sizeof (i4));
+ memcpy (&u32, &i, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ i4 = *(GFC_INTEGER_4*)&u32;
i = i4;
break;
case sizeof(GFC_INTEGER_8):
- reverse_memcpy (&i8, &i, sizeof (i8));
+ memcpy (&u64, &i, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ i8 = *(GFC_INTEGER_8*)&u64;
i = i8;
break;
@@ -2169,6 +2227,7 @@ us_read (st_parameter_dt *dtp, int continued)
runtime_error ("Illegal value for record marker");
break;
}
+ }
if (i >= 0)
{
@@ -3036,7 +3095,6 @@ write_us_marker (st_parameter_dt *dtp, const gfc_offset buf)
size_t len;
GFC_INTEGER_4 buf4;
GFC_INTEGER_8 buf8;
- char p[sizeof (GFC_INTEGER_8)];
if (compile_options.record_marker == 0)
len = sizeof (GFC_INTEGER_4);
@@ -3065,18 +3123,20 @@ write_us_marker (st_parameter_dt *dtp, const gfc_offset buf)
}
else
{
+ uint32_t u32;
+ uint64_t u64;
switch (len)
{
case sizeof (GFC_INTEGER_4):
buf4 = buf;
- reverse_memcpy (p, &buf4, sizeof (GFC_INTEGER_4));
- return swrite (dtp->u.p.current_unit->s, p, len);
+ u32 = __builtin_bswap32 (*(uint32_t*)&buf4);
+ return swrite (dtp->u.p.current_unit->s, &u32, len);
break;
case sizeof (GFC_INTEGER_8):
buf8 = buf;
- reverse_memcpy (p, &buf8, sizeof (GFC_INTEGER_8));
- return swrite (dtp->u.p.current_unit->s, p, len);
+ u64 = __builtin_bswap64 (*(uint64_t*)&buf8);
+ return swrite (dtp->u.p.current_unit->s, &u64, len);
break;
default:
@@ -3713,22 +3773,6 @@ st_set_nml_var_dim (st_parameter_dt *dtp, GFC_INTEGER_4 n_dim,
GFC_DIMENSION_SET(nml->dim[n],lbound,ubound,stride);
}
-/* Reverse memcpy - used for byte swapping. */
-
-void reverse_memcpy (void *dest, const void *src, size_t n)
-{
- char *d, *s;
- size_t i;
-
- d = (char *) dest;
- s = (char *) src + n - 1;
-
- /* Write with ascending order - this is likely faster
- on modern architectures because of write combining. */
- for (i=0; i<n; i++)
- *(d++) = *(s--);
-}
-
/* Once upon a time, a poor innocent Fortran program was reading a
file, when suddenly it hit the end-of-file (EOF). Unfortunately
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-04 22:15 [Patch, libfortran] Improve performance of byte swapped IO Janne Blomqvist
@ 2013-01-04 22:35 ` Andreas Schwab
2013-01-05 15:35 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Andreas Schwab @ 2013-01-04 22:35 UTC (permalink / raw)
To: Janne Blomqvist; +Cc: GCC Patches, Fortran List
Janne Blomqvist <blomqvist.janne@gmail.com> writes:
> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
> index c8ecc3a..bf2250a 100644
> --- a/libgfortran/io/file_pos.c
> +++ b/libgfortran/io/file_pos.c
> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
> }
> else
> {
> + uint32_t u32;
> + uint64_t u64;
> switch (length)
> {
> case sizeof(GFC_INTEGER_4):
> - reverse_memcpy (&m4, p, sizeof (m4));
> + memcpy (&u32, p, sizeof (u32));
> + u32 = __builtin_bswap32 (u32);
> + m4 = *(GFC_INTEGER_4*)&u32;
Isn't that an aliasing violation?
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-04 22:35 ` Andreas Schwab
@ 2013-01-05 15:35 ` Richard Biener
2013-01-05 21:13 ` Janne Blomqvist
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2013-01-05 15:35 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Janne Blomqvist, GCC Patches, Fortran List
On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Janne Blomqvist <blomqvist.janne@gmail.com> writes:
>
>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>> index c8ecc3a..bf2250a 100644
>> --- a/libgfortran/io/file_pos.c
>> +++ b/libgfortran/io/file_pos.c
>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
>> }
>> else
>> {
>> + uint32_t u32;
>> + uint64_t u64;
>> switch (length)
>> {
>> case sizeof(GFC_INTEGER_4):
>> - reverse_memcpy (&m4, p, sizeof (m4));
>> + memcpy (&u32, p, sizeof (u32));
>> + u32 = __builtin_bswap32 (u32);
>> + m4 = *(GFC_INTEGER_4*)&u32;
>
> Isn't that an aliasing violation?
It looks like one. Why not simply do
m4 = (GFC_INTEGER_4) u32;
? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
Richard.
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-05 15:35 ` Richard Biener
@ 2013-01-05 21:13 ` Janne Blomqvist
2013-01-06 11:33 ` Richard Biener
2013-01-11 20:41 ` Janne Blomqvist
0 siblings, 2 replies; 10+ messages in thread
From: Janne Blomqvist @ 2013-01-05 21:13 UTC (permalink / raw)
To: Richard Biener; +Cc: Andreas Schwab, GCC Patches, Fortran List
[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]
On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> Janne Blomqvist <blomqvist.janne@gmail.com> writes:
>>
>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>> index c8ecc3a..bf2250a 100644
>>> --- a/libgfortran/io/file_pos.c
>>> +++ b/libgfortran/io/file_pos.c
>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
>>> }
>>> else
>>> {
>>> + uint32_t u32;
>>> + uint64_t u64;
>>> switch (length)
>>> {
>>> case sizeof(GFC_INTEGER_4):
>>> - reverse_memcpy (&m4, p, sizeof (m4));
>>> + memcpy (&u32, p, sizeof (u32));
>>> + u32 = __builtin_bswap32 (u32);
>>> + m4 = *(GFC_INTEGER_4*)&u32;
>>
>> Isn't that an aliasing violation?
>
> It looks like one. Why not simply do
>
> m4 = (GFC_INTEGER_4) u32;
>
> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
the above, C99 6.3.1.3(3) says that if the unsigned value is outside
the range of the signed variable, the result is
implementation-defined. Though I suppose the sensible
"implementation-defined behavior" in this case on a two's complement
target is to just do a bitwise copy.
Anyway, to be really safe one could use memcpy instead; the compiler
optimizes small fixed size memcpy's just fine. Updated patch attached.
--
Janne Blomqvist
[-- Attachment #2: bswap2.diff --]
[-- Type: application/octet-stream, Size: 7690 bytes --]
diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
index c8ecc3a..22d19dd 100644
--- a/libgfortran/io/file_pos.c
+++ b/libgfortran/io/file_pos.c
@@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
}
else
{
+ uint32_t u32;
+ uint64_t u64;
switch (length)
{
case sizeof(GFC_INTEGER_4):
- reverse_memcpy (&m4, p, sizeof (m4));
+ memcpy (&u32, p, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ memcpy (&m4, &u32, sizeof (m4));
m = m4;
break;
case sizeof(GFC_INTEGER_8):
- reverse_memcpy (&m8, p, sizeof (m8));
+ memcpy (&u64, p, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ memcpy (&m8, &u64, sizeof (m8));
m = m8;
break;
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 43aeafd..f17de19 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -649,9 +649,6 @@ internal_proto(init_loop_spec);
extern void next_record (st_parameter_dt *, int);
internal_proto(next_record);
-extern void reverse_memcpy (void *, const void *, size_t);
-internal_proto (reverse_memcpy);
-
extern void st_wait (st_parameter_wait *);
export_proto(st_wait);
diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index 6dda1df..5cefc38 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -878,50 +878,91 @@ write_buf (st_parameter_dt *dtp, void *buf, size_t nbytes)
}
-/* Master function for unformatted reads. */
+/* Reverse memcpy - used for byte swapping. */
static void
-unformatted_read (st_parameter_dt *dtp, bt type,
- void *dest, int kind, size_t size, size_t nelems)
+reverse_memcpy (void *dest, const void *src, size_t n)
{
- if (likely (dtp->u.p.current_unit->flags.convert == GFC_CONVERT_NATIVE)
- || kind == 1)
+ char *d, *s;
+ size_t i;
+
+ d = (char *) dest;
+ s = (char *) src + n - 1;
+
+ /* Write with ascending order - this is likely faster
+ on modern architectures because of write combining. */
+ for (i=0; i<n; i++)
+ *(d++) = *(s--);
+}
+
+
+/* Utility function for byteswapping an array, using the bswap
+ builtins if possible. dest and src can overlap. */
+
+static void
+bswap_array (void *dest, const void *src, size_t size, size_t nelems)
+{
+ char buffer[16];
+ const char *ps;
+ char *pd;
+
+ switch (size)
{
- if (type == BT_CHARACTER)
- size *= GFC_SIZE_OF_CHAR_KIND(kind);
- read_block_direct (dtp, dest, size * nelems);
+ case 1:
+ break;
+ case 2:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint16_t*)dest)[i] = __builtin_bswap16 (((uint16_t*)src)[i]);
+ break;
+ case 4:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint32_t*)dest)[i] = __builtin_bswap32 (((uint32_t*)src)[i]);
+ break;
+ case 8:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint64_t*)dest)[i] = __builtin_bswap64 (((uint64_t*)src)[i]);
+ break;
+ default:
+ ps = src;
+ pd = dest;
+ for (size_t i = 0; i < nelems; i++)
+ {
+ reverse_memcpy (buffer, ps, size);
+ memcpy (pd, buffer, size);
+ ps += size;
+ pd += size;
+ }
}
- else
- {
- char buffer[16];
- char *p;
- size_t i;
+}
- p = dest;
+/* Master function for unformatted reads. */
+
+static void
+unformatted_read (st_parameter_dt *dtp, bt type,
+ void *dest, int kind, size_t size, size_t nelems)
+{
+ if (type == BT_CHARACTER)
+ size *= GFC_SIZE_OF_CHAR_KIND(kind);
+ read_block_direct (dtp, dest, size * nelems);
+
+ if (unlikely (dtp->u.p.current_unit->flags.convert == GFC_CONVERT_SWAP)
+ && kind != 1)
+ {
/* Handle wide chracters. */
- if (type == BT_CHARACTER && kind != 1)
- {
- nelems *= size;
- size = kind;
- }
+ if (type == BT_CHARACTER)
+ {
+ nelems *= size;
+ size = kind;
+ }
/* Break up complex into its constituent reals. */
- if (type == BT_COMPLEX)
- {
- nelems *= 2;
- size /= 2;
- }
-
- /* By now, all complex variables have been split into their
- constituent reals. */
-
- for (i = 0; i < nelems; i++)
- {
- read_block_direct (dtp, buffer, size);
- reverse_memcpy (p, buffer, size);
- p += size;
- }
+ else if (type == BT_COMPLEX)
+ {
+ nelems *= 2;
+ size /= 2;
+ }
+ bswap_array (dest, dest, size, nelems);
}
}
@@ -945,9 +986,10 @@ unformatted_write (st_parameter_dt *dtp, bt type,
}
else
{
- char buffer[16];
+#define BSWAP_BUFSZ 512
+ char buffer[BSWAP_BUFSZ];
char *p;
- size_t i;
+ size_t nrem;
p = source;
@@ -968,12 +1010,21 @@ unformatted_write (st_parameter_dt *dtp, bt type,
/* By now, all complex variables have been split into their
constituent reals. */
- for (i = 0; i < nelems; i++)
+ nrem = nelems;
+ do
{
- reverse_memcpy(buffer, p, size);
- p += size;
- write_buf (dtp, buffer, size);
+ size_t nc;
+ if (size * nrem > BSWAP_BUFSZ)
+ nc = BSWAP_BUFSZ / size;
+ else
+ nc = nrem;
+
+ bswap_array (buffer, p, size, nc);
+ write_buf (dtp, buffer, size * nc);
+ p += size * nc;
+ nrem -= nc;
}
+ while (nrem > 0);
}
}
@@ -2153,15 +2204,22 @@ us_read (st_parameter_dt *dtp, int continued)
}
}
else
+ {
+ uint32_t u32;
+ uint64_t u64;
switch (nr)
{
case sizeof(GFC_INTEGER_4):
- reverse_memcpy (&i4, &i, sizeof (i4));
+ memcpy (&u32, &i, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ memcpy (&i4, &u32, sizeof (i4));
i = i4;
break;
case sizeof(GFC_INTEGER_8):
- reverse_memcpy (&i8, &i, sizeof (i8));
+ memcpy (&u64, &i, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ memcpy (&i8, &u64, sizeof (i8));
i = i8;
break;
@@ -2169,6 +2227,7 @@ us_read (st_parameter_dt *dtp, int continued)
runtime_error ("Illegal value for record marker");
break;
}
+ }
if (i >= 0)
{
@@ -3036,7 +3095,6 @@ write_us_marker (st_parameter_dt *dtp, const gfc_offset buf)
size_t len;
GFC_INTEGER_4 buf4;
GFC_INTEGER_8 buf8;
- char p[sizeof (GFC_INTEGER_8)];
if (compile_options.record_marker == 0)
len = sizeof (GFC_INTEGER_4);
@@ -3065,18 +3123,22 @@ write_us_marker (st_parameter_dt *dtp, const gfc_offset buf)
}
else
{
+ uint32_t u32;
+ uint64_t u64;
switch (len)
{
case sizeof (GFC_INTEGER_4):
buf4 = buf;
- reverse_memcpy (p, &buf4, sizeof (GFC_INTEGER_4));
- return swrite (dtp->u.p.current_unit->s, p, len);
+ memcpy (&u32, &buf4, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ return swrite (dtp->u.p.current_unit->s, &u32, len);
break;
case sizeof (GFC_INTEGER_8):
buf8 = buf;
- reverse_memcpy (p, &buf8, sizeof (GFC_INTEGER_8));
- return swrite (dtp->u.p.current_unit->s, p, len);
+ memcpy (&u64, &buf8, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ return swrite (dtp->u.p.current_unit->s, &u64, len);
break;
default:
@@ -3713,22 +3775,6 @@ st_set_nml_var_dim (st_parameter_dt *dtp, GFC_INTEGER_4 n_dim,
GFC_DIMENSION_SET(nml->dim[n],lbound,ubound,stride);
}
-/* Reverse memcpy - used for byte swapping. */
-
-void reverse_memcpy (void *dest, const void *src, size_t n)
-{
- char *d, *s;
- size_t i;
-
- d = (char *) dest;
- s = (char *) src + n - 1;
-
- /* Write with ascending order - this is likely faster
- on modern architectures because of write combining. */
- for (i=0; i<n; i++)
- *(d++) = *(s--);
-}
-
/* Once upon a time, a poor innocent Fortran program was reading a
file, when suddenly it hit the end-of-file (EOF). Unfortunately
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-05 21:13 ` Janne Blomqvist
@ 2013-01-06 11:33 ` Richard Biener
2013-01-11 20:41 ` Janne Blomqvist
1 sibling, 0 replies; 10+ messages in thread
From: Richard Biener @ 2013-01-06 11:33 UTC (permalink / raw)
To: Janne Blomqvist; +Cc: Andreas Schwab, GCC Patches, Fortran List
On Sat, Jan 5, 2013 at 10:13 PM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>> Janne Blomqvist <blomqvist.janne@gmail.com> writes:
>>>
>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>>> index c8ecc3a..bf2250a 100644
>>>> --- a/libgfortran/io/file_pos.c
>>>> +++ b/libgfortran/io/file_pos.c
>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
>>>> }
>>>> else
>>>> {
>>>> + uint32_t u32;
>>>> + uint64_t u64;
>>>> switch (length)
>>>> {
>>>> case sizeof(GFC_INTEGER_4):
>>>> - reverse_memcpy (&m4, p, sizeof (m4));
>>>> + memcpy (&u32, p, sizeof (u32));
>>>> + u32 = __builtin_bswap32 (u32);
>>>> + m4 = *(GFC_INTEGER_4*)&u32;
>>>
>>> Isn't that an aliasing violation?
>>
>> It looks like one. Why not simply do
>>
>> m4 = (GFC_INTEGER_4) u32;
>>
>> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
>
> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
> the above, C99 6.3.1.3(3) says that if the unsigned value is outside
> the range of the signed variable, the result is
> implementation-defined. Though I suppose the sensible
> "implementation-defined behavior" in this case on a two's complement
> target is to just do a bitwise copy.
As libgfortran is a target library and thus always compiled by GCC you
can rely on GCCs documented implementation-defined behavior here
(which is to do bitwise re-interpretation). No need to obfuscate the
code more than necessary.
Richard.
> Anyway, to be really safe one could use memcpy instead; the compiler
> optimizes small fixed size memcpy's just fine. Updated patch attached.
>
>
> --
> Janne Blomqvist
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-05 21:13 ` Janne Blomqvist
2013-01-06 11:33 ` Richard Biener
@ 2013-01-11 20:41 ` Janne Blomqvist
2013-01-13 22:44 ` Janne Blomqvist
1 sibling, 1 reply; 10+ messages in thread
From: Janne Blomqvist @ 2013-01-11 20:41 UTC (permalink / raw)
To: Richard Biener; +Cc: Andreas Schwab, GCC Patches, Fortran List
[-- Attachment #1: Type: text/plain, Size: 4491 bytes --]
PING.
Slightly updated patch attached, which further improves the generic
size fallback that is used when the element size is not 2/4/8 bytes.
Changing the us_perf benchmark to use real(10), with the v2 patch the
performance is:
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 59.028550429522085 86.019754350948787
8 79.028327063130590 95.803502000733374
16 99.980457395413296 138.68367462874946
32 122.56886206338788 180.05609910155042
64 152.00478266944486 212.69931319407567
128 197.74137934940202 235.19728791956828
256 155.36245780017779 244.60578379215929
512 157.13385845966246 245.07467397691480
1024 177.26553799130201 260.44908357795623
2048 208.22852888945587 260.21587143113527
4096 222.88410474980634 262.66162209490591
8192 226.71167580652920 265.81191407123663
16384 206.51818241747065 263.59395165591724
32768 230.18707026455866 265.88990325026526
65536 229.19783089391504 268.04485112932684
131072 231.12215662044449 267.40543904427710
262144 230.72012123598142 267.60086931504122
524288 230.48959460456055 268.78750211303725
With the new v3 patch I get
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 59.779061121239941 92.777125264010024
8 92.727504266051341 126.64775563782673
16 128.94793911163904 184.69194300482837
32 169.78916283536847 267.06752001266767
64 209.50296476919556 341.60515130910238
128 236.36709738360679 416.73212655882151
256 251.79029695383340 465.46804746749740
512 259.62269939828633 500.87346060356265
1024 265.08842337586458 508.95530627428275
2048 268.71795530051884 532.12211365683640
4096 280.86546884821030 546.88907054369884
8192 286.96049684823578 569.60958187426183
16384 292.04368984868103 608.11503416324865
32768 292.96677387959392 629.80651297065833
65536 291.69098580137114 624.27103478079641
131072 292.75666234956418 605.99766136491496
262144 291.35520038228975 611.59061455535834
524288 292.15446100501691 623.76232623081580
On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>> Janne Blomqvist <blomqvist.janne@gmail.com> writes:
>>>
>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>>> index c8ecc3a..bf2250a 100644
>>>> --- a/libgfortran/io/file_pos.c
>>>> +++ b/libgfortran/io/file_pos.c
>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
>>>> }
>>>> else
>>>> {
>>>> + uint32_t u32;
>>>> + uint64_t u64;
>>>> switch (length)
>>>> {
>>>> case sizeof(GFC_INTEGER_4):
>>>> - reverse_memcpy (&m4, p, sizeof (m4));
>>>> + memcpy (&u32, p, sizeof (u32));
>>>> + u32 = __builtin_bswap32 (u32);
>>>> + m4 = *(GFC_INTEGER_4*)&u32;
>>>
>>> Isn't that an aliasing violation?
>>
>> It looks like one. Why not simply do
>>
>> m4 = (GFC_INTEGER_4) u32;
>>
>> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
>
> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
> the above, C99 6.3.1.3(3) says that if the unsigned value is outside
> the range of the signed variable, the result is
> implementation-defined. Though I suppose the sensible
> "implementation-defined behavior" in this case on a two's complement
> target is to just do a bitwise copy.
>
> Anyway, to be really safe one could use memcpy instead; the compiler
> optimizes small fixed size memcpy's just fine. Updated patch attached.
>
>
> --
> Janne Blomqvist
--
Janne Blomqvist
[-- Attachment #2: bswap3.diff --]
[-- Type: application/octet-stream, Size: 8093 bytes --]
diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
index c8ecc3a..22d19dd 100644
--- a/libgfortran/io/file_pos.c
+++ b/libgfortran/io/file_pos.c
@@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
}
else
{
+ uint32_t u32;
+ uint64_t u64;
switch (length)
{
case sizeof(GFC_INTEGER_4):
- reverse_memcpy (&m4, p, sizeof (m4));
+ memcpy (&u32, p, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ memcpy (&m4, &u32, sizeof (m4));
m = m4;
break;
case sizeof(GFC_INTEGER_8):
- reverse_memcpy (&m8, p, sizeof (m8));
+ memcpy (&u64, p, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ memcpy (&m8, &u64, sizeof (m8));
m = m8;
break;
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 43aeafd..f17de19 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -649,9 +649,6 @@ internal_proto(init_loop_spec);
extern void next_record (st_parameter_dt *, int);
internal_proto(next_record);
-extern void reverse_memcpy (void *, const void *, size_t);
-internal_proto (reverse_memcpy);
-
extern void st_wait (st_parameter_wait *);
export_proto(st_wait);
diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index 6dda1df..d812fc1 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -878,50 +878,111 @@ write_buf (st_parameter_dt *dtp, void *buf, size_t nbytes)
}
-/* Master function for unformatted reads. */
+/* Reverse memcpy - used for byte swapping. */
static void
-unformatted_read (st_parameter_dt *dtp, bt type,
- void *dest, int kind, size_t size, size_t nelems)
+reverse_memcpy (void *dest, const void *src, size_t n)
{
- if (likely (dtp->u.p.current_unit->flags.convert == GFC_CONVERT_NATIVE)
- || kind == 1)
+ char *d, *s;
+ size_t i;
+
+ d = (char *) dest;
+ s = (char *) src + n - 1;
+
+ /* Write with ascending order - this is likely faster
+ on modern architectures because of write combining. */
+ for (i=0; i<n; i++)
+ *(d++) = *(s--);
+}
+
+
+/* Utility function for byteswapping an array, using the bswap
+ builtins if possible. dest and src can overlap completely, or then
+ they must point to separate objects; partial overlaps are not
+ allowed. */
+
+static void
+bswap_array (void *dest, const void *src, size_t size, size_t nelems)
+{
+ const char *ps;
+ char *pd;
+
+ switch (size)
{
- if (type == BT_CHARACTER)
- size *= GFC_SIZE_OF_CHAR_KIND(kind);
- read_block_direct (dtp, dest, size * nelems);
+ case 1:
+ break;
+ case 2:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint16_t*)dest)[i] = __builtin_bswap16 (((uint16_t*)src)[i]);
+ break;
+ case 4:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint32_t*)dest)[i] = __builtin_bswap32 (((uint32_t*)src)[i]);
+ break;
+ case 8:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint64_t*)dest)[i] = __builtin_bswap64 (((uint64_t*)src)[i]);
+ break;
+ default:
+ pd = dest;
+ if (dest != src)
+ {
+ ps = src;
+ for (size_t i = 0; i < nelems; i++)
+ {
+ reverse_memcpy (pd, ps, size);
+ ps += size;
+ pd += size;
+ }
+ }
+ else
+ {
+ /* In-place byte swap. */
+ for (size_t i = 0; i < nelems; i++)
+ {
+ char tmp, *low = pd, *high = pd + size - 1;
+ for (size_t j = 0; j < size/2; j++)
+ {
+ tmp = *low;
+ *low = *high;
+ *high = tmp;
+ low++;
+ high--;
+ }
+ pd += size;
+ }
+ }
}
- else
- {
- char buffer[16];
- char *p;
- size_t i;
+}
- p = dest;
+/* Master function for unformatted reads. */
+
+static void
+unformatted_read (st_parameter_dt *dtp, bt type,
+ void *dest, int kind, size_t size, size_t nelems)
+{
+ if (type == BT_CHARACTER)
+ size *= GFC_SIZE_OF_CHAR_KIND(kind);
+ read_block_direct (dtp, dest, size * nelems);
+
+ if (unlikely (dtp->u.p.current_unit->flags.convert == GFC_CONVERT_SWAP)
+ && kind != 1)
+ {
/* Handle wide chracters. */
- if (type == BT_CHARACTER && kind != 1)
- {
- nelems *= size;
- size = kind;
- }
+ if (type == BT_CHARACTER)
+ {
+ nelems *= size;
+ size = kind;
+ }
/* Break up complex into its constituent reals. */
- if (type == BT_COMPLEX)
- {
- nelems *= 2;
- size /= 2;
- }
-
- /* By now, all complex variables have been split into their
- constituent reals. */
-
- for (i = 0; i < nelems; i++)
- {
- read_block_direct (dtp, buffer, size);
- reverse_memcpy (p, buffer, size);
- p += size;
- }
+ else if (type == BT_COMPLEX)
+ {
+ nelems *= 2;
+ size /= 2;
+ }
+ bswap_array (dest, dest, size, nelems);
}
}
@@ -945,9 +1006,10 @@ unformatted_write (st_parameter_dt *dtp, bt type,
}
else
{
- char buffer[16];
+#define BSWAP_BUFSZ 512
+ char buffer[BSWAP_BUFSZ];
char *p;
- size_t i;
+ size_t nrem;
p = source;
@@ -968,12 +1030,21 @@ unformatted_write (st_parameter_dt *dtp, bt type,
/* By now, all complex variables have been split into their
constituent reals. */
- for (i = 0; i < nelems; i++)
+ nrem = nelems;
+ do
{
- reverse_memcpy(buffer, p, size);
- p += size;
- write_buf (dtp, buffer, size);
+ size_t nc;
+ if (size * nrem > BSWAP_BUFSZ)
+ nc = BSWAP_BUFSZ / size;
+ else
+ nc = nrem;
+
+ bswap_array (buffer, p, size, nc);
+ write_buf (dtp, buffer, size * nc);
+ p += size * nc;
+ nrem -= nc;
}
+ while (nrem > 0);
}
}
@@ -2153,15 +2224,22 @@ us_read (st_parameter_dt *dtp, int continued)
}
}
else
+ {
+ uint32_t u32;
+ uint64_t u64;
switch (nr)
{
case sizeof(GFC_INTEGER_4):
- reverse_memcpy (&i4, &i, sizeof (i4));
+ memcpy (&u32, &i, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ memcpy (&i4, &u32, sizeof (i4));
i = i4;
break;
case sizeof(GFC_INTEGER_8):
- reverse_memcpy (&i8, &i, sizeof (i8));
+ memcpy (&u64, &i, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ memcpy (&i8, &u64, sizeof (i8));
i = i8;
break;
@@ -2169,6 +2247,7 @@ us_read (st_parameter_dt *dtp, int continued)
runtime_error ("Illegal value for record marker");
break;
}
+ }
if (i >= 0)
{
@@ -3036,7 +3115,6 @@ write_us_marker (st_parameter_dt *dtp, const gfc_offset buf)
size_t len;
GFC_INTEGER_4 buf4;
GFC_INTEGER_8 buf8;
- char p[sizeof (GFC_INTEGER_8)];
if (compile_options.record_marker == 0)
len = sizeof (GFC_INTEGER_4);
@@ -3065,18 +3143,22 @@ write_us_marker (st_parameter_dt *dtp, const gfc_offset buf)
}
else
{
+ uint32_t u32;
+ uint64_t u64;
switch (len)
{
case sizeof (GFC_INTEGER_4):
buf4 = buf;
- reverse_memcpy (p, &buf4, sizeof (GFC_INTEGER_4));
- return swrite (dtp->u.p.current_unit->s, p, len);
+ memcpy (&u32, &buf4, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ return swrite (dtp->u.p.current_unit->s, &u32, len);
break;
case sizeof (GFC_INTEGER_8):
buf8 = buf;
- reverse_memcpy (p, &buf8, sizeof (GFC_INTEGER_8));
- return swrite (dtp->u.p.current_unit->s, p, len);
+ memcpy (&u64, &buf8, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ return swrite (dtp->u.p.current_unit->s, &u64, len);
break;
default:
@@ -3713,22 +3795,6 @@ st_set_nml_var_dim (st_parameter_dt *dtp, GFC_INTEGER_4 n_dim,
GFC_DIMENSION_SET(nml->dim[n],lbound,ubound,stride);
}
-/* Reverse memcpy - used for byte swapping. */
-
-void reverse_memcpy (void *dest, const void *src, size_t n)
-{
- char *d, *s;
- size_t i;
-
- d = (char *) dest;
- s = (char *) src + n - 1;
-
- /* Write with ascending order - this is likely faster
- on modern architectures because of write combining. */
- for (i=0; i<n; i++)
- *(d++) = *(s--);
-}
-
/* Once upon a time, a poor innocent Fortran program was reading a
file, when suddenly it hit the end-of-file (EOF). Unfortunately
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-11 20:41 ` Janne Blomqvist
@ 2013-01-13 22:44 ` Janne Blomqvist
2013-01-18 22:30 ` Janne Blomqvist
0 siblings, 1 reply; 10+ messages in thread
From: Janne Blomqvist @ 2013-01-13 22:44 UTC (permalink / raw)
To: Richard Biener; +Cc: Andreas Schwab, GCC Patches, Fortran List
[-- Attachment #1: Type: text/plain, Size: 7582 bytes --]
PING**1.2
Yet another slightly updated patch attached. Compared to the previous
version, now with specializations for size 12 and 16 as well. For the
real(10) benchmark, with the previous v3 patch (please disregard the
absolute values in the post quoted below, there were wrong due to a
bug):
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 80.578833140738340 127.33074266188656
8 137.61682156650559 184.49033790407984
16 202.72871312800621 275.98801561061816
32 275.33538767460863 413.43956672052303
64 341.04488670485119 555.13744525826564
128 384.77917051919820 671.44655208024699
256 410.97208129045833 763.97660513918527
512 425.76619227779878 826.41086693364593
1024 430.77035999730009 840.30757120448550
2048 438.30318459339475 885.50033810296600
4096 455.79422809097599 919.78265920652086
8192 465.74499205886326 959.06963983370918
16384 472.48133493971142 991.11244162081744
32768 471.00024619567603 1015.7428144049615
65536 474.91235280949985 1021.2150519080892
131072 475.18664487440901 1006.3701982554830
262144 478.00435092846868 985.17141300594039
524288 476.72837201590363 991.74226579987987
With the new v4 patch:
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 87.353141847504133 145.09410391177835
8 166.95093628370549 223.60877830048437
16 272.20937208187746 364.91673986840277
32 415.26016354252715 599.41744252952310
64 592.97676703528009 900.53345964312450
128 748.27218547147686 1189.7131837787238
256 874.83098506714384 1561.3649529261234
512 935.69494481144284 1823.1760143164879
1024 983.51689491813215 1931.8773088107300
2048 1009.5491761651396 1971.6978586130062
4096 1115.5862027658552 2119.4151169997808
8192 1172.9400229568287 2184.1403983641089
16384 1222.6659284153168 2258.5490449229878
32768 1242.2417626697293 2251.8159046253918
65536 1227.9967555594396 2313.4106672387143
131072 1204.4295656544052 2129.1309150039478
262144 1135.7905614378458 2154.7146453789856
524288 1075.5769074402640 2170.5151501933169
On Fri, Jan 11, 2013 at 10:41 PM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> PING.
>
> Slightly updated patch attached, which further improves the generic
> size fallback that is used when the element size is not 2/4/8 bytes.
> Changing the us_perf benchmark to use real(10), with the v2 patch the
> performance is:
>
> Unformatted sequential write/read performance test
> Record size Write MB/s Read MB/s
> ==========================================================
> 4 59.028550429522085 86.019754350948787
> 8 79.028327063130590 95.803502000733374
> 16 99.980457395413296 138.68367462874946
> 32 122.56886206338788 180.05609910155042
> 64 152.00478266944486 212.69931319407567
> 128 197.74137934940202 235.19728791956828
> 256 155.36245780017779 244.60578379215929
> 512 157.13385845966246 245.07467397691480
> 1024 177.26553799130201 260.44908357795623
> 2048 208.22852888945587 260.21587143113527
> 4096 222.88410474980634 262.66162209490591
> 8192 226.71167580652920 265.81191407123663
> 16384 206.51818241747065 263.59395165591724
> 32768 230.18707026455866 265.88990325026526
> 65536 229.19783089391504 268.04485112932684
> 131072 231.12215662044449 267.40543904427710
> 262144 230.72012123598142 267.60086931504122
> 524288 230.48959460456055 268.78750211303725
>
> With the new v3 patch I get
>
> Unformatted sequential write/read performance test
> Record size Write MB/s Read MB/s
> ==========================================================
> 4 59.779061121239941 92.777125264010024
> 8 92.727504266051341 126.64775563782673
> 16 128.94793911163904 184.69194300482837
> 32 169.78916283536847 267.06752001266767
> 64 209.50296476919556 341.60515130910238
> 128 236.36709738360679 416.73212655882151
> 256 251.79029695383340 465.46804746749740
> 512 259.62269939828633 500.87346060356265
> 1024 265.08842337586458 508.95530627428275
> 2048 268.71795530051884 532.12211365683640
> 4096 280.86546884821030 546.88907054369884
> 8192 286.96049684823578 569.60958187426183
> 16384 292.04368984868103 608.11503416324865
> 32768 292.96677387959392 629.80651297065833
> 65536 291.69098580137114 624.27103478079641
> 131072 292.75666234956418 605.99766136491496
> 262144 291.35520038228975 611.59061455535834
> 524288 292.15446100501691 623.76232623081580
>
>
> On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist
> <blomqvist.janne@gmail.com> wrote:
>> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>>> Janne Blomqvist <blomqvist.janne@gmail.com> writes:
>>>>
>>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>>>> index c8ecc3a..bf2250a 100644
>>>>> --- a/libgfortran/io/file_pos.c
>>>>> +++ b/libgfortran/io/file_pos.c
>>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
>>>>> }
>>>>> else
>>>>> {
>>>>> + uint32_t u32;
>>>>> + uint64_t u64;
>>>>> switch (length)
>>>>> {
>>>>> case sizeof(GFC_INTEGER_4):
>>>>> - reverse_memcpy (&m4, p, sizeof (m4));
>>>>> + memcpy (&u32, p, sizeof (u32));
>>>>> + u32 = __builtin_bswap32 (u32);
>>>>> + m4 = *(GFC_INTEGER_4*)&u32;
>>>>
>>>> Isn't that an aliasing violation?
>>>
>>> It looks like one. Why not simply do
>>>
>>> m4 = (GFC_INTEGER_4) u32;
>>>
>>> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
>>
>> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
>> the above, C99 6.3.1.3(3) says that if the unsigned value is outside
>> the range of the signed variable, the result is
>> implementation-defined. Though I suppose the sensible
>> "implementation-defined behavior" in this case on a two's complement
>> target is to just do a bitwise copy.
>>
>> Anyway, to be really safe one could use memcpy instead; the compiler
>> optimizes small fixed size memcpy's just fine. Updated patch attached.
>>
>>
>> --
>> Janne Blomqvist
>
>
>
> --
> Janne Blomqvist
--
Janne Blomqvist
[-- Attachment #2: bswap4.diff --]
[-- Type: application/octet-stream, Size: 8772 bytes --]
diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
index c8ecc3a..22d19dd 100644
--- a/libgfortran/io/file_pos.c
+++ b/libgfortran/io/file_pos.c
@@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
}
else
{
+ uint32_t u32;
+ uint64_t u64;
switch (length)
{
case sizeof(GFC_INTEGER_4):
- reverse_memcpy (&m4, p, sizeof (m4));
+ memcpy (&u32, p, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ memcpy (&m4, &u32, sizeof (m4));
m = m4;
break;
case sizeof(GFC_INTEGER_8):
- reverse_memcpy (&m8, p, sizeof (m8));
+ memcpy (&u64, p, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ memcpy (&m8, &u64, sizeof (m8));
m = m8;
break;
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 43aeafd..f17de19 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -649,9 +649,6 @@ internal_proto(init_loop_spec);
extern void next_record (st_parameter_dt *, int);
internal_proto(next_record);
-extern void reverse_memcpy (void *, const void *, size_t);
-internal_proto (reverse_memcpy);
-
extern void st_wait (st_parameter_wait *);
export_proto(st_wait);
diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index 6dda1df..020b6f0 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -878,50 +878,138 @@ write_buf (st_parameter_dt *dtp, void *buf, size_t nbytes)
}
-/* Master function for unformatted reads. */
+/* Reverse memcpy - used for byte swapping. */
static void
-unformatted_read (st_parameter_dt *dtp, bt type,
- void *dest, int kind, size_t size, size_t nelems)
+reverse_memcpy (void *dest, const void *src, size_t n)
{
- if (likely (dtp->u.p.current_unit->flags.convert == GFC_CONVERT_NATIVE)
- || kind == 1)
+ char *d, *s;
+ size_t i;
+
+ d = (char *) dest;
+ s = (char *) src + n - 1;
+
+ /* Write with ascending order - this is likely faster
+ on modern architectures because of write combining. */
+ for (i=0; i<n; i++)
+ *(d++) = *(s--);
+}
+
+
+/* Utility function for byteswapping an array, using the bswap
+ builtins if possible. dest and src can overlap completely, or then
+ they must point to separate objects; partial overlaps are not
+ allowed. */
+
+static void
+bswap_array (void *dest, const void *src, size_t size, size_t nelems)
+{
+ const char *ps;
+ char *pd;
+
+ switch (size)
{
- if (type == BT_CHARACTER)
- size *= GFC_SIZE_OF_CHAR_KIND(kind);
- read_block_direct (dtp, dest, size * nelems);
+ case 1:
+ break;
+ case 2:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint16_t*)dest)[i] = __builtin_bswap16 (((uint16_t*)src)[i]);
+ break;
+ case 4:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint32_t*)dest)[i] = __builtin_bswap32 (((uint32_t*)src)[i]);
+ break;
+ case 8:
+ for (size_t i = 0; i < nelems; i++)
+ ((uint64_t*)dest)[i] = __builtin_bswap64 (((uint64_t*)src)[i]);
+ break;
+ case 12:
+ ps = src;
+ pd = dest;
+ for (size_t i = 0; i < nelems; i++)
+ {
+ uint32_t tmp;
+ memcpy (&tmp, ps, 4);
+ *(uint32_t*)pd = __builtin_bswap32 (*(uint32_t*)(ps + 8));
+ *(uint32_t*)(pd + 4) = __builtin_bswap32 (*(uint32_t*)(ps + 4));
+ *(uint32_t*)(pd + 8) = __builtin_bswap32 (tmp);
+ ps += size;
+ pd += size;
+ }
+ break;
+ case 16:
+ ps = src;
+ pd = dest;
+ for (size_t i = 0; i < nelems; i++)
+ {
+ uint64_t tmp;
+ memcpy (&tmp, ps, 8);
+ *(uint64_t*)pd = __builtin_bswap64 (*(uint64_t*)(ps + 8));
+ *(uint64_t*)(pd + 8) = __builtin_bswap64 (tmp);
+ ps += size;
+ pd += size;
+ }
+ break;
+ default:
+ pd = dest;
+ if (dest != src)
+ {
+ ps = src;
+ for (size_t i = 0; i < nelems; i++)
+ {
+ reverse_memcpy (pd, ps, size);
+ ps += size;
+ pd += size;
+ }
+ }
+ else
+ {
+ /* In-place byte swap. */
+ for (size_t i = 0; i < nelems; i++)
+ {
+ char tmp, *low = pd, *high = pd + size - 1;
+ for (size_t j = 0; j < size/2; j++)
+ {
+ tmp = *low;
+ *low = *high;
+ *high = tmp;
+ low++;
+ high--;
+ }
+ pd += size;
+ }
+ }
}
- else
- {
- char buffer[16];
- char *p;
- size_t i;
+}
+
- p = dest;
+/* Master function for unformatted reads. */
+
+static void
+unformatted_read (st_parameter_dt *dtp, bt type,
+ void *dest, int kind, size_t size, size_t nelems)
+{
+ if (type == BT_CHARACTER)
+ size *= GFC_SIZE_OF_CHAR_KIND(kind);
+ read_block_direct (dtp, dest, size * nelems);
+ if (unlikely (dtp->u.p.current_unit->flags.convert == GFC_CONVERT_SWAP)
+ && kind != 1)
+ {
/* Handle wide chracters. */
- if (type == BT_CHARACTER && kind != 1)
- {
- nelems *= size;
- size = kind;
- }
+ if (type == BT_CHARACTER)
+ {
+ nelems *= size;
+ size = kind;
+ }
/* Break up complex into its constituent reals. */
- if (type == BT_COMPLEX)
- {
- nelems *= 2;
- size /= 2;
- }
-
- /* By now, all complex variables have been split into their
- constituent reals. */
-
- for (i = 0; i < nelems; i++)
- {
- read_block_direct (dtp, buffer, size);
- reverse_memcpy (p, buffer, size);
- p += size;
- }
+ else if (type == BT_COMPLEX)
+ {
+ nelems *= 2;
+ size /= 2;
+ }
+ bswap_array (dest, dest, size, nelems);
}
}
@@ -945,9 +1033,10 @@ unformatted_write (st_parameter_dt *dtp, bt type,
}
else
{
- char buffer[16];
+#define BSWAP_BUFSZ 512
+ char buffer[BSWAP_BUFSZ];
char *p;
- size_t i;
+ size_t nrem;
p = source;
@@ -968,12 +1057,21 @@ unformatted_write (st_parameter_dt *dtp, bt type,
/* By now, all complex variables have been split into their
constituent reals. */
- for (i = 0; i < nelems; i++)
+ nrem = nelems;
+ do
{
- reverse_memcpy(buffer, p, size);
- p += size;
- write_buf (dtp, buffer, size);
+ size_t nc;
+ if (size * nrem > BSWAP_BUFSZ)
+ nc = BSWAP_BUFSZ / size;
+ else
+ nc = nrem;
+
+ bswap_array (buffer, p, size, nc);
+ write_buf (dtp, buffer, size * nc);
+ p += size * nc;
+ nrem -= nc;
}
+ while (nrem > 0);
}
}
@@ -2153,15 +2251,22 @@ us_read (st_parameter_dt *dtp, int continued)
}
}
else
+ {
+ uint32_t u32;
+ uint64_t u64;
switch (nr)
{
case sizeof(GFC_INTEGER_4):
- reverse_memcpy (&i4, &i, sizeof (i4));
+ memcpy (&u32, &i, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ memcpy (&i4, &u32, sizeof (i4));
i = i4;
break;
case sizeof(GFC_INTEGER_8):
- reverse_memcpy (&i8, &i, sizeof (i8));
+ memcpy (&u64, &i, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ memcpy (&i8, &u64, sizeof (i8));
i = i8;
break;
@@ -2169,6 +2274,7 @@ us_read (st_parameter_dt *dtp, int continued)
runtime_error ("Illegal value for record marker");
break;
}
+ }
if (i >= 0)
{
@@ -3036,7 +3142,6 @@ write_us_marker (st_parameter_dt *dtp, const gfc_offset buf)
size_t len;
GFC_INTEGER_4 buf4;
GFC_INTEGER_8 buf8;
- char p[sizeof (GFC_INTEGER_8)];
if (compile_options.record_marker == 0)
len = sizeof (GFC_INTEGER_4);
@@ -3065,18 +3170,22 @@ write_us_marker (st_parameter_dt *dtp, const gfc_offset buf)
}
else
{
+ uint32_t u32;
+ uint64_t u64;
switch (len)
{
case sizeof (GFC_INTEGER_4):
buf4 = buf;
- reverse_memcpy (p, &buf4, sizeof (GFC_INTEGER_4));
- return swrite (dtp->u.p.current_unit->s, p, len);
+ memcpy (&u32, &buf4, sizeof (u32));
+ u32 = __builtin_bswap32 (u32);
+ return swrite (dtp->u.p.current_unit->s, &u32, len);
break;
case sizeof (GFC_INTEGER_8):
buf8 = buf;
- reverse_memcpy (p, &buf8, sizeof (GFC_INTEGER_8));
- return swrite (dtp->u.p.current_unit->s, p, len);
+ memcpy (&u64, &buf8, sizeof (u64));
+ u64 = __builtin_bswap64 (u64);
+ return swrite (dtp->u.p.current_unit->s, &u64, len);
break;
default:
@@ -3713,22 +3822,6 @@ st_set_nml_var_dim (st_parameter_dt *dtp, GFC_INTEGER_4 n_dim,
GFC_DIMENSION_SET(nml->dim[n],lbound,ubound,stride);
}
-/* Reverse memcpy - used for byte swapping. */
-
-void reverse_memcpy (void *dest, const void *src, size_t n)
-{
- char *d, *s;
- size_t i;
-
- d = (char *) dest;
- s = (char *) src + n - 1;
-
- /* Write with ascending order - this is likely faster
- on modern architectures because of write combining. */
- for (i=0; i<n; i++)
- *(d++) = *(s--);
-}
-
/* Once upon a time, a poor innocent Fortran program was reading a
file, when suddenly it hit the end-of-file (EOF). Unfortunately
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-13 22:44 ` Janne Blomqvist
@ 2013-01-18 22:30 ` Janne Blomqvist
2013-01-22 22:32 ` Thomas Koenig
0 siblings, 1 reply; 10+ messages in thread
From: Janne Blomqvist @ 2013-01-18 22:30 UTC (permalink / raw)
To: Richard Biener; +Cc: Andreas Schwab, GCC Patches, Fortran List
PING**2
On Mon, Jan 14, 2013 at 12:44 AM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> PING**1.2
>
> Yet another slightly updated patch attached. Compared to the previous
> version, now with specializations for size 12 and 16 as well. For the
> real(10) benchmark, with the previous v3 patch (please disregard the
> absolute values in the post quoted below, there were wrong due to a
> bug):
>
> Unformatted sequential write/read performance test
> Record size Write MB/s Read MB/s
> ==========================================================
> 4 80.578833140738340 127.33074266188656
> 8 137.61682156650559 184.49033790407984
> 16 202.72871312800621 275.98801561061816
> 32 275.33538767460863 413.43956672052303
> 64 341.04488670485119 555.13744525826564
> 128 384.77917051919820 671.44655208024699
> 256 410.97208129045833 763.97660513918527
> 512 425.76619227779878 826.41086693364593
> 1024 430.77035999730009 840.30757120448550
> 2048 438.30318459339475 885.50033810296600
> 4096 455.79422809097599 919.78265920652086
> 8192 465.74499205886326 959.06963983370918
> 16384 472.48133493971142 991.11244162081744
> 32768 471.00024619567603 1015.7428144049615
> 65536 474.91235280949985 1021.2150519080892
> 131072 475.18664487440901 1006.3701982554830
> 262144 478.00435092846868 985.17141300594039
> 524288 476.72837201590363 991.74226579987987
>
> With the new v4 patch:
>
> Unformatted sequential write/read performance test
> Record size Write MB/s Read MB/s
> ==========================================================
> 4 87.353141847504133 145.09410391177835
> 8 166.95093628370549 223.60877830048437
> 16 272.20937208187746 364.91673986840277
> 32 415.26016354252715 599.41744252952310
> 64 592.97676703528009 900.53345964312450
> 128 748.27218547147686 1189.7131837787238
> 256 874.83098506714384 1561.3649529261234
> 512 935.69494481144284 1823.1760143164879
> 1024 983.51689491813215 1931.8773088107300
> 2048 1009.5491761651396 1971.6978586130062
> 4096 1115.5862027658552 2119.4151169997808
> 8192 1172.9400229568287 2184.1403983641089
> 16384 1222.6659284153168 2258.5490449229878
> 32768 1242.2417626697293 2251.8159046253918
> 65536 1227.9967555594396 2313.4106672387143
> 131072 1204.4295656544052 2129.1309150039478
> 262144 1135.7905614378458 2154.7146453789856
> 524288 1075.5769074402640 2170.5151501933169
>
>
> On Fri, Jan 11, 2013 at 10:41 PM, Janne Blomqvist
> <blomqvist.janne@gmail.com> wrote:
>> PING.
>>
>> Slightly updated patch attached, which further improves the generic
>> size fallback that is used when the element size is not 2/4/8 bytes.
>> Changing the us_perf benchmark to use real(10), with the v2 patch the
>> performance is:
>>
>> Unformatted sequential write/read performance test
>> Record size Write MB/s Read MB/s
>> ==========================================================
>> 4 59.028550429522085 86.019754350948787
>> 8 79.028327063130590 95.803502000733374
>> 16 99.980457395413296 138.68367462874946
>> 32 122.56886206338788 180.05609910155042
>> 64 152.00478266944486 212.69931319407567
>> 128 197.74137934940202 235.19728791956828
>> 256 155.36245780017779 244.60578379215929
>> 512 157.13385845966246 245.07467397691480
>> 1024 177.26553799130201 260.44908357795623
>> 2048 208.22852888945587 260.21587143113527
>> 4096 222.88410474980634 262.66162209490591
>> 8192 226.71167580652920 265.81191407123663
>> 16384 206.51818241747065 263.59395165591724
>> 32768 230.18707026455866 265.88990325026526
>> 65536 229.19783089391504 268.04485112932684
>> 131072 231.12215662044449 267.40543904427710
>> 262144 230.72012123598142 267.60086931504122
>> 524288 230.48959460456055 268.78750211303725
>>
>> With the new v3 patch I get
>>
>> Unformatted sequential write/read performance test
>> Record size Write MB/s Read MB/s
>> ==========================================================
>> 4 59.779061121239941 92.777125264010024
>> 8 92.727504266051341 126.64775563782673
>> 16 128.94793911163904 184.69194300482837
>> 32 169.78916283536847 267.06752001266767
>> 64 209.50296476919556 341.60515130910238
>> 128 236.36709738360679 416.73212655882151
>> 256 251.79029695383340 465.46804746749740
>> 512 259.62269939828633 500.87346060356265
>> 1024 265.08842337586458 508.95530627428275
>> 2048 268.71795530051884 532.12211365683640
>> 4096 280.86546884821030 546.88907054369884
>> 8192 286.96049684823578 569.60958187426183
>> 16384 292.04368984868103 608.11503416324865
>> 32768 292.96677387959392 629.80651297065833
>> 65536 291.69098580137114 624.27103478079641
>> 131072 292.75666234956418 605.99766136491496
>> 262144 291.35520038228975 611.59061455535834
>> 524288 292.15446100501691 623.76232623081580
>>
>>
>> On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist
>> <blomqvist.janne@gmail.com> wrote:
>>> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>>>> Janne Blomqvist <blomqvist.janne@gmail.com> writes:
>>>>>
>>>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>>>>> index c8ecc3a..bf2250a 100644
>>>>>> --- a/libgfortran/io/file_pos.c
>>>>>> +++ b/libgfortran/io/file_pos.c
>>>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
>>>>>> }
>>>>>> else
>>>>>> {
>>>>>> + uint32_t u32;
>>>>>> + uint64_t u64;
>>>>>> switch (length)
>>>>>> {
>>>>>> case sizeof(GFC_INTEGER_4):
>>>>>> - reverse_memcpy (&m4, p, sizeof (m4));
>>>>>> + memcpy (&u32, p, sizeof (u32));
>>>>>> + u32 = __builtin_bswap32 (u32);
>>>>>> + m4 = *(GFC_INTEGER_4*)&u32;
>>>>>
>>>>> Isn't that an aliasing violation?
>>>>
>>>> It looks like one. Why not simply do
>>>>
>>>> m4 = (GFC_INTEGER_4) u32;
>>>>
>>>> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
>>>
>>> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
>>> the above, C99 6.3.1.3(3) says that if the unsigned value is outside
>>> the range of the signed variable, the result is
>>> implementation-defined. Though I suppose the sensible
>>> "implementation-defined behavior" in this case on a two's complement
>>> target is to just do a bitwise copy.
>>>
>>> Anyway, to be really safe one could use memcpy instead; the compiler
>>> optimizes small fixed size memcpy's just fine. Updated patch attached.
>>>
>>>
>>> --
>>> Janne Blomqvist
>>
>>
>>
>> --
>> Janne Blomqvist
>
>
>
> --
> Janne Blomqvist
--
Janne Blomqvist
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-18 22:30 ` Janne Blomqvist
@ 2013-01-22 22:32 ` Thomas Koenig
2013-01-23 21:57 ` Janne Blomqvist
0 siblings, 1 reply; 10+ messages in thread
From: Thomas Koenig @ 2013-01-22 22:32 UTC (permalink / raw)
To: Janne Blomqvist; +Cc: Richard Biener, Andreas Schwab, GCC Patches, Fortran List
Hi Janne,
> PING**2
this is OK. Thanks a lot for the work you put into this!
Thomas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch, libfortran] Improve performance of byte swapped IO
2013-01-22 22:32 ` Thomas Koenig
@ 2013-01-23 21:57 ` Janne Blomqvist
0 siblings, 0 replies; 10+ messages in thread
From: Janne Blomqvist @ 2013-01-23 21:57 UTC (permalink / raw)
To: Thomas Koenig; +Cc: Richard Biener, Andreas Schwab, GCC Patches, Fortran List
On Wed, Jan 23, 2013 at 12:32 AM, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Hi Janne,
>
>> PING**2
>
>
> this is OK. Thanks a lot for the work you put into this!
Thanks for the review; committed as r195413.
--
Janne Blomqvist
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-01-23 21:57 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-04 22:15 [Patch, libfortran] Improve performance of byte swapped IO Janne Blomqvist
2013-01-04 22:35 ` Andreas Schwab
2013-01-05 15:35 ` Richard Biener
2013-01-05 21:13 ` Janne Blomqvist
2013-01-06 11:33 ` Richard Biener
2013-01-11 20:41 ` Janne Blomqvist
2013-01-13 22:44 ` Janne Blomqvist
2013-01-18 22:30 ` Janne Blomqvist
2013-01-22 22:32 ` Thomas Koenig
2013-01-23 21:57 ` Janne Blomqvist
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).