public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/109326] New: Bad assembler code generation for valid C on 886-64
@ 2023-03-29 1:04 susurrus.of.qualia at gmail dot com
2023-03-29 1:23 ` [Bug middle-end/109326] " pinskia at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: susurrus.of.qualia at gmail dot com @ 2023-03-29 1:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326
Bug ID: 109326
Summary: Bad assembler code generation for valid C on 886-64
Product: gcc
Version: og10 (devel/omp/gcc-10)
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: susurrus.of.qualia at gmail dot com
Target Milestone: ---
Created attachment 54782
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54782&action=edit
compiler output
I have a bit of code here that is compiling without warnings and producing what
appear to be gross errors in the assembler output for some functions.
Pertinent info:
$ gcc10.4 -v
Using built-in specs.
COLLECT_GCC=gcc10.4
COLLECT_LTO_WRAPPER=/home/stevet/libexec/gcc/x86_64-pc-linux-gnu/10.4.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-10.4.0/configure --prefix=/home/stevet
--program-suffix=10.4 --enable-shared --enable-linker-build-id
--without-included-gettext --enable-threads=posix --enable-nls
--enable-bootstrap --enable-clocale=gnu --with-tune=generic
--enable-languages=c --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.4.0 (GCC)
uname -a
Linux mx 5.18.0-4mx-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1~mx21+1
(2022-08-22) x86_64 GNU/Linux
Unit compilation command:
gcc10.4 -c -D_POSIX_C_SOURCE=200112L -DOLOCK_192 -DARCH_64 -DLINUX -I./ -I./
-pthread -m64 -std=c99 -Wall -Wextra -Wno-implicit-fallthrough -Werror
-falign-functions=16 -falign-loops=1 -falign-jumps=1
-fno-inline-small-functions -fdiagnostics-color=never -fverbose-asm
--save-temps -O3 -ggdb -o olock.o olock.c
it should be noted that the bad code generation seems lessened, but not
eliminated at -O2. Similarly, the problems were slightly different between
gcc-10.2.1 and the most recent 10.x release.
First thing to note is the assembler generated for the relatively simple
olock_reset_op() function. Near as I can tell, the asm bears exactly zero
relation to the C code of that function. The mystery constant $0xa06 seems
notable and also appears in init_olock_op_element_struct().
init_olock_op_struct() begins with an access to %fs:0x0, which is then
clobbered by an add $0x0, %rax shortly thereafter. Perhaps this is normal.
olock_fsm_event() doesn't look good either. There are three callq *%reg
instances where there should be at most one.
I'm not sure about olock_op_allocator(). olock_opcode_acqs() looks suspicious,
but I'm not that well versed in x86 so I could be wrong.
If I knew that the dynamic linker would fixup the %fs:0x0 references to
something normal I'd have more confidence about the rest of the code, but it
looks like about half the functions aren't correct at this point.
I've not yet tested any of this code yet; it is still subject to revision while
I clean it up. With this type of algorithm it is unfortunately necessary to
have mostly correct code before even thinking about testing it. This version
is close to that point.
As I note I can only attach one file, I'll include the assembler output for the
troublesome olock_reset_op() function for reference.
216 0000000000000290 <olock_reset_op>:
217 290: 0f b7 57 10 movzwl 0x10(%rdi),%edx
218 294: 66 85 d2 test %dx,%dx
219 297: 0f 84 f4 04 00 00 je 791
<olock_reset_op+0x501>
220 29d: 8d 42 ff lea -0x1(%rdx),%eax
221 2a0: 66 83 f8 0e cmp $0xe,%ax
222 2a4: 0f 86 e8 04 00 00 jbe 792
<olock_reset_op+0x502>
223 2aa: 89 d1 mov %edx,%ecx
224 2ac: 48 8d 47 2c lea 0x2c(%rdi),%rax
225 2b0: 66 c1 e9 04 shr $0x4,%cx
226 2b4: 83 e9 01 sub $0x1,%ecx
227 2b7: 0f b7 c9 movzwl %cx,%ecx
228 2ba: 48 8d 0c 49 lea (%rcx,%rcx,2),%rcx
229 2be: 48 c1 e1 07 shl $0x7,%rcx
230 2c2: 48 8d 8c 0f ac 01 00 lea 0x1ac(%rdi,%rcx,1),%rcx
231 2c9: 00
232 2ca: 41 b9 06 0a 00 00 mov $0xa06,%r9d
233 2d0: c7 40 f4 00 00 00 00 movl $0x0,-0xc(%rax)
234 2d7: 41 ba 06 0a 00 00 mov $0xa06,%r10d
235 2dd: 41 bb 06 0a 00 00 mov $0xa06,%r11d
236 2e3: c7 40 0c 00 00 00 00 movl $0x0,0xc(%rax)
237 2ea: be 06 0a 00 00 mov $0xa06,%esi
238 2ef: 41 b8 06 0a 00 00 mov $0xa06,%r8d
239 2f5: 48 05 80 01 00 00 add $0x180,%rax
240 2fb: c7 80 a4 fe ff ff 00 movl $0x0,-0x15c(%rax)
241 302: 00 00 00
242 305: c7 80 bc fe ff ff 00 movl $0x0,-0x144(%rax)
243 30c: 00 00 00
244 30f: c7 80 d4 fe ff ff 00 movl $0x0,-0x12c(%rax)
245 316: 00 00 00
246 319: c7 80 ec fe ff ff 00 movl $0x0,-0x114(%rax)
247 320: 00 00 00
248 323: c7 80 04 ff ff ff 00 movl $0x0,-0xfc(%rax)
249 32a: 00 00 00
250 32d: c7 80 1c ff ff ff 00 movl $0x0,-0xe4(%rax)
251 334: 00 00 00
252 337: c7 80 34 ff ff ff 00 movl $0x0,-0xcc(%rax)
253 33e: 00 00 00
254 341: c7 80 4c ff ff ff 00 movl $0x0,-0xb4(%rax)
255 348: 00 00 00
256 34b: c7 80 64 ff ff ff 00 movl $0x0,-0x9c(%rax)
257 352: 00 00 00
258 355: c7 80 7c ff ff ff 00 movl $0x0,-0x84(%rax)
259 35c: 00 00 00
260 35f: c7 40 94 00 00 00 00 movl $0x0,-0x6c(%rax)
261 366: c7 40 ac 00 00 00 00 movl $0x0,-0x54(%rax)
262 36d: c7 40 c4 00 00 00 00 movl $0x0,-0x3c(%rax)
263 374: c7 40 dc 00 00 00 00 movl $0x0,-0x24(%rax)
264 37b: c6 80 7c fe ff ff 00 movb $0x0,-0x184(%rax)
265 382: c6 80 94 fe ff ff 00 movb $0x0,-0x16c(%rax)
266 389: c6 80 ac fe ff ff 00 movb $0x0,-0x154(%rax)
267 390: c6 80 c4 fe ff ff 00 movb $0x0,-0x13c(%rax)
268 397: c6 80 dc fe ff ff 00 movb $0x0,-0x124(%rax)
269 39e: c6 80 f4 fe ff ff 00 movb $0x0,-0x10c(%rax)
270 3a5: c6 80 0c ff ff ff 00 movb $0x0,-0xf4(%rax)
271 3ac: c6 80 24 ff ff ff 00 movb $0x0,-0xdc(%rax)
272 3b3: c6 80 3c ff ff ff 00 movb $0x0,-0xc4(%rax)
273 3ba: c6 80 54 ff ff ff 00 movb $0x0,-0xac(%rax)
274 3c1: c6 80 6c ff ff ff 00 movb $0x0,-0x94(%rax)
275 3c8: c6 40 84 00 movb $0x0,-0x7c(%rax)
276 3cc: c6 40 9c 00 movb $0x0,-0x64(%rax)
277 3d0: c6 40 b4 00 movb $0x0,-0x4c(%rax)
278 3d4: c6 40 cc 00 movb $0x0,-0x34(%rax)
279 3d8: c6 40 e4 00 movb $0x0,-0x1c(%rax)
280 3dc: 66 44 89 88 80 fe ff mov %r9w,-0x180(%rax)
281 3e3: ff
282 3e4: 41 b9 06 0a 00 00 mov $0xa06,%r9d
283 3ea: 66 44 89 90 98 fe ff mov %r10w,-0x168(%rax)
284 3f1: ff
285 3f2: 41 ba 06 0a 00 00 mov $0xa06,%r10d
286 3f8: 66 44 89 98 b0 fe ff mov %r11w,-0x150(%rax)
287 3ff: ff
288 400: 41 bb 06 0a 00 00 mov $0xa06,%r11d
289 406: 66 89 b0 c8 fe ff ff mov %si,-0x138(%rax)
290 40d: be 06 0a 00 00 mov $0xa06,%esi
291 412: 66 44 89 80 e0 fe ff mov %r8w,-0x120(%rax)
292 419: ff
293 41a: 41 b8 06 0a 00 00 mov $0xa06,%r8d
294 420: 66 44 89 88 f8 fe ff mov %r9w,-0x108(%rax)
295 427: ff
296 428: 41 b9 06 0a 00 00 mov $0xa06,%r9d
297 42e: 66 44 89 90 10 ff ff mov %r10w,-0xf0(%rax)
298 435: ff
299 436: 41 ba 06 0a 00 00 mov $0xa06,%r10d
300 43c: 66 44 89 98 28 ff ff mov %r11w,-0xd8(%rax)
301 443: ff
302 444: 41 bb 06 0a 00 00 mov $0xa06,%r11d
303 44a: 66 89 b0 40 ff ff ff mov %si,-0xc0(%rax)
304 451: be 06 0a 00 00 mov $0xa06,%esi
305 456: 66 44 89 80 58 ff ff mov %r8w,-0xa8(%rax)
306 45d: ff
307 45e: 41 b8 06 0a 00 00 mov $0xa06,%r8d
308 464: 66 44 89 88 70 ff ff mov %r9w,-0x90(%rax)
309 46b: ff
310 46c: 41 b9 06 0a 00 00 mov $0xa06,%r9d
311 472: 66 44 89 50 88 mov %r10w,-0x78(%rax)
312 477: 66 44 89 58 a0 mov %r11w,-0x60(%rax)
313 47c: 66 89 70 b8 mov %si,-0x48(%rax)
314 480: 66 44 89 40 d0 mov %r8w,-0x30(%rax)
315 485: 66 44 89 48 e8 mov %r9w,-0x18(%rax)
316 48a: 48 39 c8 cmp %rcx,%rax
317 48d: 0f 85 37 fe ff ff jne 2ca
<olock_reset_op+0x3a>
318 493: 89 d0 mov %edx,%eax
319 495: 83 e0 f0 and $0xfffffff0,%eax
320 498: f6 c2 0f test $0xf,%dl
321 49b: 0f 84 f8 02 00 00 je 799
<olock_reset_op+0x509>
322 4a1: 0f b7 f0 movzwl %ax,%esi
323 4a4: 8d 48 01 lea 0x1(%rax),%ecx
324 4a7: 48 8d 34 76 lea (%rsi,%rsi,2),%rsi
325 4ab: 48 c1 e6 03 shl $0x3,%rsi
326 4af: 4c 8d 04 37 lea (%rdi,%rsi,1),%r8
327 4b3: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
328 4ba: 00
329 4bb: 41 c6 40 28 00 movb $0x0,0x28(%r8)
330 4c0: 41 b8 06 0a 00 00 mov $0xa06,%r8d
331 4c6: 66 44 89 44 37 2c mov %r8w,0x2c(%rdi,%rsi,1)
332 4cc: 66 39 d1 cmp %dx,%cx
333 4cf: 0f 83 bc 02 00 00 jae 791
<olock_reset_op+0x501>
334 4d5: 0f b7 c9 movzwl %cx,%ecx
335 4d8: 41 bb 06 0a 00 00 mov $0xa06,%r11d
336 4de: 8d 70 02 lea 0x2(%rax),%esi
337 4e1: 48 8d 0c 49 lea (%rcx,%rcx,2),%rcx
338 4e5: 48 c1 e1 03 shl $0x3,%rcx
339 4e9: 4c 8d 04 0f lea (%rdi,%rcx,1),%r8
340 4ed: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
341 4f4: 00
342 4f5: 41 c6 40 28 00 movb $0x0,0x28(%r8)
343 4fa: 66 44 89 5c 0f 2c mov %r11w,0x2c(%rdi,%rcx,1)
344 500: 66 39 d6 cmp %dx,%si
345 503: 0f 83 88 02 00 00 jae 791
<olock_reset_op+0x501>
346 509: 0f b7 f6 movzwl %si,%esi
347 50c: 41 ba 06 0a 00 00 mov $0xa06,%r10d
348 512: 8d 48 03 lea 0x3(%rax),%ecx
349 515: 48 8d 34 76 lea (%rsi,%rsi,2),%rsi
350 519: 48 c1 e6 03 shl $0x3,%rsi
351 51d: 4c 8d 04 37 lea (%rdi,%rsi,1),%r8
352 521: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
353 528: 00
354 529: 41 c6 40 28 00 movb $0x0,0x28(%r8)
355 52e: 66 44 89 54 37 2c mov %r10w,0x2c(%rdi,%rsi,1)
356 534: 66 39 ca cmp %cx,%dx
357 537: 0f 86 54 02 00 00 jbe 791
<olock_reset_op+0x501>
358 53d: 0f b7 c9 movzwl %cx,%ecx
359 540: 41 b9 06 0a 00 00 mov $0xa06,%r9d
360 546: 8d 70 04 lea 0x4(%rax),%esi
361 549: 48 8d 0c 49 lea (%rcx,%rcx,2),%rcx
362 54d: 48 c1 e1 03 shl $0x3,%rcx
363 551: 4c 8d 04 0f lea (%rdi,%rcx,1),%r8
364 555: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
365 55c: 00
366 55d: 41 c6 40 28 00 movb $0x0,0x28(%r8)
367 562: 66 44 89 4c 0f 2c mov %r9w,0x2c(%rdi,%rcx,1)
368 568: 66 39 f2 cmp %si,%dx
369 56b: 0f 86 20 02 00 00 jbe 791
<olock_reset_op+0x501>
370 571: 0f b7 f6 movzwl %si,%esi
371 574: 8d 48 05 lea 0x5(%rax),%ecx
372 577: 48 8d 34 76 lea (%rsi,%rsi,2),%rsi
373 57b: 48 c1 e6 03 shl $0x3,%rsi
374 57f: 4c 8d 04 37 lea (%rdi,%rsi,1),%r8
375 583: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
376 58a: 00
377 58b: 41 c6 40 28 00 movb $0x0,0x28(%r8)
378 590: 41 b8 06 0a 00 00 mov $0xa06,%r8d
379 596: 66 44 89 44 37 2c mov %r8w,0x2c(%rdi,%rsi,1)
380 59c: 66 39 ca cmp %cx,%dx
381 59f: 0f 86 ec 01 00 00 jbe 791
<olock_reset_op+0x501>
382 5a5: 0f b7 c9 movzwl %cx,%ecx
383 5a8: 41 bb 06 0a 00 00 mov $0xa06,%r11d
384 5ae: 8d 70 06 lea 0x6(%rax),%esi
385 5b1: 48 8d 0c 49 lea (%rcx,%rcx,2),%rcx
386 5b5: 48 c1 e1 03 shl $0x3,%rcx
387 5b9: 4c 8d 04 0f lea (%rdi,%rcx,1),%r8
388 5bd: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
389 5c4: 00
390 5c5: 41 c6 40 28 00 movb $0x0,0x28(%r8)
391 5ca: 66 44 89 5c 0f 2c mov %r11w,0x2c(%rdi,%rcx,1)
392 5d0: 66 39 f2 cmp %si,%dx
393 5d3: 0f 86 b8 01 00 00 jbe 791
<olock_reset_op+0x501>
394 5d9: 0f b7 f6 movzwl %si,%esi
395 5dc: 41 ba 06 0a 00 00 mov $0xa06,%r10d
396 5e2: 8d 48 07 lea 0x7(%rax),%ecx
397 5e5: 48 8d 34 76 lea (%rsi,%rsi,2),%rsi
398 5e9: 48 c1 e6 03 shl $0x3,%rsi
399 5ed: 4c 8d 04 37 lea (%rdi,%rsi,1),%r8
400 5f1: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
401 5f8: 00
402 5f9: 41 c6 40 28 00 movb $0x0,0x28(%r8)
403 5fe: 66 44 89 54 37 2c mov %r10w,0x2c(%rdi,%rsi,1)
404 604: 66 39 ca cmp %cx,%dx
405 607: 0f 86 84 01 00 00 jbe 791
<olock_reset_op+0x501>
406 60d: 0f b7 c9 movzwl %cx,%ecx
407 610: 41 b9 06 0a 00 00 mov $0xa06,%r9d
408 616: 8d 70 08 lea 0x8(%rax),%esi
409 619: 48 8d 0c 49 lea (%rcx,%rcx,2),%rcx
410 61d: 48 c1 e1 03 shl $0x3,%rcx
411 621: 4c 8d 04 0f lea (%rdi,%rcx,1),%r8
412 625: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
413 62c: 00
414 62d: 41 c6 40 28 00 movb $0x0,0x28(%r8)
415 632: 66 44 89 4c 0f 2c mov %r9w,0x2c(%rdi,%rcx,1)
416 638: 66 39 f2 cmp %si,%dx
417 63b: 0f 86 50 01 00 00 jbe 791
<olock_reset_op+0x501>
418 641: 0f b7 f6 movzwl %si,%esi
419 644: 8d 48 09 lea 0x9(%rax),%ecx
420 647: 48 8d 34 76 lea (%rsi,%rsi,2),%rsi
421 64b: 48 c1 e6 03 shl $0x3,%rsi
422 64f: 4c 8d 04 37 lea (%rdi,%rsi,1),%r8
423 653: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
424 65a: 00
425 65b: 41 c6 40 28 00 movb $0x0,0x28(%r8)
426 660: 41 b8 06 0a 00 00 mov $0xa06,%r8d
427 666: 66 44 89 44 37 2c mov %r8w,0x2c(%rdi,%rsi,1)
428 66c: 66 39 ca cmp %cx,%dx
429 66f: 0f 86 1c 01 00 00 jbe 791
<olock_reset_op+0x501>
430 675: 0f b7 c9 movzwl %cx,%ecx
431 678: 41 bb 06 0a 00 00 mov $0xa06,%r11d
432 67e: 8d 70 0a lea 0xa(%rax),%esi
433 681: 48 8d 0c 49 lea (%rcx,%rcx,2),%rcx
434 685: 48 c1 e1 03 shl $0x3,%rcx
435 689: 4c 8d 04 0f lea (%rdi,%rcx,1),%r8
436 68d: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
437 694: 00
438 695: 41 c6 40 28 00 movb $0x0,0x28(%r8)
439 69a: 66 44 89 5c 0f 2c mov %r11w,0x2c(%rdi,%rcx,1)
440 6a0: 66 39 f2 cmp %si,%dx
441 6a3: 0f 86 e8 00 00 00 jbe 791
<olock_reset_op+0x501>
442 6a9: 0f b7 f6 movzwl %si,%esi
443 6ac: 41 ba 06 0a 00 00 mov $0xa06,%r10d
444 6b2: 8d 48 0b lea 0xb(%rax),%ecx
445 6b5: 48 8d 34 76 lea (%rsi,%rsi,2),%rsi
446 6b9: 48 c1 e6 03 shl $0x3,%rsi
447 6bd: 4c 8d 04 37 lea (%rdi,%rsi,1),%r8
448 6c1: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
449 6c8: 00
450 6c9: 41 c6 40 28 00 movb $0x0,0x28(%r8)
451 6ce: 66 44 89 54 37 2c mov %r10w,0x2c(%rdi,%rsi,1)
452 6d4: 66 39 ca cmp %cx,%dx
453 6d7: 0f 86 b4 00 00 00 jbe 791
<olock_reset_op+0x501>
454 6dd: 0f b7 c9 movzwl %cx,%ecx
455 6e0: 41 b9 06 0a 00 00 mov $0xa06,%r9d
456 6e6: 8d 70 0c lea 0xc(%rax),%esi
457 6e9: 48 8d 0c 49 lea (%rcx,%rcx,2),%rcx
458 6ed: 48 c1 e1 03 shl $0x3,%rcx
459 6f1: 4c 8d 04 0f lea (%rdi,%rcx,1),%r8
460 6f5: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
461 6fc: 00
462 6fd: 41 c6 40 28 00 movb $0x0,0x28(%r8)
463 702: 66 44 89 4c 0f 2c mov %r9w,0x2c(%rdi,%rcx,1)
464 708: 66 39 f2 cmp %si,%dx
465 70b: 0f 86 80 00 00 00 jbe 791
<olock_reset_op+0x501>
466 711: 0f b7 f6 movzwl %si,%esi
467 714: 8d 48 0d lea 0xd(%rax),%ecx
468 717: 48 8d 34 76 lea (%rsi,%rsi,2),%rsi
469 71b: 48 c1 e6 03 shl $0x3,%rsi
470 71f: 4c 8d 04 37 lea (%rdi,%rsi,1),%r8
471 723: 41 c7 40 20 00 00 00 movl $0x0,0x20(%r8)
472 72a: 00
473 72b: 41 c6 40 28 00 movb $0x0,0x28(%r8)
474 730: 41 b8 06 0a 00 00 mov $0xa06,%r8d
475 736: 66 44 89 44 37 2c mov %r8w,0x2c(%rdi,%rsi,1)
476 73c: 66 39 ca cmp %cx,%dx
477 73f: 76 50 jbe 791
<olock_reset_op+0x501>
478 741: 0f b7 c9 movzwl %cx,%ecx
479 744: 83 c0 0e add $0xe,%eax
480 747: 48 8d 0c 49 lea (%rcx,%rcx,2),%rcx
481 74b: 48 c1 e1 03 shl $0x3,%rcx
482 74f: 48 8d 34 0f lea (%rdi,%rcx,1),%rsi
483 753: c7 46 20 00 00 00 00 movl $0x0,0x20(%rsi)
484 75a: c6 46 28 00 movb $0x0,0x28(%rsi)
485 75e: be 06 0a 00 00 mov $0xa06,%esi
486 763: 66 89 74 0f 2c mov %si,0x2c(%rdi,%rcx,1)
487 768: 66 39 c2 cmp %ax,%dx
488 76b: 76 24 jbe 791
<olock_reset_op+0x501>
489 76d: 0f b7 c0 movzwl %ax,%eax
490 770: 48 8d 04 40 lea (%rax,%rax,2),%rax
491 774: 48 c1 e0 03 shl $0x3,%rax
492 778: 48 8d 14 07 lea (%rdi,%rax,1),%rdx
493 77c: c7 42 20 00 00 00 00 movl $0x0,0x20(%rdx)
494 783: c6 42 28 00 movb $0x0,0x28(%rdx)
495 787: ba 06 0a 00 00 mov $0xa06,%edx
496 78c: 66 89 54 07 2c mov %dx,0x2c(%rdi,%rax,1)
497 791: c3 retq
498 792: 31 c0 xor %eax,%eax
499 794: e9 08 fd ff ff jmpq 4a1
<olock_reset_op+0x211>
500 799: c3 retq
501 79a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
There seems to be some structure in the above, but in comparison to the source
it doesn't seem the slightest bit relevant.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/109326] Bad assembler code generation for valid C on 886-64
2023-03-29 1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
@ 2023-03-29 1:23 ` pinskia at gcc dot gnu.org
2023-03-29 1:38 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-03-29 1:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2023-03-29
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
init_olock_op_element_struct asm output looks fine to me:
movzwl .LC0(%rip), %eax
movq $0, (%rdi)
movq $0, 8(%rdi)
movl $0, 16(%rdi)
movw %ax, 20(%rdi)
LC0 is:
.LC0:
.byte 6
.byte 10
olock_fsm_event is fine too as it is just duplicating those basic blocks (the
calls).
init_olock_op_struct looks fine really:
movq %fs:0, %rax
pxor %xmm0, %xmm0
movups %xmm0, (%rdi)
addq $olock_tparams@tpoff, %rax
In Intel asm syntax:
mov rax, QWORD PTR fs:0
pxor xmm0, xmm0
movups XMMWORD PTR [rdi], xmm0
add rax, OFFSET FLAT:olock_tparams@tpoff
it is basically moving the TLS pointer to rax and then adding the offset for
the variable.
I don't understand what exactly you are complaining about really.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/109326] Bad assembler code generation for valid C on 886-64
2023-03-29 1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
2023-03-29 1:23 ` [Bug middle-end/109326] " pinskia at gcc dot gnu.org
@ 2023-03-29 1:38 ` pinskia at gcc dot gnu.org
2023-03-29 2:26 ` susurrus.of.qualia at gmail dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-03-29 1:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note if you are disassemblying the object file with objdump -d, you might want
to add the -r option to enable dumping of the relocations that are produced
too. In the init_olock_op_struct case you miss the relocation of the object
file because of that.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/109326] Bad assembler code generation for valid C on 886-64
2023-03-29 1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
2023-03-29 1:23 ` [Bug middle-end/109326] " pinskia at gcc dot gnu.org
2023-03-29 1:38 ` pinskia at gcc dot gnu.org
@ 2023-03-29 2:26 ` susurrus.of.qualia at gmail dot com
2023-03-29 2:35 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: susurrus.of.qualia at gmail dot com @ 2023-03-29 2:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326
--- Comment #3 from Steve Thompson <susurrus.of.qualia at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> init_olock_op_element_struct asm output looks fine to me:
>
> movzwl .LC0(%rip), %eax
> movq $0, (%rdi)
> movq $0, 8(%rdi)
> movl $0, 16(%rdi)
> movw %ax, 20(%rdi)
>
> LC0 is:
> .LC0:
> .byte 6
> .byte 10
>
> olock_fsm_event is fine too as it is just duplicating those basic blocks
> (the calls).
>
> init_olock_op_struct looks fine really:
> movq %fs:0, %rax
> pxor %xmm0, %xmm0
> movups %xmm0, (%rdi)
> addq $olock_tparams@tpoff, %rax
>
> In Intel asm syntax:
>
> mov rax, QWORD PTR fs:0
> pxor xmm0, xmm0
> movups XMMWORD PTR [rdi], xmm0
> add rax, OFFSET FLAT:olock_tparams@tpoff
>
> it is basically moving the TLS pointer to rax and then adding the offset for
> the variable.
>
> I don't understand what exactly you are complaining about realy.
OK, I wasn't sure about the TLS accesses; adding -r to objdump helped clear
that up. However I don't understand why olock_reset_op() is so large. It's a
trivial initializer for a descriptor with an array of olock_op_element
structures appended. There's no way it should look like what I quoted. I'd be
happy if I am experiencing a fever-dream over nothing due to ignorance, but I
am not convinced that that is the case. If I am wrong I will be very
disappointed.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/109326] Bad assembler code generation for valid C on 886-64
2023-03-29 1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
` (2 preceding siblings ...)
2023-03-29 2:26 ` susurrus.of.qualia at gmail dot com
@ 2023-03-29 2:35 ` pinskia at gcc dot gnu.org
2023-03-30 2:18 ` [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64 susurrus.of.qualia at gmail dot com
2023-03-30 2:32 ` susurrus.of.qualia at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-03-29 2:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Steve Thompson from comment #3)
> However I don't understand why olock_reset_op() is so large. It's
> a trivial initializer for a descriptor with an array of olock_op_element
> structures appended. There's no way it should look like what I quoted. I'd
> be happy if I am experiencing a fever-dream over nothing due to ignorance,
> but I am not convinced that that is the case. If I am wrong I will be very
> disappointed.
GCC unrolled the loop via vectorizing it.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64
2023-03-29 1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
` (3 preceding siblings ...)
2023-03-29 2:35 ` pinskia at gcc dot gnu.org
@ 2023-03-30 2:18 ` susurrus.of.qualia at gmail dot com
2023-03-30 2:32 ` susurrus.of.qualia at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: susurrus.of.qualia at gmail dot com @ 2023-03-30 2:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326
--- Comment #5 from Steve Thompson <susurrus.of.qualia at gmail dot com> ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Steve Thompson from comment #3)
> > However I don't understand why olock_reset_op() is so large. It's
> > a trivial initializer for a descriptor with an array of olock_op_element
> > structures appended. There's no way it should look like what I quoted. I'd
> > be happy if I am experiencing a fever-dream over nothing due to ignorance,
> > but I am not convinced that that is the case. If I am wrong I will be very
> > disappointed.
>
> GCC unrolled the loop via vectorizing it.
OMG did it ever. It seems that I'm an idiot and must apologise for wasting
everyone's time.
I fixed up some remaining support code and dug into it with gdb and determined
that it does, in fact work. There appear to be distinct paths for particular
array ranges and logic to take care odd numbers, sort of like memcopy handling
large blocks.
But I have to say that i really don't like it, and obviously I can work around
it by making the while() block similar to what is done in olock_init_op().
That gives me two functions with a combined text of 64 bytes if there is no
padding. Compare this to the 1.2KB of the original disassembly for a generous
factor of 20 code expansion. That seems like a great way to bloat code.
I realize that -Os is available, but it eliminates a bunch of supposed inline
functions leading to linker errors for the missing symbols. I'm not about to
try finding out why for the time being as I don't really need it.
For fun I built a short test program and measured the latency across
olock_reset_op for various array lengths:
1 8 16 32
64B code:
1.2K code:
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64
2023-03-29 1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
` (4 preceding siblings ...)
2023-03-30 2:18 ` [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64 susurrus.of.qualia at gmail dot com
@ 2023-03-30 2:32 ` susurrus.of.qualia at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: susurrus.of.qualia at gmail dot com @ 2023-03-30 2:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326
--- Comment #6 from Steve Thompson <susurrus.of.qualia at gmail dot com> ---
(In reply to Steve Thompson from comment #5)
> 1 8 16 32
> 64B code:
>
> 1.2K code:
Sorry, my touchpad glitched and sent prematurely.
For the overlarge vectorized version I hate:
[28] nr_ops=1 nr_samples=1000000(0) min=1 avg=5 max=12248
[28] nr_ops=8 nr_samples=1000000(0) min=1 avg=6 max=13022
[28] nr_ops=16 nr_samples=1000000(0) min=8 avg=11 max=9548
[28] nr_ops=32 nr_samples=1000000(0) min=26 avg=33 max=8126
[28] nr_ops=64 nr_samples=1000000(0) min=62 avg=73 max=11186
[28] nr_ops=128 nr_samples=1000000(0) min=134 avg=153 max=14426
[28] nr_ops=256 nr_samples=1000000(0) min=296 avg=312 max=12608
[28] nr_ops=1024 nr_samples=1000000(0) min=1250 avg=1269 max=23858
And the compact, esthetically pleasing version I like:
[28] nr_ops=1 nr_samples=1000000(0) min=1 avg=5 max=7910
[28] nr_ops=8 nr_samples=1000000(0) min=1 avg=7 max=20150
[28] nr_ops=16 nr_samples=1000000(0) min=8 avg=24 max=11402
[28] nr_ops=32 nr_samples=1000000(0) min=62 avg=74 max=20582
[28] nr_ops=64 nr_samples=1000000(0) min=152 avg=153 max=12482
[28] nr_ops=128 nr_samples=1000000(0) min=296 avg=313 max=33884
[28] nr_ops=256 nr_samples=1000000(0) min=620 avg=632 max=22940
[28] nr_ops=1024 nr_samples=1000000(0) min=2528 avg=2546 max=25064
(System is an AMD Ryzen 5700U laptop; the [28] is the measured cycle latency of
the RDTSCP operation; ()'ed number shows bad samples occasionally).
As it turns out, there are no advantages to the vectorized version until arrays
of 16; after that it is approximately twice as fast. Some will be happy to pay
that cost for the extra performance I suppose, but it still seems wasteful.
Again, sorry for being an idiot.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-03-30 2:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-29 1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
2023-03-29 1:23 ` [Bug middle-end/109326] " pinskia at gcc dot gnu.org
2023-03-29 1:38 ` pinskia at gcc dot gnu.org
2023-03-29 2:26 ` susurrus.of.qualia at gmail dot com
2023-03-29 2:35 ` pinskia at gcc dot gnu.org
2023-03-30 2:18 ` [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64 susurrus.of.qualia at gmail dot com
2023-03-30 2:32 ` susurrus.of.qualia at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).