From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 24FA73858C66; Mon, 13 Nov 2023 13:33:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 24FA73858C66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1699882386; bh=D4wHOcPvtcCfcOUm8OClj5sLzBwYUL/2zrFIAWtqxaM=; h=From:To:Subject:Date:From; b=TDV5ABd3XNTOdHQQQ1z/EfVxI5iS1lLrQTi2/Ksqjzbn0XHNbuhS6Wxv3x1+FgoP9 yrfouP3D1sgBKh4LW41qeMdtDVyje1VVQN31bxF5jK7JrKyDjaZ0yhP2iHX+EVreCN Sj4X5RbBSH6dm8WJv1mPpO6ql+zSlYQdiowMSACQ= From: "alexander.grund@tu-dresden.de" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/112513] New: Misoptimization of argument Date: Mon, 13 Nov 2023 13:33:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 12.2.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: alexander.grund@tu-dresden.de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112513 Bug ID: 112513 Summary: Misoptimization of argument Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: alexander.grund@tu-dresden.de Target Milestone: --- Created attachment 56569 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D56569&action=3Dedit Preprocessed source In the NVIDIA NCCL library (https://github.com/NVIDIA/nccl) I came across a SIGSEGV in __strncmp_sse42 that happens "sometimes" when compiled with GCC = 12 in -O2 mode and higher but don't happen in lower modes or in GCC 11 The original stacktrace looks like this: #0 0x00002aaaabd82e3a in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00002aab18d83a6e in xmlGetAttrIndex (index=3D, attrName=3D0x2aab18e2820c "familyid", node=3D0x2aae9c108160) at graph/xml.h= :67 #2 xmlSetAttrInt (value=3D143, attrName=3D0x2aab18e2820c "familyid", node=3D0x2aae9c108160) at graph/xml.h:167 #3 ncclTopoGetXmlFromCpu (cpuNode=3DcpuNode@entry=3D0x2aae9c108160, xml=3Dxml@entry=3D0x2aae9c0d1f20) at graph/xml.cc:436 Moving the `strncmp(key, attrName, MAX_STR_LEN) =3D=3D 0` out into a separa= te function to see the arguments in the debugger shows this backtrace: #0 0x00002aaaabd83c00 in __strncmp_sse42 () from /lib64/libc.so.6 #1 0x00002aab18d75a9f in cmpFromXml (attrName=3D0x89300800 , key=3D0x2aaeac107eb0 "numaid") at graph/xml.= h:65 #2 xmlGetAttrIndex (index=3D, attrName=3D0x89300800 , node=3D0x2aaeac107db0) at graph/xml.h:73 #3 xmlSetAttrInt (node=3Dnode@entry=3D0x2aaeac107db0, attrName=3DattrName@entry=3D0x89300800 , value=3D143) at graph/xml.h:174 #4 0x00002aab18d77de4 in ncclTopoGetXmlFromCpu (cpuNode=3DcpuNode@entry=3D0x2aaeac107db0, xml=3Dxml@entry=3D0x2aaeac0d1b70= ) at graph/xml.cc:437 So it looks like the `attrName` parameter gets corrupted somehow. The calls= ite of `xmlSetAttrInt` is `NCCLCHECK(xmlSetAttrInt(cpuNode, "familyid", familyId));`, so that parameter is a string constant already used earlier by `NCCLCHECK(xmlGetAttrIndex(cpuNode, "familyid", &index));` I suspect the `index` parameter to be involved. Many modifications cause the bug to disappear, such as removing the `NCCLCH= ECK` macro (basically an `if(error) return error;`-wrapper) or adding fprintf-statements into xmlGetAttrIndex or cmpFromXml The compile command is `g++ -fPIC -fvisibility=3Dhidden -std=3Dc++11 -O2 -g= -ggdb3 -c graph/xml.cc`, the preprocessed source. Needs minimization but as it only happens when compiled into a library used by a python package from a script= I don't know how. So I hope that there will be something obvious for someone familiar with the optimization in GCC=