From mboxrd@z Thu Jan 1 00:00:00 1970 From: paul.clements@steeleye.com To: gcc-gnats@gcc.gnu.org Subject: c/4104: structure misalignment problem causing kernel failures Date: Thu, 23 Aug 2001 13:06:00 -0000 Message-id: <20010823200222.6450.qmail@sourceware.cygnus.com> X-SW-Source: 2001-08/msg00604.html List-Id: >Number: 4104 >Category: c >Synopsis: structure misalignment problem causing kernel failures >Confidential: no >Severity: serious >Priority: medium >Responsible: unassigned >State: open >Class: sw-bug >Submitter-Id: net >Arrival-Date: Thu Aug 23 13:06:02 PDT 2001 >Closed-Date: >Last-Modified: >Originator: Paul Clements >Release: >Organization: >Environment: i386, Red Hat 7.1, 2.4.3-12 RedHat kernel >Description: We have been experiencing some problems trying to use kernel modules with kernels that are compiled with different versions of gcc. On our kernel build machine (where we compile our kernel modules) we have gcc 2.91.66 ; RedHat 7.1 ships with gcc 2.96. Now, the problem is that RedHat also apparently compiles (at least its newer) kernels with the 2.96 gcc. Unfortunately, there appears to be a structure misalignment problem in gcc 2.96. One particular instance of this problem that we are running into is in the raid1.o module in the 2.4.3 kernel. The structure alignment problem is causing our gcc 2.91.66-compiled raid1 module to malfunction. (raid1.o compiled from the same source on gcc 2.96 works fine.) We've traced the problem down to the following assembly code generated by the 2.96 and 2.91.66 gcc's respectively: (assembly code for parameter setup and call to __alloc_pages (within raid1_grow_buffers)) 2.96: movl $contig_page_data_Rsmp_cef82582+3800, %eax call __alloc_pages_Rsmp_decacc2f 2.91.66: movl $contig_page_data_Rsmp_cef82582+3884,%eax call __alloc_pages_Rsmp_decacc2f gcc 2.91.66 is padding out the zone_t structure by 28 bytes. With an array of 3 of those before our field in question that equals 84 bytes offset in the above assembler code. The 28 byte padding is because gcc 2.91.66 is trying to 32 byte align this structure. The reason for this is that the first submember of zone_t is explicitly defined as 32 byte aligned (per_cpu_t). So, gcc 2.91.66 is (properly) aligning the per_cpu_t structure on a 32 byte boundary as specified by the __attribute__((aligned(32))) directive in that structure's definition: (gdb) p &((pg_data_t *)0)->node_zones[1].cpu_pages[0] $22 = (per_cpu_t *) 0x4e0 (gdb) p &((pg_data_t *)0)->node_zones[1].cpu_pages[1] $23 = (per_cpu_t *) 0x500 (gdb) p 0x500 % 32 $24 = 0 (gdb) p 0x4e0 % 32 $25 = 0 gcc 2.96 is not properly aligning this structure: (gdb) p &((pg_data_t *)0)->node_zones[1].cpu_pages[0] $32 = (per_cpu_t *) 0x4c4 (gdb) p &((pg_data_t *)0)->node_zones[1].cpu_pages[1] $33 = (per_cpu_t *) 0x4e4 (gdb) p 0x4c4 % 32 $34 = 4 (gdb) p 0x4e4 % 32 $35 = 4 So, in order for our raid1 modules to work properly with a kernel compiled by gcc 2.96, we must also use (the broken) 2.96 to compile our module. >How-To-Repeat: compile raid1.o kernel module with gcc v2.91.66 and attempt to install module on (gcc 2.96 compiled) RedHat 7.1 2.4.3-12 kernel Module will fail to perform correctly. Data will never be resynced. (Root problem is structure misalignment which causes failure in __alloc_pages kernel function call.) >Fix: >Release-Note: >Audit-Trail: >Unformatted: Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs; gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81)