Skip to content

all: reduce type, align for 64-bit, using autopadding memholes after swap fields #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

GermanAizek
Copy link

@GermanAizek GermanAizek commented May 31, 2025

@tkanteck, @hpax

Using Pahole memory struct/class analyzer (from Red Hat https://linux.die.net/man/1/pahole) on object files after compilation, you can find places that are problematic for CPU cache, C/C++ compiler does not have automatic filling and alignment memholes and relies on programmer, since struct packaging can break behavior program, and for this there are keywords for packaging structures.

I also tried to keep previous order fields so as not to break style code.

I did not find in the documentation how to run nasm benchmark to compare PR and master, but in any case, leveling and reducing the size of structures reduces RAM usage since they can fit into the processor cache line.

I don't mind running a benchmark if someone tells me how. Maybe build speed on especially large projects will really be different.

Reduce type sizes:

  • section-align int -> int16_t
  • section-fileindex uint32_t -> uint16_t

Reduce structure sizes:

  • section 168 160 saved 8 bytes (using type size reduce)
  • coff_Section 112 96 saved 16 bytes
  • cv8_symbol 40 32 saved 8 bytes
  • cv8_state 152 144 saved 8 bytes
  • Symbol 56 48 saved 8 bytes

Pahole example output with struct coff_Section:

  • Comment /* XXX {n} bytes hole, try to pack */ shows where optimization is possible by rearranging the order of fields structures and classes or swap to end struct for autopadding by compiler

Master branch

debian@debian:~/GIT/nasm$ ~/GIT/dwarves/build/pahole --class_name=coff_Section */*.o
struct coff_Section {
        struct SAA *               data;                 /*     0     8 */
        uint32_t                   len;                  /*     8     4 */
        int                        nrelocs;              /*    12     4 */
        int32_t                    index;                /*    16     4 */

        /* XXX 4 bytes hole, try to pack */

        struct coff_Reloc *        head;                 /*    24     8 */
        struct coff_Reloc * *      tail;                 /*    32     8 */
        uint32_t                   flags;                /*    40     4 */
        uint32_t                   align_flags;          /*    44     4 */
        uint32_t                   sectalign_flags;      /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        char *                     name;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        int32_t                    namepos;              /*    64     4 */
        int32_t                    pos;                  /*    68     4 */
        int32_t                    relpos;               /*    72     4 */

        /* XXX 4 bytes hole, try to pack */

        int64_t                    pass_last_seen;       /*    80     8 */
        char *                     comdat_name;          /*    88     8 */
        uint32_t                   checksum;             /*    96     4 */
        int8_t                     comdat_selection;     /*   100     1 */
        int8_t                     comdat_symbol;        /*   101     1 */

        /* XXX 2 bytes hole, try to pack */

        int32_t                    comdat_associated;    /*   104     4 */

        /* size: 112, cachelines: 2, members: 19 */
        /* sum members: 94, holes: 4, sum holes: 14 */
        /* padding: 4 */
        /* last cacheline: 48 bytes */
};

PR

debian@debian:~/GIT/nasm$ ~/GIT/dwarves/build/pahole --reorganize -S --class_name=coff_Section */*.o
struct coff_Section {
        struct SAA *               data;                 /*     0     8 */
        struct coff_Reloc *        head;                 /*     8     8 */
        struct coff_Reloc * *      tail;                 /*    16     8 */
        uint32_t                   len;                  /*    24     4 */
        int                        nrelocs;              /*    28     4 */
        int32_t                    index;                /*    32     4 */
        uint32_t                   flags;                /*    36     4 */
        uint32_t                   align_flags;          /*    40     4 */
        uint32_t                   sectalign_flags;      /*    44     4 */
        char *                     name;                 /*    48     8 */
        int64_t                    pass_last_seen;       /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        int32_t                    namepos;              /*    64     4 */
        int32_t                    pos;                  /*    68     4 */
        int32_t                    relpos;               /*    72     4 */
        uint32_t                   checksum;             /*    76     4 */
        char *                     comdat_name;          /*    80     8 */
        int32_t                    comdat_associated;    /*    88     4 */
        int8_t                     comdat_selection;     /*    92     1 */
        int8_t                     comdat_symbol;        /*    93     1 */

        /* size: 96, cachelines: 2, members: 19 */
        /* padding: 2 */
        /* last cacheline: 32 bytes */
};

References:

https://hpc.rz.rptu.de/Tutorials/AVX/alignment.shtml

https://wr.informatik.uni-hamburg.de/_media/teaching/wintersemester_2013_2014/epc-14-haase-svenhendrik-alignmentinc-presentation.pdf

https://en.wikipedia.org/wiki/Data_structure_alignment

https://stackoverflow.com/a/20882083

https://zijishi.xyz/post/optimization-technique/learning-to-use-data-alignment/

…swap fields

Using Pahole memory struct/class analyzer (from Red Hat https://linux.die.net/man/1/pahole) on object files after compilation, you can find places that are problematic for CPU cache, C/C++ compiler does not have automatic filling and alignment memholes and relies on programmer, since struct packaging can break behavior program, and for this there are keywords for packaging structures.

Reduce type sizes:
section-align  int -> int16_t
section-fileindex  uint32_t -> uint16_t

Reduce structure sizes:
section         168     160     saved 8 bytes (using type size reduce)
itemplate       80      72      saved 8 bytes
coff_Section    112     96      saved 16 bytes
cv8_symbol      40      32      saved 8 bytes
cv8_state       152     144     saved 8 bytes
Symbol          56      48      saved 8 bytes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant