Skip to content

Conversation

saghul
Copy link
Contributor

@saghul saghul commented Sep 10, 2024

No description provided.

Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended as an optimization for page-size-or-bigger allocations, because of the expectation that calloc will mmap the memory?

assert(count != 0 && size != 0);

/* When malloc_limit is 0 (unlimited), malloc_limit - 1 will be SIZE_MAX. */
if (unlikely(s->malloc_size + (count * size) > s->malloc_limit - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count * size can overflow, you should probably handle that:

if (size > 0) // I know you have an assert a few lines up but just in case
    if (count != (count * size) / size)
        return NULL;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting tidbit... I was going to suggest to do the overflow check like this:

if (size > 0)
    if (count > SIZE_MAX / size)
        return NULL;

...on the assumption it's faster because it's replacing a variable (count) with a constant (SIZE_MAX) but... it's not!

gcc at -O3 compiles it to mul + jo (jump if overflow) in both cases, but in the SIZE_MAX code, it prepends a test + jz (jump if zero). Must be a missed optimization because omitting the test + jz doesn't affect correctness.

Even more interestingly: gcc replaces malloc+memset with calloc calls!

Code:

void *alloc(size_t m, size_t n) {
#ifdef DIVIDE_CONSTANT
        if (n > 0 && m > (size_t)-1 / n) return 0;
#else
        if (n > 0 && m != (m * n) / n) return 0;
#endif
        void *p = malloc(m*n);
        if (p) memset(p, 0, m*n);
        return p;
}

Assembly:

// -DDIVIDE_CONSTANT
0000000000001170 <alloc>:
    1170:       f3 0f 1e fa             endbr64 
    1174:       48 85 f6                test   %rsi,%rsi
    1177:       74 08                   je     1181 <alloc+0x11>
    1179:       48 89 f8                mov    %rdi,%rax
    117c:       48 f7 e6                mul    %rsi
    117f:       70 0e                   jo     118f <alloc+0x1f>
    1181:       48 0f af fe             imul   %rsi,%rdi
    1185:       be 01 00 00 00          mov    $0x1,%esi
    118a:       e9 c1 fe ff ff          jmp    1050 <calloc@plt>
    118f:       31 c0                   xor    %eax,%eax
    1191:       c3                      ret    

// no -DDIVIDE_CONSTANT
0000000000001170 <alloc>:
    1170:       f3 0f 1e fa             endbr64 
    1174:       48 89 f0                mov    %rsi,%rax
    1177:       48 f7 e7                mul    %rdi
    117a:       48 89 c7                mov    %rax,%rdi
    117d:       70 0a                   jo     1189 <alloc+0x19>
    117f:       be 01 00 00 00          mov    $0x1,%esi
    1184:       e9 c7 fe ff ff          jmp    1050 <calloc@plt>
    1189:       31 c0                   xor    %eax,%eax
    118b:       c3                      ret    

assert(count != 0 && size != 0);

/* When malloc_limit is 0 (unlimited), malloc_limit - 1 will be SIZE_MAX. */
if (unlikely(s->malloc_size + (count * size) > s->malloc_limit - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@saghul
Copy link
Contributor Author

saghul commented Sep 10, 2024

Some (non-scientific) results:

## macOS before

kraken-1.0/ai-astar-data.js (0.0487 seconds)
kraken-1.0/ai-astar.js (1.4193 seconds)
kraken-1.0/audio-beat-detection-data.js (0.0050 seconds)
kraken-1.0/audio-beat-detection.js (1.1712 seconds)
kraken-1.0/audio-dft-data.js (0.0041 seconds)
kraken-1.0/audio-dft.js (0.7772 seconds)
kraken-1.0/audio-fft-data.js (0.0041 seconds)
kraken-1.0/audio-fft.js (1.0665 seconds)
kraken-1.0/audio-oscillator-data.js (0.0040 seconds)
kraken-1.0/audio-oscillator.js (0.8044 seconds)
kraken-1.0/imaging-gaussian-blur-data.js (0.0896 seconds)
kraken-1.0/imaging-gaussian-blur.js (8.5139 seconds)
kraken-1.0/imaging-darkroom-data.js (0.0890 seconds)
kraken-1.0/imaging-darkroom.js (1.0774 seconds)
kraken-1.0/imaging-desaturate-data.js (0.0900 seconds)
kraken-1.0/imaging-desaturate.js (1.9366 seconds)
kraken-1.0/json-parse-financial-data.js (0.0007 seconds)
kraken-1.0/json-parse-financial.js (0.0744 seconds)
kraken-1.0/json-stringify-tinderbox-data.js (0.0130 seconds)
kraken-1.0/json-stringify-tinderbox.js (0.0596 seconds)
kraken-1.0/stanford-crypto-aes-data.js (0.0129 seconds)
kraken-1.0/stanford-crypto-aes.js (0.3453 seconds)
kraken-1.0/stanford-crypto-ccm-data.js (0.0141 seconds)
kraken-1.0/stanford-crypto-ccm.js (0.2583 seconds)
kraken-1.0/stanford-crypto-pbkdf2-data.js (0.0020 seconds)
kraken-1.0/stanford-crypto-pbkdf2.js (0.7996 seconds)
kraken-1.0/stanford-crypto-sha256-iterative-data.js (0.0018 seconds)
kraken-1.0/stanford-crypto-sha256-iterative.js (0.2399 seconds)

Total (ms): 18922.3160

kraken-1.1/ai-astar-data.js (0.0396 seconds)
kraken-1.1/ai-astar.js (1.4209 seconds)
kraken-1.1/audio-beat-detection-data.js (0.0050 seconds)
kraken-1.1/audio-beat-detection.js (1.1287 seconds)
kraken-1.1/audio-dft-data.js (0.0042 seconds)
kraken-1.1/audio-dft.js (0.7840 seconds)
kraken-1.1/audio-fft-data.js (0.0041 seconds)
kraken-1.1/audio-fft.js (1.0157 seconds)
kraken-1.1/audio-oscillator-data.js (0.0040 seconds)
kraken-1.1/audio-oscillator.js (0.8113 seconds)
kraken-1.1/imaging-gaussian-blur-data.js (0.0905 seconds)
kraken-1.1/imaging-gaussian-blur.js (8.5465 seconds)
kraken-1.1/imaging-darkroom-data.js (0.0896 seconds)
kraken-1.1/imaging-darkroom.js (1.0832 seconds)
kraken-1.1/imaging-desaturate-data.js (0.0888 seconds)
kraken-1.1/imaging-desaturate.js (1.9378 seconds)
kraken-1.1/json-parse-financial-data.js (0.0004 seconds)
kraken-1.1/json-parse-financial.js (0.0746 seconds)
kraken-1.1/json-stringify-tinderbox-data.js (0.0130 seconds)
kraken-1.1/json-stringify-tinderbox.js (0.0611 seconds)
kraken-1.1/stanford-crypto-aes-data.js (0.0130 seconds)
kraken-1.1/stanford-crypto-aes.js (0.3446 seconds)
kraken-1.1/stanford-crypto-ccm-data.js (0.0148 seconds)
kraken-1.1/stanford-crypto-ccm.js (0.2606 seconds)
kraken-1.1/stanford-crypto-pbkdf2-data.js (0.0022 seconds)
kraken-1.1/stanford-crypto-pbkdf2.js (0.7967 seconds)
kraken-1.1/stanford-crypto-sha256-iterative-data.js (0.0017 seconds)
kraken-1.1/stanford-crypto-sha256-iterative.js (0.2415 seconds)

Total (ms): 18878.1750

sunspider-1.0/3d-cube.js (0.0338 seconds)
sunspider-1.0/3d-morph.js (0.0215 seconds)
sunspider-1.0/3d-raytrace.js (0.0174 seconds)
sunspider-1.0/access-binary-trees.js (0.0129 seconds)
sunspider-1.0/access-fannkuch.js (0.0491 seconds)
sunspider-1.0/access-nbody.js (0.0114 seconds)
sunspider-1.0/access-nsieve.js (0.0189 seconds)
sunspider-1.0/bitops-3bit-bits-in-byte.js (0.0071 seconds)
sunspider-1.0/bitops-bits-in-byte.js (0.0152 seconds)
sunspider-1.0/bitops-bitwise-and.js (0.0257 seconds)
sunspider-1.0/bitops-nsieve-bits.js (0.0137 seconds)
sunspider-1.0/controlflow-recursive.js (0.0064 seconds)
sunspider-1.0/crypto-aes.js (0.0220 seconds)
sunspider-1.0/crypto-md5.js (0.0065 seconds)
sunspider-1.0/crypto-sha1.js (0.0064 seconds)
sunspider-1.0/date-format-tofte.js (0.0383 seconds)
sunspider-1.0/date-format-xparb.js (0.0176 seconds)
sunspider-1.0/math-cordic.js (0.0230 seconds)
sunspider-1.0/math-partial-sums.js (0.0120 seconds)
sunspider-1.0/math-spectral-norm.js (0.0091 seconds)
sunspider-1.0/regexp-dna.js (0.1049 seconds)
sunspider-1.0/string-base64.js (0.0188 seconds)
sunspider-1.0/string-fasta.js (0.0515 seconds)
sunspider-1.0/string-tagcloud.js (0.0528 seconds)
sunspider-1.0/string-unpack-code.js (0.0954 seconds)
sunspider-1.0/string-validate-input.js (0.0748 seconds)

Total (ms): 766.0570

Richards : 1253
DeltaBlue : 1230
Crypto : 1234
RayTrace : 1324
EarleyBoyer : 2385
RegExp : 275
Splay : 2477
SplayLatency : 9185
NavierStokes : 2195
PdfJS : 4301
Mandreel : 1041
MandreelLatency : 7071
Gameboy : 9104
CodeLoad : 17352
Box2D : 5384
zlib : ReferenceError: print is not defined
Typescript : 18187

Score (version 9): 2988


macOS after

kraken-1.0/ai-astar-data.js (0.0517 seconds)
kraken-1.0/ai-astar.js (1.4119 seconds)
kraken-1.0/audio-beat-detection-data.js (0.0052 seconds)
kraken-1.0/audio-beat-detection.js (1.1600 seconds)
kraken-1.0/audio-dft-data.js (0.0043 seconds)
kraken-1.0/audio-dft.js (0.7544 seconds)
kraken-1.0/audio-fft-data.js (0.0040 seconds)
kraken-1.0/audio-fft.js (1.0513 seconds)
kraken-1.0/audio-oscillator-data.js (0.0042 seconds)
kraken-1.0/audio-oscillator.js (0.8084 seconds)
kraken-1.0/imaging-gaussian-blur-data.js (0.0895 seconds)
kraken-1.0/imaging-gaussian-blur.js (8.4772 seconds)
kraken-1.0/imaging-darkroom-data.js (0.0908 seconds)
kraken-1.0/imaging-darkroom.js (1.0795 seconds)
kraken-1.0/imaging-desaturate-data.js (0.0912 seconds)
kraken-1.0/imaging-desaturate.js (1.9241 seconds)
kraken-1.0/json-parse-financial-data.js (0.0005 seconds)
kraken-1.0/json-parse-financial.js (0.0744 seconds)
kraken-1.0/json-stringify-tinderbox-data.js (0.0132 seconds)
kraken-1.0/json-stringify-tinderbox.js (0.0598 seconds)
kraken-1.0/stanford-crypto-aes-data.js (0.0130 seconds)
kraken-1.0/stanford-crypto-aes.js (0.3396 seconds)
kraken-1.0/stanford-crypto-ccm-data.js (0.0142 seconds)
kraken-1.0/stanford-crypto-ccm.js (0.2588 seconds)
kraken-1.0/stanford-crypto-pbkdf2-data.js (0.0020 seconds)
kraken-1.0/stanford-crypto-pbkdf2.js (0.7788 seconds)
kraken-1.0/stanford-crypto-sha256-iterative-data.js (0.0017 seconds)
kraken-1.0/stanford-crypto-sha256-iterative.js (0.2386 seconds)

Total (ms): 18802.1750

kraken-1.1/ai-astar-data.js (0.0363 seconds)
kraken-1.1/ai-astar.js (1.4230 seconds)
kraken-1.1/audio-beat-detection-data.js (0.0054 seconds)
kraken-1.1/audio-beat-detection.js (1.1241 seconds)
kraken-1.1/audio-dft-data.js (0.0041 seconds)
kraken-1.1/audio-dft.js (0.7393 seconds)
kraken-1.1/audio-fft-data.js (0.0040 seconds)
kraken-1.1/audio-fft.js (0.9918 seconds)
kraken-1.1/audio-oscillator-data.js (0.0040 seconds)
kraken-1.1/audio-oscillator.js (0.8018 seconds)
kraken-1.1/imaging-gaussian-blur-data.js (0.0895 seconds)
kraken-1.1/imaging-gaussian-blur.js (8.4916 seconds)
kraken-1.1/imaging-darkroom-data.js (0.0896 seconds)
kraken-1.1/imaging-darkroom.js (1.0723 seconds)
kraken-1.1/imaging-desaturate-data.js (0.0915 seconds)
kraken-1.1/imaging-desaturate.js (1.9257 seconds)
kraken-1.1/json-parse-financial-data.js (0.0003 seconds)
kraken-1.1/json-parse-financial.js (0.0742 seconds)
kraken-1.1/json-stringify-tinderbox-data.js (0.0132 seconds)
kraken-1.1/json-stringify-tinderbox.js (0.0611 seconds)
kraken-1.1/stanford-crypto-aes-data.js (0.0133 seconds)
kraken-1.1/stanford-crypto-aes.js (0.3414 seconds)
kraken-1.1/stanford-crypto-ccm-data.js (0.0141 seconds)
kraken-1.1/stanford-crypto-ccm.js (0.2551 seconds)
kraken-1.1/stanford-crypto-pbkdf2-data.js (0.0020 seconds)
kraken-1.1/stanford-crypto-pbkdf2.js (0.7774 seconds)
kraken-1.1/stanford-crypto-sha256-iterative-data.js (0.0016 seconds)
kraken-1.1/stanford-crypto-sha256-iterative.js (0.2362 seconds)

Total (ms): 18683.8060

sunspider-1.0/3d-cube.js (0.0407 seconds)
sunspider-1.0/3d-morph.js (0.0228 seconds)
sunspider-1.0/3d-raytrace.js (0.0178 seconds)
sunspider-1.0/access-binary-trees.js (0.0128 seconds)
sunspider-1.0/access-fannkuch.js (0.0479 seconds)
sunspider-1.0/access-nbody.js (0.0115 seconds)
sunspider-1.0/access-nsieve.js (0.0190 seconds)
sunspider-1.0/bitops-3bit-bits-in-byte.js (0.0067 seconds)
sunspider-1.0/bitops-bits-in-byte.js (0.0150 seconds)
sunspider-1.0/bitops-bitwise-and.js (0.0251 seconds)
sunspider-1.0/bitops-nsieve-bits.js (0.0131 seconds)
sunspider-1.0/controlflow-recursive.js (0.0062 seconds)
sunspider-1.0/crypto-aes.js (0.0219 seconds)
sunspider-1.0/crypto-md5.js (0.0062 seconds)
sunspider-1.0/crypto-sha1.js (0.0060 seconds)
sunspider-1.0/date-format-tofte.js (0.0382 seconds)
sunspider-1.0/date-format-xparb.js (0.0176 seconds)
sunspider-1.0/math-cordic.js (0.0227 seconds)
sunspider-1.0/math-partial-sums.js (0.0119 seconds)
sunspider-1.0/math-spectral-norm.js (0.0091 seconds)
sunspider-1.0/regexp-dna.js (0.1059 seconds)
sunspider-1.0/string-base64.js (0.0191 seconds)
sunspider-1.0/string-fasta.js (0.0515 seconds)
sunspider-1.0/string-tagcloud.js (0.0551 seconds)
sunspider-1.0/string-unpack-code.js (0.0958 seconds)
sunspider-1.0/string-validate-input.js (0.0774 seconds)

Total (ms): 777.0140

Richards : 1255
DeltaBlue : 1166
Crypto : 1241
RayTrace : 1308
EarleyBoyer : 2382
RegExp : 274
Splay : 2494
SplayLatency : 9338
NavierStokes : 2181
PdfJS : 4312
Mandreel : 1050
MandreelLatency : 7051
Gameboy : 8866
CodeLoad : 16892
Box2D : 5422
zlib : ReferenceError: print is not defined
Typescript : 18304

Score (version 9): 2973

Linux before

kraken-1.0/ai-astar-data.js (0.0453 seconds)
kraken-1.0/ai-astar.js (3.4035 seconds)
kraken-1.0/audio-beat-detection-data.js (0.0082 seconds)
kraken-1.0/audio-beat-detection.js (2.0010 seconds)
kraken-1.0/audio-dft-data.js (0.0084 seconds)
kraken-1.0/audio-dft.js (1.8161 seconds)
kraken-1.0/audio-fft-data.js (0.0075 seconds)
kraken-1.0/audio-fft.js (1.8986 seconds)
kraken-1.0/audio-oscillator-data.js (0.0075 seconds)
kraken-1.0/audio-oscillator.js (1.6619 seconds)
kraken-1.0/imaging-gaussian-blur-data.js (0.1260 seconds)
kraken-1.0/imaging-gaussian-blur.js (15.9569 seconds)
kraken-1.0/imaging-darkroom-data.js (0.1208 seconds)
kraken-1.0/imaging-darkroom.js (1.9736 seconds)
kraken-1.0/imaging-desaturate-data.js (0.1177 seconds)
kraken-1.0/imaging-desaturate.js (3.5737 seconds)
kraken-1.0/json-parse-financial-data.js (0.0005 seconds)
kraken-1.0/json-parse-financial.js (0.1080 seconds)
kraken-1.0/json-stringify-tinderbox-data.js (0.0176 seconds)
kraken-1.0/json-stringify-tinderbox.js (0.0876 seconds)
kraken-1.0/stanford-crypto-aes-data.js (0.0189 seconds)
kraken-1.0/stanford-crypto-aes.js (0.9771 seconds)
kraken-1.0/stanford-crypto-ccm-data.js (0.0191 seconds)
kraken-1.0/stanford-crypto-ccm.js (0.6871 seconds)
kraken-1.0/stanford-crypto-pbkdf2-data.js (0.0028 seconds)
kraken-1.0/stanford-crypto-pbkdf2.js (2.1654 seconds)
kraken-1.0/stanford-crypto-sha256-iterative-data.js (0.0022 seconds)
kraken-1.0/stanford-crypto-sha256-iterative.js (0.6629 seconds)

Total (ms): 37476.0620

kraken-1.1/ai-astar-data.js (0.0450 seconds)
kraken-1.1/ai-astar.js (3.3833 seconds)
kraken-1.1/audio-beat-detection-data.js (0.0082 seconds)
kraken-1.1/audio-beat-detection.js (1.9347 seconds)
kraken-1.1/audio-dft-data.js (0.0085 seconds)
kraken-1.1/audio-dft.js (1.8085 seconds)
kraken-1.1/audio-fft-data.js (0.0075 seconds)
kraken-1.1/audio-fft.js (1.8428 seconds)
kraken-1.1/audio-oscillator-data.js (0.0075 seconds)
kraken-1.1/audio-oscillator.js (1.6660 seconds)
kraken-1.1/imaging-gaussian-blur-data.js (0.1253 seconds)
kraken-1.1/imaging-gaussian-blur.js (16.0328 seconds)
kraken-1.1/imaging-darkroom-data.js (0.1211 seconds)
kraken-1.1/imaging-darkroom.js (1.9935 seconds)
kraken-1.1/imaging-desaturate-data.js (0.1175 seconds)
kraken-1.1/imaging-desaturate.js (3.8197 seconds)
kraken-1.1/json-parse-financial-data.js (0.0005 seconds)
kraken-1.1/json-parse-financial.js (0.1067 seconds)
kraken-1.1/json-stringify-tinderbox-data.js (0.0176 seconds)
kraken-1.1/json-stringify-tinderbox.js (0.0871 seconds)
kraken-1.1/stanford-crypto-aes-data.js (0.0189 seconds)
kraken-1.1/stanford-crypto-aes.js (0.9816 seconds)
kraken-1.1/stanford-crypto-ccm-data.js (0.0191 seconds)
kraken-1.1/stanford-crypto-ccm.js (0.6846 seconds)
kraken-1.1/stanford-crypto-pbkdf2-data.js (0.0029 seconds)
kraken-1.1/stanford-crypto-pbkdf2.js (2.1630 seconds)
kraken-1.1/stanford-crypto-sha256-iterative-data.js (0.0022 seconds)
kraken-1.1/stanford-crypto-sha256-iterative.js (0.6628 seconds)

Total (ms): 37668.7920

sunspider-1.0/3d-cube.js (0.0461 seconds)
sunspider-1.0/3d-morph.js (0.0440 seconds)
sunspider-1.0/3d-raytrace.js (0.0314 seconds)
sunspider-1.0/access-binary-trees.js (0.0190 seconds)
sunspider-1.0/access-fannkuch.js (0.0988 seconds)
sunspider-1.0/access-nbody.js (0.0233 seconds)
sunspider-1.0/access-nsieve.js (0.0394 seconds)
sunspider-1.0/bitops-3bit-bits-in-byte.js (0.0327 seconds)
sunspider-1.0/bitops-bits-in-byte.js (0.0481 seconds)
sunspider-1.0/bitops-bitwise-and.js (0.0534 seconds)
sunspider-1.0/bitops-nsieve-bits.js (0.0560 seconds)
sunspider-1.0/controlflow-recursive.js (0.0167 seconds)
sunspider-1.0/crypto-aes.js (0.0387 seconds)
sunspider-1.0/crypto-md5.js (0.0185 seconds)
sunspider-1.0/crypto-sha1.js (0.0190 seconds)
sunspider-1.0/date-format-tofte.js (0.0499 seconds)
sunspider-1.0/date-format-xparb.js (0.0210 seconds)
sunspider-1.0/math-cordic.js (0.0463 seconds)
sunspider-1.0/math-partial-sums.js (0.0253 seconds)
sunspider-1.0/math-spectral-norm.js (0.0214 seconds)
sunspider-1.0/regexp-dna.js (0.1253 seconds)
sunspider-1.0/string-base64.js (0.0312 seconds)
sunspider-1.0/string-fasta.js (0.0741 seconds)
sunspider-1.0/string-tagcloud.js (0.0706 seconds)
sunspider-1.0/string-unpack-code.js (0.1337 seconds)
sunspider-1.0/string-validate-input.js (0.1145 seconds)

Total (ms): 1298.2530

Richards : 707
DeltaBlue : 741
Crypto : 446
RayTrace : 911
EarleyBoyer : 1184
RegExp : 244
Splay : 1674
SplayLatency : 5053
NavierStokes : 1061
PdfJS : 2351
Mandreel : 378
MandreelLatency : 2917
Gameboy : 3877
CodeLoad : 13242
Box2D : 2650
zlib : ReferenceError: print is not defined
Typescript : 8840

Score (version 9): 1593


Linux after

kraken-1.0/ai-astar-data.js (0.0455 seconds)
kraken-1.0/ai-astar.js (3.3884 seconds)
kraken-1.0/audio-beat-detection-data.js (0.0083 seconds)
kraken-1.0/audio-beat-detection.js (1.9994 seconds)
kraken-1.0/audio-dft-data.js (0.0085 seconds)
kraken-1.0/audio-dft.js (1.8215 seconds)
kraken-1.0/audio-fft-data.js (0.0076 seconds)
kraken-1.0/audio-fft.js (1.9149 seconds)
kraken-1.0/audio-oscillator-data.js (0.0076 seconds)
kraken-1.0/audio-oscillator.js (1.6853 seconds)
kraken-1.0/imaging-gaussian-blur-data.js (0.1261 seconds)
kraken-1.0/imaging-gaussian-blur.js (15.8976 seconds)
kraken-1.0/imaging-darkroom-data.js (0.1226 seconds)
kraken-1.0/imaging-darkroom.js (1.9713 seconds)
kraken-1.0/imaging-desaturate-data.js (0.1189 seconds)
kraken-1.0/imaging-desaturate.js (3.6046 seconds)
kraken-1.0/json-parse-financial-data.js (0.0005 seconds)
kraken-1.0/json-parse-financial.js (0.1067 seconds)
kraken-1.0/json-stringify-tinderbox-data.js (0.0178 seconds)
kraken-1.0/json-stringify-tinderbox.js (0.0910 seconds)
kraken-1.0/stanford-crypto-aes-data.js (0.0194 seconds)
kraken-1.0/stanford-crypto-aes.js (0.9704 seconds)
kraken-1.0/stanford-crypto-ccm-data.js (0.0192 seconds)
kraken-1.0/stanford-crypto-ccm.js (0.6874 seconds)
kraken-1.0/stanford-crypto-pbkdf2-data.js (0.0028 seconds)
kraken-1.0/stanford-crypto-pbkdf2.js (2.1759 seconds)
kraken-1.0/stanford-crypto-sha256-iterative-data.js (0.0022 seconds)
kraken-1.0/stanford-crypto-sha256-iterative.js (0.6657 seconds)

Total (ms): 37487.0360

kraken-1.1/ai-astar-data.js (0.0475 seconds)
kraken-1.1/ai-astar.js (3.3670 seconds)
kraken-1.1/audio-beat-detection-data.js (0.0083 seconds)
kraken-1.1/audio-beat-detection.js (1.9800 seconds)
kraken-1.1/audio-dft-data.js (0.0087 seconds)
kraken-1.1/audio-dft.js (1.8232 seconds)
kraken-1.1/audio-fft-data.js (0.0076 seconds)
kraken-1.1/audio-fft.js (1.8520 seconds)
kraken-1.1/audio-oscillator-data.js (0.0076 seconds)
kraken-1.1/audio-oscillator.js (1.6612 seconds)
kraken-1.1/imaging-gaussian-blur-data.js (0.1263 seconds)
kraken-1.1/imaging-gaussian-blur.js (16.1703 seconds)
kraken-1.1/imaging-darkroom-data.js (0.1225 seconds)
kraken-1.1/imaging-darkroom.js (2.0144 seconds)
kraken-1.1/imaging-desaturate-data.js (0.1186 seconds)
kraken-1.1/imaging-desaturate.js (3.6067 seconds)
kraken-1.1/json-parse-financial-data.js (0.0005 seconds)
kraken-1.1/json-parse-financial.js (0.1072 seconds)
kraken-1.1/json-stringify-tinderbox-data.js (0.0179 seconds)
kraken-1.1/json-stringify-tinderbox.js (0.0912 seconds)
kraken-1.1/stanford-crypto-aes-data.js (0.0195 seconds)
kraken-1.1/stanford-crypto-aes.js (0.9667 seconds)
kraken-1.1/stanford-crypto-ccm-data.js (0.0193 seconds)
kraken-1.1/stanford-crypto-ccm.js (0.6849 seconds)
kraken-1.1/stanford-crypto-pbkdf2-data.js (0.0028 seconds)
kraken-1.1/stanford-crypto-pbkdf2.js (2.1731 seconds)
kraken-1.1/stanford-crypto-sha256-iterative-data.js (0.0022 seconds)
kraken-1.1/stanford-crypto-sha256-iterative.js (0.6675 seconds)

Total (ms): 37674.6470

sunspider-1.0/3d-cube.js (0.0415 seconds)
sunspider-1.0/3d-morph.js (0.0426 seconds)
sunspider-1.0/3d-raytrace.js (0.0313 seconds)
sunspider-1.0/access-binary-trees.js (0.0189 seconds)
sunspider-1.0/access-fannkuch.js (0.1004 seconds)
sunspider-1.0/access-nbody.js (0.0237 seconds)
sunspider-1.0/access-nsieve.js (0.0396 seconds)
sunspider-1.0/bitops-3bit-bits-in-byte.js (0.0327 seconds)
sunspider-1.0/bitops-bits-in-byte.js (0.0480 seconds)
sunspider-1.0/bitops-bitwise-and.js (0.0540 seconds)
sunspider-1.0/bitops-nsieve-bits.js (0.0563 seconds)
sunspider-1.0/controlflow-recursive.js (0.0167 seconds)
sunspider-1.0/crypto-aes.js (0.0384 seconds)
sunspider-1.0/crypto-md5.js (0.0185 seconds)
sunspider-1.0/crypto-sha1.js (0.0189 seconds)
sunspider-1.0/date-format-tofte.js (0.0508 seconds)
sunspider-1.0/date-format-xparb.js (0.0208 seconds)
sunspider-1.0/math-cordic.js (0.0459 seconds)
sunspider-1.0/math-partial-sums.js (0.0257 seconds)
sunspider-1.0/math-spectral-norm.js (0.0214 seconds)
sunspider-1.0/regexp-dna.js (0.1258 seconds)
sunspider-1.0/string-base64.js (0.0312 seconds)
sunspider-1.0/string-fasta.js (0.0722 seconds)
sunspider-1.0/string-tagcloud.js (0.0706 seconds)
sunspider-1.0/string-unpack-code.js (0.1335 seconds)
sunspider-1.0/string-validate-input.js (0.1167 seconds)

Total (ms): 1296.0790

Richards : 722
DeltaBlue : 739
Crypto : 445
RayTrace : 896
EarleyBoyer : 1206
RegExp : 242
Splay : 1662
SplayLatency : 4979
NavierStokes : 1044
PdfJS : 2366
Mandreel : 378
MandreelLatency : 2917
Gameboy : 3845
CodeLoad : 13152
Box2D : 2604
zlib : ReferenceError: print is not defined
Typescript : 8728

Score (version 9): 1586

Not sure this is worth it, any idea how we could better measure the impact @bnoordhuis ?

@saghul
Copy link
Contributor Author

saghul commented Sep 10, 2024

Is this intended as an optimization for page-size-or-bigger allocations, because of the expectation that calloc will mmap the memory?

KInda yeah, that since at least one such allocation is a potential hot path (object shape allocation) maybe this would help. My results (see comment) don't seem conclusive though!

@gengjiawen
Copy link

gengjiawen commented Sep 11, 2024

Looks promising.

Benchmark (Higher scores are better) QuickJS QuickJS calloc (This PR) V8 --jitless V8 JSC
Richards 458 451 924 23003 24200
DeltaBlue 634 697 1433 99666 47069
Crypto 651 733 1085 43373 46701
RayTrace 670 1041 3597 89317 75627
EarleyBoyer 1230 1531 5326 68625 62693
RegExp 188 227 2761 8381 15396
Splay 1242 1664 6316 32213 29100
NavierStokes 1139 1228 1854 39544 35580
Score 672 801 2337 39964 37630

@saghul
Copy link
Contributor Author

saghul commented Sep 11, 2024

Thank you!

@bnoordhuis
Copy link
Contributor

at least one such allocation is a potential hot path (object shape allocation) [..] don't seem conclusive though

The way libcs implement calloc can deceive short-lived benchmarks. Canonical implementation:

void *calloc(size_t sz, size_t n) {
  // overflow checking elided
  void *p = malloc(sz * n);
  if (p && (isreused(p) || !ismmapped(p))) memset(p, 0, sz * n);
  return p;  
}

That is, libcs know when the allocation comes from a fresh all-zeroes page and omit the memset. Short-lived benchmarks are going to hit that condition often, making calloc look faster than it really it is vs. malloc+memset.

Short-lived programs are a good use case for quickjs though, so it's likely still a worthwhile optimization. Bursty programs (allocate a lot, release a lot, in cycles) probably stand to benefit, too, if libc unmaps the memory between cycles.

@saghul
Copy link
Contributor Author

saghul commented Sep 11, 2024

Thanks for the explainer! I'll fix the suggestions then!

@saghul saghul force-pushed the calloc branch 2 times, most recently from 0f261df to 0437e0b Compare September 11, 2024 11:29
@saghul saghul marked this pull request as ready for review September 11, 2024 11:29
@saghul
Copy link
Contributor Author

saghul commented Sep 11, 2024

@bnoordhuis Updated!

@chqrlie
Copy link
Collaborator

chqrlie commented Sep 11, 2024

This is good investigative work, we should document the implementation of js_trace_calloc and friends with precise insight and relevant information explaining the choices made and the measured advantages.

@saghul
Copy link
Contributor Author

saghul commented Sep 11, 2024

I'll add a link to this issue as a comment.

@saghul saghul merged commit fb70e09 into master Sep 11, 2024
50 checks passed
@saghul saghul deleted the calloc branch September 11, 2024 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants