Support conversion of hexidecimal values #173

marcpaterno · 2025-03-16T20:32:00Z

This small addition allows as.integer64.character to handle hexidecimal printed representation of large integers, in the style of as.integer in base R.

MichaelChirico · 2025-03-16T20:45:46Z

thanks for the PR! the idea seems great!

would you mind

filing an issue that motivates the change, including the current failure mode / (optional) known workarounds
including a test that actually touches the 64-bit range, i.e. that wouldn't fit in 32-bit integers (or I guess that's covered already?, I never remember what the exact boundary is... maybe make it at least 2^33 for clarity)
include a few more tests, e.g. of negative #s, hex #s that don't fit in 64 bits, any edge cases you can think of

MichaelChirico · 2025-03-16T23:26:20Z

WDYT about just letting strtoll() handle this by setting base=0 instead?

https://en.cppreference.com/w/c/string/byte/strtol

src/integer64.c

MichaelChirico · 2025-03-16T23:50:29Z

I think the issue with just doing base=0 is it will not be back-compatible (c.f. as.integer64('011'), which will be 11LL on main but 9LL with base=0).

We can easily imagine there's ppl working with long-string identifiers relying on the old behavior.

That said, I think we can do this?

-ret[i] = strtoll(str, &endpointer, 10);
+int negative=str[0] == '-';
+if (!strncmp(str+negative, "0x", 2) || !strncmp(str+negative,"0X", 2)) {
+  ret[i] = strtoll(str, &endpointer, 16);
+} else {
+  ret[i] = strtoll(str, &endpointer, 10);
+}

marcpaterno · 2025-03-17T13:11:42Z

I think the issue with just doing base=0 is it will not be back-compatible (c.f. as.integer64('011'), which will be 11LL on main but 9LL with base=0).

We can easily imagine there's ppl working with long-string identifiers relying on the old behavior.

That said, I think we can do this?
-ret[i] = strtoll(str, &endpointer, 10);
+int negative=str[0] == '-';
+if (!strncmp(str+negative, "0x", 2) || !strncmp(str+negative,"0X", 2)) {
+  ret[i] = strtoll(str, &endpointer, 16);
+} else {
+  ret[i] = strtoll(str, &endpointer, 10);
+}

I have tested this code and it yields different behavior in one edge case. I thought I had picked the appropriate behavior, but you can of course correct me.

In the case of handling the string "-0x8000000000000000", my original code yielded -9223372036854775807, the first value in the result of lim.integer64(). Modified as you propose above, the result is "NA".

I will add the new testing (that includes the handling of input "0") to the PR, and await your decision on the desired behavior for handling "-0x8000000000000000".

# Conflicts: # tests/testthat/test-integer64.R

MichaelChirico · 2025-03-17T15:49:43Z

Let's pick whatever strtoll("-0x8000000000000000", &endpointer, 16) does, I think it's best to be consistent with & as close to the well-tested underlying C library routine.

MichaelChirico · 2025-03-17T15:50:08Z

please add a NEWS entry btw!

tests/testthat/test-integer64.R

marcpaterno · 2025-03-17T23:39:28Z

I do not think it is possible to match the behavior of strtoll with the current representation of integer64.
strtoll will read "-9223372036854775808" and represent it exactly. This value is 1 smaller than what lim.integer64() indicates can be represented by integer64. strtoll will read "-9223372036854775809" and yield the value -9223372036854775808.

How would you like to proceed?

MichaelChirico · 2025-03-18T06:11:47Z

Oh, great point. That's because -922...808 is how NA_integer64_ is represented (just as NA_integer_ is represented as -214...647):

bit64/src/integer64.h

Line 28 in 62cd4ee

#define NA_INTEGER64 LLONG_MIN

In fact you've found what I think is kind of buggy behavior in as.integer64.character for decimal string input, too: #175.

Part of the issue is, there's no 128-bit double to case into with full fidelity to work with these out-of-range values before casting back down to 64 bits; that's what's done in the 32-bit case by R:

https://github.com/r-devel/r-svn/blob/c8ab68478c5d2322956e19764c20c092916133cd/src/main/util.c#L2112

Anyway, for now, I think we should aim to be consistent between hex & decimal strings, and to bitstrings & return NA_integer64_ for -9223372036854775808

# as.bitstring(lim.integer64())
as.integer64(structure("1000000000000000000000000000000000000000000000000000000000000000", class='bitstring'))
# integer64
# [1] <NA>

Support conversion of hexidecimal values

ebede66

marcpaterno and others added 2 commits March 16, 2025 17:30

Fix handling of negative values and add more tests

0f8413e

style tweaks

232fec3

MichaelChirico reviewed Mar 16, 2025

View reviewed changes

src/integer64.c Show resolved Hide resolved

MichaelChirico reviewed Mar 16, 2025

View reviewed changes

src/integer64.c Show resolved Hide resolved

marcpaterno added 2 commits March 17, 2025 08:16

Test small decimal negatives and zero

6019ed5

Merge branch 'main' into work

4da129d

# Conflicts: # tests/testthat/test-integer64.R

MichaelChirico reviewed Mar 17, 2025

View reviewed changes

tests/testthat/test-integer64.R Outdated Show resolved Hide resolved

bad copy-paste

c247898

MichaelChirico reviewed Mar 17, 2025

View reviewed changes

tests/testthat/test-integer64.R Show resolved Hide resolved

restore more diff, make another test cover <0

73847bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support conversion of hexidecimal values #173

Support conversion of hexidecimal values #173

Uh oh!

marcpaterno commented Mar 16, 2025

Uh oh!

MichaelChirico commented Mar 16, 2025

Uh oh!

MichaelChirico commented Mar 16, 2025

Uh oh!

Uh oh!

Uh oh!

MichaelChirico commented Mar 16, 2025

Uh oh!

marcpaterno commented Mar 17, 2025

Uh oh!

MichaelChirico commented Mar 17, 2025

Uh oh!

MichaelChirico commented Mar 17, 2025

Uh oh!

Uh oh!

Uh oh!

marcpaterno commented Mar 17, 2025

Uh oh!

MichaelChirico commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Support conversion of hexidecimal values #173

Are you sure you want to change the base?

Support conversion of hexidecimal values #173

Uh oh!

Conversation

marcpaterno commented Mar 16, 2025

Uh oh!

MichaelChirico commented Mar 16, 2025

Uh oh!

MichaelChirico commented Mar 16, 2025

Uh oh!

Uh oh!

Uh oh!

MichaelChirico commented Mar 16, 2025

Uh oh!

marcpaterno commented Mar 17, 2025

Uh oh!

MichaelChirico commented Mar 17, 2025

Uh oh!

MichaelChirico commented Mar 17, 2025

Uh oh!

Uh oh!

Uh oh!

marcpaterno commented Mar 17, 2025

Uh oh!

MichaelChirico commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants