-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
faster String allocation
#19449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster String allocation
#19449
Conversation
9b881ff to
72ffe2d
Compare
| d = s.data | ||
| i = length(d) | ||
| @inbounds while i > 0 && is_valid_continuation(d[i]) | ||
| p = pointer(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a gcroot intrinsic with a semantic similar to ccall to make this usage gc safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what would be a good api? I'm mainly not sure about how to express the scope a gcroot applies to.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we should make the scope the whole enclosing function.
base/boot.jl
Outdated
| data::Array{UInt8,1} | ||
| # required to make String("foo") work (#15120): | ||
| String(d::Array{UInt8,1}) = new(d) | ||
| type String <: AbstractString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not immutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume so that the compiler would pass it correctly. (It'll be a isbits type otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct; the presence of the string data is not reflected in the layout of the type, so it would not be passed correctly. Of course that's only the beginning of the problems caused by a hack like this, requiring special cases e.g. in deepcopy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you addressed the aforementioned "problems" including the special deepcopy implementation?
base/strings/basic.jl
Outdated
|
|
||
| convert(::Type{Vector{UInt8}}, s::AbstractString) = String(s).data | ||
| convert(::Type{Array{UInt8}}, s::AbstractString) = String(s).data | ||
| #convert(::Type{Vector{UInt8}}, s::AbstractString) = String(s).data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be some mechanism to get a vector of bytes equivalent to a copy of the string after this is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that functionality is still pretty useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can definitely still have that. It might even be possible for an array to share string data in-place.
yuyichao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also needs a special case in the gc push_root before the pointerfree case. so that gc_setmark will see the right size.
|
Some interesting issues so far:
|
|
Next issue: this representation does not support |
133d41d to
d72ffe2
Compare
String allocationString allocation
d72ffe2 to
8ae0e85
Compare
|
This has passed CI and is ready for review. The last commit implements the following scheme for copy-avoidance when converting between strings and vectors:
I'm seeing >=30% speedups for #19141, and the spellcheck benchmark. We need to decide what to do about |
|
Is there a deprecation mechanism for |
base/LineEdit.jl
Outdated
| seek(buf, pos - length(r)) | ||
| end | ||
| splice!(buf.data, r + 1, ins.data) # position(), etc, are 0-indexed | ||
| splice!(buf.data, r + 1, convert(Vector{UInt8},ins)) # position(), etc, are 0-indexed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more concise if we could just do Vector(ins).
base/boot.jl
Outdated
| function write(io::IO, x::String) | ||
| nb = sizeof(x.data) | ||
| unsafe_write(io, ccall(:jl_array_ptr, Ptr{UInt8}, (Any,), x.data), nb) | ||
| nb = x.len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably sizeof(x) is defined and would be cleaner than accessing the .len field directly?
base/datafmt.jl
Outdated
| (!allow_comments || isascii(cchar))) | ||
| if T === String && all_ascii | ||
| return dlm_parse(dbuff.data, eol % UInt8, dlm % UInt8, qchar % UInt8, cchar % UInt8, | ||
| if all_ascii && !(D <: UInt8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D != UInt8? Why this additional condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. The whole all_ascii business at the top of this function can probably be removed.
base/strings/util.jl
Outdated
|
|
||
| function ascii(s::String) | ||
| for (i, b) in enumerate(s.data) | ||
| for (i, b) in enumerate(bytes(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is bytes? I thought we were now using convert(Vector{UInt8}, s) or maybe Vector(s) or Vector{UInt8}(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an iterator over bytes that I was trying out. I guess it's not really necessary.
base/util.jl
Outdated
| function crc32c end | ||
| crc32c(a::Array{UInt8}, crc::UInt32=0x00000000) = ccall(:jl_crc32c, UInt32, (UInt32, Ptr{UInt8}, Csize_t), crc, a, sizeof(a)) | ||
| crc32c(s::String, crc::UInt32=0x00000000) = crc32c(s.data, crc) | ||
| crc32c(s::String, crc::UInt32=0x00000000) = ccall(:jl_crc32c, UInt32, (UInt32, Ptr{UInt8}, Csize_t), crc, s, sizeof(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just use Union{Array{UInt8},String} for the first argument in a single method?
|
Needs NEWS and probably doc updates. |
src/array.c
Outdated
| tsz = JL_ARRAY_ALIGN(tsz, JL_CACHE_BYTE_ALIGNMENT); // align whole object | ||
| a = (jl_array_t*)jl_gc_alloc(ptls, tsz, atype); | ||
| JL_GC_PUSH1(&a); | ||
| data = jl_gc_alloc_buf(ptls, tot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the how needs to be initialized to 0 first before calling into the GC again so that the marking code won't try to trace a dangling pointer.
src/julia_internal.h
Outdated
| { | ||
| return jl_gc_alloc(ptls, sz, (void*)jl_buff_tag); | ||
| void **buf = (void**)jl_gc_alloc(ptls, sz + sizeof(void*), (void*)jl_buff_tag); | ||
| return (void*)(buf + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this mis-aligning the buf pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose so. I guess we should only do this for elsz==1.
|
Should accessing the array be a method of convert, reinterpret, or transcode? There was some resistance to using convert between strings and arrays-of-integers during the last string refactoring. |
|
We've had a conversion from |
|
Ah, I see; using |
|
We could define a generic function |
81e4251 to
8ea9828
Compare
|
Does anybody understand the OSX failure here? |
8ea9828 to
30a7437
Compare
|
segfaulting with |
|
That's weird. I can't reproduce that on linux. |
30a7437 to
a72588a
Compare
|
Looks like it was a stack overflow loading the system image. I'll try a different approach to serializing the new String type. |
a72588a to
c01a2cf
Compare
|
OK, that didn't help. |
|
Should be good to merge once appveyor's happy, but since I think it's idle, let's see if anything recent made much difference: @nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
|
Looking good. Away we go! |
| arraylist_push(&backref_list, v); | ||
| jl_deserialize_struct(s, v, 1); | ||
| ((jl_typemap_entry_t*)v)->next = (jl_typemap_entry_t*)jl_nothing; | ||
| *pn = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing gc_wb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the gc is off here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to create a gc stack but it is still necessary to use write barrier since there can still be old and young objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't a collection have to happen for an object to become old?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so if you are sure that the parent is always allocated in the same "deserialization session" it should be enough with a comment about it. Looks like it should be fine here since *pn is pointing to either the stack or a v allocated in a previous iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
This replaces the
ArrayinsideStringwith a length field, plus directly asking the GC to allocate space for the string data in-line.This is very incomplete so far, and is only meant to demonstrate the basic idea.The following gist has a hacked-up working version that can be used within an existing build: https://gist.github.com/JeffBezanson/6b7f1785bb7f2509cbd5d4ff1380556dThis makes string allocation ~2x faster, use 1 object instead of 2, and use around half the space for short strings. It's not pretty, but I think we should go with something like this to fix the immediate performance problem. Also removing the
.datafield as soon as possible will make it easier to change the representation in the future if we come up with something nicer.