-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
WIP: Clean up arraylist_t handling [ci skip][av skip] #11751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@JeffBezanson when you have a roundtuit (which might be never, I know you're very busy!), I'd love it if you could look over this change, and let me know what you think... I'm not sure if you, or @carnaval, or who, are the best experts on this part of the code... |
9889e0f to
88b6135
Compare
|
I have to say this seems like a lot of effort to avoid something that's not actually a problem. |
|
Yes, I have to agree that this is a lot of code noise for no benefit. Ok, you like your C type punning served differently, but at the end of the day it's the same, with some extra calls to |
|
Code noise? The problem with the way the current code is, the type punning means that changes in the sizes of things could silently break things (which was I started to look into it in the first place, because of the PR to just "cast" away legitimate warnings). There are a ton of articles on the evils of type punning, and, as I've shown here, it's not necessary at all. The current code has 3 issues that are considered bad programming practice in C...
The The code is a lot more self-documenting, and doing this exercise turned up issues like the fact that
// in case we didn't free it last time we got here (for example, if we threw an error)
jlgensym_to_flisp.len = 0;So, @JeffBezanson do you still feel that there is no benefit to this change? |
@StefanKarpinski Whether or not it was much effort for me is my problem, not yours. It actually was not difficult at all, I just did it while watching TV with my wife... to me it is relaxing... and it was a good way to learn some of the GC code, to demonstrate that type punning between integers and pointers was not at all necessary, and I turned up what look like some serious issues of memory leakage to boot) |
|
Ok, if you can get this to stop crashing, we can consider it. As usual, I would recommend making these changes in small, easily digestible pieces. If you think your string changes were given close scrutiny, large, pervasive changes to the C core of the language will make that look like a walk in the park. |
|
You raise some good points, but I think they can be addressed much more simply and surgically.
Yes, @vtjnash confirms that's not strictly necessary at the moment and we could just remove it.
Yes, we should add an assert and a comment.
I think
But are there really memory leaks? If so, we can just fix the leak where it happens. However this point is largely unrelated to the rest of this discussion. It would be good to address all of these items, but it seems it could be done with only a tiny diff. I don't know why this branch is crashing, and I'm sure you could fix it in 10 minutes, but that does illustrate the problem with changes like this. With more code churn you run more risk of introducing problems. |
I'm not at all worried about the trivial amount of extra code. I'd much rather have simple idiomatic |
|
I think there are memory leaks any place where 1) more than AL_N_INLINE entries have been pushed, |
|
About the type punning thing, I don't want to argue on the definition but even if arraylist_t knows the element size it is still not type safe (as a polymorphic implementation would be) because you can still push the wrong struct in the wrong arraylist. It is more resilient to change in the element layout as you said, but in the int example wouldn't a typedef do the same thing ? I.e., if you have schematically int a;
push((void*)a);
...
int b = (int)pop();which is, as you said unsafe if you change the type of About the finalizer thing, I think it's quite clear with the even/odd layout but I don't have any strong opinion. About cosmetics now, you might want to keep with the _t suffix convention for struct typedef. I also don't think _str is commonly thought as the abbreviation for struct but rather for string, so it might be a good thing to change it. |
This blog is referring to a form of type punning that is problematic for TBAA, and it only matters if you are going to dereference the pointer (which wouldn't be valid here, because the pointer holds an integer value instead).
as my comment there explains, this place might get reached only if we threw an error. in general, the arraylist_new constructor cannot assume that the memory it was given has been zeroed, and thus it cannot do anything with the
accessing a pointer out of a static struct should be "free" in any sane C compiler. |
|
since C largely ignores mismatches in the declared struct type and the { } initializer, it's not clear to me how much this helps with element-wise type-safety either. |
|
OK, that was one of the first articles that came up on google when I looked for type punning, I should have read it more thoroughly. However, for what it's worth, I've found over the years that type punning should be avoided if at all possible (and I've never found a case where the code couldn't be better written without type punning). You are missing the point about the memory leakage. What about the case I mentioned: if (len) {
JL_GC_PUSH1(&arr);
// Why doesn't this grow the array all at once?
// jl_array_grow_end(arr, len);
jl_module_t **usingp = (jl_module_t **)mod->usings.items;
do {
jl_array_grow_end(arr, 1);
jl_cellset(arr,jl_array_dim0(arr)-1, *usingp++);
} while (--len);
JL_GC_POP();
}What is the cost of the |
|
If there's a memory leak, it should obviously be fixed, but why is that fix coupled with the rest of this change to avoid type punning? |
|
You can consider jl_gc_push/pop free except in the tightest loops, it is essentially storing a couple pointers into a global structure. |
|
especially since the failure mode of avoiding a gc_push/pop where it was needed is someone spending up to a few hours in gdb pulling some hairs out. better not try to be smart about this. |
reflection code is rarely in the critical performance loop |
@vtjnash I don't buy that argument, when the code could just as easily been written efficiently. |
|
@carnaval That's why I asked the question, I had no idea how costly those are... and since I've seen that adding a GC frame can kill performance, I was concerned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you wanted to really optimize this function, there is a far more clear option:
jl_array_t *arr = jl_alloc_array_1d(jl_array_any_type, mod->usings.len);
memcpy(jl_array_data(arr), mod->usings.items, sizeof(void*) * mod->usings.len);
return (jl_value_t*)arr;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's very nice... that's why I was put question in the comment... I'll put that in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ScottPJones, before you continue to modify / work on this, please consider splitting it up into separate logical commits that each address as specific an issue as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure... np!
@vtjnash You will get a warning from the compiler (like the one that was covered up by the PR that made me curious about this in the first place) if you try to stick something into the structure that won't fit. |
|
I don't think Jameson was arguing that it was meant to be this way for any particular reason, just that care was not applied because it is far for performance critical. It could definitely be changed. I would side on Jeff's side on the for loop argument though, after some time of C you get good "brain pattern matching" on for loops so we should probably use them whenever possible. About this particular gc frame it is safe, yes. I'm more arguing about the general case where you should not have to think whether it is safe or not : I'm all for redundant and useless gc rooting unless actual performance increase can be demonstrated by being more clever. The problem with making assumptions about "this variable is already rooted by this one" is that they can be broken by refactoring. Nothing to do with this simple case though. |
I just thought of something, so that this change can make things safer... Nice, huh? 😀 |
@carnaval I can go ahead and change those couple of loops to for loops, if it makes @JeffBezanson happier, but it really comes down to what your brain is used to pattern matching... maybe you could say that my brain became ossified on a certain style that was absolutely necessary with the dumb C compilers in the 80s and even 90s. |
@carnaval Actually, the _t suffix is reserved by Posix... Julia is in severe violation of that...
|
|
Again, that's a separate issue. Please fix one thing at a time. |
|
@StefanKarpinski You mean the |
|
Here's the thing – you introduce several new structs using your own convention for struct naming – trailing |
|
@StefanKarpinski Remember, this is an early WIP, that I did while watching TV to relax me... |
Many of mine? This currently is just a WIP, I wanted to get some good feedback on the code, which is happening. Yes, #11004 was too big, but I don't think any of my subsequent changes have been like that... if so, please give me an example... I broke that down into as small pieces as I thought would work even. Where do you think my changes (mostly bug fixes and adding testing) have left things in an inconsistent state? |
|
Fair enough about the WIP, but please keep in mind that if this is all one big commit including N conceptually different changes, if any one of those N changes is contentious, the whole thing is going to end up getting blocked. There are at least four different changes in this so far:
|
|
|
About 2), even though technically, keeping the |
|
About 1) I've been playing around with a separate PR to fix that... will depend on how much TV watching I'm doing as to when it gets done 😀 |
|
Ok, thanks for the clarifications. I think option 2.b) seems reasonable, but how about opening an issue and getting some consensus on it. |
This compiles cleanly, but it crashes, and I haven't had time to look into it yet, but I thought it might give people an idea of just what I was saying, about changing the code to not depend ever on sticking integers into pointer values (at least for right now, for things using
arraylist_t).My digging into this has also exposed some things I'd like to ask about, where it looks like there is potential for memory leaks...