Skip to content

Conversation

@ScottPJones
Copy link

This compiles cleanly, but it crashes, and I haven't had time to look into it yet, but I thought it might give people an idea of just what I was saying, about changing the code to not depend ever on sticking integers into pointer values (at least for right now, for things using arraylist_t).
My digging into this has also exposed some things I'd like to ask about, where it looks like there is potential for memory leaks...

@ScottPJones ScottPJones changed the title WIP Clean up arraylist_t handling [ci skip] WIP: Clean up arraylist_t handling [ci skip][av skip] Jun 18, 2015
@ScottPJones
Copy link
Author

@JeffBezanson when you have a roundtuit (which might be never, I know you're very busy!), I'd love it if you could look over this change, and let me know what you think... I'm not sure if you, or @carnaval, or who, are the best experts on this part of the code...
Thanks in advance to anybody who might take a look at this!

@StefanKarpinski
Copy link
Member

I have to say this seems like a lot of effort to avoid something that's not actually a problem.

@JeffBezanson
Copy link
Member

Yes, I have to agree that this is a lot of code noise for no benefit. Ok, you like your C type punning served differently, but at the end of the day it's the same, with some extra calls to memcpy thrown in. I also find the style of writing if (len) { do { } while (); } kind of verbose and ugly.

@ScottPJones
Copy link
Author

Code noise? The problem with the way the current code is, the type punning means that changes in the sizes of things could silently break things (which was I started to look into it in the first place, because of the PR to just "cast" away legitimate warnings).

There are a ton of articles on the evils of type punning, and, as I've shown here, it's not necessary at all.
http://www.cocoawithlove.com/2008/04/using-pointers-to-recast-in-c-is-bad.html.
Note: converting between (void *) and different pointer types is not what I (or other C programmers) would mean when they talk about "type punning", that's really more where you play tricks like converting integers to floats or to pointers and vice versa.

The current code has 3 issues that are considered bad programming practice in C...

  1. punning between integer and pointer types (I cringed when I saw the use of ptrint_t & uptrint_t,
    esp. as they weren't used consistently).
  2. defining structures by code execution (i.e. first push this, then that, etc).
  3. requiring casts to avoid conversion errors

The if (len) { } avoids unnecessary code, like setting up the pointer to walk quickly through the elements of the array, or calling things like JL_GC_PUSH1 and JL_GC_POP. I don't know how expensive those are, but why be doing unnecessary work?
The memcpy is because the arraylist_t structure has now been made general, and can take any size of structure, and the code is no longer dependent on type punning... (what I did there, using a structure, and then copying it with memcpy is not type punning...).
The cases that just pushed a pointer, remain the same, and don't use memcpy, the new more general calls are differentiated with _str.

The code is a lot more self-documenting, and doing this exercise turned up issues like the fact that

  1. the second element of the reinit_list arraylist was only set to 1
  2. the first element was written out as a signed 32-bit integer, which, depending on how many
    items could be on the list, could overflow after 2^30 items.
    (if the size could never get near that, then an assert and some internal documentation would have
    made that clear, and safer).
  3. some things commented as size_t were actually assigned negative values (i.e. flagref_list).
  4. it looks like this code is rife for memory leaks.
    The code calls arraylist_new in many places, but the arraylist_new code doesn't check to see if some memory had already been allocated to the arraylist.
    It seems that somebody understood that that sort of thing could happen, but nothing's been done about it.
        // in case we didn't free it last time we got here (for example, if we threw an error)
        jlgensym_to_flisp.len = 0;

So, @JeffBezanson do you still feel that there is no benefit to this change?

@ScottPJones
Copy link
Author

have to say this seems like a lot of effort to avoid something that's not actually a problem.

@StefanKarpinski Whether or not it was much effort for me is my problem, not yours.
The only concern here should be: do the changes fix certain potential problems (yes),
and do they make the code more self-documenting (yes).

It actually was not difficult at all, I just did it while watching TV with my wife... to me it is relaxing... and it was a good way to learn some of the GC code, to demonstrate that type punning between integers and pointers was not at all necessary, and I turned up what look like some serious issues of memory leakage to boot)

@StefanKarpinski
Copy link
Member

Ok, if you can get this to stop crashing, we can consider it. As usual, I would recommend making these changes in small, easily digestible pieces. If you think your string changes were given close scrutiny, large, pervasive changes to the C core of the language will make that look like a walk in the park.

@JeffBezanson
Copy link
Member

You raise some good points, but I think they can be addressed much more simply and surgically.

  1. the second element of the reinit_list arraylist was only set to 1

Yes, @vtjnash confirms that's not strictly necessary at the moment and we could just remove it.

if the size could never get near that, then an assert and some internal documentation

Yes, we should add an assert and a comment.

some things commented as size_t were actually assigned negative values

I think (size_t)-1 is a generally accepted idiom, but it would be good to replace this with a constant like FLAGREF_NONE.

it looks like this code is rife for memory leaks

But are there really memory leaks? If so, we can just fix the leak where it happens. However this point is largely unrelated to the rest of this discussion.

It would be good to address all of these items, but it seems it could be done with only a tiny diff. I don't know why this branch is crashing, and I'm sure you could fix it in 10 minutes, but that does illustrate the problem with changes like this. With more code churn you run more risk of introducing problems.

@JeffBezanson
Copy link
Member

The if (len) { } avoids unnecessary code, like setting up the pointer

I'm not at all worried about the trivial amount of extra code. I'd much rather have simple idiomatic for loops.

@ScottPJones
Copy link
Author

I think there are memory leaks any place where 1) more than AL_N_INLINE entries have been pushed,
2) an error occurs, which, from the comment I found in the code, is a distinct possibility.
That raises a question, just how are errors (like ^C) handled when this code is running?

@carnaval
Copy link
Contributor

About the type punning thing, I don't want to argue on the definition but even if arraylist_t knows the element size it is still not type safe (as a polymorphic implementation would be) because you can still push the wrong struct in the wrong arraylist. It is more resilient to change in the element layout as you said, but in the int example wouldn't a typedef do the same thing ?

I.e., if you have schematically

int a;
push((void*)a);
...
int b = (int)pop();

which is, as you said unsafe if you change the type of a to, say, long, you could achieve this kind of safety by using a typedef int my_size and use my_size everywhere ?

About the finalizer thing, I think it's quite clear with the even/odd layout but I don't have any strong opinion. About cosmetics now, you might want to keep with the _t suffix convention for struct typedef. I also don't think _str is commonly thought as the abbreviation for struct but rather for string, so it might be a good thing to change it.

@vtjnash
Copy link
Member

vtjnash commented Jun 18, 2015

There are a ton of articles on the evils of type punning, and, as I've shown here, it's not necessary at all.

This blog is referring to a form of type punning that is problematic for TBAA, and it only matters if you are going to dereference the pointer (which wouldn't be valid here, because the pointer holds an integer value instead).

  1. it looks like this code is rife for memory leaks.
    The code calls arraylist_new in many places, but the arraylist_new code doesn't check to see if some memory had already been allocated to the arraylist.
    It seems that somebody understood that that sort of thing could happen, but nothing's been done about it.
   // in case we didn't free it last time we got here (for example, if we threw an error)
    jlgensym_to_flisp.len = 0;

as my comment there explains, this place might get reached only if we threw an error. in general, the arraylist_new constructor cannot assume that the memory it was given has been zeroed, and thus it cannot do anything with the len pointer.

The if (len) { } avoids unnecessary code, like setting up the pointer

accessing a pointer out of a static struct should be "free" in any sane C compiler.

@vtjnash
Copy link
Member

vtjnash commented Jun 18, 2015

since C largely ignores mismatches in the declared struct type and the { } initializer, it's not clear to me how much this helps with element-wise type-safety either.

@ScottPJones
Copy link
Author

OK, that was one of the first articles that came up on google when I looked for type punning, I should have read it more thoroughly. However, for what it's worth, I've found over the years that type punning should be avoided if at all possible (and I've never found a case where the code couldn't be better written without type punning).

You are missing the point about the memory leakage.
If an error occurs, before the function calls arraylist_free, and then it comes back and calls arraylist_new, it will lose the memory. That's what I've been talking about.

What about the case I mentioned:

    if (len) {
        JL_GC_PUSH1(&arr);
        // Why doesn't this grow the array all at once?
        // jl_array_grow_end(arr, len);
        jl_module_t **usingp = (jl_module_t **)mod->usings.items;
        do {
            jl_array_grow_end(arr, 1);
            jl_cellset(arr,jl_array_dim0(arr)-1, *usingp++);
        } while (--len);
        JL_GC_POP();
    }

What is the cost of the JL_GC_PUSH1 and JL_GC_POP? What about my question I added in the comment, why does it grow the jl_array structure one at a time, instead of all at once?

@StefanKarpinski
Copy link
Member

If there's a memory leak, it should obviously be fixed, but why is that fix coupled with the rest of this change to avoid type punning?

@carnaval
Copy link
Contributor

You can consider jl_gc_push/pop free except in the tightest loops, it is essentially storing a couple pointers into a global structure.

@carnaval
Copy link
Contributor

especially since the failure mode of avoiding a gc_push/pop where it was needed is someone spending up to a few hours in gdb pulling some hairs out. better not try to be smart about this.

@vtjnash
Copy link
Member

vtjnash commented Jun 18, 2015

What about my question I added in the comment, why does it grow the jl_array structure one at a time, instead of all at once?

reflection code is rarely in the critical performance loop

@ScottPJones
Copy link
Author

reflection code is rarely in the critical performance loop

@vtjnash I don't buy that argument, when the code could just as easily been written efficiently.
Since there is no internal documentation, somebody seeing that ends up wondering just why it was done inefficiently, for example, because something called in the loop is changing the size of the array also.

@ScottPJones
Copy link
Author

@carnaval That's why I asked the question, I had no idea how costly those are... and since I've seen that adding a GC frame can kill performance, I was concerned.
Also, I don't think that the place above where I avoided the push/pop could lead to what you are describing... it's totally within the bounds of the if statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you wanted to really optimize this function, there is a far more clear option:

jl_array_t *arr = jl_alloc_array_1d(jl_array_any_type, mod->usings.len);
memcpy(jl_array_data(arr), mod->usings.items, sizeof(void*) * mod->usings.len);
return (jl_value_t*)arr;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's very nice... that's why I was put question in the comment... I'll put that in!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ScottPJones, before you continue to modify / work on this, please consider splitting it up into separate logical commits that each address as specific an issue as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure... np!

@ScottPJones
Copy link
Author

since C largely ignores mismatches in the declared struct type and the { } initializer, it's not clear to me how much this helps with element-wise type-safety either.

@vtjnash You will get a warning from the compiler (like the one that was covered up by the PR that made me curious about this in the first place) if you try to stick something into the structure that won't fit.
It's not perfect, no, but it's about the best you can do in C.

@carnaval
Copy link
Contributor

I don't think Jameson was arguing that it was meant to be this way for any particular reason, just that care was not applied because it is far for performance critical. It could definitely be changed. I would side on Jeff's side on the for loop argument though, after some time of C you get good "brain pattern matching" on for loops so we should probably use them whenever possible.

About this particular gc frame it is safe, yes. I'm more arguing about the general case where you should not have to think whether it is safe or not : I'm all for redundant and useless gc rooting unless actual performance increase can be demonstrated by being more clever. The problem with making assumptions about "this variable is already rooted by this one" is that they can be broken by refactoring. Nothing to do with this simple case though.

@ScottPJones
Copy link
Author

since C largely ignores mismatches in the declared struct type and the { } initializer, it's not clear to me how much this helps with element-wise type-safety either.

I just thought of something, so that this change can make things safer...
For a debug build, the arraylist_push* and arraylist_pop can be replaced by macros,
that add assert(sizeof(val) == arr->elesz).

Nice, huh? 😀

@ScottPJones
Copy link
Author

after some time of C you get good "brain pattern matching" on for loops so we should probably use them whenever possible.

@carnaval I can go ahead and change those couple of loops to for loops, if it makes @JeffBezanson happier, but it really comes down to what your brain is used to pattern matching... maybe you could say that my brain became ossified on a certain style that was absolutely necessary with the dumb C compilers in the 80s and even 90s.

@ScottPJones
Copy link
Author

About the finalizer thing, I think it's quite clear with the even/odd layout but I don't have any strong opinion. About cosmetics now, you might want to keep with the _t suffix convention for struct typedef. I also don't think _str is commonly thought as the abbreviation for struct but rather for string, so it might be

@carnaval Actually, the _t suffix is reserved by Posix... Julia is in severe violation of that...

Some additional classes of identifier names are reserved for future extensions to the C language or the POSIX.1 environment. While using these names for your own purposes right now might not cause a problem, they do raise the possibility of conflict with future versions of the C or POSIX standards, so you should avoid these names.

  • Names beginning with a capital ‘E’ followed a digit or uppercase letter may be used for additional error code names. See Error Reporting.
  • Names that begin with either ‘is’ or ‘to’ followed by a lowercase letter may be used for additional character testing and conversion functions. See Character Handling.
  • Names that begin with ‘LC_’ followed by an uppercase letter may be used for additional macros specifying locale attributes. See Locales.
  • Names of all existing mathematics functions (see Mathematics) suffixed with ‘f’ or ‘l’ are reserved for corresponding functions that operate on float and long double arguments, respectively.
  • Names that begin with ‘SIG’ followed by an uppercase letter are reserved for additional signal names. See Standard Signals.
  • Names that begin with ‘SIG_’ followed by an uppercase letter are reserved for additional signal actions. See Basic Signal Handling.
  • Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter are reserved for additional string and array functions. See String and Array Utilities.
  • Names that end with ‘_t’ are reserved for additional type names.

@StefanKarpinski
Copy link
Member

Again, that's a separate issue. Please fix one thing at a time.

@ScottPJones
Copy link
Author

@StefanKarpinski You mean the _t issue?
I wasn't intending on doing anything about it, at least not at the moment!
I have seen people get bitten by it, and it was something I banned in ISC's C style guide...
It doesn't change frequently, but it does happen, like when char16_t and char32_t got added...

@StefanKarpinski
Copy link
Member

Here's the thing – you introduce several new structs using your own convention for struct naming – trailing _str instead of _t. Perhaps that is more correct, but with your change the Julia source has a mix of _t and _str struct names. Instead of just doing what you think is better only for your struct names, you should follow the current convention, regardless of whether you think it's a good idea or not – because that's not what this pull request is about. If you'd like to fix the current _t convention, you can make a separate pull request that changes all of the struct names at once, with a message that makes the case for the change. As it stands, this pull request, as many of yours are, is about at least three different things, all mixed together and leaving the codebase in a fairly incoherent state afterwards.

@ScottPJones
Copy link
Author

@StefanKarpinski Remember, this is an early WIP, that I did while watching TV to relax me...
I'd already gotten changed the _str, I just haven't had a chance to push it yet.

@ScottPJones
Copy link
Author

As it stands, this pull request, as many of yours are, is about at least three different things, all mixed together and leaving the codebase in a fairly incoherent state afterwards.

Many of mine? This currently is just a WIP, I wanted to get some good feedback on the code, which is happening. Yes, #11004 was too big, but I don't think any of my subsequent changes have been like that... if so, please give me an example... I broke that down into as small pieces as I thought would work even. Where do you think my changes (mostly bug fixes and adding testing) have left things in an inconsistent state?

@StefanKarpinski
Copy link
Member

Fair enough about the WIP, but please keep in mind that if this is all one big commit including N conceptually different changes, if any one of those N changes is contentious, the whole thing is going to end up getting blocked. There are at least four different changes in this so far:

  1. Fixing a potential memory leak in the way that arraylist_t is used. To quote you, this can happen when "more than AL_N_INLINE entries have been pushed and an error occurs". If there really is a potentially memory leak, I can assure that it will get fixed posthaste. Please open a separate issue and/or pull request describing the circumstances under which you believe the memory leak can occur, ideally with reproducing code.
  2. Changing (partially) the naming convention for struct typedefs from ending in _t, which as you say the POSIX standard claims for itself, to ending in _str. It's probably best to open an issue to discuss this first, since there are many ways this can go. We could, e.g. decide that a) it's fine because the chances that POSIX will introduce clashing names is tiny; b) it's fine to end in _t but we should prefix with jl_ also; c) we should pick a different naming convention. Just changing it in some of the code that you've introduced is not a good way to address this issue.
  3. Introduce unions to avoid "type punning". This change should arguably be broken down even further into individual commits introducing each new union type, together with changes to all the places where that union can be used instead of a type pun. This means that each struct you introduce should have its own commit. This makes the code easier to review and easier to revert if it introduces a problem.
  4. Changing the API to arraylist_t. I'm least clear on this one, but it seems pretty clear that it's very different from the other issues. If there's some reason why it must be coupled with one or more of the other changes, then that should be explained.

@ScottPJones
Copy link
Author

  1. I brought up the potential memory leak, but this PR doesn't do anything about it
  2. I never said that I planned on changing partially or fully the naming convention for struct typedefs,
    I don't even have anything ending in _str any more, that was just in the first revision.
  3. I never introduced unions - where are you getting that from?
    I introduced only 3 typedef structs, for flagref_list, finalizer_list, and jlgensym_to_lisp
    (for some reason, that last's name isn't consistent with all the other arraylist_t structures,
    which all end in _list. (reinit_list had already been changed to just be a int32_t
    It will be rather a pain to try to split those into separate commits, but if that's what's necessary...
  4. The change of the API to arraylist_t is to eliminate totally the need for the sort of dangerous type punning and definition of structure by order of operation that was being done, which was why I started playing around with this in the first place. 3 & 4 go together, and are what make up this PR.

@ScottPJones
Copy link
Author

About 2), even though technically, keeping the _t is against the Posix rules, I think your option b) is probably the easiest to do, in practice won't ever conflict with Posix, and probably makes the most sense to other core julia developers...
Maybe during some future TV show I'll query/search replace my way to a PR that adds the jl_ to those typedefs. Does that sound reasonable?

@ScottPJones
Copy link
Author

About 1) I've been playing around with a separate PR to fix that... will depend on how much TV watching I'm doing as to when it gets done 😀

@StefanKarpinski
Copy link
Member

Ok, thanks for the clarifications. I think option 2.b) seems reasonable, but how about opening an issue and getting some consensus on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants