-
Notifications
You must be signed in to change notification settings - Fork 597
toke.c: S_intuit_more: Add more commentary #23708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: blead
Are you sure you want to change the base?
Conversation
4c36ccf
to
f0a5a44
Compare
d1fffd6
to
3407c5f
Compare
3407c5f
to
c77f0b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed solely for spelling and grammar issues.
/* Examine each character in the construct */ | ||
/* That this knows nothing of UTF-8 can lead to opposite results if the | ||
* text is encoded in UTF-8 or not; another relic of the Unicode Bug. | ||
* Suppose a string consistis of various un-repeated code points between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/consistis/consists/ in above
* (Some do it though as a mnemonic that is meaningful to | ||
* them.) But generally, repeated characters makes things | ||
* more likely to be a charclass. But we have here that | ||
* this an identifier so likely a subscript. Its spelling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sentence beginning with But
lacks a verb, is confusing.
This function is described in its comments as 'terrifying', and by its original author, Larry Wall, as "truly awful". As a result, it has been mostly untouched since its introduction in 1993. That means it has not been updated as new language features have been added. As an example, it does not know about lexical variables, so the code it has for globals just doesn't work on the vast majority of modern day coding practices. Another example is it knows nothing of UTF-8, and as a result simply changing the input encoding from Latin1 to UTF-8 can result in its outcome being the opposite result. And it is buggy. An example of how hard this can be to get right is this fairly common use in our test suite: [$A-Z] That looks like a character class matching 27 characters. But wait, what if there exists a $A and a parameterless subroutine 'Z'. Then this could instead be an expression for a subcript. A few years ago, I set out to try to understand it. I added commentary and simplified some overly complicated expressions, but left its behavior unchanged. Now, I set out to make some changes, and found many more issues than I had earlier. This commit adds commentary about those. Hopefully this will lead to some discussion and a consensus on the way forward.
c77f0b2
to
6ef8978
Compare
This function is described in its comments as 'terrifying', and by its original author, Larry Wall, as "truly awful". As a result, it has been mostly untouched since its introduction in 1993. That means it has not been updated as new language features have been added.
As an example, it does not know about lexical variables, so the code it has for globals just doesn't work on the vast majority of modern day coding practices.
Another example is it knows nothing of UTF-8, and as a result simply changing the input encoding from Latin1 to UTF-8 can result in its outcome being the opposite result.
And it is buggy.
A few years ago, I set out to try to understand it. I added commentary and simplified some overly complicated expressions, but left its behavior unchanged.
Now, I set out to make some changes, and found many more issues than I had earlier. This commit adds commentary about those. Hopefully this will lead to some discussion and a consensus on the way forward.