perlretut: Grammar, clarifications, white-space

khwilliamson · jkeenan · commit 05423e5e6831 · 2021-01-31T14:33:01.000-05:00
diff --git a/pod/perlretut.pod b/pod/perlretut.pod
@@ -20,17 +20,20 @@ expressions will allow you to manipulate text with surprising ease.
 What is a regular expression?  At its most basic, a regular expression
 is a template that is used to determine if a string has certain
 characteristics.  The string is most often some text, such as a line,
-sentence, web page, or even a whole book, but less commonly it could be
-some binary data as well.
+sentence, web page, or even a whole book, but it doesn't have to be.  It
+could be binary data, for example.  Biologists often use Perl to look
+for patterns in long DNA sequences.
+
 Suppose we want to determine if the text in variable, C<$var> contains
 the sequence of characters S<C<m u s h r o o m>>
 (blanks added for legibility).  We can write in Perl
 
  $var =~ m/mushroom/
 
 The value of this expression will be TRUE if C<$var> contains that
-sequence of characters, and FALSE otherwise.  The portion enclosed in
-C<'E<sol>'> characters denotes the characteristic we are looking for.
+sequence of characters anywhere within it, and FALSE otherwise.  The
+portion enclosed in C<'E<sol>'> characters denotes the characteristic we
+are looking for.
 We use the term I<pattern> for it.  The process of looking to see if the
 pattern occurs in the string is called I<matching>, and the C<"=~">
 operator along with the C<m//> tell Perl to try to match the pattern
@@ -60,7 +63,7 @@ many examples.  The first part of the tutorial will progress from the
 simplest word searches to the basic regular expression concepts.  If
 you master the first part, you will have all the tools needed to solve
 about 98% of your needs.  The second part of the tutorial is for those
-comfortable with the basics and hungry for more power tools.  It
+comfortable with the basics, and hungry for more power tools.  It
 discusses the more advanced regular expression operators and
 introduces the latest cutting-edge innovations.
 
@@ -135,7 +138,7 @@ And finally, the C<//> default delimiters for a match can be changed
 to arbitrary delimiters by putting an C<'m'> out front:
 
     "Hello World" =~ m!World!;   # matches, delimited by '!'
-    "Hello World" =~ m{World};   # matches, note the matching '{}'
+    "Hello World" =~ m{World};   # matches, note the paired '{}'
     "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
                                  # '/' becomes an ordinary char
 
@@ -151,7 +154,7 @@ Let's consider how different regexps would match C<"Hello World">:
     "Hello World" =~ /oW/;     # doesn't match
     "Hello World" =~ /World /; # doesn't match
 
-The first regexp C<world> doesn't match because regexps are
+The first regexp C<world> doesn't match because regexps are by default
 case-sensitive.  The second regexp matches because the substring
 S<C<'o W'>> occurs in the string S<C<"Hello World">>.  The space
 character C<' '> is treated like any other character in a regexp and is
@@ -169,8 +172,8 @@ always match at the earliest possible point in the string:
     "That hat is red" =~ /hat/; # matches 'hat' in 'That'
 
 With respect to character matching, there are a few more points you
-need to know about.   First of all, not all characters can be used "as
-is" in a match.  Some characters, called I<metacharacters>, are
+need to know about.   First of all, not all characters can be used
+"as-is" in a match.  Some characters, called I<metacharacters>, are
 generally reserved for use in regexp notation.  The metacharacters are
 
     {}[]()^$.|*+?-#\
@@ -832,8 +835,8 @@ Counting the opening parentheses to get the correct number for a
 backreference is error-prone as soon as there is more than one
 capturing group.  A more convenient technique became available
 with Perl 5.10: relative backreferences. To refer to the immediately
-preceding capture group one now may write C<\g{-1}>, the next but
-last is available via C<\g{-2}>, and so on.
+preceding capture group one now may write C<\g-1> or C<\g{-1}>, the next but
+last is available via C<\g-2> or C<\g{-2}>, and so on.
 
 Another good reason in addition to readability and maintainability
 for using relative backreferences is illustrated by the following example,
@@ -1989,10 +1992,11 @@ C<\x>I<XY> (without curly braces and I<XY> are two hex digits) doesn't
 go further than 255.  (Starting in Perl 5.14, if you're an octal fan,
 you can also use C<\o{oct}>.)
 
-    /\x{263a}/;  # match a Unicode smiley face :)
+    /\x{263a}/;   # match a Unicode smiley face :)
+    /\x{ 263a }/; # Same
 
 B<NOTE>: In Perl 5.6.0 it used to be that one needed to say C<use
-utf8> to use any Unicode features.  This is no more the case: for
+utf8> to use any Unicode features.  This is no longer the case: for
 almost all Unicode processing, the explicit C<utf8> pragma is not
 needed.  (The only case where it matters is if your Perl script is in
 Unicode and encoded in UTF-8, then an explicit C<use utf8> is needed.)
@@ -2070,16 +2074,16 @@ C<\p{Mark}>, meaning things like accent marks.
 
 The Unicode C<\p{Script}> and C<\p{Script_Extensions}> properties are
 used to categorize every Unicode character into the language script it
-is written in.  (C<Script_Extensions> is an improved version of
-C<Script>, which is retained for backward compatibility, and so you
-should generally use C<Script_Extensions>.)
-For example,
+is written in.  For example,
 English, French, and a bunch of other European languages are written in
 the Latin script.  But there is also the Greek script, the Thai script,
-the Katakana script, I<etc>.  You can test whether a character is in a
-particular script (based on C<Script_Extensions>) with, for example
-C<\p{Latin}>, C<\p{Greek}>, or C<\p{Katakana}>.  To test if it isn't in
-the Balinese script, you would use C<\P{Balinese}>.
+the Katakana script, I<etc>.  (C<Script> is an older, less advanced,
+form of C<Script_Extensions>, retained only for backwards
+compatibility.)  You can test whether a character is in a particular
+script  with, for example C<\p{Latin}>, C<\p{Greek}>, or
+C<\p{Katakana}>.  To test if it isn't in the Balinese script, you would
+use C<\P{Balinese}>.  (These all use C<Script_Extensions> under the
+hood, as that gives better results.)
 
 What we have described so far is the single form of the C<\p{...}> character
 classes.  There is also a compound form which you may run into.  These
@@ -2459,7 +2463,7 @@ substring delimited by parentheses.  The problem with this regexp is
 that it is pathological: it has nested indeterminate quantifiers
 of the form C<(a+|b)+>.  We discussed in Part 1 how nested quantifiers
 like this could take an exponentially long time to execute if there
-was no match possible.  To prevent the exponential blowup, we need to
+is no match possible.  To prevent the exponential blowup, we need to
 prevent useless backtracking at some point.  This can be done by
 enclosing the inner quantifier as an independent subexpression:
 
@@ -2645,8 +2649,8 @@ section L</"Pragmas and debugging"> below.
 
 More fun with C<?{}>:
 
-    $x =~ /(?{print "Hi Mom!";})/;       # matches,
-                                         # prints 'Hi Mom!'
+    $x =~ /(?{print "Hi Mom!";})/;         # matches,
+                                           # prints 'Hi Mom!'
     $x =~ /(?{$c = 1;})(?{print "$c";})/;  # matches,
                                            # prints '1'
     $x =~ /(?{$c = 1;})(?{print "$^R";})/; # matches,