The Puzzling Quirks of Regular Expressions

  1. Acknowledgments
  2. Rights of (Wo)Man
  3. Credits
  4. Preface
  5. Quantifiers and Special Sub-Patterns
    1. Wildcard Scope
    2. Words and Sequences
    3. Endpoint Classes
    4. A Configuration Format
    5. The Human Genome
  6. Pitfalls and Sand in the Gears
    1. Catastrophic Backtracking
    2. Playing Dominoes
    3. Advanced Dominoes
    4. Sensor Art
  7. Creating Functions using Regexen
    1. Reimplementing str.count()
    2. Reimplementing str.count() (stricter)
    3. Finding a Name for a Function
    4. Playing Poker (Part 1)
    5. Playing Poker (Part 2)
    6. Playing Poker (Part 3)
    7. Playing Poker (Part 4)
    8. Playing Poker (Part 5)
  8. Easy, Difficult, and Impossible Tasks
    1. Identifying Equal Counts
    2. Matching Before Duplicate Words
    3. Testing an IPv4 Address
    4. Matching a Numeric Sequence
    5. Matching the Fibonacci Sequence
    6. Matching the Prime Numbers
    7. Matching Relative Prime Numbers

Support the author!
Lulu Editions
Paypal Donation
Other Publications

Preface

Regular expressions—sometimes given the playful back formation regexen or more neutrally regex—are a powerful and compact way of describing patterns in text. Many tutorials and “cheat sheets” exist to understand their syntax and semantics in a formally correct manner. I encourage you to read some of those, if you have not already.

These puzzles begin at a certain point where the formal descriptions leave off. As you work with regexen, you will find subtle pitfalls. A pattern that seems like it should obviously match one thing actually matches something slightly different than you intended. Or perhaps a match pattern has “pathological” behavior and takes far too long. Or sometimes it is simply that a more concise pattern would be clearer in describing what you actually wish to match.

A great many programming languages, libraries, and tools support regular expressions, with relatively minor variations in the syntax used. Such software includes [efr]?grep, sed, awk, Perl, Java, .NET, JavaScript, Julia, XML Schema, or indeed, pretty much every other programming language via a library.

For this book, we will use Python to pose these puzzles. In particular, we will use the standard library module re. Often code samples are used in puzzles and in explanation; where I wish to show the output from code, the example emulates the Python shell with lines starting with >>> (or continuing with ...). Outputs are echoed without a prompt in this case. Where code defines a function that is not necessarily executed in the mention, only the plain code is shown.

While you are reading this book, I strongly encourage you to keep open an interactive Python environment. Many tools enable this, such as the Python REPL (read-evaluate-print-loop) itself, or the IPython enhanced REPL, or Jupyter notebooks, or the IDLE editor that comes with Python, or indeed most modern code editors and IDEs (integrated development environments). A number of online regular expression testers are also available, although those will not capture the the Python calling details. Explanations will follow each puzzle, but trying to work it out in code before reading it is worthwhile. C’mon, not thinking about a puzzle before reading the solution is a cop-out.