The Puzzling Quirks of Regular Expressions

  1. Acknowledgments
  2. Rights of (Wo)Man
  3. Credits
  4. Preface
  5. Quantifiers and Special Sub-Patterns
    1. Wildcard Scope
    2. Words and Sequences
    3. Endpoint Classes
    4. A Configuration Format
    5. The Human Genome
  6. Pitfalls and Sand in the Gears
    1. Catastrophic Backtracking
    2. Playing Dominoes
    3. Advanced Dominoes
    4. Sensor Art
  7. Creating Functions using Regexen
    1. Reimplementing str.count()
    2. Reimplementing str.count() (stricter)
    3. Finding a Name for a Function
    4. Playing Poker (Part 1)
    5. Playing Poker (Part 2)
    6. Playing Poker (Part 3)
    7. Playing Poker (Part 4)
    8. Playing Poker (Part 5)
  8. Easy, Difficult, and Impossible Tasks
    1. Identifying Equal Counts
    2. Matching Before Duplicate Words
    3. Testing an IPv4 Address
    4. Matching a Numeric Sequence
    5. Matching the Fibonacci Sequence
    6. Matching the Prime Numbers
    7. Matching Relative Prime Numbers
Root5spiral_Recto

Support the author!
Lulu Editions
Paypal Donation
Other Publications

A Configuration Format

This exercise requires just a little bit of Python itself, but is mainly about choosing the right regular expression. Let’s suppose you have a configuration format that looks like this:

config = """
3 = foobar
14=baz
9= fizzbuzz
21=more_stuff,here
"""

With a little bit of code, and using a regular expression, you wish to convert text in this format to a dictionary mapping the numbers to the left of the equal sign to the strings to the right. For example, the above example would produce:

{3: 'foobar', 14: 'baz', 9: 'fizzbuzz', 21: 'more_stuff,here'}

Before you turn the page…

Remember that shapes have edges.

As the example shows, there seems to be flexibility in spaces around the two sides of the equal sign. We should probably assume zero or more spaces are permitted on either side. The format is probably slightly surprising in that we would more commonly use words on the left and numbers on the right in most formats; but it is well-defined enough, and we can stipulate it has a purpose.

The easiest way to capture the relevant information is probably by using groups for each side, which will be exposed by re.findall() and other regular expression functions. We almost get the right answer with this:

>>> dict(re.findall(r'^(\d+) *= *(.*)$', s, re.MULTILINE))
{'3': 'foobar', '14': 'baz', '9': 'fizzbuzz',
'21': 'more_stuff,here'}

Notice that we required the “multiline” modifier to match on each line of the string. The one problem is that the puzzle requested that numbers appear as numbers, not as strings of digits. There are a number of ways we might achieve that in Python, but one easy one is:

>>> {int(k): v for k, v in
            re.findall(r'^(\d+) *= *(.*)$', s, re.MULTILINE)}
{3: 'foobar', 14: 'baz', 9: 'fizzbuzz', 
21: 'more_stuff,here'}