The Puzzling Quirks of Regular Expressions

  1. Acknowledgments
  2. Rights of (Wo)Man
  3. Credits
  4. Preface
  5. Quantifiers and Special Sub-Patterns
    1. Wildcard Scope
    2. Words and Sequences
    3. Endpoint Classes
    4. A Configuration Format
    5. The Human Genome
  6. Pitfalls and Sand in the Gears
    1. Catastrophic Backtracking
    2. Playing Dominoes
    3. Advanced Dominoes
    4. Sensor Art
  7. Creating Functions using Regexen
    1. Reimplementing str.count()
    2. Reimplementing str.count() (stricter)
    3. Finding a Name for a Function
    4. Playing Poker (Part 1)
    5. Playing Poker (Part 2)
    6. Playing Poker (Part 3)
    7. Playing Poker (Part 4)
    8. Playing Poker (Part 5)
  8. Easy, Difficult, and Impossible Tasks
    1. Identifying Equal Counts
    2. Matching Before Duplicate Words
    3. Testing an IPv4 Address
    4. Matching a Numeric Sequence
    5. Matching the Fibonacci Sequence
    6. Matching the Prime Numbers
    7. Matching Relative Prime Numbers
Naive_Scribble_Recto

Support the author!
Lulu Editions
Paypal Donation
Other Publications

Matching the Fibonacci Sequence

Here we get to something harder than the last puzzle. It is not obvious whether regular expressions are powerful enough to express this sequence. Think about your solution, or the reasons it is impossible, before you turn the page.

The Fibonacci sequence is a famous recursive relationship, in which each number in the sequence is the sum of the prior two numbers. Hence, the first few Fibonacci numbers are:

1 1 2 3 5 8 13 21 34 55 89 144

In fact, the Fibonacci sequence is only one of an infinite number of similar recursive sequences, known generally as Lucas sequences. The Lucas numbers are one such sequence in which the initial elements are 2 and 1 (rather than 1 and 1). We are actually interested here in matching “Fibonacci-like” sequences, where given two elements, the next one is the sum of those prior two.

As in the last puzzle, we represent numeric sequences by a number of repetitions of the @ symbol followed by spaces. For example:

# Match: 1 1 2 3 5 8
fibs = "@ @ @@ @@@ @@@@@ @@@@@@@@ "
# Match: 2 1 3 4 7 11
lucas = "@@ @ @@@ @@@@ @@@@@@@ @@@@@@@@@@@ "
# Match: 3 1 4 5 9 14
fib2 = "@@@ @ @@@@ @@@@@ @@@@@@@@@ @@@@@@@@@@@@@@ "
# Fail: 1 1 3 4 7 11
wrong1 = "@ @ @@@ @@@@ @@@@@@@ @@@@@@@@@@@ "
# Fail: 1 1 2 3 4 7
wrong2 = "@ @ @@ @@@ @@@@ @@@@@@@ "

Can you create a regular expression that matches only Fibonacci-like sequences within encoded strings?

Before you turn the page…

The Golden Spiral beautifully generalizes Fibonacci Numbers.

It turns out that matching properly encoded Fibonacci-like sequences is within the power of regular expressions. Adding together two prior elements is actually a lot like simply doubling the one prior element as we did in the last puzzle.

The main difference in the solution to this puzzle versus the last one is that we need to backreference two groups in the lookahead pattern rather than just one. Study the explanation of the last puzzle before looking at the solution to this one.

>>> pat1 = r"^(((@+) (@+) )(?=$|\3\4 ))+(\3\4)?$"
>>> pat2 = r"^@+ (((@+) (@+) )(?=\3\4 ))+"
>>> for s in (fibs, lucas, fib2, wrong1, wrong2):
...     match = re.search(pat1, s)
...     if match and re.search(pat2, s):
...         print("VALID", match.group())
...     else:
...         print("INVALID", s)
...
VALID @ @ @@ @@@ @@@@@ @@@@@@@@
VALID @@ @ @@@ @@@@ @@@@@@@ @@@@@@@@@@@
VALID @@@ @ @@@@ @@@@@ @@@@@@@@@ @@@@@@@@@@@@@@
INVALID @ @ @@@ @@@@ @@@@@@@ @@@@@@@@@@@
INVALID @ @ @@ @@@ @@@@ @@@@@@@

Actually, there are two extra caveats here. We assume in this solution that an even number of numbers are represented in the string. The lookahead only evaluates the one next number (that must be the sum of the current two numbers). However, this means that we match two different ‘@’ sequences at a time; and hence that there must be an even number if we match to the end.

The second issue is that since we stride two-by-two through the “numbers,” we need to use a second regular expression to make sure the sequence predicts the next element when offset by one element as well. We see that problem in wrong1. If we only utilized pat1 it would incorrectly match as Fibonacci-like. Since pat1 already collects the final “number,” there is no need for pat2 to do so as well; the lookahead suffices.