The Puzzling Quirks of Regular Expressions

  1. Acknowledgments
  2. Rights of (Wo)Man
  3. Credits
  4. Preface
  5. Quantifiers and Special Sub-Patterns
    1. Wildcard Scope
    2. Words and Sequences
    3. Endpoint Classes
    4. A Configuration Format
    5. The Human Genome
  6. Pitfalls and Sand in the Gears
    1. Catastrophic Backtracking
    2. Playing Dominoes
    3. Advanced Dominoes
    4. Sensor Art
  7. Creating Functions using Regexen
    1. Reimplementing str.count()
    2. Reimplementing str.count() (stricter)
    3. Finding a Name for a Function
    4. Playing Poker (Part 1)
    5. Playing Poker (Part 2)
    6. Playing Poker (Part 3)
    7. Playing Poker (Part 4)
    8. Playing Poker (Part 5)
  8. Easy, Difficult, and Impossible Tasks
    1. Identifying Equal Counts
    2. Matching Before Duplicate Words
    3. Testing an IPv4 Address
    4. Matching a Numeric Sequence
    5. Matching the Fibonacci Sequence
    6. Matching the Prime Numbers
    7. Matching Relative Prime Numbers

Support the author!
Lulu Editions
Paypal Donation
Other Publications

Matching Relative Prime Numbers

If you read the last puzzle, you saw the subtle reason why a regular expression cannot match an initial sequence of primes. Think finite automaton. If you skipped that puzzle, at least go back and refresh your understanding of the Sieve of Eratosthenes.

Mathematics has a concept of relative primes which is slightly weaker than primality. All prime numbers are relatively prime—also called coprime—with each other, but other pairs are as well. Two coprime numbers have no common divisors other than 1. This is certainly true of prime numbers; for example, 11 and 53 are relatively prime since neither have any divisors other than themselves and 1. But likewise 10 and 21 are coprime since the divisors of the first are 2 and 5, but those of the second are 3 and 7, which do not overlap.

So the question for this puzzle is whether you can create an expression that will identify all and only sequences of ascending natural numbers where all of them are relatively prime to each other. Trivially, any sequence of ascending primes qualifies here, but so do other sequences.

As in the last three puzzles, we encode numeric sequences using a number of contiguous @ symbols, with each “number” separated by spaces. For example:

# Match: 2 3 5 7 11
primes5 = "@@ @@@ @@@@@ @@@@@@@ @@@@@@@@@@@ "
# Match: 2 5 7 9 11
relprime1 = "@@ @@@@@ @@@@@@@ @@@@@@@@@ @@@@@@@@@@@ "
# Match: 3 4 7 11
relprime2 = "@@@ @@@@ @@@@@@@ @@@@@@@@@@@ "
# Match: 9 16
startbig = "@@@@@@@@@ @@@@@@@@@@@@@@@@ "
# Fail: 2 3 4 5 7  (2, 4 relatively composite)
fail1 = "@@ @@@ @@@@ @@@@@ @@@@@@@ "
# Fail: 5 7 2 3 11 (all primes, non-ascending)
fail2 = "@@@@@ @@@@@@@ @@ @@@ @@@@@@@@@@@ "

Are relative primes consigned to the same fate as primes?

Before you turn the page…

Nothing is either true or false but thinking makes it so.

There are a couple of issues to consider in this solution. It turns out that such a solution is indeed possible, using much the same style as the Sieve of Eratosthenes, but not an identical technique. That is, as discussed in the last puzzle, we are perfectly well able to reject a string based on a future multiple of a given number.

The trick is that we do not need to reject infinitely many if we do not assume that a string needs to contain all the initial primes. Instead, we can focus just on a single number at a time, and rule out its multiples. We might miss some primes in our sequence, or indeed have some relatively prime composite numbers. But that satisfies the current puzzle.

However, for this “striking through” to work, we need also to enforce the rule that sequences are ascending. Otherwise, we might encounter, e.g. @@@@@@@@ @@@@ @@ (i.e. ‘8 4 2’). Those are definitely not mutually coprime. However, “stricking out” multiples of 8 does not help reject 4 later in the string. Python only allows fixed length lookbehind assertions, but some other regex implementation could technically relax this ascending sequence restriction (however, a library that did so would quickly face catastrophic exponential complexity in this case).

pat = r'^((@@+) (?=\2@)(?!.* \2{2,} ))+'

Here we first identify a group of 2 or more @ symbols. Then we do a postive lookahead to ensure that the next group of @ symbols has at least one more symbol.

The real crux of this is the negative lookahead assertion that we never later see a (space delimited) sequence of two or more copies of the group. This pattern does not capture the final “number” in the sequence; it is just used to provide a true or false answer to whether the sequence matches.