regular expression matching without a state machine

Over the past few days I’ve been brushing up on my compiler theory, and in the course of my readings happened across an interesting article in the Scala blogs. The author presented a very concise (albeit not complete; see the comments following the article) implementation of a regex matching engine. I was intrigued and thought I’d try my hand at writing it in Common Lisp. My naive implementation:

(defclass regex () ())

(defclass phi (regex) ())

(defclass empty (regex) ())

(defclass l (regex) ((letter :initarg :letter)))

(defclass choice (regex)
  ((left :initarg :left)
   (right :initarg :right)))

(defclass seq (regex)
  ((left :initarg :left)
   (right :initarg :right)))

(defclass star (regex) ((expr :initarg :exp)))

(defgeneric accepts-empty (re))

(defmethod accepts-empty ((re phi)) nil)

(defmethod accepts-empty ((re empty)) t)

(defmethod accepts-empty ((re choice))
  (or (accepts-empty (slot-value re 'left))
      (accepts-empty (slot-value re 'right))))

(defmethod accepts-empty ((re seq))
  (and (accepts-empty (slot-value re 'left))
       (accepts-empty (slot-value re 'right))))

(defmethod accepts-empty ((re star)) t)

(defmethod accepts-empty ((re l)) nil)

(defgeneric part-deriv (regex item))

(defmethod part-deriv ((re phi) atom)
  (make-instance 'phi))

(defmethod part-deriv ((re empty) atom)
  (make-instance 'phi))

(defmethod part-deriv ((re l) atom)
  (if (equalp atom (slot-value re 'letter))
      (make-instance 'empty)
      (make-instance 'phi)))

(defmethod part-deriv ((re choice) atom)
  (with-slots ((left left) (right right)) re
    (make-instance 'choice
                   :left (part-deriv left atom)
                   :right (part-deriv right atom))))

(defmethod part-deriv ((re seq) atom)
  (with-slots ((left left) (right right)) re
    (let ((rn (make-instance 'seq :left (part-deriv left atom) :right right)))
      (if (accepts-empty left)
          (make-instance 'choice :left rn :right (part-deriv right atom))
          rn))))

(defmethod part-deriv ((re star) atom)
  (make-instance 'seq
                 :left (part-deriv (slot-value re 'expr) atom)
                 :right re))

(defgeneric match-regexp (re word))

(defmethod match-regexp ((re regex) word)
  (if (eq 0 (length word))
      (accepts-empty re)
      (match-regexp (part-deriv re (first word)) (cdr word))))

; testing

(match-regexp (make-instance 'choice
                             :left (make-instance 'seq
                                                  :left (make-instance 'l :letter #a)
                                                  :right (make-instance 'l :letter #c))
                             :right (make-instance 'star :exp (make-instance 'l :letter #b)))
               (list #b #b))

It’s definately not as concise as the Scala version, partly because of my decision to use generic functions to imitate Scala’s case-class matching. If any Lispers happen across this, I’d really appreciate any comments about how to improve on this.

0 Responses to “regular expression matching without a state machine”


  1. No Comments

Leave a Reply

You must login to post a comment.