Friday, October 16, 2009

Beginning Clojure Macros - Part 1

I finally got around to learning the macro system in Clojure. When I'm picking up a new language, one of the first things I do is write an app which parses log files. I keep some around, just for that. There's a lot on the web about learning Clojure, but not as much about its macro system, not to mention macros, in general.

What's a macro?


According to this tutorial on which I've been leaning heavily:
Macros are used to add new constructs to the language. They are code that generates code at read-time.

To further oversimplify, macros are some really smart text substitution. Sort of.

You use macros to re-use code structure. You probably already do this with generic algorithms as part of your abstraction, but macros let you generate those algorithms dynamically. Applied appropriately, they result in cleaner code.

What is macro expansion?


When the compiler takes your source code, the first thing it does is look for macros. It "expands" the macro into the defmacro, replacing the text of one for the other.

Consider the text "The quick brown fox jumps over the lazy dog". There's a structure to that sentence, "The jumps over the lazy dog" (and more that we could abstract away, but we're keeping it simple here). If our macro is "", and we've defined it as "gazelle", then the resulting sentence would be "The gazelle jumps over the lazy dog".

Simple enough, right?

An actual code example



Log files have a bunch of different fields, and they all have their little peculiarities to deal with. To that end, I've got a couple of filter-ish functions: parse-ip and parse-datetime.

The two share a lot of similarities. They both check to see if the data is valid, and if it is, then update the current data and return the results, otherwise return the original results. Inside of the update, we update the map, either creating a new entry with a value of "1", or incrementing the existing entry.

I've added a bunch of comments to help explain the various pieces of a macro.

(defn parse-ip [ip current]
;;; accepts the IP address field as a string
;;; the current hashmap is updated and returned

;;; is it actually an ip address?
(if
;;; probably should use the network lib to be as accurate as possible...
;;; but I won't
(re-matches #"[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" ip)

;;; increment the count
(assoc current ip
(if (contains? current ip)
(+ 1 (get current ip))
1 ))

;;; not an ip address
current))

(defn parse-datetime [ datetime current ]
(if
;;; since we're splitting on spaces, we get back "[19/Jan/2006:04:30:26"
(re-matches #"\[.{20}" datetime)

(let [ dt (subs datetime 1 12) ] ;;;(subs datetime 1 12)]

(assoc current dt
(if (contains? current dt)
(+ 1 (get current dt))
1 )))
current))


We'll take care of the hash update part, first, and replace that whole assoc form with something else.

;;; our macro takes two parameters, like a function
(defmacro inc-summary-counter [ hset nkey ]
;;; see the little "`" at the beginning?
;;; that means "don't evaluate anything, just return the text"
`(let
;;; there's two "weird" things in this let form
;;; the "#" after the first "hset"
;;; and the "~" before the second "hset"
;;;
;;; the suffix "#" means "generate a symbol for this",
;;; essentially a name which is guaranteed unique to this macro expansion
;;;
;;; the prefix "~" means "expand this passed variable"
;;; "~hset" will be replaced with the text of "hset"
;;;
;;; the reason for this particular little trick is to make sure that "hset"
;;; is evaluated only once, and it's result kept in a binding
[ hset# ~hset
nkey# ~nkey ]

;;; all of this will be returned as-is
(assoc hset# nkey#
(if (contains? hset# nkey#)
(+ 1 (get hset# nkey#))
1))))


Well, that was a lot to "save time", wasn't it? Here's our updated filter code, minus but using inc-summary-counter:

(defn parse-ip [ip current]
(if
;;; probably should use the network lib to be as accurate as possible...
;;; but I won't
(re-matches #"[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" ip)
(inc-summary-counter current ip)
current))

(defn parse-datetime [ datetime current ]
(if
;;; since we're splitting on spaces, we get back "[19/Jan/2006:04:30:26"
(re-matches #"\[.{20}" datetime)
(inc-summary-counter current (subs datetime 1 12))
current))

Much prettier, wouldn't you say? Nothing you couldn't do with a function, but it is what is happening that is important.

Everywhere you see inc-summary-counter and parameters, that expression is replaced with it's defmacro.

Here's parse-ip again, with the macro pseudo-expanded.

(defn parse-ip [ip current]
(if
;;; probably should use the network lib to be as accurate as possible...
;;; but I won't
(re-matches #"[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" ip)
;;; inc-summary-counter was here
(let
;;; the ~hset and ~nkey have been replaced
[ hset# current
nkey# ip ]
(assoc hset# nkey#
(if (contains? hset# nkey#)
(+ 1 (get hset# nkey#))
1))))
current))

Remember that macro-expansion happens before compilation? So, after the macros are expanded, the above code is what is passed to the compiler. I'll have a more powerful example next time.

No comments:

Post a Comment