better string.match

Extend string.match to test and capture patterns

Pattern matching is a powerful feature of Lua’s standard string library. I use it often to automate text file conversion and reporting. For instance, I use pattern matching in a Lua script to format and email my weekly report.

I have maintained a weekly diary of projects I’ve worked on, accomplishments, meetings attended, plans, business travel, and other information since 1988. Each week is logged in a simple text file. This means that despite the evolution of text editors and file media over the decades, I can still read my original diary files. The Lua script that prepares my weekly report scans my latest diary file and uses pattern matches to extract and format an email that adheres to the current reporting standards where I work.

The string.match function is useful for both testing if a pattern exists in a string and for extracting substrings that match a pattern enclosed in parentheses. Lua calls these substrings “captures”. Often I want to do both simultaneously–test for a pattern in an if statement and capture substrings. For instance, in my weekly report generator, I have a bit of code:

if line:match("^Weekly Report.+(%d%d)/(%d%d)/(%d%d%d%d)") then
    local month = _1
    local day   = _2
    local year  = _3
    email:setSubject(("John Powers - %s-%s-%s Weekly Report"):format(year, month, day))

The first line checks to see if variable line starts with the text “Weekly Report” and contains a date. If it matches, as a side effect it also sets global variables _1, _2, and _3 to the captures, i.e. the month, day, and year captured from string line. The built-in definition of string.match does not have this side effect. But we can extend string.match to gain this new capability.

Here is the code I used to modify string.match.

do
    local smatch = string.match     -- keep the original definition of string.match
 
    -- String matching function
    -- Same results as string:match but as a side effect
    -- places the captures in global variables _1, _2, ...
    function string:match(pat)
        local matches = {smatch(self, pat)}    -- call the original match to do the work
        for i = 1, #matches do                 -- #matches == 0 if no matches
            _G["_" .. i] = matches[i]          -- assign captures to global variables
        end
        return unpack(matches)                 -- return original results
    end
end

Note the use of a do … end block. This creates a block that limits the scope of local variable smatch. Only the new function string.match can call it.

Placing captures into global variables is nothing new. Anyone familiar with the AWK, Perl, and Ruby scripting languages will recognize this feature right away.

One thought on “Extend string.match to test and capture patterns”

Leave a Reply