[Chapter 6] 6.5 A Final Look at Pattern Matching

6.5 A Final Look at Pattern Matching

We conclude this chapter by presenting sample tasks that involve complex pattern-matching concepts. Rather than solve the problems right away, we'll work toward the solutions step by step.

6.5.1 Deleting an Unknown Block of Text

Suppose you have a few lines with this general form:

the best of times; the worst of times:  moving
The coolest of times; the worst of times:  moving

The lines that you're concerned with always end with moving, but you never know what the first two words might be. You want to change any line that ends with moving to read:

The greatest of times; the worst of times:  moving

Since the changes must occur on certain lines, you need to specify a context-sensitive global replacement. Using :g/moving$/ will match lines that end with moving. Next, you realize that your search pattern could be any number of any character, so the metacharacters .* come to mind. But these will match the whole line unless you somehow restrict the match. Here's your first attempt:

:g/moving$/s/.*of/Thegreatestof/

This search string, you decide, will match from the beginning of the line to the first of. Since you needed to specify the word of to restrict the search, you simply repeat it in the replacement. Here's the resulting line:

The greatest of times:  moving

Something went wrong. The replacement gobbled the line up to the second of instead of the first. Here's why. When given a choice, the action of "match any number of any character" will match as much text as possible. In this case, since the word of appears twice, your search string finds:

the best of times; the worst of

rather than:

the best of

Your search pattern needs to be more restrictive:

:g/moving$/s/.*of times;/The greatest of times;/

Now the .* will match all characters up to the instance of the phrase of times;. Since there's only one instance, it has to be the first.

There are cases, though, when it is inconvenient, or even incorrect, to use the .* metacharacters. For example, you might find yourself typing many words to restrict your search pattern, or you might be unable to restrict the pattern by specific words (if the text in the lines varies widely). The next section presents such a case.

6.5.2 Switching Items in a Database

Suppose you want to switch the order of all last names and first names in a (text) database. The lines look like this:

Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567
Name: Joy, Susan S.; Areas: Graphics; Phone: 999-3333

The name of each field ends with a colon, and each field is separated by a semicolon. Using the top line as an example, you want to change Feld, Ray to Ray Feld. We'll present some commands that look promising but don't work. After each command, we show you the line the way it looked before the change and after the change.

:%s/: \(.*\), \(.*\);/: \2 \1;/

Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567	Before
Name: UNIX Feld, Ray; Areas: PC; Phone: 123-4567	After

We've highlighted the contents of the first hold buffer in bold and the contents of the second hold buffer in italic. Note that the first hold buffer contains more than you want. Since it was not sufficiently restricted by the pattern that follows it, the hold buffer was able to save up to the second comma. Now you try to restrict the contents of the first hold buffer:

:%s/: \(....\), \(.*\);/: \2 \1;/

Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567	Before
Name: Ray; Areas: PC, UNIX Feld; Phone: 123-4567	After

Here you've managed to save the last name in the first hold buffer, but now the second hold buffer will save anything up to the last semicolon on the line. Now you restrict the second hold buffer, too:

:%s/: \(....\), \(...\);/: \2 \1;/

Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567	Before
Name: Ray Feld; Areas: PC, UNIX; Phone: 123-4567	After

This gives you what you want, but only in the specific case of a four-letter last name and a three-letter first name. (The previous attempt included the same mistake.) Why not just return to the first attempt, but this time be more selective about the end of the search pattern?

:%s/: \(.*\), \(.*\); Area/: \2 \1; Area/

Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567	Before
Name: Ray Feld; Areas: PC, UNIX; Phone: 123-4567	After

This works, but we'll continue the discussion by introducing an additional concern. Suppose that the Area field isn't always present or isn't always the second field. The above command won't work on such lines.

We introduce this problem to make a point. Whenever you rethink a pattern match, it's usually better to work toward refining the variables (the metacharacters), rather than using specific text to restrict patterns. The more variables you use in your patterns, the more powerful your commands will be.

In the current example, think again about the patterns you want to switch. Each word starts with an uppercase letter and is followed by any number of lowercase letters, so you can match the names like this:

[A-Z][a-z]*

A last name might also have more than one uppercase letter (McFly, for example), so you'd want to search for this possibility in the second and succeeding letters:

[A-Z][A-Za-z]*

It doesn't hurt to use this for the first name, too (you never know when McGeorge Bundy will turn up). Your command now becomes:

:%s/: \([A-Z][A-Za-z]*\), \([A-Z][A-Za-z]*\);/: \2 \1;/

Quite forbidding, isn't it? It still doesn't cover the case of a name like Joy, Susan S. Since the first-name field might include a middle initial, you need to add a space and a period within the second pair of brackets. But enough is enough. Sometimes, specifying exactly what you want is more difficult than specifying what you don't want. In your sample database, the last names end with a comma, so a last-name field can be thought of as a string of characters that are not commas:

[^,]*

This pattern matches characters up until the first comma. Similarly, the first-name field is a string of characters that are not semicolons:

[^;]*

Putting these more efficient patterns back into your previous command, you get:

:%s/: \([^,]*\), \([^;]*\);/: \2 \1;/

The same command could also be entered as a context-sensitive replacement. If all lines begin with Name, you can say:

:g/^Name/s/: \([^,]*\), \([^;]*\);/: \2 \1;/

You can also add an asterisk after the first space, in order to match a colon that has extra spaces (or no spaces) after it:

:g/^Name/s/: *\([^,]*\), \([^;]*\);/: \2 \1;/

6.5.3 Using :g to Repeat a Command

As we've usually seen the :g command used, it selects lines that are typically then edited by subsequent commands on the same line -- for example, we select lines with g, and then make substitutions on them, or select them and delete them:

:g/mg[ira]box/s/box/square/g
:g/^$/d

However, in his two-part tutorial in UNIX World,[9] Walter Zintz makes an interesting point about the g command. This command selects lines -- but the associated editing commands need not actually affect the lines that are selected.

[9] Part 1, "vi Tips for Power Users," appears in the April 1990 issue of UNIX World. Part 2, "Using vi to Automate Complex Edits," appears in the May 1990 issue. The examples presented are from Part 2.

Instead, he demonstrates a technique by which you can repeat ex commands some arbitrary number of times. For example, suppose you want to place ten copies of lines 12 through 17 of your file at the end of your current file. You could type:

:1,10g/^/ 12,17t$

This is a very unexpected use of g, but it works! The g command selects line 1, executes the specified t command, then goes on to line 2, to execute the next copy command. When line 10 is reached, ex will have made ten copies.

6.5.4 Collecting Lines

Here's another advanced g example, again building on suggestions provided in Zintz's article. Suppose you're editing a document that consists of several parts. Part 2 of this file is shown below, using ellipses to show omitted text and displaying line numbers for reference:

301  Part 2
302  Capability Reference
303  .LP
304  Chapter 7
305  Introduction to the Capabilities
306  This and the next three chapters ...

400  ... and a complete index at the end.
401  .LP
402  Chapter 8
403  Screen Dimensions
404  Before you can do anything useful
405  on the screen, you need to know ...

555  .LP
556  Chapter 9
557  Editing the Screen
558  This chapter discusses ...

821  .LP
822  Part 3:
823  Advanced Features
824  .LP
825  Chapter 10

The chapter numbers appear on one line, their titles appear on the line below, and the chapter text (highlighted for emphasis) begins on the line below that. The first thing you'd like to do is copy the beginning line of each chapter, sending it to an already existing file called begin.

Here's the command that does this:

:g /^Chapter/ .+2w >> begin

You must be at the top of your file before issuing this command. First you search for Chapter at the start of a line, but then you want to run the command on the beginning line of each chapter -- the second line below Chapter. Because a line beginning with Chapter is now selected as the current line, the line address .+2 will indicate the second line below it. The equivalent line addresses +2 or ++ work as well. You want to write these lines to an existing file named begin, so you issue the w command with the append operator >>.

Suppose you want to send the beginnings of chapters that are only within Part 2. You need to restrict the lines selected by g, so you change your command to this:

:/^Part 2/,/^Part 3/g /^Chapter/ .+2w >> begin

Here, the g command selects the lines that begin with Chapter, but it searches only that portion of the file from a line starting with Part 2 through a line starting with Part 3. If you issue the above command, the last lines of the file begin will read as follows:

This and the next three chapters ...
Before you can do anything useful
This chapter discusses ...

These are the lines that begin Chapters 7, 8, and 9.

In addition to the lines you've just sent, you'd like to copy chapter titles to the end of the document, in preparation for making a table of contents. You can use the vertical bar to tack a second command after your first command, like so:

:/^Part 2/,/^Part 3/g /^Chapter/ .+2w >> begin | +t$

Remember that with any subsequent command, line addresses are relative to the previous command. The first command has marked lines (within Part 2) that start with Chapter, and the chapter titles appear on a line below such lines. Therefore, to access chapter titles in the second command, the line address is + (or the equivalents +1 or .+1). Then use t$ to copy the chapter titles to the end of the file.

As these examples illustrate, thought and experimentation may lead you to some unusual editing solutions. Don't be afraid to try things! Just be sure to back up your file first!


6.4 Pattern-Matching Examples		7. Advanced Editing