Linux

Manipulate text with sed


Sed is a very handy and very powerful little text manipulator. Sed is short for "stream editor" and what it does is manipulate and filter text. Typically, sed is used "in-transmit," meaning that you pipe the output of one command into sed to have it modify the contents of another program's output and format it, rendering new output. You can also operate sed on a text file; it will send the transformed text to standard output, which can then be redirected into another file.

The best way to illustrate the power of sed is to provide a few examples:

$ printf "line one\\nline two\\n" | sed -e 's/.*/( & )/'

( line one )

( line two )

This example outputs two lines, and sed transforms them into two lines wrapped in parentheses. It does this by matching a pattern and transforming it. The expression is s/[pattern]/[replacement]/. You can use other characters as delimiters; in this case I used the backslash (/), but you can also use a comma or pipe (|).

In the above expression, the pattern to match is ".*" (everything); the replacement expression uses the ampersand (&) as a placeholder to indicate all matched text. In this case, it's the entire line, so the replacement text is ( [text] ).

You can also use sed to transpose text. Assume you had a file with two words per line, but you wished to have the second word displayed first, then the second, separated by a comma:

$ printf "line one\\nline two\\n" | sed -e 's/\\(.*\\) \(.*\\)/\\2,\\1/'

one,line

two,line

Here the line line one is transformed into one,line. The pattern uses parentheses to create matching blocks. In other words, the expression (.*) (.*) matches one string, a space, then another string. Both of these strings are placed into hold buffers, which are represented by \1 for the first and \2 for the second. The replacement expression then uses these hold buffers to place the text in the format we want: second string, comma, first string.

You can use sed to do some very interesting things, such as create a command to rename certain files:

$ ls -1 *.txt | sed -e 's/.*/mv & &.old/' >execute; sh execute && rm -f execute

This chain of commands takes the output of ls -1 *.txt, which sed modifies -- turning the file name from list.txt to mv list.txt list.txt.old, which is then piped into a file called execute. Once this is complete, execute is executed by the sh shell, which will perform the mv command on each listed file; when it has successfully completed, the execute file is removed.

This has just scratched the surface of using sed. It is extremely powerful and has many interesting uses, and is definitely worth a closer look.

Additional sed resources:

Delivered each Tuesday, TechRepublic's free Linux NetNote provides tips, articles, and other resources to help you hone your Linux skills. Automatically sign up today!

UPDATE: 11:40 a.m. (ET) on 8/14/07: When first posted there was a problem with the formatting that ignored the escape characters as noted in the comments below. This problem has been corrected. I apologize for the confusion. We will work on resolving this problem for the future.

About

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

23 comments
arran.price
arran.price

and basically if I was going to learn sed, grep or awk that I shouldnt. anything that can be done with any of these tools can be done with perl on the command line (without writing a script). It gives you more power, only means you need to learn the syntax of one tool and is typically a much more useful tool to learn. Thats not to say you need to learn all of perl, just the sed/awk/grep or whatever part of it. they are all fairly similar and hence its not harder to learn. I gave the same advice to someone who was already very proficient with sed and awk and now uses perl exclusively for the same things. my 2c

lefty.crupps
lefty.crupps

Maybe I'm just not enough of a programmer to follow this article well, or maybe my Kubuntu is broke (but it's not)... but here are the results of my commands: eljefe@home:~$ printf "line onenline twon" | sed -e 's/.*/( & )/' ( line onenline twon )eljefe@home:~$ eljefe@home:~$ printf "line onenline twon" | sed -e 's/(.*) (.*)/2,1/' line onenline twoneljefe@home:~$ Basically nothing was broken down like you showed that it would; it just became a part of the next line, before my prompt. Part of the reason, in my head, is that you didn't give a single example of what was supposed to be the delimiter in each example. How would the command know that some random letter 'n' would mean 'line break' (it wasn't a \n in your example, but maybe it was supposed to be... editors?). The -e flag, for those who have to look (me), means there is a script involved, or so says the help screen. Where is the script? I feel like I am missing a lot here. The final example, although the most difficult, seemed to me to be the most accurately explained. This is one that I would like to test a few times, and then I would use it often if it works better than the first two. Thanks for the article, but please write it so that we can all understand it, with real examples, not variables in place of variables in place of a useful example (yes I feel that you're two levels away from reality here). But maybe its just me, that's been known to happen also.

flhtc
flhtc

Throw in a little grep and awk and you got some real power!

Ajax4Hire
Ajax4Hire

chain several sed commands together. Some stream edits can be quit long and involved. To ease the complexity and increase the understandability, chain several sed commands together: For example: ls -alF | sed -e 's/Jan /01-/' | sed -e 's/Feb /02-/' | ... to replace Jan with 01-, Feb with 02- and so on. Trivial example but illustrates the point.

CharlieSpencer
CharlieSpencer

What's the purpose of the 'n' character in "line onenline twon"? Is it some sort of text separator? How does the sed command know to interpret the 'n' after 'one' and 'two', but not the 'n' in the two occurences of the word 'line'?

jynyl
jynyl

The examples don't work.

mark.howe
mark.howe

For the first example to work you need backslashes before each n (\n).

Bernard.L.duBreuil
Bernard.L.duBreuil

Also, do you really want a -l flag on the ls command in the last example?

vdanen
vdanen

That should be "line one\nline two"... let's see if the comments print it out properly. I've just sent an email to my editor to see if those examples can be fixed.

mark.howe
mark.howe

Each 'n' should actually be the new line sequence \n (i.e. n preceded by the escape character).

SnoopDoug
SnoopDoug

This kind of issue--code in a submission is whacked unrecognizable--is endemic to online publishing. If you belong to any Yahoo group I'm sure you've seen URLs with random spaces. The best solution would be if online publishers gave their authors a "staging" area where the publisher would upload the article and the author could confirm everything works as advertised. This will never happen as all of this information is provided gratis. So folks, you get what you pay for. You want free code snippets, you get an occasional bug (and misspelling). Rather than rag the author, whose code probably worked in the original manuscript, get out your editor and fix it yourself. One additional tidbit would be to add how to put these sed commands into a shell script. There are some additional escape sequences you must use to make this work. Any non-trivial sed command takes a while to get it to run well. If it is useful then you will want to keep it in your bag o' tricks. I must have a dozen or so SH files in my bin directory. doug in Seattle

Selena Frye
Selena Frye

Sorry for the confusion. For some reason, the escapes are not being rendered properly in the post. I am working on getting this resolved ASAP.

Andrew T. Fry
Andrew T. Fry

All the examples are missing backslashes everywhere it's as if someone ran s/\\//g over the examples.

vdanen
vdanen

That's the number one, not the letter "l".

CharlieSpencer
CharlieSpencer

That would explain why the output wasn't ( li e o e li e two ) or ( li ) ( e ) ( o ) ( e ) ( li ) ( e ) ( two ) depending on how I misinterpreted the character :-)

CharlieSpencer
CharlieSpencer

"Rather than rag the author, whose code probably worked in the original manuscript, get out your editor and fix it yourself." This assumes I know what's wrong and how to fix it. I don't think anybody here was ragging the author. Some of us were just confused.

mark.howe
mark.howe

It all looks great now, thanks. As someone else has pointed out elsewhere, it would be best to delete all the posts referring to the (now non-existent errors) they'll only be confusing to future readers.

vdanen
vdanen

That must be a symptom of editing... =( Or the software is sanitizing things, because that's not how it was submitted.

vdanen
vdanen

The formatting of the platform is what wrecked things. It should be ok now. I'm not able to delete posts, but the examples are properly showing the "\n" bit now, so hopefully it will be easier to read.

CharlieSpencer
CharlieSpencer

This is a potentially useful article, but either the formatting or the editing rendered it less effective than it could have been. If it gets cleaned up, it may be worth deleting the posts about the formatting problem since they'll just confuse future readers.

CharlieSpencer
CharlieSpencer

I'm grateful for it. If I'm mocking anything, it's the original article. I'm -very- new to Linux and was not familiar with the \n line separator before your reply. Since I didn't know it existed, I didn't realize the '\' was missing, so the 'n' characters confused the 'l' out of me.

mark.howe
mark.howe

You're absolutely right to mock my reply to your question. I should have written it more carefully, but you've obviously more than understood what I meant to put. Perhaps I should have just corrected the line: printf "line one\nline two\n" | sed -e 's/.*/( & )/'