KBD

Keith Devens .com

Saturday, November 22, 2008 Flag waving
Of course, that's just my opinion. I could be wrong. – Dennis Miller
← Adam Langley on FactorJames White: A Sad, Sobering Statistic →

Daily link icon Tuesday, January 17, 2006

Unix command line scripting question

I'm wondering if it's possible to do the following just using unix command line utilities. Given a text file with duplicate lines, give a printout of all unique lines in the file and a count of how many copies there are of each.

Example file:

foo
bar
foo
baz
bar

Example output:

foo    2
bar    2
baz    1

It'd be trivial for me to write a script to accomplish this. I'm just wondering if there's a simple way to do it with sort, uniq, and awk or whatever.

Update: Solutions in the comments (thanks!).

Also, that should probably be "bar 2, foo 2, baz 1" for that last example, since "bar" comes before "foo" alphabetically. Though, ironically the order I gave is what you get with the straightforward sort -rn.

← Adam Langley on FactorJames White: A Sad, Sobering Statistic →

Comments XML gif

Jeff (http://www.scraprap.com) wrote:

sort file | uniq -c
2 bar
1 baz
2 foo

more work needed to get the line count after...

∴ Jeff | 17-Jan-2006 3:16pm est | http://www.scraprap.com | #9010

Joseph Scott (http://joseph.randomnetworks.com/) wrote:

To get the list sorted by count:

sort strings | uniq -c | sort -rn
2 foo
2 bar
1 baz

I agree with Jeff, getting the numbers after the string takes more work. A little bit of awk will do the trick:

sort strings | uniq -c | sort -rn | awk '{print $2 " " $1}'
foo 2
bar 2
baz 1

∴ Joseph Scott | 17-Jan-2006 3:52pm est | http://joseph.randomnetworks.com/ | #9012

Jeff (http://www.scraprap.com) wrote:

Ah yes the final sort and awk. I rarely have reason to use awk and should have spent a minute in the man page. swaping fields is the second example. :-)

Joseph, apparently you don't need to define the space between the fields. awk '{print $2, $1}' seems to preserve the existing space. Probably because it is being treated as the delimiter between fields.

∴ Jeff | 17-Jan-2006 10:17pm est | http://www.scraprap.com | #9014

Keith (http://keithdevens.com/) wrote:

You guys rock.

Keith | 18-Jan-2006 6:43am est | http://keithdevens.com/ | #9015

Keith (http://keithdevens.com/) wrote:

Hmm, something's not quite right.

$ sort strings.txt | uniq -c | sort -rn
      2 foo
      2 bar
      1 baz

$ sort strings.txt | uniq -c | sort -rn | awk '{print $2, $1}'  
 2o
 2r
 1z

That's weird. (Note: I'm using cygwin).

Also, any clue how to get it to sort reverse numerically, then alphabetically, so it's "bar 2, foo 2, baz 1"? Smiley

Keith | 18-Jan-2006 7:21am est | http://keithdevens.com/ | #9016

Davd wrote:

I'd love to see this in perl and python. Why? Because I've done this manually alot and my brain hurts this morning.

∴ Davd | 18-Jan-2006 9:21am est | #9022

Keith (http://keithdevens.com/) wrote:

Ok, well here's a very short and dirty Python script that does it:

import sys
result = {}
for line in [line.rstrip() for line in open(sys.argv[1]).readlines()]:
    result[line] = result.get(line,0)+1
for tup in sorted(result.items(), lambda a,b: -cmp(a[1],b[1]) or cmp(a[0],b[0])):
    print tup[0],"\t",tup[1]

Probably fairly straightforward to translate into Perl.

Keith | 18-Jan-2006 10:47am est | http://keithdevens.com/ | #9024

Jeff (http://www.scraprap.com) wrote:

I don't know why it chops it that way on cygwin. You could try Joseph's awk syntax awk '{print $2 " " $1}' and see if that makes a difference.

I tested on OS X, FreeBSD, and RedHat with the same results.

On my systems to do the sorting the way you want:
sort strings.txt | uniq -c | sort -t " " -k1rn -k2 | awk '{print $2, $1}'
bar 2
foo 2
baz 1

∴ Jeff | 18-Jan-2006 11:24am est | http://www.scraprap.com | #9025

Keith (http://keithdevens.com/) wrote:

You could try Joseph's awk syntax awk '{print $2 " " $1}' and see if that makes a difference.

Sorry, I should have mentioned that I already had and it didn't. It's weird.

And, thanks for the lesson in shell-fu.

Keith | 18-Jan-2006 11:38am est | http://keithdevens.com/ | #9026

Feel free to post a comment below. Please see my comment policy.

Formatting Rules (No HTML):

  • **bold**, *italic*, _underlined_, --strikeout--
  • "text"="url" creates a link, and URLs are auto-highlighted
  • Blockquote: Like e-mail, begin paragraph with > (greater-than sign)
  • Lists: begin paragraph with *,-, or + (unordered), or # (ordered)
  • Code block: ?!code:language=perl|php|sql|javascript|etc.{\n}...{\n}?!/code

:
(will be your IP address if blank)
: (optional)
(Will not be shown on site)

: (optional)
:

November 2008
SunMonTueWedThuFriSat
 1
2345678
9101112131415
16171819202122
23242526272829
30 



RSS feed RSS feed for Keith's Weblog
Atom feed Atom feed for Keith's Weblog
Weblog archive
Recent comments
  on 5 posts

Recent comments XML

new⇒Ubuntu Nvidia install not working for me... could use a hand

Cant change xorg.conf!

I'm not​the owner of it, don't ask me​why
but it...

I)orogon: Nov 22, 5:41am

Calif. Supreme Court to take up gay marriage ban

I would argue the point is not​definitional.  While the word​marriage is su...

Justin: Nov 20, 4:37pm

Java join function

Meh, don't have null strings in​your string arrays imo, but you're​welcome ...

Keith: Nov 19, 7:51pm

Girls, please don't get breast implants

sorry but another thing i have to​make a comment on about you​men...the men...

happynow: Nov 17, 11:36pm

Books by Vincent Cheung

to all Cheung​fans:

read:

http://www.progin​osko.com/aquascum/cheung.h...

Zamir: Nov 16, 9:07am

Generated in about 0.189s.

(Used 8 db queries)

mobile phone