Thirteen Useful Tools for Working with Text on the Command Line

Text Tool Linux Command Line Featured

GNU/Linux distributions include a wealth of programs for handling text, most of which are provided by the GNU core utilities. There’s somewhat of a learning curve, but these utilities can prove very useful and efficient when used correctly.

Here are thirteen powerful text manipulation tools every command-line user should know.

1. cat

Cat was designed to concatenate files but is most often used to display a single file. Without any arguments, cat reads standard input until Ctrl + D is pressed (from the terminal or from another program output if using a pipe).  Standard input can also be explicitly specified with a -.

Cat has a number of useful options, notably:

  • -A prints “$” at the end of each line and displays non-printing characters using caret notation.
  • -n numbers all lines.
  • -b numbers lines that are not blank.
  • -s reduces a series of blank lines to a single blank line.

In the following example, we are concatenating and numbering the contents of file1, standard input, and file3.

Linux Text Tools Cat

2. sort

As its name suggests, sort sorts file contents alphabetically and numerically.

Linux Text Tools Sort

3. uniq

Uniq takes a sorted file and removes duplicate lines. It is often chained with sort in a single command.

Linux Text Tools Uniq

4. comm

Comm is used to compare two sorted files, line by line. It outputs three columns: the first two columns contain lines unique to the first and second file respectively, and the third displays those found in both files.

Linux Text Tools Comm

5. cut

Cut is used to retrieve specific sections of lines, based on characters, fields, or bytes. It can read from a file or from standard input if no file is specified.

Cutting by character position

The -c option specifies a single character position or one or more ranges of characters.

For example:

  • -c 3:  the 3rd character.
  • -c 3-5:  from the 3rd to the 5th character.
  • -c -5 or -c 1-5:  from the 1st to the 5th character.
  • -c 5-: from the 5th character to the end of the line.
  • -c 3,5-7: the 3rd and from the 5th to the 7th character.
Linux Text Tools Cut Char

Cutting by field

Fields are separated by a delimiter consisting of a single character, which is specified with the -d option.  The -f option selects a field position or one or more ranges of fields using the same format as above.

Linux Text Tools Cut Field

6. dos2unix

GNU/Linux and Unix usually terminate text lines with a line feed (LF), while Windows uses carriage return and line feed (CRLF). Compatibility issues can arise when handling CRLF text on Linux, which is where dos2unix comes in. It converts CRLF terminators to LF.

In the following example, the file command is used to check the text format before and after using dos2unix.

Linux Text Tools Dos2unix

7. fold

To make long lines of text easier to read and handle, you can use fold, which wraps lines to a specified width.

Fold strictly matches the specified width by default, breaking words where necessary.

Linux Text Tools Fold

If breaking words is undesirable, you can use the -s option to break at spaces.

Linux Text Tools Fold Spaces

8. iconv

This tool converts text from one encoding to another, which is very useful when dealing with unusual encodings.

  • “input_encoding” is the encoding you are converting from.
  • “output_encoding” is the encoding you are converting to.
  • “output_file” is the filename iconv will save to.
  • “input_file” is the filename iconv will read from.

Note: you can list the available encodings with iconv -l

9. sed

sed is a powerful and flexible stream editor, most commonly used to find and replace strings with the following syntax.

The following command will read from the specified file (or standard input), replacing the parts of text that match the regular expression pattern with the replacement string and outputting the result to the terminal.

To modify the original file instead, you can use the -i flag.

Linux Text Tools Sed

10. wc

The wc utility prints the number of bytes, characters, words, or lines in a file.

Linux Text Tools Wc

11. split

You can use split to divide a file into smaller files, by number of lines, by size, or to a specific number of files.

Splitting by number of lines

Linux Text Tools Split Lines

Splitting by bytes

Linux Text Tools Split Bytes

Splitting to a specific number of files

Linux Text Tools Split Number

12. tac

Tac, which is cat in reverse, does exactly that: it displays files with the lines in reverse order.

Linux Text Tools Tac

13. tr

The tr tool is used to translate or delete sets of characters.

A set of characters is usually either a string or ranges of characters. For instance:

  • “A-Z”: all uppercase letters
  • “a-z0-9”: lowercase letters and digits
  • “\n[:punct:]”: newline and punctuation characters

Refer to the tr manual page for more details.

To translate one set to another, use the following syntax:

For instance, to replace lowercase characters with their uppercase equivalent, you can use the following:

Linux Text Tools Tr

To delete a set of characters, use the -d flag.

Linux Text Tools Tr D

To delete the complement of a set of characters (i.e. everything except the set), use -dc.

Linux Text Tools Tr Dc

Conclusion

There is plenty to learn when it comes to Linux command line. Hopefully, the above commands can help you to better deal with text in the command line.

Karl Wakim

Karl Wakim is a technical author and Linux systems administrator.

3 comments

  1. IMO, pv is one of the most underrated tools and I use it all the time. It’s basically the equivalent of cat, but it gives you a progress bar along with an estimated time of completion. I find this incredibly helpful to know if I’m going to have to wait 5 seconds or 5 minutes for something to finish. Of course it’s only a guess, but it’s better than any guess I could make on my own.

    Unfortunately I don’t know of any distribution which provides pv by default, but most have a package available in the package manager.

    https://linux.die.net/man/1/pv

  2. Is there a CLI utility that can convert the text portion of a pdf file to another text file containing only the text without any extraneous images and/or coding?

  3. Not to be pedantic but when you sort simply sort -u which gives you a uniq
    And take your cat to the vet, cat -vet is a great command to show all hidden control chars etc.

Leave a Comment

Yeah! You've decided to leave a comment. That's fantastic! Check out our comment policy here. Let's have a personal and meaningful conversation.