5 Grep Tools for Linux

As every Linux user surely knows, grep is a reliable command-line tool for in-depth file searching. Still, many beginners avoid it because they dislike the terminal. The apps presented in this article aren’t exactly alternatives to grep because in some usage scenarios grep is truly irreplaceable. Instead, let’s call them visual upgrades for grep because they extend grep’s functionality and wrap it in a full-fledged graphical interface.

Regexxer is a practical file search tool that lets you edit files directly from its interface. You can search for files and folders by name and look inside text-based files (including HTML and XML files). The left side of the window lets you select the target folder and pattern (put * for all files or *.txt just for text files). Regexxer can perform recursive search in subfolders of any selected folder and include hidden files in the results.

grep-tools-regexxer

The right side of the window lets you perform “search and replace” on a selected file. Here you can replace only one instance of a found phrase or all of them automatically. You can also replace the selected phrase in all found files which is useful for batch-editing.

Back in the day Searchmonkey was very popular. At some point, the development of the Linux version ceased, and now the website offers new downloads for Windows only. Still, the old version can be installed from the repositories of nearly every Linux distribution. Perhaps surprisingly, it works great and it’s really fast. You can use Searchmonkey to find files and folders by name, or to look through their contents and search for phrases using regular expressions.

grep-tools-searchmonkey

Searchmonkey helps you build complex queries with the File Expression Wizard (activated by clicking the Expression Builder button) and an option called Test Regular Expression (in the Extras menu). It can search for files recursively, and you can set the search depth (how many subfolders it should look into) and filter files by size and date. In the “Options” tab, you can limit the number of files in the results and choose how many lines of context you want to see.

Instead of directly search your filesystem, DocFetcher will ask you to build an index and then look for your queries only in indexed files. It offers a portable version (just unpack it and run the .sh file from the terminal) for both 32- and 64-bit systems. To build an index, right-click in the “Search Scope” area on the left.

grep-tools-docfetcher

You can add folders to the index, pause index creation and continue it later, index archive files (ZIP, TAR) as folders, and exclude selected files from the index with the help of regular expressions.

DocFetcher has a built-in HTML renderer which lets you preview HTML files complete with formatting and images. It offers a privacy-conscious option to remove search history and lets you search for and within files using wildcards, Boolean operators, fuzzy search (finds similar words), proximity search (how far the words should be from each other in text), and more. DocFetcher supports an impressive number of formats, including Microsoft and Libre Office files (DOC, DOCX, ODT, OTP …), PDF and EPUB, HTML and XML, Outlook PST email files, and audio and image metadata.

Regain is a search engine for your desktop; something like Google, but for your files and folders. It’s written in Java, so it works on Linux, OS X and Windows, provided that you have Java properly installed and configured. The installation file is available on the project website, and you can simply extract it into a folder, open that folder in the terminal and run java -jar regain.jar to start the application. (The file “regain.jar” has to be executable). Regain will run in your default web browser.

grep-tools-regain

To search for your files and folders, Regain must first crawl your system and build a search index. In the “Preferences” form, you add the folders which you want indexed. If you don’t want to include particular files in the index, blacklist them in the “CrawlerConfiguration.xml” file. Once you start using Regain, it will search the index instead of scanning the entire hard drive. This saves system resources and makes searching faster.

Of all the tools on this list, PDFgrep is the most similar to original grep, but it’s also “the odd one out”, because it’s a command-line tool. Several distributions offer PDFgrep in their repositories, but the newest version (currently 1.3.2) has to be compiled.

While grep outputs the line number in which the search string appears, PDFgrep will show you the page number instead, which is more useful for PDF files as we tend to read them like books, not analyze them line by line. PDFgrep works only on PDF files. They have to be either converted from text or OCR-ed, not just scanned images.

grep-tools-pdfgrep

To search for a word in a PDF file, type:

pdfgrep word filename.pdf

To ignore the case, use the option -i:

pdfgrep -i word filename.pdf

This will find “Word”, “word”, “WORD” and other possible combinations. If you’re looking for a phrase, enclose it in quotation marks. Some useful options are:

  • -n: outputs the page number for every match
  • -c: prints only the number of matches in a file
  • -p shows the number of matches per page
  • -C NUMBER: prints the selected number of characters around every match for context. Instead of a number, you can write “line” and PDFgrep will print the entire line.

PDFgrep can recursively search in all subfolders of an active folder and look through multiple PDF files. It also supports regular expressions, and options can be combined:

pdfgrep -nH "Linux world" file1.pdf file2.pdf /home/user/Desktop/newfile.pdf

This would print the page number and the filename for each match (because of the -H option).

Which Linux tools and commands do you use to find files? Share your favorites in the comments below.

Image credit: Featured image source, Teaser image source