How to Check Text for Plagiarism

Plagiarism Detection Feature

Plagiarism has always been an issue for teachers, writers, editors, and others who deal with words and ideas on a regular basis, and it’s only gotten worse thanks to the advent of the Internet and the copy-paste function. Plagiarism-checking software can help, but not every program has a large database or an accurate algorithm. Some of the sketchier checkers may even use submitted content for their own purposes. Even the best checkers don’t have a 100-percent success rate. But knowing how the tools that check text for plagiarism work will help you decide which ones are worth your time.

How do plagiarism-checkers work?

Plagiarism Detection Methods Performance

Every piece of text-matching software has its own approach. Most work on the same basic principle: check entered content against a database of source material and look for similarities. Considering the vast amount of content that may potentially be plagiarized, though, this is not a trivial task. A simple line-by-line search would take forever and be impractically resource-intensive.

That’s why most tools that check text for plagiarism use fingerprinting. For every piece of text in the database and every piece of text they check, they extract sets of samples and run each one through a hashing algorithm which produces a unique identifier for every input.

Plagiarism Detection Fingerprinting

If a paper has a fingerprint that’s identical to one in the database, it means they both have the same input and may be plagiarism. This unavoidably results in lower accuracy, but a good fingerprinting algorithm can take samples from the paper in such a way that it can detect not only exact matches but plagiarism where some of the content has been altered – such as by a spinning program.

If the program finds a fingerprint match, it may simply flag a possible instance of plagiarism and call it a day. Higher-quality software, though, will often then use direct string matching to check the texts line by line. This is a task that gets much computationally lighter once the database has been narrowed down. This helps confirm initial fingerprint hits and provides a lot more data for the humans making the ultimate decision.

Things to Look for in a Good Plagiarism Checker

A plagiarism checker should have:

  1. Strong privacy policy (e.g., they don’t store/sell your content)
  2. Large database
  3. Good algorithm
Plagiarism Detection Diagram

Privacy policy

Many free (or, more often, freemium) plagiarism checkers are legitimate, making money through ads or by selling a premium version. However, some of the less-scrupulous ones may actually be taking the writing you check and using it for their own purposes. It may end up being used as content on a study website or being run through a “spinner” to change its wording and be put up as an article to generate traffic. It’s a good idea to check the privacy policy and do a quick check on the site’s reputation. Especially do this if it seems a bit sketchy or too good to be true.

Database

If a plagiarism checker doesn’t have access to the right source material, it won’t be able to tell when that material is plagiarized. This is typically the biggest thing that separates lower-quality plagiarism checkers from their premium counterparts. Getting access to collections of books, articles, and other content that is owned by someone else isn’t free or easy, so many tools can only check the Internet. That is where a lot of plagiarism happens, though, so having access to books, journal articles, or other private materials is most important if you’re checking for plagiarism that someone might have put a bit more work into.

Algorithm

Most plagiarism-checkers don’t explicitly reveal their algorithm, but the quality and accuracy of the results are a good indicator of how well-built it is. This can be difficult to measure directly, but looking at how much detail it returns, reading user reviews, and testing to see if it can detect material you copy from other sources can give you a good idea of how comprehensive the site searches. If the free version fails to pick up a copy-paste from a Wikipedia article, for example, you probably can’t expect the paid version to be very thorough.

The Best Plagiarism Checker

Professional-level plagiarism checkers mostly all come at a price, and most of the free options available are either worse than Google or have privacy policies that imply they might be using your content for their own purposes. The best you’re likely to get for free is either a few trial pages or a simple report that simply tells you if there’s plagiarism present. The latter can still be useful since it gives you a quick way to assess whether or not you should use a more in-depth tool or go through a paper manually.

I tested each of the tools below using several texts (articles I’ve written, Wikipedia entries, and news sources) and all of them were able to accurately identify plagiarized content, along with the sources. I tested quite a few completely free sites, but many of them were unable to identify passages from my articles and even failed to catch copy-pastes from the BBC and Wikipedia, despite a quick Google search popping up with the plagiarized content immediately.

1. Google

If there’s a specific piece of text you suspect of having been plagiarized, Google is actually a great first stop. You can only search for 32 words at a time, but that can often be enough to turn up the website, paper, or book that someone has copied from, even if they’ve altered a few words.

2. Grammarly

Plagiarism Detection Grammarly

Grammarly requires a subscription to it editing service for you to get full plagiarism results, but there’s no charge for the initial check, which tells you whether there’s likely to be plagiarism or not. That’s more than you get with a lot of other apps, and I found it correctly flagged plagiarism most of the time, making it a good first-line free option.

3. SearchEngineReports

Plagiarism Detection Searchenginereports

It’s basically a Google wrapper, but it’s free and actually works better than a lot of other free options. It got most of what I put into it correct. SearchEngineReports lets you check up to 2,000 words of text per search (with no upper limit on the number of searches) and runs it through Google piece by piece, telling you which sentences produce hits. It also gives you the option to rewrite plagiarized content to avoid future detection, which I don’t advise you do.

4. Copyleaks

Plagiarism Detection Copyleaks

Copyleaks gives you 2,500 words, or about 10 pages, of free checking. It’s pretty widely used, has a user-friendly interface, and includes a large database of academic and scientific work to check against. If you need to go beyond Internet content, this is a reliable place to start. It got all the online content I threw at it.

5. Quetext

Plagiarism Detection Quetext

You get three 500-word checks for free, and after that you have to subscribe. Quetext has a good reputation for accuracy and thoroughness, though, and, accordingly, it performed well in my tests. Its database includes a lot of books and articles as well as Internet content. If you’re looking for something comprehensive but cheaper than Copyleaks, Quetext is a good place to start.

6. Plagscan

Plagiarism Detection Plagscan

PlagScan has an extensive database of books, articles, and other texts and returns a detailed analysis that, for me, identified just about all of the plagiarized sources. The free trial is good for 2,000 words, after which you’ll have to buy credits to continue. If you don’t have a huge amount of text to check, the system of buying credits to check a certain number of words could end up being cheaper than the subscription options offered by most other plagiarism checkers.

There’s no magic bullet

Plagiarism checkers, especially the budget ones, almost definitely won’t be able to catch everything. If a plagiarist uses obscure sources or rewrites enough, there’s not much a machine can do to flag them, and even knowledgeable humans can be fooled. They can be a good first line of defense, though, and can at least deter low-effort plagiarism.

Image credits: PD Methods Detection Performance, Example-Of-Article-Plagiarism-Diagram, A hash function at work

Andrew Braun Andrew Braun

Andrew Braun is a lifelong tech enthusiast with a wide range of interests, including travel, economics, math, data analysis, fitness, and more. He is an advocate of cryptocurrencies and other decentralized technologies, and hopes to see new generations of innovation continue to outdo each other.

Leave a Comment

Yeah! You've decided to leave a comment. That's fantastic! Check out our comment policy here. Let's have a personal and meaningful conversation.