Verify File Integrity with Hashdeep

Our site remains ad-free thanks to Linode. See if Linode works for you with $100 in credit. Accelerate innovation with simple, affordable, and accessible Linux cloud solutions and services. See why developers around the world trust Linode.

The post is not sponsored by the above affiliate and the content here is not representative of their company.

File integrity ensures that the files on your system have not been modified since the last time you generated a checksum of the file. Checksums are often times generated and displayed when downloading files off of the internet to ensure that the file you downloaded is both properly downloaded and that the file downloaded is identical to the one being offered. File integrity can also be used on your server to alert you whenever a file has been modified.

There are a few limitations when using file integrity applications. The first is when a system is compromised, can you actually trust the results of the integrity check? Storing your checksum / hash results file off of the server is a good first start, but applications and the kernel can be modified to return inaccurate results and hide the presence of any modifications. Secondly, file integrity checks doesn’t prevent attacks but can help show what the attackers intentions were and can alert you after the fact that an attack took place. Finally, there are normal reasons why files would change, sometimes applications, updates, and a user will legitimately modify a file.

Generating Checksum File using Hashdeep

The first step to verify file integrity require foresight and should be done when you first setup your server and are sure it hasn’t been compromised. To be able to check the integrity of files we need to generate a file that contains the hashes of every file on the system. Hashdeep offers a few different algorithms that can be used for generating checksums, but I prefer using SHA-256 since the other algorithms such as MD5 and SHA-1 have published collision attacks.

The command below will create checksums for all regular files for the majority of root directories except for “/proc”, “/lost+found”, “/media”, “/sys”, and “/mnt”. When trying to hash files in “/proc”, you will run into issues where the program will “hang” and will never complete. The command will save the hashes in the hashdeep default format (similar to a CSV file) but it’s possible to save the results as a DFXML file.

$ hashdeep -c sha256 -r -o f /bin /boot /dev /etc /home /lib /lib64 /opt /root /sbin /srv /usr /var > ~/file_hashes

Once all the hashes are computed (for new systems, takes about 1 minute) you should save the hash file onto a separate system, preventing the file that is used to perform audits from being altered. If the hash file ends up being modified, you can no longer trust the results of hashdeep.

If you are running hashdeep and it hasn’t completed in a reasonable time (about 10 seconds per gigabyte of stored files), you should cancel the operation (CTRL + C). Now you will need to stop redirecting the output to a file and add the option -e to show the estimated amount of time per file. Watch the terminal and you will eventually see a file that is causing hashdeep to get caught up and is preventing it from being able to finish.

Auditing a System using Hashdeep

Now that the hashes have been generated and the hash file has been stored on a different secure system, we can now run an audit. When running an audit, you need to remember the folders you used when generating your hashes (see tricks section below) along with the algorithm used.

$ hashdeep -c sha256 -k ~/file_hashes -s -x -r -o f /bin /boot /dev /etc /home /lib /lib64 /opt /root /sbin /srv /usr /var

The above command will output a list of files that have been modified or added to the system. You can replace the -x option with -a to perform an audit that simply returns if the audit was successful or failed, and this will take into account deleted files. Adding the option -v or -v -v will have the output be more verbose and display the amount of matched, deleted, moved, and new files found.

$ hashdeep -c sha256 -k ~/file_hashes -s -a -v -r -o f //bin /boot /dev /etc /home /lib /lib64 /opt /root /sbin /srv /usr /var
hashdeep: Audit failed
          Files matched: 14425
Files partially matched: 0
            Files Moved: 6
        New files found: 100
  Known files not found: 104

Hashdeep Tricks

There are a few tricks to using hashdeep that can greatly help you when creating automated scripts to check the integrity of files on your server. The first trick is to look at the file that was generated by hashdeep if you’re unable to remember the parameters and options you used when creating the hashes. The file’s header will contain the directory and command that you used to generate the file.

%%%% HASHDEEP-1.0
%%%% size,sha256,filename
## Invoked from: /root
## # hashdeep -c sha256 -r -o f /bin /boot /dev /etc /home /lib /lib64 /opt /root /sbin /srv /usr /var

The next trick is to create a new file with the path of files for which you want hashes. Using the command find to create a list of files for hashdeep to use can greatly increase the selective process of what files need to be hashed. To use this feature, when running hashdeep, you will provide the parameter -f <file> and omit the directories at the end of the command.

Alternative Methods to Hashdeep

Other solutions for verifying file integrity exist. The most notable options are OSSEC, AIDE, and Tripwire. These alternatives offer features beyond just file integrity checks, such as log and system monitoring.

Another option is to roll your own shell script using the commands md5sum or sha256sum. Because hashdeep is such a small application, it’s probably easier to use hashdeep inside of shell scripts if you need additional customization.

One other option which sorta replicates the results of file integrity checks is to perform frequent backups using rsync. With rsync, you will be able to see the files that have been modified since the last backup you performed.

Related Posts

Automatically Start Docker Container

Automatically start Docker containers when your server or computer boots using restart policies and avoiding systemd service files.

How to Train SpamAssassin

Learn about the different methods used to train SpamAssassin, along with initial spam data sources to use with SpamAssassin. Update your bayes database easily with existing data.

SpamAssassin SA-Update Tool

Learn what SpamAssassin's sa-update tool does, how it works, and if you should keep it running and modifying the configuration files on your server.

Incremental MySQL Backup with Binary Log

Learn how to properly perform an incremental MySQL backup using binary logs without having a gap between backups and overall improve the speed of database backups.