Post Image
By Matt GaidicaJune 7, 2012In Uncategorized

Creating a hash (checksum) for an external file in Ruby

The Problem

We have a file on a server and want to create a hash (or checksum) of it, so we can compare it to the hash of other files down to road, to see if the files are the same.

The Solution

Ruby has a class called “Tempfile”, which allows you to read a file into a temporary location that already is assigned a unique name, can be accessed for normal file operations, and is exposed to Ruby’s native garbage collection. Since we are only concerned about storing the hash, we will write the file using the net/http library, and unlink (delete) the file when we are done. By including the digest library we are able to use an MD5 hash algorithm to produce a hash from the file, which is read in as a string. The final hash is stored, and would likely be put into a database, referencing or belonging to the external file.

Why use a hash?

A hashing algorithm is a lossy type of data compression (yes, it losses data), but is a formidable way to give a file a fingerprint. There indeed exists the possibility of two files generating the same hash, however the likelihood is astronomical. Hashes and checksums are commonly used to check the integrity of files you download, ensuring that the file a website intended to serve is the file you put on your computer.

svgBrewers Conjecture
svgComparing two files via MD5 hash on Amazon S3 using Ruby