Creating a hash (checksum) for an external file in Ruby
The Problem
We have a file on a server and want to create a hash (or checksum) of it, so we can compare it to the hash of other files down to road, to see if the files are the same.
The Solution
Ruby has a class called “Tempfile”, which allows you to read a file into a temporary location that already is assigned a unique name, can be accessed for normal file operations, and is exposed to Ruby’s native garbage collection. Since we are only concerned about storing the hash, we will write the file using the net/http library, and unlink (delete) the file when we are done. By including the digest library we are able to use an MD5 hash algorithm to produce a hash from the file, which is read in as a string. The final hash is stored, and would likely be put into a database, referencing or belonging to the external file.
Why use a hash?
A hashing algorithm is a lossy type of data compression (yes, it losses data), but is a formidable way to give a file a fingerprint. There indeed exists the possibility of two files generating the same hash, however the likelihood is astronomical. Hashes and checksums are commonly used to check the integrity of files you download, ensuring that the file a website intended to serve is the file you put on your computer.
Recent Comments
Archives
- April 2023
- January 2023
- November 2022
- May 2022
- March 2022
- January 2022
- December 2021
- April 2021
- December 2020
- October 2020
- August 2020
- July 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- January 2019
- December 2018
- November 2018
- August 2018
- July 2018
- April 2018
- March 2018
- November 2017
- October 2017
- February 2017
- October 2016
- August 2016
- July 2016
- November 2015
- October 2013
- February 2013
- January 2013
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- February 2012
- December 2011