Comparing two files via MD5 hash on Amazon S3 using Ruby
This technique is helpful if you are using Amazon S3 as a file repository and want to detect duplicate files as they are uploaded to your application. Amazon S3 gives each file an ETag property, which is an MD5 hash of the file, although, in some cases this is not true (multipart and >5GB, so it seems). Let’s get started with a new directory, a file, and the Amazon S3 gem.
> mkdir amazon-compare && cd amazon-compare > touch compare.rb > sudo gem i aws-s3
The gem you will be using is straight from Amazon and connects to their S3 REST API- it comes with great documentation. Make sure you have setup an S3 bucket and have access to your API credentials. Open “compare.rb” and use the following code.
require 'digest/md5' require 'aws/s3' #set your AWS credentials AWS::S3::Base.establish_connection!( :access_key_id => 'XXX', :secret_access_key => 'XXX' ) #get the S3 file (object) object = AWS::S3::S3Object.find('02185773dcb5a468df6b.pdf', 'your_bucket') #separate the etag object, and remove the extra quotations etag = object.about['etag'].gsub('"', '') #get the local file f = '/Users/matt/Desktop/02185773dcb5a468df6b.pdf' digest = Digest::MD5.hexdigest(File.read(f)) #lets see them both puts digest + ' vs ' + etag #a string comparison to finish it off if digest.eql? etag puts 'same file!' else puts 'different files.' end
As you can see, we are just doing a simple comparison of two MD5 hashes, you can run the program using the ruby command.
> ruby compare.rb
Recent Comments
Archives
- April 2023
- January 2023
- November 2022
- May 2022
- March 2022
- January 2022
- December 2021
- April 2021
- December 2020
- October 2020
- August 2020
- July 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- January 2019
- December 2018
- November 2018
- August 2018
- July 2018
- April 2018
- March 2018
- November 2017
- October 2017
- February 2017
- October 2016
- August 2016
- July 2016
- November 2015
- October 2013
- February 2013
- January 2013
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- February 2012
- December 2011