amazon

Making files public in Amazon S3 and the owner problem

The most common way to make files publicly accessible for an Amazon S3 bucket is to add a bucket policy (Bucket Properties -> Add Bucket Policy).

{
  "Version":"2008-10-17",
  "Statement":[{
    "Sid":"AllowPublicRead",
        "Effect":"Allow",
      "Principal": {
            "AWS": "*"
         },
      "Action":["s3:GetObject"],
      "Resource":["arn:aws:s3:::bucket/*"
      ]
    }
  ]
}

As you should expect, you would replace “bucket” in the resource line to the name of your bucket- the syntax is actually quite familiar, using an asterisk as a wildcard parameter for all files inside of the specified bucket.

This still leaves some people in trouble though, because the bucket policy only applies to files that are owned by the bucket’s administrator. If you have an external application uploading files to your bucket, the policy does not apply, and you could be left with private/unaccessible files. There is a Stack Overflow post that explains this in a bit more depth.

Since my application is already using the Ruby Library for Amazon S3, the easiest solution was to change the policy for the file itself.  The solution is not very clear or elegant in the librarie’s documentation, so here is the best way.

#get the amazon object
amazon_object = AWS::S3::S3Object.find('golden_gate_bridge.png', 'my_photo_bucket')
#grant a public_read policy to the object grants
amazon_object.acl.grants << AWS::S3::ACL::Grant.grant(:public_read)
#write the changes to the policy
amazon_object.acl(amazon_object.acl)

The last way to get public access to a private file is to create a public url for the object, which defaults to only being accessible for 5 minutes, but can be set to a time in the future that is likely beyond the needs of your application. The documentation outlines a “doomsday” example.

doomsday = Time.mktime(2038, 1, 18).to_i
url = amazon_object.url(:expires => doomsday)

Using this method exposes three URL paramters in the public url- “AWSAccessKeyID”, “Expires”, and “Signature”, which junks it up a bit. Check up on the docs for more, and if you are just getting started with Ruby and Amazon S3, one of my previous posts might be of some help.

Comparing two files via MD5 hash on Amazon S3 using Ruby

This technique is helpful if you are using Amazon S3 as a file repository and want to detect duplicate files as they are uploaded to your application. Amazon S3 gives each file an ETag property, which is an MD5 hash of the file, although, in some cases this is not true (multipart and >5GB, so it seems). Let’s get started with a new directory, a file, and the Amazon S3 gem.

> mkdir amazon-compare && cd amazon-compare
> touch compare.rb
> sudo gem i aws-s3

The gem you will be using is straight from Amazon and connects to their S3 REST API- it comes with great documentation. Make sure you have setup an S3 bucket and have access to your API credentials. Open “compare.rb” and use the following code.

require 'digest/md5'
require 'aws/s3'

#set your AWS credentials
AWS::S3::Base.establish_connection!(
  :access_key_id     => 'XXX',
  :secret_access_key => 'XXX'
)

#get the S3 file (object)
object = AWS::S3::S3Object.find('02185773dcb5a468df6b.pdf', 'your_bucket')
#separate the etag object, and remove the extra quotations
etag = object.about['etag'].gsub('"', '')

#get the local file
f = '/Users/matt/Desktop/02185773dcb5a468df6b.pdf'
digest = Digest::MD5.hexdigest(File.read(f))

#lets see them both
puts digest + ' vs ' + etag

#a string comparison to finish it off
if digest.eql? etag
  puts 'same file!'
else
  puts 'different files.'
end

As you can see, we are just doing a simple comparison of two MD5 hashes, you can run the program using the ruby command.

> ruby compare.rb

View the Github Gist