Looping through an S3 Bucket and performing actions using the AWS-CLI and BASH
In this blog post I will go over how to interact with S3 objects via the AWS-CLI. In this case I will follow methods I took to move objects into a different organizational structure to support a Cloudfront distribution standard.
Check your credentials and policy
The first step to dealing with the aws-cli is to set up and verify that you have the security credentials to handle any operations you plan on using. To do this, go to the IAM Management Console and head over to Users. Click on the User you plan on using and take a glance at the Policy (If you do not have a User or a policy, you will need to make one using the provided tools in the console).
Note: You should not edit your bucket policy for these operations. Editing your bucket policy is like setting global permissions, you only want to set very specific permissions to a specific person
For S3 operations, you will need a policy like this:
{
“Effect”: “Allow”,
“Condition”: {
“Bool”: {
“aws:MultiFactorAuthPresent”: “true”
}
},
“Action”: [
“s3:*”
],
“Resource”: [
“arn:aws:s3:::bucket-name”,
“arn:aws:s3:::bucket-name/*”,
]
}
Actions: Actions define all of the actions that are allowed with this policy. In this case, the wile card s3:* states that all s3 actions are allowed. If it is desired to minimize what is allowed, specific s3 actions can be listed
Resource: Resources must point specifically to the s3 bucket that is desired. Each bucket should be listed that you want the user to be able to perform the actions on. Note, if you only have “arn:aws:s3:::bucket-name/*” then you will not be able to perform actions on the actual bucket (calling listObjects on the actual bucket would not be allowed), which is why having two entries here is important
Condition: Any special conditions are listed here. In my case, 2-factor authentication is required for this policy to be used. More on how to do that with the AWS CLI later
Using the AWS-CLI
AWS Configure
After installing the aws-cli (I personally used brew), it is now important to configure the cli. Simply type aws configure in the terminal. Enter the Access Key ID and the Secret that you got when you set up your user, the region name and your preferred output (probably json).
2-Factor Auth in the cli
If 2-factor authentication is required in the policy (and really it should). It adds an extra step everytime you use the cli
aws sts get-session-token — serial-number arn:aws:iam::12345678:mfa/<username> — token-code 796568
The serial number is listed in the IAM Management Console under Assigned MFA device on the user you are looking at. The token code is whatever the 2-factor authentication app gives you.
Once you put these fields in correctly, it will return this object
{
"Credentials": {
"SecretAccessKey": "secretAccessKeyString",
"SessionToken": "Session Token",
"Expiration": "2017-03-13T15:27:29Z",
"AccessKeyId": "Access Key ID"
}
}
You now need to export these credentials by simply exporting these variables in your terminal
export AWS_ACCESS_KEY_ID=accessKeyID
export AWS_SECRET_ACCESS_KEY=secretAccessKey
export AWS_SESSION_TOKEN=sessionToken
note: because these are now exported in your terminal window, if you open up a new terminal window, you will have to re-export these values
At this point you should have the ability to perform any s3 actions on the buckets stated in the policy
Performing S3 Actions
In the case of my task, I needed to export some information from our database, and convert it to a SET so I could reorganize the structure of our s3 objects. To do this, I needed an associative array, which is not supported in the version of Bash that gets shipped with macs. So first I had to install Bash 4.0 or higher.
Importing JSON into bash
declare -A myArray
while IFS="=" read -r key value
do
myArray[$key]="$value"
done < <(jq -r "to_entries|map(\"\(.key)=\(.value)\")|.[]" ~/Desktop/profiles.json)
this converts a simple json object into a bash associative array
Validate the import
it wouldn’t hurt to make sure no mistakes were made in this process. Take a minute to loop through your object and make sure everything looks good before you start mutating s3 objects
count=0
for key in "${!myArray[@]}"
do
let count=count+1
echo "$key = ${myArray[$key]}"
done
echo $count
Looping through an s3 Bucket
origin="bucket-name/path/to/folder/"
count=0
for path in $(aws s3 ls $origin);
do
oldID=${path%/}
newID=${myArray[$oldID]} #gets the newID
if [[ "$newID" != "" ]]; then
destination="$origin/$newID/$path" aws s3 cp "s3://$origin$path" $destination --recursive
let count=count+1
echo "transferred $count images"
fi
done
aws s3 ls bucket-name
will loop through any items at that path location.
oldID=${path%/}
saves the relative path of the file and removes the trailing slash
newID=${myArray[$oldID]}
retrieves the new ID from the previously created associative array
if [[ “$newID” != “” ]]; then
This ensures that none of the operations are going to be performed on empty paths. Some form of validation is required for these scripts, otherwise you will perform your actions on other objects that are listed when you call ls
such as PRE
which contains metadata and does not point to an actual s3 object
aws s3 cp “s3://$origin$path” $destination — recursive
finally we call the command. In this case we are only copying instead of moving, that way reverting in the case of an error is much easier and safer. The recursive flag forces every file within each folder of the current directory to be moved using the same path information
More information on commands you can make with the AWS-CLI using s3 http://docs.aws.amazon.com/cli/latest/userguide/using-s3-commands.html