Connecting to LTS¶
LTS is not available as a mounted filesystem on local computers or Cheaha. You must use an interface to transfer data between LTS and whichever machine you are using. There a variety of interfaces around, some of which are listed here.
Globus¶
Globus is a general file transfer system that operates through a web browser and is recommended for most file transfer needs. UAB has an S3 connector for Globus that can transfer data to and from LTS as long as the user has access to the desired buckets.
To connect to the LTS endpoint in Globus, search UAB Research Computing LTS
in the search bar.
Command Line¶
Linux has very few workable GUIs capable of accessing S3 storage for free available, and so almost all tools for transferring from Cheaha to LTS will be command line interfaces (CLI). The positives for this are that CLIs offer a much broader range of function available to researchers for managing their LTS buckets.
There are a few different CLIs available to researchers on Cheaha to use. Current available CLIs on Cheaha are rclone, s3cmd, and the AWS CLI. This documentation will show how to perform each function using all three tools where possible and will give a comparison chart contrasting what each tool is useful for.
rclone and AWS CLI are available as modules under the rclone
and awscli
module names. s3cmd should be installed via Anaconda.
Note
Of note, all of these tools are available for Windows and Mac as well if you are comfortable using command line interfaces on those platforms. There are installation instructions for both of these tools on their respective websites.
Configuration¶
In order to access LTS through the command line no matter which CLI you use, you will need to perform some configuration. This will set up the remote connection from your local machine (or you Cheaha profile) to LTS. If you choose to use multiple CLIs, you will need to perform configuration for each one separately.
rclone¶
The instructions for setting up a remote connection with rclone can be found in our main rclone documentation. In the following examples, uablts
is used as the name of the remote connection.
s3cmd¶
s3cmd can be easily installed via an Anaconda environment. Create the environment, activate it, then install using:
Note
Depending on how Anaconda chooses to install the package, the actual s3cmd script may be in your $HOME/.local/bin folder. This folder can be added to your path using PATH=$PATH:$HOME/.local/bin
, and you will have access to the s3cmd script after that.
Once you have s3cmd downloaded, you can start the configuration process like so:
You can run the configuration either with or without the [-c]
option. If you use it, a file named profile_name
will be created in your home directory with your login credentials and other information. If you omit the -c
option, a file called $HOME/.s3cfg
will be created by default. This can be helpful if you have multiple S3 profiles you are using. If you use UAB LTS as your only S3 storage platform, it's suggested to omit the -c
option.
Note
After configuration, the s3cmd
command will default to using the .s3cfg
file for credentials if it exists. If you create a separate named profile file, you will need to add that to the s3cmd
call each time you run it.
During configuration, you will be asked to enter some information. You can follow the example below, inputting your user-specific information where required. Lines requiring user input are highlighted.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: <access key>
Secret Key: <secret key>
Default Region [US]: <leave blank>
Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: s3.lts.rc.uab.edu
Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: %(bucket).s3.lts.rc.uab.edu
Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3
Encryption password: <leave blank or enter password>
Path to GPG program [/usr/bin/gpg]: <leave blank>
When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: <leave blank>
On some networks all internet access must go through a HTTP proxy. Try setting it here if you can't connect to S3 directly
HTTP Proxy server name: <leave blank>
New settings:
Access Key: <access key>
Secret Key: <secret key>
Default Region: US
S3 Endpoint: s3.lts.rc.uab.edu
DNS-style bucket+hostname:port template for accessing a bucket: %(bucket).s3.lts.rc.uab.edu
Encryption password:
Path to GPG program: $HOME/bin/gpg
Use HTTPS protocol: True
HTTP Proxy server name:
HTTP Proxy server port: 0
Test access with supplied credentials? [Y/n] Y
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)
Now verifying that encryption works...
Not configured. Never mind.
Save settings? [y/N] y
If your test access succeeded, you are now ready to use s3cmd
.
AWS CLI¶
Setting up a remote connection using AWS CLI is fairly straightforward. After loading the awscli
module, run aws configure
in the terminal.
It will ask you enter four pieces of information, fill them out as so:
AWS Access Key [none]: <access key>
AWS Secret Access Key [none]: <secret key>
Default region name [none]: <Press Enter>
Default output format [none]: json
Your access key and secret key should have been given to you by research computing when you requested your account. Copy-paste these into the terminal when requested.
Important
AWS CLI assumes you are using the AWS service for S3 storage. In order to access UAB LTS, for all AWS CLI commands, you will need to add the --endpoint-url https://s3.lts.rc.uab.edu
option to the aws
function call.
General Command Structure¶
RClone:
Important
For all commands, replace everything in <>
with the necessary values. Do not include the <>
symbols in your command.
To see a list of all subcommands available to rclone, you can use rclone --help
. You can also use the --help
option with any subcommand to see a detailed explanation of what it does plus any options you may want or need to set when calling it.
s3cmd:
As noted previously, the [-c profile_file]
is only required if you are NOT using credentials saved in the $HOME/.s3cfg
file. Otherwise, you can leave it out.
To see a list of commands available, use s3cmd --help
. Additionally, if you want to test an action without actually running it (i.e. it prints all actions that would be performed), you can add the -n
or --dry-run
option.
AWS CLI:
The <command>
for most commonly used functions will either be s3
or s3api
. You can use the help
option to view available commands, subcommands, and options for AWS.
Additionally, when running basically any AWS CLI command, you can include the --dryrun
option to see the exact actions that will be performed without actually performing them. This is useful for things like deleting files and folders to make sure you are not performing an unwanted action.
Important
If you are wanting to perform actions on a specific directory in S3, it is imperative to add the /
at the end of the directory name. For more information on this, see this ask.ci FAQ.
Make a Bucket¶
Buckets are essentially the root folder of a filesystem where you are storing your data. You will need to create a bucket before being able to copy data to LTS. Bucket names are unique across LTS, see avoiding duplicate names for more details.
RClone:
s3cmd:
AWS CLI:
Listing Buckets and Contents¶
RClone:
To list all buckets you have available, use the lsd
subcommand with only the remote specified:
To list all contents inside a bucket, use the ls
subcommand with the remote and bucket specified. You can also be specific about the path to the directory you want to list.
This outputs all files along with their directory path recursively. So if you only specify the main bucket, it will output every file in the bucket no matter how deep in the directory tree.
To only list files and folders in a given directory, you can use the lsf
subcommand
s3cmd:
With s3cmd, you can use the ls
command to list both buckets and their contents.
You can add the --recursive
option to list all files in the given path. By default, it only lists objects or folders at the top level of the path.
AWS CLI:
If you would like to list all objects recursively, you can add the --recursive
tag. A couple of other helpful options are --summarize
and --human-readable
that will give a total number of objects and their size and make the size output more easily readable, respectively.
Check Bucket or Folder Size¶
s3cmd:
With s3cmd, you can use the du
command to list the size of a bucket or folder within a bucket.
By default, the output will be in bytes, but you can add the -H
option to output in a human readable format rounding the nearest MB, GB, or TB.
AWS CLI:
By default, the ls
subcommand will output the size of any objects in the path given (see the ls section). Unlike the other tools, AWS CLI will not output the total size of folders. However, the total combined size of all objects in a folder can be calculated using the --summarize
option. You can also convert the output to readable size by using --human-readable
as well.
Uploading Files and Folders¶
RClone:
Uploading files and folders can be done a couple of ways. The first is by using the copy
subcommand. This will add files from the source to the destination.
The second method is using the sync
subcommand. This subcommand makes the destination identical to the source. The -i
option can be added to make it interactive, asking you whether to copy or delete each file.
Danger
Be extremely cautious using sync. If there are files in the destination that are not in the source, it will delete those files in addition to adding files to the destination. If data is deleted from LTS, it is not recoverable.
s3cmd:
s3cmd disinguishes between moving files between a local source and S3 versus moving files between two S3 locations using 3 different commands.
# transfer from local to S3
s3cmd put <source> s3://<bucket/path/destination/>
# transfer from S3 to local
s3cmd get s3://<bucket/path/source/> <destination>
# transfer between two S3 locations
s3cmd cp s3://<bucket/path/> s3://<bucket/path/>
If you are transferring an entire folder from S3 to either another S3 location or a local destination, you will need to add the --recursive
option, otherwise you will get an error.
Like rclone and AWS, there is also a sync
command here as well.
# sync an S3 location to a local source
s3cmd sync <source> s3://<bucket/path/destination>
# sync a local destination to an S3 location
s3cmd sync s3://<bucket/path/source> <destination>
AWS CLI:
Copying files and directories can be managed using the cp
subcommand and has the same behavior as rclone's copy
.
aws s3 cp <source> s3://<bucket/path/destination> --endpoint-url https://s3.lts.rc.uab.edu [--recursive]
If you are copying a directory, you will need to add the --recursive
option.
If you are wanting to copy data down from LTS to your local machine or Cheaha, just reverse the positions of the source and destination in the function call.
Like rclone, AWS also has a sync subcommand that performs the same functionality. You can use it like so:
aws s3 sync <source> s3://<bucket/path/destination> --endpoint-url https://s3.lts.rc.uab.edu [--delete]
sync
has an added benefit that only files that do not exist in the destination or files that have changed in the source will be transferred whereas cp
copies everything no matter if it already exists in the destination. By default, sync
DOES NOT cause files in the detination that are not in the source to be deleted like rclone sync
does. If you want this functionality, you can add the --delete
tag at the end of the function call.
Danger
Be extremely cautious using --delete
. Only use if you are sure any data deleted is not important. Data deleted from LTS is not recoverable.
Deleting Files and Directories¶
RClone:
File deletion is performed using the delete
subcommand.
Directory deletion is handled using the purge
subcommand. Be very cautious with this, as this deletes all files and subdirectories within the directory as well.
s3cmd:
File and directory deletion is handled by the rm
command.
If you want to delete a directory, you will need to add the --recursive
option.
To delete an entire bucket, use the rb
command.
AWS CLI:
The subcommand for deleting files and folders from LTS is rm
:
rm
can delete both files and folders. If you are wanting to delete a folder and everything in it, you will need to add the --recursive
option. Like with sync
be very cautious using rm
and make sure you know what you are deleting before you do so.
To delete an entire bucket, you will need to use the s3api
command paired with the `delete-bucket subcommand. An example of this looks like:
Command Comparison Chart¶
Note
For brevity, the chart will exclude the --endpoint-url
option from the AWS CLI commands, but it will need to be included if you choose to use that tool.
Action | rclone | s3cmd | AWS CLI |
---|---|---|---|
Make Bucket | rclone mkdir uablts:<bucket> |
s3cmd mb s3://<bucket> |
aws s3api create-bucket --bucket <bucket> |
List Buckets | rclone lsd uablts: |
s3cmd ls |
aws s3 ls |
List Files | rclone lsf uablts:<bucket/path/> |
s3cmd ls s3://<bucket/path/> |
aws s3 ls s3://<bucket/path/> |
Full Upload | rclone copy <source> uablts:<bucket/destination> |
s3cmd put <source> s3://<bucket/destination/> |
aws s3 cp <source> s3://<bucket/destination> |
Download | rclone copy uablts:<bucket/source/> <destination> |
s3cmd get s3://<bucket/source/> <destination> |
aws s3 cp s3://<bucket/source/> <destination> |
Sync | rclone sync [-i] <source> uablts:<bucket/destination/> |
s3cmd sync <source> s3://<bucket/destination/> |
aws s3 sync <source> s3://<bucket/destination/> [--delete] |
Delete File | rclone delete uablts:<bucket/path/file> |
s3cmd rm s3://<bucket/path/file> |
aws s3 rm s3://<bucket/path/file> |
Delete Folder | rclone purge uablts:<bucket/path/> |
s3cmd rm s3://<bucket/path/> --recursive |
aws s3 rm s3://<bucket/path/> --recursive |
Delete Bucket | rclone purge uablts:<bucket> |
s3cmd rb s3://<bucket> |
aws s3api delete-bucket --bucket <bucket> |
Cyberduck¶
To access LTS from Windows and Mac, we suggest using the Cyberduck GUI which is free to download.
Once you have it installed and open, Cyberduck will look like this:
Creating a Connection¶
First, download the UAB CyberDuck Connection Profile. After it's downloaded, double click the file to open it in Cyberduck. It will open the following connection creation window:
Input your Access Key and Secret Access Key sent to you by Research Computing after account creation in their appropriate fields. Once you've entered these keys you can close the connection creation window. This connection with the keys you entered is now saved as a bookmark for easy access in the future. Double click the created bookmark to open the connection to LTS.
Creating a Bucket¶
In order to create a bucket, click File > New Folder...
and then name the bucket you would like to create. Once the bucket is created, it will appear in the File window. An example could look like:
The bucket will have the symbol of a hard disk with an Amazon A brand on it. This is the root of the file system for that bucket. You can then double click into it to open that file system.
Uploading and Downloading Data¶
Once you're inside the bucket, files can be uploaded easily through dragging and dropping from your local machine into the GUI. You can also use the Upload
button in the toolbar to open a file browser and choose what to upload.
Downloading files from the bucket can be done by first selecting the file(s)/folder(s) to download and then clicking the Actions
button in the toolbar. In that dropdown will be a Download
option. You can also get to this dropdown through the File
menu or by right-clicking.
Alternative Interfaces¶
In addition to Cyberduck, there are other GUI based programs for interfacing with UAB LTS. S3 Browser is an easy-to-use program for uploading and downloading files. However more sophisticated tools, such as setting file permissions, are hidden behind a paywall. This tool is also only available on Windows platforms. researchers can investigate this tool if desired, however research computing will not provide direct support for this program.