Connecting to LTS¶
LTS is not available as a mounted filesystem on local computers or Cheaha. You must use an interface to transfer data between LTS and whichever machine you are using. There a variety of interfaces around, some of which are listed here.
Globus is a general file transfer system that operates through a web browser and is recommended for most file transfer needs. UAB has an S3 connector for Globus that can transfer data to and from LTS as long as the user has access to the desired buckets.
To connect to the LTS endpoint in Globus, search
UAB Research Computing LTS in the search bar.
Linux has very few workable GUIs capable of accessing S3 storage for free available, and so almost all tools for transferring from Cheaha to LTS will be command line interfaces (CLI). The positives for this are that CLIs offer a much broader range of function available to researchers for managing their LTS buckets.
There are a few different CLIs available to researchers on Cheaha to use. Current available CLIs on Cheaha are rclone, s3cmd, and the AWS CLI. This documentation will show how to perform each function using all three tools where possible and will give a comparison chart contrasting what each tool is useful for.
rclone and AWS CLI are available as modules under the
awscli module names. s3cmd should be installed via Anaconda.
Of note, all of these tools are available for Windows and Mac as well if you are comfortable using command line interfaces on those platforms. There are installation instructions for both of these tools on their respective websites.
In order to access LTS through the command line no matter which CLI you use, you will need to perform some configuration. This will set up the remote connection from your local machine (or you Cheaha profile) to LTS. If you choose to use multiple CLIs, you will need to perform configuration for each one separately.
The instructions for setting up a remote connection with rclone can be found in our main rclone documentation. In the following examples,
uablts is used as the name of the remote connection.
s3cmd can be easily installed via an Anaconda environment. Create the environment, activate it, then install using:
Depending on how Anaconda chooses to install the package, the actual s3cmd script may be in your $HOME/.local/bin folder. This folder can be added to your path using
PATH=$PATH:$HOME/.local/bin, and you will have access to the s3cmd script after that.
Once you have s3cmd downloaded, you can start the configuration process like so:
You can run the configuration either with or without the
[-c] option. If you use it, a file named
profile_name will be created in your home directory with your login credentials and other information. If you omit the
-c option, a file called
$HOME/.s3cfg will be created by default. This can be helpful if you have multiple S3 profiles you are using. If you use UAB LTS as your only S3 storage platform, it's suggested to omit the
After configuration, the
s3cmd command will default to using the
.s3cfg file for credentials if it exists. If you create a separate named profile file, you will need to add that to the
s3cmd call each time you run it.
During configuration, you will be asked to enter some information. You can follow the example below, inputting your user-specific information where required. Lines requiring user input are highlighted.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables. Access Key: <access key> Secret Key: <secret key> Default Region [US]: <leave blank> Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3. S3 Endpoint [s3.amazonaws.com]: s3.lts.rc.uab.edu Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used if the target S3 system supports dns based buckets. DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: %(bucket).s3.lts.rc.uab.edu Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3 Encryption password: <leave blank or enter password> Path to GPG program [/usr/bin/gpg]: <leave blank> When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP, and can only be proxied with Python 2.7 or newer Use HTTPS protocol [Yes]: <leave blank> On some networks all internet access must go through a HTTP proxy. Try setting it here if you can't connect to S3 directly HTTP Proxy server name: <leave blank> New settings: Access Key: <access key> Secret Key: <secret key> Default Region: US S3 Endpoint: s3.lts.rc.uab.edu DNS-style bucket+hostname:port template for accessing a bucket: %(bucket).s3.lts.rc.uab.edu Encryption password: Path to GPG program: $HOME/bin/gpg Use HTTPS protocol: True HTTP Proxy server name: HTTP Proxy server port: 0 Test access with supplied credentials? [Y/n] Y Please wait, attempting to list all buckets... Success. Your access key and secret key worked fine :-) Now verifying that encryption works... Not configured. Never mind. Save settings? [y/N] y
If your test access succeeded, you are now ready to use
Setting up a remote connection using AWS CLI is fairly straightforward. After loading the
awscli module, run
aws configure in the terminal.
It will ask you enter four pieces of information, fill them out as so:
AWS Access Key [none]: <access key> AWS Secret Access Key [none]: <secret key> Default region name [none]: <Press Enter> Default output format [none]: json
Your access key and secret key should have been given to you by research computing when you requested your account. Copy-paste these into the terminal when requested.
AWS CLI assumes you are using the AWS service for S3 storage. In order to access UAB LTS, for all AWS CLI commands, you will need to add the
--endpoint-url https://s3.lts.rc.uab.edu option to the
aws function call.
General Command Structure¶
For all commands, replace everything in
<> with the necessary values. Do not include the
<> symbols in your command.
To see a list of all subcommands available to rclone, you can use
rclone --help. You can also use the
--help option with any subcommand to see a detailed explanation of what it does plus any options you may want or need to set when calling it.
As noted previously, the
[-c profile_file] is only required if you are NOT using credentials saved in the
$HOME/.s3cfg file. Otherwise, you can leave it out.
To see a list of commands available, use
s3cmd --help. Additionally, if you want to test an action without actually running it (i.e. it prints all actions that would be performed), you can add the
<command> for most commonly used functions will either be
s3api. You can use the
help option to view available commands, subcommands, and options for AWS.
Additionally, when running basically any AWS CLI command, you can include the
--dryrun option to see the exact actions that will be performed without actually performing them. This is useful for things like deleting files and folders to make sure you are not performing an unwanted action.
If you are wanting to perform actions on a specific directory in S3, it is imperative to add the
/ at the end of the directory name. For more information on this, see this ask.ci FAQ.
Make a Bucket¶
Buckets are essentially the root folder of a filesystem where you are storing your data. You will need to create a bucket before being able to copy data to LTS. Bucket names are unique across LTS, see avoiding duplicate names for more details.
Listing Buckets and Contents¶
To list all buckets you have available, use the
lsd subcommand with only the remote specified:
To list all contents inside a bucket, use the
ls subcommand with the remote and bucket specified. You can also be specific about the path to the directory you want to list.
This outputs all files along with their directory path recursively. So if you only specify the main bucket, it will output every file in the bucket no matter how deep in the directory tree.
To only list files and folders in a given directory, you can use the
With s3cmd, you can use the
ls command to list both buckets and their contents.
You can add the
--recursive option to list all files in the given path. By default, it only lists objects or folders at the top level of the path.
If you would like to list all objects recursively, you can add the
--recursive tag. A couple of other helpful options are
--human-readable that will give a total number of objects and their size and make the size output more easily readable, respectively.
Check Bucket or Folder Size¶
With s3cmd, you can use the
du command to list the size of a bucket or folder within a bucket.
By default, the output will be in bytes, but you can add the
-H option to output in a human readable format rounding the nearest MB, GB, or TB.
By default, the
ls subcommand will output the size of any objects in the path given (see the ls section). Unlike the other tools, AWS CLI will not output the total size of folders. However, the total combined size of all objects in a folder can be calculated using the
--summarize option. You can also convert the output to readable size by using
--human-readable as well.
Uploading Files and Folders¶
Uploading files and folders can be done a couple of ways. The first is by using the
copy subcommand. This will add files from the source to the destination.
The second method is using the
sync subcommand. This subcommand makes the destination identical to the source. The
-i option can be added to make it interactive, asking you whether to copy or delete each file.
Be extremely cautious using sync. If there are files in the destination that are not in the source, it will delete those files in addition to adding files to the destination. If data is deleted from LTS, it is not recoverable.
s3cmd disinguishes between moving files between a local source and S3 versus moving files between two S3 locations using 3 different commands.
# transfer from local to S3 s3cmd put <source> s3://<bucket/path/destination/> # transfer from S3 to local s3cmd get s3://<bucket/path/source/> <destination> # transfer between two S3 locations s3cmd cp s3://<bucket/path/> s3://<bucket/path/>
If you are transferring an entire folder from S3 to either another S3 location or a local destination, you will need to add the
--recursive option, otherwise you will get an error.
Like rclone and AWS, there is also a
sync command here as well.
# sync an S3 location to a local source s3cmd sync <source> s3://<bucket/path/destination> # sync a local destination to an S3 location s3cmd sync s3://<bucket/path/source> <destination>
Copying files and directories can be managed using the
cp subcommand and has the same behavior as rclone's
aws s3 cp <source> s3://<bucket/path/destination> --endpoint-url https://s3.lts.rc.uab.edu [--recursive]
If you are copying a directory, you will need to add the
If you are wanting to copy data down from LTS to your local machine or Cheaha, just reverse the positions of the source and destination in the function call.
Like rclone, AWS also has a sync subcommand that performs the same functionality. You can use it like so:
aws s3 sync <source> s3://<bucket/path/destination> --endpoint-url https://s3.lts.rc.uab.edu [--delete]
sync has an added benefit that only files that do not exist in the destination or files that have changed in the source will be transferred whereas
cp copies everything no matter if it already exists in the destination. By default,
sync DOES NOT cause files in the detination that are not in the source to be deleted like
rclone sync does. If you want this functionality, you can add the
--delete tag at the end of the function call.
Be extremely cautious using
--delete. Only use if you are sure any data deleted is not important. Data deleted from LTS is not recoverable.
Deleting Files and Directories¶
File deletion is performed using the
Directory deletion is handled using the
purge subcommand. Be very cautious with this, as this deletes all files and subdirectories within the directory as well.
File and directory deletion is handled by the
If you want to delete a directory, you will need to add the
To delete an entire bucket, use the
The subcommand for deleting files and folders from LTS is
rm can delete both files and folders. If you are wanting to delete a folder and everything in it, you will need to add the
--recursive option. Like with
sync be very cautious using
rm and make sure you know what you are deleting before you do so.
To delete an entire bucket, you will need to use the
s3api command paired with the `delete-bucket subcommand. An example of this looks like:
Command Comparison Chart¶
For brevity, the chart will exclude the
--endpoint-url option from the AWS CLI commands, but it will need to be included if you choose to use that tool.
To access LTS from Windows and Mac, we suggest using the Cyberduck GUI which is free to download.
Once you have it installed and open, Cyberduck will look like this:
Creating a Connection¶
First, download the UAB CyberDuck Connection Profile. After it's downloaded, double click the file to open it in Cyberduck. It will open the following connection creation window:
Input your Access Key and Secret Access Key sent to you by Research Computing after account creation in their appropriate fields. Once you've entered these keys you can close the connection creation window. This connection with the keys you entered is now saved as a bookmark for easy access in the future. Double click the created bookmark to open the connection to LTS.
Creating a Bucket¶
In order to create a bucket, click
File > New Folder... and then name the bucket you would like to create. Once the bucket is created, it will appear in the File window. An example could look like:
The bucket will have the symbol of a hard disk with an Amazon A brand on it. This is the root of the file system for that bucket. You can then double click into it to open that file system.
Uploading and Downloading Data¶
Once you're inside the bucket, files can be uploaded easily through dragging and dropping from your local machine into the GUI. You can also use the
Upload button in the toolbar to open a file browser and choose what to upload.
Downloading files from the bucket can be done by first selecting the file(s)/folder(s) to download and then clicking the
Actions button in the toolbar. In that dropdown will be a
Download option. You can also get to this dropdown through the
File menu or by right-clicking.
In addition to Cyberduck, there are other GUI based programs for interfacing with UAB LTS. S3 Browser is an easy-to-use program for uploading and downloading files. However more sophisticated tools, such as setting file permissions, are hidden behind a paywall. This tool is also only available on Windows platforms. researchers can investigate this tool if desired, however research computing will not provide direct support for this program.