Skip to content

Data transfer

Introduction

There are multiple ways to access your data from different locations. Also the possible storage solutions for your data is numerous. For some of these solutions a small instruction is given on how to use it on your computer (hpc, server, desktop/laptop, smartphone etc. etc.).

HPC/server

For accessing data on a server there is no distinction between plain servers and HPC servers.

SFTP, SCP

If you can access a server with ssh (see ssh manual) you can transfer data between your computer and this server with either sftp or scp. SFTP presents you with an interactive interface (like FTP) and SCP allows you to transfer files on the command-line (like cp). With both programs you need to provide the same username and password when you login with ssh.

You can use the CUI as explained in command-line or a GUI like FileZilla.

command-line

Some examples are given if you would like to transfer data for instance with the HPC server hpc24.tudelft.net (see HPC servers). For these examples we create a test file test.txt (first line) which is first copied from your computer to the server and then copied back from the server to your computer.

SFTP

You give your username by prefixing it to the server-name with a @ (at) character: <username>@<server-name>

$ echo "This is a test file" > test.txt
$ sftp <username>@hpc24.tudelft.net
<username>@hpc24.tudelft.net's password:
Connected to hpc24.tudelft.net.
sftp> ls
Desktop           Documents         Downloads         Music             Pictures
Public            Templates         Videos
sftp> mkdir test
sftp> cd test
sftp> ls
sftp> put test.txt
Uploading test.txt to /home/<username>/test/test.txt
test.txt                                                    100%   20     1.2KB/s   00:00
sftp> ls
test.txt
sftp> get test.txt
Fetching /home/<username>/test/test.txt to test.txt
/home/<username>/test/test.txt                              100%   20     0.6KB/s   00:00
sftp> exit
$

Tip

read the man-page for more information: $ man sftp

SCP

You give your username by prefixing it to the server-name with a @ (at) sign; the filename is appended to the server-name with a : (colon) character: <username>@<server-name>:<filename>

$ echo "This is a test file" > test.txt
$ scp test.txt rligteringen@hpc24.tudelft.net:test.txt
rligteringen@hpc24.tudelft.net's password:
test.txt                                                    100%   20     1.0KB/s   00:00
$ scp rligteringen@hpc24.tudelft.net:test.txt test.txt
rligteringen@hpc24.tudelft.net's password:
test.txt                                                    100%   20     1.4KB/s   00:00
$

Tip

read the man-page for more information: $ man scp

FileZilla

The open-source FileZilla program provides a GUI for file transfers. You can use this package for a SFTP connection to a server. FileZilla is available for all major OS’s and can be downloaded here:

In the video below a small tutorial is given on the use of FileZilla. After a SFTP connection has been made to the server (hpc24.tudelft.net) a test-file (test.txt) is first uploaded to the server and then downloaded from the server.

SSHFS

On macOS and Linux the SSH connection can also be used to mount the storage from a server on your computer with the SSHFS protocol. With this method you will create a mount-point on your computer as a subdirectory. When you enter this directory you will find the files and directory on the remote storage. You can access these files as if they reside on your computer.

Below the commands to setup a SSHFS mount. In this example we want to access our home-directory on the hpc24:

$ mkdir <mount-point> # the mount-point must be a directory
# mount your home-directory on the hpc24 to your mount-point
$ sshfs -ouser=<local_username> <remote_username>@hpc24.tudelft.net:/home/<remote_username> <mount-point>
$ cd <mount-point>
⋮ # access data from your home-directory on hpc24
$ fusermount -u <mount-point> # unmount your SSHFS connection to the hpc24

Some important notes:

  1. if the directory where you want to mount your SSHFS connection on already exist the mount will proceed and for the duration of this connection the original content will not be available. After unmounting the original data can be accessed again.
  2. note the colon (:) separating the server-name and the remote directory you want to mount. In this example we want to mount the remote directory /home/<remote_username> from the server hpc24.tudelft.net.
  3. in many cases the remote username on the server will be the same as the local username on your computer. In that case you can leave out the <remote_username@>-part.
  4. do not leave out the -ouser=<local_username>-part! If you do, you will not be able to unmount your SSHFS connection without administrator rights. If you do not have these rights you will need to ask your system administrator to help you out.
  5. the connection will stay up also when you logout from the awi-cloud. You can disconnect with the fusermount command as shown in the example above. Also the connection will be lost after a reboot of the server or client.

This method can be used to access the storage from the server but also access the network-storage of the TU Delft e.g. staff-bulk and staff-group. On servers, like awi-cloud, where there is no system mount-point available to these network-storages you can use this SSHFS method to access these storages. The server-name to access these TU Delft network-storages is linux-bastion.tudelft.nl.

Below an example on how to access the hpc directory on the bulk-storage for the group MI (formerly AK):

$ sshfs -ouser=<local_username> <netid>@linux-bastion.tudelft.nl:/tudelft.net/staff-bulk/tnw/ist/AK/hpc <mount-point>
  1. you must login with your netid on the linux-bastion.tudelft.nl. Again if this is the same as your local username you can leave out the <netid>@-part
  2. you will only be able to access the files if you have permissions for the remote directory. If you do not have access please contact your system administrator

Table with remote directories for different storages:

remote directory description
/tudelft.net/staff-groups group-storage staff
/tudelft.net/staff-bulk bulk-storage staff
/tudelft.net/staff-umbrella project-storage staff
/tudelft.net/student-groups group-storage students
/tudelft.net/student-bulk bulk-storage students

The specific locations for MI and CI (formerly AK and QI) are not yet setup. You will find the data for these groups at the following locations.

  • /tudelft.net/staff-groups/tnw/ist/ak
  • /tudelft.net/staff-groups/tnw/ist/qi
  • /tudelft.net/staff-bulk/tnw/ist/AK
  • /tudelft.net/staff-bulk/tnw/ist/qi

rsync

The rsync command line program allows you to synchronize files between two locations. This might also be across different computers on the network. The package is available on most OS’s like Linux, macOS and Windows (via WSL).

The general construct is:

$ rsync [options] <source> <destination>

Synchronize files between two directories on one computer

$ rsync -av --progress sourcedir destdir

If the sourcedir directory is not yet available in the destdir directory it will be created and all files in sourcedir will be copied to destdir/sourcedir/. If it is already there all files from sourcedir will by synchronized with the files in destdir/sourcedir. The options -av mean archive and verbose. The archive option will only copy changed files from the source to destination maintaining the ownership and access-rights of these files. Verbose will give you information about each synchronized file. The --progress gives you some nice eye-candy with a progress-bar.

Important

It is important to either add or omit the slash (/) at the end of the source directory. If it is omitted the source directory itself is copied to the destination directory including all files inside the source directory. When added all files inside the source directory are copied directly to the destination directory. It does not matter if the slash is added or omitted at the end of the destination directory.

Synchronize files between two computers

$ rsync -av --progress --no-perms -e ssh <sourcedir>\
    <netid>@linux-bastion.tudelft.net:/tudelft.net/staff-bulk/tnw/ist/QI/users/<netid>

In this example all files on your local computer in sourcedir will by synchronized with your user directory on the bulk-storage (see TU storage). The destination path is constructed with a login name <netid>@ (note the @ character to indicate a login name), a host name linux-bastion.tudelft.net: (note the : character to indicate a host name) and a directory name /tudelft.net/staff-bulk/tnw/ist/QI/users/<netid>. This destination path is the mount point of the TU bulk-storage on the TU bastion host (linux-bastion) (see the last part of Data Transfer→SSHFS). For the connection and the transport of the data rsync will use the SSH protocol as indicated with the -e ssh option. The SSH protocol will use the login name to login on the host name. Because of the underlying filesystem on the bulk-storage some filepermissions on the source cannot be used on the destination, for this you need to provide the --no-perms option.

Backup files with archive of removed files

rsync also allows you to delete files from the destination that are not in the source. With this method the destination will always be an exact copy of the source. Together with the option to store the deleted files on the destination to a separate directory it is possible to restore your data from any previous moment a backup has been made. This method approaches the technique macOS uses with Time Machine.

To perform this backup on a regular schedule create a crontab job with the following command:

$ rsync -av --delete --backup\
    --backup-dir=/data_backup/DELETED_FILES/$(date '+\%Y\%m\%d\%H\%M\%S')\
    --exclude /DELETED_FILES/ /data/ /data_backup/

In the above example the source is /data/ and the destination is /data_backup (note they’re on the same computer). Also note the closing slash in the source directory (see note above). The options are explained here:

--delete
delete all files in the destination that are not in the source
--backup
preserve all deleted files from the destination
--backup-dir=<backup_directory>
all preserved files are stored in the backup_directory
--exclude <exclude_directory>
all files inside the exclude directory on the destination are not deleted by the --delete option. Note that there is no = character between the option and the directory name

To date the archived deleted files a timestamp is created with the command $(date '+\%Y\%m\%d\%H\%M\%S'). For more information on date check the man-page ($ man date). The construct $(<command>) is a bash preprocessor command which will execute the command in a subshell and return the output of this command. For more information see: https://tldp.org/LDP/abs/html/subshells.html.

Much, much, much more information can be found on the internet about the use of rsync. Please check with your search engine for the numerous possibilities with this powerfull application.


Last update: 2021-05-26