A brief insight on various tech related topics
Scientists utilizing the Scientific Gateways often upload large input files for running Experiments. These uploads which are done through the Science Gateway Portals are over HTTP. There are 2 inherent problems arising out of this architecture.
As the uploads are HTTP uploads, these are highly reliant on a continuous internet connection being available on the client machine. Often times, there are network disruptions or connectivity issues and the file upload fails. As a result, the users may have to retry uploads manually and wait for a successful upload to take place.
The file which is being uploaded usually has to be staged somewhere for a certain period of time till it is picked up for further processing, which may take a considerable amount of time. Often times, these files are staged in the same web server on which the Science Gateway Portal is hosted. In case multiple large files are uploaded at the same time, the host machine may run out of space and this could have adverse affects on the performance of the Portal.
Also, there could be cases in which multiple Science Gateway Portals are hosted on a the same web server. If in such a setting, a particular Portal fills up server space with multiple large files, it may affect the performance of other Portals residing on that host server as well. In short, the files-upload functionality should not affect the Portal performance.
To combat the First issue of Unreliable HTTP connection, it has to be ensured that subsequent retries for uploading the file are done automatically. Along with auto-retries, the data transfer between client and server should be minimized by not re-uploading chunks of data which were already uploaded prior to failure of connection. This feature has already been implemented by multiple JavaScript libraries. Notable among these are:
All the Javascript libraries mentioned above are fully capable of performing resumable uploads. However, these implementations vary and there are no common standardized server implementations for handling the file uploads using these JavaScript libraries.
tus.io establishes an open protocol which aims to solve the problem of unreliable file uploads once and for all. To think of it, tus.io specifies a set of rules, which, if followed by the client & server implementations, would allow any such implementation to successfully exhibit reliable and resumable File upload system.
The second problem pertaining to space issues arising out of uploaded content, can be solved by having the upload Server-implementation run on a dedicated separate remote Application server instance. This way, in case of large upload files, the upload traffic gets directed to the upload server and the Application Server on which the Scientific Portals are located would not get hampered.
These remote Server instances which would be used for Uploading files, should be secured in a similar manner as the Portals are secured i.e. using an Identity and Access Management system. tus.io has a concept of hooks which allows us to execute code at certain stages of the upload proces. The pre-create
hook can be used for implementing Auth checking in the upload server. This prevents any unauthorized access to the File-Upload server and thus ensures the security of the entire system.
tus.io offers a standardized protocol which has advantages over individual upload libraries. The base advantage of being reliant on a protocol is that it removes dependency on a specific language and library, thus allowing for Gateway Portals written in multiple languages such as PHP, Java, Python to implement their own file upload functionality conforming to the tus.io protocol specifications.
Due to this decoupling, multiple Science Gateway Portals such as desktop, web, native portals could connect to a single tus.io compliant server implemented by the Science Gateway and securely upload files without affecting the Portal performance.
Though the JavaScript libraries offer a quick short term solution, the tus.io client-server implementation emerged as a much better, scalable, flexible and maintainable file-upload implementation for Science Gateways from a long term perspective.