Parallel Resource-Optimized Provisioning of End-to-End Requests Project

Project grant DE-SC0012115TDD (June 2014-2017)

DOE ASCR Big-Data Networking Program

Project Summary

Data is being generated and archived at increasing rates due to deployment of large-scale cyberinfrastructure in all kinds of applications including bio-informatics, material-informatics, automation, control, operation, and health monitoring of various physical infrastructures. Big Data has become a key basis of competition, underpinning new waves of productivity, growth, and innovation. Petascale and exascale scientific applications will be required by teams of research scientists who are distributed not only across a university campus, but also across national and international laboratories. Experimental facilities, such as the Department of Energy's Large Hadron Collider (LHC) are expected to generate data in the petabyte and exabyte ranges. Significant advances in science can be made possible by sophisticated collaborative data analytics with the huge volumes of data generated by these facilities.

This data is not only used locally, but needs to be transferred, processed, and analyzed at remote locations. Further, storage and computation for petascale and exascale applications are expensive, and economics dictates that these facilities will have to be shared between many experiments and experimental facilities. Optical Wavelength Division Multiplexing (WDM) networks have proven to be ideal for supporting the growing needs of scientific experiments that require large volumes of data to be distributed and analyzed by a large number of geographically disperse users. Examples of such experiments include Large Hadron Collider experiments (e.g., ATLAS) and neutron scattering experiments at the Spallation Neutron Source. It is envisioned that such large experiments will dominate the trend of science in the foreseeable future.

A reliable and flexible Big Data storage and retrieval service is needed, that uses the full potential of parallel architectures. Currently the data is stored in parallel, processed in parallel, and analyzed in parallel. However, the underlying core network is not currently used to harness parallelism, but rather necessitates serial resource usage. A user requests data retrieval from a remote site. They have the choice between replicated storage sites, but can most likely only pick one of those. The data is then transferred in a best-effort manner across the core network and then handled locally at the destination. A discrepancy exists between the availability and actual use of parallel technologies. The data is typically available at several sites (for performance and backup reasons) where it is stored in parallel storage arrays and data processing is most likely also parallelized. The network, operated on a physical-layer with parallel wavelengths (WDM) and at higher layers, through software-defined networking, in parallel slices and virtual circuits also is capable of parallel transfers. These parallelisms, however are not yet fully usable to researchers in today's networks, such as ESnet, due to the lack of a storage resource aware, intelligent, and flexible scheduler.

To facilitate Big Data transfers, we propose a Parallel Resource-Optimized Provisioning of End-to-End Requests (PROPER) framework for ubiquitous parallel use of network and storage resources. The included Co-Scheduler will coordinate storage and virtual circuit reservation. We will provide flexibility by introducing negotiation-based What-If capability in our network and storage Co-Scheduler. Spatial and temporal flexibilities in reservation are considered by the proposed What-If engine in order to provide a list of recommendations to users. This results is high availability, ease of use, and reduction of service blocking. The proposed services extend the parallel architectures of data processing and storage all throughout the network infrastructure using multi-path provisioning, multi-sourced data retrieval, and parallel replicated data storage. Also, to prevent loss of unique scientific data and to prevent repetition of experiments, data storage and retrieval operations are implemented with protection for increased reliability. We will implement the PROPER framework in conjunction with the OSCARS network provisioning software and the BeStMan storage management software.