ICRAR’s Data Intensive Astronomy program’s main focus is the development of the data management and processing technologies required for the SKA. The program is divided into five streams:
- SKA Activities;
- Australian SKA Regional Centre Project (AusSRC);
- Survey Science Support Project (SSS);
- Data Intensive Research Project;
- Operational Support and Maintenance.
The ICRAR DIA team, located at the Centre’s UWA node, is comprised of researchers with a background from astronomy and industry who have been involved and led the development of data and operations systems for billion-Euro astronomical infrastructure in Europe and South America as well as industry projects in various domains. The team is involved in many, very diverse projects, including
- Square Kilometre Array pre-construction and construction activities
- Australian SKA Regional Centre
- The CHILES project
- The ASKAP DINGO survey project
- The ASKAP WALLABY survey project
- AstroQuest Citizen Science project
- The International Virtual Observatory Alliance (IVOA)
As part of these projects the team is developing software and systems as well as expert systems (machine learning) and also provide support for operational systems, including the in-house ICRAR compute lab. We are also very heavy users of the Pawsey Supercomputing Centre as well as a number of other supercomputers around the world.
We do maintain active collaborations with many other academic and non-academic institutions both in astronomy, but also in other fields. In addition we are running a number of projects with industry partners from small startups to globally operating companies. Our group of students, which range from summer students, over Masters to PhD students are working on challenging and cutting-edge projects in pure computer science and cross-over between computer science, software engineering and astronomy. We are always open to discuss new ideas and trends in computer science and astronomical data reduction and software and are running teaching and training, industry and science workshops both in-house and externally.
Prior to 2014, the DIA program was called Information and Communication Technology program, but that was deemed to be a too generic term providing no focus. ICRAR’s ICT program focused on developing core capabilities through development of a Data Intensive Research Pathfinder, Conceptual Design Studies for the SKA and High Performance Computing (supercomputing). The Data Intensive Research Pathfinder involved the implementation and commissioning of dataflow and data archiving infrastructure for the Murchison Widefield Array (MWA). The MWA archive has been fully operational since July 2013 and is receiving data from the array at a rate of up to 1000 MB/s. The system provides data migration to various partner institutions around the world at the same rate. In addition it currently serves user requests at a rate exceeding the input data rate. Implementation of the MWA archive helped ICRAR secure leadership of the Data Layer task in the Science Data Processing (SDP) Consortium for the SKA.
Meet the Data Intensive Astronomy Team
Case Study: Think Bottom Up
Data Intensive Astronomy Focuses
DIA has a strategic partnership with CSIRO to undertake SKA bridging design and development work supported by an initial $2M ad-hoc grant by DIIS. The goals include support of the SKA critical design review milestone and to upgrade the JACAL software to be ready to support the early SKA commissioning phase and to perform science data processing tasks as soon as the first antennas are linked up.
The software will evolve from existing precursors and SKA prototypes which have a very well characterized development risk. As in the past the joint ICRAR and CSIRO team will continue to collaborate with AARNET and industrial partners to adapt evolving technologies and pursue further funding and industrial co-design opportunities (e.g. Nyriad).
A gradual scope shift during the earlier pre-construction phase gave rise to the SRC project described below. The two activities are synergistic. In tandem and thanks to the capability to deploy JACAL inside and outside the observatory boundaries they will provide the regional science community and the respective support industry with the best possible access to data products for the purpose of post-processing and analysis.
Australian SKA Regional Centre Project (AusSRC)
ICRAR DIA together with CSIRO leads the Australian SKA Regional Centre Design Study Project, which is a joint effort of ICRAR DIA, ASKAP, CASS CS, Pawsey and MWA to design and prototype the architecture of the future Australian SKA Regional Centre. The effort involves national and international collaboration and coordination in the SRC development. The AusSRC will play an important role during the SKA construction in the definition and implementation of the interfaces between the SKA and the SRCs. The project will continue developing the regional SRC development through the ERIDANUS project. The overall project will have 6.4 FTE in total deployed in three institutions – CSIRO, UWA and Curtin University. This ICRAR DIA project will coordinate the effort.
Survey Science Support Project (SSS)
This project covers activities directly supporting science projects of ICRAR research groups or individual researchers. Due to limited resource availability, these projects and their requirements need to be identified and resourced on a case-by-case basis. Past and current examples include the CHILES, GLEAM, IMAGINE and GLASS surveys as well as GAMA, but also some smaller scale support for individual researchers. Nominally this also includes our contribution to ASTRO-3D.
We will continue to follow our current operational model, which is to have an annual review of requirements, with ICRAR staff. Additional themes within the project can then be generated outside this review, in response to demand (as for the Machine Learning theme last year). This maintains the project agility and ability to respond rapidly to ICRAR staff demands. SSS in ICRAR II had 13 themes covering most of the interests of the ICRAR research staff. This level of commitment will be carried forward, as it was maintainable. Because of the limited resources we will deliver against multiple projects. For example, for ICRAR II, the following themes delivered against multiple projects: direction dependent calibration and deep spectral surveys are two issues of particular importance to SDP; VO services and data distribution are vital considerations for the regional centres; CHILES was the platform to develop much to the functionality of the work-flow manager DaLiuGE. Machine Learning has now been spun out into a Critical Mass Research Group led by the DIRG.
Data Intensive Research Project
This project covers computer science and data science research projects as well as activities in teaching (HPC and ML) and student supervision. The project also hosts concrete collaborative data science research projects with industry or other research organisations. The project will also generate most of the research papers produced by the DIA team.
Operational Support and Maintenance
Translation and Impact as well as S/W Development and Maintenance Project. This will include the maintenance of the NGAS software, exploiting opportunities provided by the new associate membership of Australia in the European Southern Observatory (ESO), as well as upcoming development and/or support opportunities like the FAST HI pipeline, or specific, directly funded software implementation projects with other research institutes or industry partners. The project will also include hardware/software co-design activities, like the ones we are currently undertaking with Nyriad, and projects like the ASVO middleware software and the VOSpace implementations.
The ICRAR Fairway compute lab represents a major resource for computing and storage for ICRAR staff and students. It currently hosts about 2 PB of storage and about 25 server (compute and storage) with and without GPUs as well as a few more experimental systems, all connected via 10Gigabit ethernet and/or 40Gigabit Infiniband. This project covers the maintenance, upgrades and system administration of all the resources in the lab as well as in-house networking, backups and
migration of services to university and cloud systems. In addition it also maintains cloud resources in support of various ICRAR projects. The project also covers the in-house remote observing facility at ICRAR Fairway.
As the number of supported systems has more than doubled since the start of ICRAR II, a focus for ICRAR III will be to move where possible (logistically and economically) internal compute and storage services to the cloud plus consolidate into few nodes the number of internal systems where they cannot be migrated.