[Skip to Content]
Project Background

Science is becoming increasingly data intensive and this requires a new data focused approach. The prime example is the Square Kilometre Array (SKA) project, one of the world’s latest large- scale global scientific endeavours, which will be co-hosted in Australia and is set to produce orders of magnitude more data than all of mankind’s past accomplishments. Just one of the SKA phase-1 science projects (e.g. the HI survey) will produce derived data in the order of several terabytes per second, and the second phase of the SKA project will be at least an order of magnitude greater.

The project

A DIA related PhD project is being offered in Data Intensive Astronomy which will bridge the computer science-focused data-driven approach to the science applications. Data Intensive Science has become fundamental to deliver any modern-day cutting edge science. ICRAR is a recognised world leader in its astronomical applications. This position bridges the technical issues and the astronomical requirements. The PhD projects will provide engagement with industry and other partners and a unique training environment, working at the cutting edge of radio astronomy, computer science and commercial business and scientific systems.

The Computer Science element will cover: 1) profiling basic algorithms to measure various compute and other metrics and creating data slicing helper functions based on information derived from measured metrics; 2) the characterisation of the transitions between compute intense and I/O intense phases, the balancing these being central to getting the best performance. The student will be interact primarily by Prof. Wicenec.

The practical aspect will be to work on some of the most extreme datasets observed in Radio Astronomy to date, which will provide a perfect test bed for the data-driven paradigm. The data we will use will come from the Australian SKA Pathfinder, and the project DINGO. The student will investigate the ideas and methods, demonstrating new approaches on frontier data products. The student will be supervised primarily by Dr. Dodson.

Your background and interest

We are interested to hear from potential candidates from any STEM background, as the range of skill sets required (and to be developed) can not be limited to one traditional field of study. The candidate would join an active multi-disciplinary group with many scientific and commercial cross fertilisation possibilities.

The results

Obviously we can’t show the results of this project, but we can show some results from our prototyping. What this project is all about is to achieve something like this on a much larger scale and in an optimized way.

SDP prototyping workflow, based on that for the deep HI project CHILES. The daily observations are split into many small frequency sub-bands that can be imaged and cleaned in parallel and then recombined into the final data product; an image cube covering redshifts between 0 and 0.5, which is allowing us to explore the local Universe in HI and discover the most distant HI galaxies (Fernández 2016).

SDP prototyping workflow, based on that for the deep HI project CHILES. The daily observations are split into many small frequency sub-bands that can be imaged and cleaned in parallel and then recombined into the final data product; an image cube covering redshifts between 0 and 0.5, which is allowing us to explore the local Universe in HI and discover the most distant HI galaxies (Fernández 2016).

Image from a single velocity channel in an image cube made in a highly distributed fashion on the Amazon Web Services Cloud-based computing. The hydrogen line emission from a single galaxy is clearly visible, with flux levels around 1 mJy/beam. The inset spectrum shows the integrated flux across the galaxy as a function of frequency (in GHz) showing the emission is limited to a fraction of a MHz. The frequency allows us to calculate the distance to this galaxy to be 30 Mpc.

Image from a single velocity channel in an image cube made in a highly distributed fashion on the Amazon Web Services Cloud-based computing. The hydrogen line emission from a single galaxy is clearly visible, with flux levels around 1 mJy/beam. The inset spectrum shows the integrated flux across the galaxy as a function of frequency (in GHz) showing the emission is limited to a fraction of a MHz. The frequency allows us to calculate the distance to this galaxy to be 30 Mpc.