
An artist’s impression of the Murchison Widefield Array (MWA) detecting a pulsar signal (Credit: Dilpreet Kaur, CSIRO). Processing the data collected for SMART to find such pulsars is like finding a needle in a haystack. Having a robust database linked to our processing workflows is critical for this to become a reality, and that is where this project comes in!.
The Southern-Sky MWA Rapid Two-Metre (SMART) pulsar survey is an ambitious project that is moving into its next stage of processing, with 16 discoveries so far and many more around the corner. Given the large data volume and number of linked processing tasks, the SMART pipeline (Nextflow-based) manages and stores a vast amount of metadata and information about the status of various stages of analysis at any given moment. There are also a large number of “false” pulsar candidates which we need to sift through — an ideal task for machine learning (ML). We have developed a basic ML classification scheme and the database that can nominally handle this kind of information. One of the missing links is to have the database and workflows (including, but not limited to the ML classifier) interact in real-time! In this project, the student will develop and integrate a series of software tools that can be used to gather data from and send data to the database as the automated SMART workflow crunches through four petabytes of pulsar search data. This will help streamline the processing workflow enormously, and thus accelerate the rate of pulsar discoveries and the science that can be extracted from them.
Student attributes | |
Academic background | Enrolment in any Computing or Data Science course is appropriate. The applicant should have an interest in astrophysics, but it is not required to be enrolled in the astronomy stream. |
Computing skills | Unix operating systems (required), Python (required), Machine Learning techniques (required), Relational databases (ideal), workflow management (ideal) |
Training requirement | Nextflow, using supercomputing systems |
Project timeline | |
Week 1 | Inductions and project introduction |
Week 2 | Initial presentation |
Week 3 | Familiarisation with current ML scheme and pulsar candidate data |
Week 4 | Familiarisation with current database structure + creating a mock database |
Week 5 | Generating “fake” data (e.g., ML scores, pulsar candidate info.) and identifying required interactions with mock database |
Week 6 | Writing/testing Python functions automatically create and execute database calls |
Week 7 | Testing/updating Python functions to interact with “deployed” database |
Week 8 | Integrating scripts into SMART workflows |
Week 9 | Final presentation |
Week 10 | Final report |