In September 2015, a new partnership between UCT, the University of the Western Cape and North-West University was launched, with the aim of developing crucial capacity for big data management and analysis, particularly for Square Kilometre Array (SKA) astronomy. Somehow I managed to miss this, but to me this is BIG NEWS so I'm really excited to see how this pans out.
A data science institute for Australian astronomers (and multi-disciplinary research in general) is something I've been championing for the past couple of years. While there is definitely a lot of interest in the community, it's still be a bit of hard sell. In the 2016 – 2025 Australian Astronomy Decadal Plan, the issues around data Intensive research (including HPC) and e-Science were highlighted, but compared to other areas it wasn't considered a high priority – more like a "nice to have". However since then various groups within the astronomy community have been working hard to come up with solutions (locally and nationally) that could benefit the whole of Australian astronomy. As with any new initiative it's difficult to please everybody, particularly when resources are limited, so this has taken some time.
Having said that I can't help but feel like everyone else is beating us to it. Various universities in the US and UK have launched multi-disciplinary data science Institutes (e.g. the Moore–Sloan Data Science Environment - BIDS, NYU and UW). Over the past year STScI, LSST and individual astronomy research centres (e.g. IGC Portsmouth) have all committed to funding astronomy/data science fellowships. It certainly hasn't helped that our Government has slashed Australia's scientific research budget more than once in the past few years, but I think there are still some cultural?/traditional? barriers we need to break through.
The below address was given by Naledi Pandor MP, Minister of Science and Technology, at the launch of the Inter-University Institute for Data Intensive Astronomy (IDIA), South African Astronomical Observatory, Cape Town. I've taken the liberty of copying it verbatim from the South African Government website, and bolded some of my favourite bits.
I'm somewhat biased towards the power of data science training and digital literacy, and partnering with tech companies to do research. Partly because of the direction I want my career to take, but mainly because I think this has enormous potential to flow into the broader research (and non-research) community. I'm a firm believer of the idea that big-data (and technology in general) can help the developing world beat poverty, and that solutions have to come from within those countries. This idea is summed up nicely in this article by Datafloq.
Prof Russ Taylor
Prof Tyrone Pretorius, UWCVice-Chancellor,
Dr Max Price, UCT Vice-Chancellor,
Dr Bernie Fanaroff
Prof Frik van Niekerk, NWU DeputyVice-Chancellor,
Directors of the SKA project
"Rising to the Big Data Challenge of the SKA"
The SKA is not simply an astronomy project. Or a big science project. Or an infrastructure project. It's certainly a global infrastructure project and there will be activities in some 20 countries on 5 continents. Total project costs will run into billions of Euros, with much being spent on relaying, storing and analysing the data captured by the antennae - a task that will require processing power estimated to be equal to several millions of today’s fastest computers.
Professor John Womersley (SKA board) once said that “SKA is to some extent an IT project with an astronomy question as a driver."
It's an IT project of the kind that pushes the boundaries of global technology. Big tech companies like IBM and Cisco are already involved because they know it will allow them to develop the knowledge and technologies that will keep them at the leading edge of computing. This in turn will benefit computer users in many spheres from finance to government through industry and medicine to other science researchers.
SKA challenges big data to the extreme. All science pushes the boundaries of knowledge but big science like SKA has the ambition to push those boundaries on the largest scale imaginable.
Our challenge in Africa is to use big data to find answers to big science questions. To do that, we have to develop capacity in Africa. To date no South African university has offered a comprehensive dedicated data-science degree programme. However, given the nature of data science most universities have graduated students in disciplines such as computer science, statistics, high-performance computing and databases and data processing. The Department of Science and Technology supports postgraduate students through grants by institutions like the Centre for High Performance Computing (CHPC). For the past three years, an average of 15 postgraduate (masters and doctoral) students per year graduated from CHPC supported programmes.
The newly established Sol Plaatje University made history by being the first institute in the country (and the continent) to introduce, in 2014, a dedicated undergraduate degree in data science. The current intake is about 30 students. Other universities have recognised the urgent need to develop programmes in the area of big data to be globally competitive in SKA research and are starting up programmes at postgraduate level and appointing senior staff with data science backgrounds.
The IDIA initiative is therefore a timely intervention. The new Institute plans to provide training in SKA-driven data-science research for up to 100 young data scientists over the next 5 years. SKA SA itself has a significant HCD programme that is starting to focus more on supporting work in the area of big data.
Besides the targeted SKA activities to promote big data and the continued support to the CHPC, the DST funded National Integrated Cyber Infrastructure System (NICIS), through its Data Intensive Research Initiative for South Africa (DIRISA), will support and facilitate the development of data science across the entire national research and innovation space. This will be done by enabling and facilitating data-intensive research activities in and between higher education and research institutions. Data-intensive research-capacity development programmes will be established at two institutions during this year and this initiative will be expanded to other institutions.
Negotiations with tech companies, such as IBM, are aimed at preparing Massive Open Online Courses (MOOCs) in various topics in Big Data Science that can be included in coursework programmes of academic institutions. The NICIS ‘Skills and Training’ component will facilitate and promote the training of all types of data professionals. Apart from the significant investments (R200 million per year) to date in supporting infrastructures for big-data research, the DST will invest about R100 million over the next three years in the establishment of DIRISA. In addition, work with European partners (and funding) in developing training initiatives are underway. Efforts have been initiated at a national level to better coordinate various research community efforts and infrastructures in support of developing big data skills and projects.
A significant focus and investment in big data in South Africa is not only due, but is probably crucial if South Africa is to play a significant role in the world economy in the coming decades. However, it is now a fact that the human expertise to capture and analyse big data is both the most expensive and the most constraining factor for most organisations pursuing big data initiatives.
The McKinsey Global Institute (2011 report), a business and economics research organisation, predicts that by 2018 in the United States alone there will be a shortfall of 140,000 to 190,000 people with deep analytics skills and 1.5 million managers and analysts who know how to make effective decisions using analysis of big data. Scaled to the population of South Africa, we would need 23,000 to 31,000 specialists with deep-analytic big-data skills. Initiatives in broadband rollout for all communities, and provision of ready access to e-learning can also play a role here in preparing younger generations for the coming wave of data and opportunities that this represents.
It's clear that the current data challenge requires a wide range of new skills, policies and practices, technologies and legal frameworks. We also need people-focused big-data systems. This area involves looking at data that is geared towards solving problems such as disease* in Africa like the H3AFrica Bioinformatics project. The project collects data and looks at solving problems like TB and Malaria.
The other big-data important project is in the agricultural sector, where we need to have efficient data collection and analysis to assist farmers in better farming methods and also food production.
IDIA will make a significant and broad contribution to the research enterprise in South Africa. Through a focused research and training program in data-intensive science, IDIA will drive innovation in big data solutions that will have impact beyond astronomy. We will be working proactively to transfer knowledge and expertise to benefit a broad range of data challenged domains in science, humanities and commerce.
*Incidentally, South Africa has one of the highest rates of active TB and it continues to be the leading cause of death in South Africa. WHO gives a figure of 25,000 deaths from TB in South Africa in 2011 but this excludes those people who had both TB and HIV infection when they died. These people are internationally considered to have died of HIV. - See more at: http://www.tbfacts.org/tb-statistics-south-africa/
... and here is the press release from the University of Cape Town (September 2015)