D3.js

From Astrophysics to Data Science

 

Visualising the career paths of 200 astronomers turned data scientists.

The first in a series of blog posts that explore why astronomers are leaving academia.

 

The Science to Data Science (S2DS) and Insight Data Science fellowships are 5–7 week intensive post-doctoral training fellowships that bridge the gap between academia and data science in industry. This interactive visualisation was created to get a better sense of what stage in their career astronomers move into data science.

  • How many PhD students forego postdoctoral research in favour of moving straight into data science?

  • How many professional astronomers are moving into data science?

  • At what stage of their career do they do this?

  • How may postdocs do they have on their CV before deciding to leave?

  • Are tenured astronomers moving into data science?

  • How many go through a data science fellowship programs?

  • How many do industry internships?

  • How many go back to complete a Masters in Data Science or other similarly formal data science education?

  • How many make the transition without a data science fellowship? Have any moved back and forth between academia and data science? </p>

  • Why do they move in data science?

The data: comes from the LinkedIn profiles of 116 astronomers who moved from astronomy to data science at some point in their career.

The visualisation: was created using the d3.js sunburst template, HTML, and CSS.

Wrap-up of the first two-day Unix/Python workshop

 

Earlier this week we ran our first two-day Unix/Python workshop (Twitter #swinpython)  for PhD students and research staff interested in getting started with Python. The was led by Swinburne University's newly minted Software Carpentry instructors, Dr. Ewan Barr and soon to be Dr. Genevieve Shattow, with help from a small team of unix and python experts from the Centre of Astrophysics and Supercomputing.

Workshop participants

Following the workshop I put together a quick D3.js interactive visualisation, profiling the workshop participants.

Thirty people attended part or all of the workshop. Roughly half the participants were women, with an even split between PhD students (light grey - outer ring) and established research postdocs and staff (dark grey - outer ring). The script (participants.html) for generating the above plot, and example data (participants_template.csv), can be downloaded from this resources repository. The template was provided by @evilangelpixie and is a modified version of the Sequences Sunburst template.



The Sessions

Since this workshop was designed to teach general python skills participants came from a variety of research disciplines. Consequently there was a large spread in programming experience and ability (as well as operating systems). We chose to install software locally rather than using a pre-configured cloud instance. While this takes a little more effort it meant that participants could set up their own GitHub accounts, Git Bash shell and python (Anaconda) installation for use later on. It also gives them a better understanding of how their laptop operates. Fortunately this went rather smoothly. We anticipated that the first half of the first session would be spent troubleshooting Linux, Windows and Mac OSX issues, but this wasn't case. A few days before the workshop we held an hour long helper meeting where participants could come and get help setting up laptops. This turned out to be a really good idea and we'll definitely be offering that next time. For larger workshops we might think about setting up a pre-configured instance using NeCTAR's Research Cloud.

The first session on the Unix shell was a little hit and miss I think - mainly because of the material. If you've used the shell throughout your PhD/postdoc I don't think you get much out of this session, unless you want to brush up on your scripting skills. If you've never used it (or even heard of it) it can be quite baffling. Genevieve did an excellent job teaching this session which is more conceptual, with less guided material than the rest of the workshop. Based on a number of reviews of Python workshops held at other institutions, a favoured option is to present Unix shell material as optional pre-workshop homework therefore allowing more time for Python and SQL.

Python was our language of choice, because it’s free, intuitive and popular, especially in the physical sciences. It's also a useful (and often expected) language for those who want to move into the tech industry. This did not mean that only Python users (or potential users) would benefit from this particular workshop. Since the focus is on learning general programming skills that are transferable to any language, learning Python basics is a good place to start. We covered the most useful elements of programming; manipulating data, plotting, using loops and conditional statements ("for", "if", "else" etc.) and functions.

The structure of the workshop was mix of guided teaching with challenges thrown in along the way. Participants were encouraged to work in small groups and discuss ideas. Of course we spent more time on each session than originally planned, but this seems to be typical of Software Carpentry courses - too much (great!) material, too little time.

From Software Carpentry at Swinburne - Workshop resources for Swinburne researchers.