Data Visualisation

From Astrophysics to Data Science

 

Visualising the career paths of 200 astronomers turned data scientists.

The first in a series of blog posts that explore how and when astronomers transition into data science careers.

 

The Science to Data Science (S2DS) and Insight Data Science fellowships are 5–7 week intensive post-doctoral training fellowships that bridge the gap between academia and data science in industry. This interactive visualisation was created to get a better sense of what stage in their career astronomers move into data science.

  • How many PhD students forego postdoctoral research in favour of moving straight into data science?

  • How many professional astronomers are moving into data science?

  • At what stage of their career do they do this?

  • How may postdocs do they have on their CV before deciding to leave?

  • Are tenured astronomers moving into data science?

  • How many go through a data science fellowship programs?

  • How many do industry internships?

  • How many go back to complete a Masters in Data Science or other similarly formal data science education?

  • How many make the transition without a data science fellowship? Have any moved back and forth between academia and data science? </p>

  • Why do they move in data science?

The data: comes from the LinkedIn profiles of 116 astronomers who moved from astronomy to data science at some point in their career.

The visualisation: was created using the d3.js sunburst template, HTML, and CSS.

Introducing Swinburne Hacker Within

The Hacker Within (THW) began as a student organisation at the University of Wisconsin-Madison, and is now reborn as a collection of such chapters around the world. Active chapters include Wisconsin, Berkeley, Yale and Melbourne. Each of the chapters convenes a community of researchers, at all levels of their education and training, to share their knowledge and best practices in scientific computing to accomplish their work. I first heard of The Hacker Within after visiting the Berkeley Institute of Data Science  (@UCBIDS) in December 2014. There I met Katy Huff, a BIDS Data Science Fellow (with background in Nuclear Engineering) and one the founding members of THW. Katy now now leads the charge at Berkeley.

Although it took a while to get started, implementing Swinburne Hacker Within (SHW) was no-brainer. It's run as a weekly, multi-disciplinary, meet-up for Swinburne PhD students, technical staff, and researchers involved in big-data projects at all levels and from all disciplines. The goal of the project is to introduce researchers – primarily from the social sciences, biosciences, astronomy, and economics – to the plethora of open-source tools that can be exploited to increase productivity and enhance existing projects, and to encourage the development of off-shoot projects and contribution to community-developed research tools. There are two reasons why this is important:

  1. To prepare researchers for alternative careers in the technology industry. The rise of fellowship programs, for example Insight Data Fellows and Science to Data Science, enable scientists to learn the industry specific skills needed to work in the growing field of big data at leading companies. With new skills in data science and software development, scientists with analytical  backgrounds are now in great demand on the European and US job market and are being offered jobs in leading tech-companies.
  2. The tenets of scientific research (e.g., data control, reproducibility, and peer review) suffer in projects that fail to make use of current development tools such as version control, testing, and comprehensive/automatic documentation. To avoid these pitfalls, the numerous Hacker Within Chapters exist for the purpose of sharing skills and best practices for computational scientific applications. 

It’s also about learning new web-skills, gaining experience with programming languages, trouble shooting existing problems, and helping one another to create innovative research tools and projects to enhance your research (or other interests). Having fun, sharing ideas, and finding the "hacker within". Why change something that clearly works?

No previous programming/hacking experience is necessary. Each session will be based around a theme and may include a short talk or demonstration to generate ideas. Researchers are welcome to follow along, or just work independently or in groups on their on their own projects. Participants are encouraged to work together, share ideas and skills, and propose topics for future sessions. It's pretty much a win-win for everyone.

Initial sessions will focus on data visualisation tools for research, creating research websites and blogs, and learning about software repositories like GitHub and useful tools to make programming easier, e.g. iPython Notebook. More advanced topics may be thrown in from time to time. In terms of project hacking, I'll admit I'm biased towards .Astronomy-like projects. This is somewhat deliberate. I'm on the organising committee for the .Astronomy7 conference in Sydney later this year, so I see SHW as one way of generating interest. There are also clear benefits to CAS researchers in terms of promoting their research products.

You can follow the Swinburne and the Berkeley chapters on Twitter: @hackerwithin

Python for bioinformatics and gut microbiology.

This afternoon I had a really interesting conversation with Professor Linda Blackall about data visualisation and some of the data-intensive research problems that crop up within her field. Linda is also one of Swinburne's Academic Directors (for Research and Training) in the Faculty of Science and Technology. Not surprising we spent quite a while lamenting the lack of university-wide research software training (mainly Python and R), and brainstormed ideas to address this. We also spent a quite a long time discussing  the ‘big-data’ and data analysis challenges facing PhD and postdocs within her field.  I also learnt a lot about gut microbiology (which I knew absolutely nothing about) and the concept of the body's 'second brain'. As with most scientific research disciplines the microbiology PhD students and early career researchers (ECRs) learn advanced statistics, data analysis and computing skills in an ad-hoc manner, often relying on other students and research colleagues within their individual groups. Fortunately the bioinformatics, microbiology and psychology groups at Swinburne have a significant number of proactive researchers and students eager to take advantage of any opportunity that comes their way. For example, enrolling in workshops at other institutions (Software Carpentry), or starting their own informal coding groups.(e.g. NeuralCode group). It's a tricky problem that will take quite a lot of effort and resources to solve.

.astronomy6 director's lunch talk (slides)

Here are the slides from the .Astronomy6 wrap-up talk I recently gave at the Swinburne Astronomy group's, Director's Lunch talk series. The talk was based on my previous blog post and it generated quite a buzz among the younger researchers. Only about a quarter of the astronomy group were aware of .Astronomy community but hopefully we can change that by hosting the 2015 conference in Australia. I also advertised a new initiative - The Hacker Within: Swinburne  - that I hope to get up and running in a couple of weeks. The idea is to start build up software skills by working on .Astronomy hack projects throughout the year, rather than just relying on the online courses such as Coursera, Codecademy and Software and Data Carpentry. My gut feeling is that a project based approach will enable researchers to learn more quickly, actually have projects to showcase their efforts, potentially improve/complement their existing research projects, and better prepare them for .Astronomy7, and help build up the skills required for alternative career paths. Plus why should .Astronomy (fun, fun, fun!) be restricted to once a year? It's also a cunning way to give myself more time to work on new projects and ideas.

These slides can also be downloaded from Speaker Deck.

 

 

Chicago: City of Big Data

Chicago is a brilliant city. I spent nearly a week in Chicago after the .Astronomy6 conference, mainly checking out skyscrapers and Frank Lloyd Wright houses (of course), hanging out in the Ukrainian Village and Wicker Park, and being dazzled by the Bowie exhibition at the Museum of Contemporary Art. Where do I start with the love? Well, the Chicago Architecture Foundation (CAF) is probably a good start. As well as being the first port of call for architectural tours of the city and heritage sites, the have an excellent permanent exhibition called  Chicago - City of Big Data. Through interactive displays, recreated sections of Chicago, the exhibition reveals the potential of urban data and offers a new perspective on Chicago and cities everywhere. It's damn impressive and I think I spent a good hour learning about all the data. Fortuitously I also ended up chatting with the curator for another 45 minutes about how urban data is collected, who analyses it in Chicago and the rising trend of urban planning/smart cities/community driven data mining (Chicago has a similar event to Australia's GovHack) for assessing the sustainability of major cities around the world. I also took a stack of photos and videos. Here are just a few:


The Chicago - City of Big Data highlights website has a pretty good explanation of the main data sets: reports of pot holes, rats, energy consumption, and how personal data is collected. I'd love to see a similar exhibition for Melbourne. They also have and architectural lego centre. Nice.