creative coding & astrophysics in a data science context

Over the past year I started creating small projects and began writing them up as Chasing Telescopes blog posts. This is still a work in progress. The projects mainly served as demonstrations, or tutorials for other researchers, to show what could be created with just a few extra tools. Most astronomers are really good at coding, they've dealt with complex data since the very start of their PhDs, and they are experts in deriving insights. After all, astronomy research is about discovering, understanding, and explaining the unknown. But creating projects that showcase or communicate research is rarely done, unless of course you deal specifically in science communication and outreach. Web development and building small projects is not something astronomers do regularly.

Building small projects is also an excellent way to learn a new programming language, or a library, or tools. It's also a great way to explore various data analysis techniques. Rigorous data analysis requires a good understanding of the data, and understanding of appropriate techniques. To gain a deeper understanding of  Python's scikit–learn machine learning package, I created a tutorial (by adapting existing code on GitHub) that uses k-means clustering to determine the colour palette of Planet Labs images. Given a set of data, K-means clusters them in K-groups or clusters, essentially by optimising moving centroids. Prior to this I mainly used K-nearest neighbour (KNN) algorithms in my research , for example, to find the number of physical sub-clusters within a larger cluster of galaxies – a similar minimisation/optimisation problem. I also enjoy creating visualisations to communicate science to the public and to tell compelling stories There are a number of tools that make it really easy to create interactive stories, and they require very little coding. These include Carto, TimelineJS, and OdessyJS.

My Data Science Toolkit:


The majority of my astronomy research was done using IDL (a vectorised, numerical language with a syntax that includes constructs from Fortran and C), UNIX shell scripting (e.g. AWK and SED), Fortran, and standard astronomy data analysis packages and tools (e.g. SExtractor, IRAF/PyRAF, and Topcat). Since then I've become proficient in Python (scikit-learn, scipy, numpy, pandas, matplotlib, AstroML), git version control (using GitHub & BitBucket), HTML & CSS, and Jupyter Notebooks. I've also started using SQL. I'm a big fan of the D3.js and dimple.js javascript libraries, These are fantastic tools for visualising data. I'm familiar with SQL/sqlite, XML/JSON, Hadoop, APIs, although I haven't used them extensively. Right now Natural Language Processing (NLP) is at the top of my list of tools to master.