I'm a data scientist working at the intersection of technology and design. Reformed astrophysicist & former e-Research/data consultant.

Last month I was in Chicago for the annual .Astronomy6 conference, arguably one of the most exciting and productive conferences I’ve encountered so far. Like many of the other 'first timers'  I wasn’t exactly sure what to expect. I knew it would be relatively informal, I knew there would be a hackathon, and I new there would be some discussion about the Zooniverse and other citizen science projects... perhaps?

I think it’s fair to say that the broader astronomy community views .Astronomy (dotAstronomy) as a bit of an enigma. Many astronomers don't really know what it is (I suspect some aren't aware that it even exists) and those who have heard of it, suspect that the conferences are pretty great, but are still not quite sure what it’s all about, or how it relates to their research. My answer to this is to  find a way to go - it's amazing, it will blow your mind and it will challenge any traditional ideas you have about astronomy careers,

The fact that the conference program is pretty much put together at the conference says a lot; it's dynamic, exciting, and participant driven. You won't see the same old  green valley/galaxy formation [ insert other astro topic] review  or discovery talk that you saw the year before.

The average age of participants is younger than most academic conferences. While some have tenured positions the majority are post-docs, many of whom are in the process of re-assessing their current career path, or looking to create their own alternative-astronomy careers and some who already made the decision to transitioned to industry. Nearly all are active GitHub or Twitter users and those that aren't seem to  approach their research as if they worked for Google or a tech start up . In fact the cross over between traditional astronomy research and tech industry "data science" is one of the most appealing aspects of dotAstronomy.

For those who require some background: DotAstronomy was created back in 2008 by Robert Simpson, along with Alasdair AllanSarah KendrewChris LintottStuart LoweCarolina Odman Govender and Arfon Smith. I like to think that it started out as a small group of renegade astronomers who lamented the lack of blue sky thinking and did something about it. It's now grown into a fairly large yet tight-knit community. These folks are also responsible for organising the astronomy hack days at the National Astronomy Meetings in the UK (NAMhack) and the annual American Astronomical Society conference (check out #AAS225 & #hackass).

So what happened at .Astronomy6? What follows is a long list of the Top 5 things I took away from  the conference. It also includes an attempt to consolidate other participants blog posts, Hack Day ideas, required skills, Google docs, project outcomes, and GitHub repositories. You should also take a look at the Official Live Blogs: Day 1 - Unconference, Day 2 - Hack Day, Day 3 - Unconference/Projects.

Brooke Simmons (aka @vrooje), Meredith Rawls (aka @merrdiff), and Elisabeth Newton (aka @EllieInSpace)  did a really great job summarising everything  that happened each day (and participating in the StarTrekRewatch podcast - Amusing musings). I don't think I could have done that. My brain with split all over the place (when not thinking about future data science projects). Meanwhile my eyes, following no less than six Twitter #Hashtags at a time. Exhausting. Seriously. Perhaps next time.

Heidi Tebbe also wrote a nice little post on her blog: An Idea Of Happiness.

#1 There is a strong overlap between the .Astronomy and Data Science communities.

This years conference was supported by GitHub and the Alfred P. Sloan Foundation.  The Keynote talk was given by Arfon Smith, a former astronomer now working at GitHub. So perhaps not surprisingly he talked about GitHub for Science. In particular the value of:

  • Open (not necessarily public) and collaborative coding, community learning and improving (and testing) code,
  • Version control for research and software citation,
  • Forking code. GitHub makes forking the norm. A foreign concept to most researchers and still a bit of a mystery.
  • Building up your research profile to include "software developer". (As astronomers we are used to writing our own data analysis routines, but unfortunately most academic environments of today do not reward tool builders - seems to be true in other disciplines as well).
  • Citing other people's code - building a community based on trust, just like publication citation, and
  • Requesting to review code/methodology when reviewing a paper - enforces better research practices.

What really surprised me, and everyone else it seemed, was the increasing number of open software repositories in GitHub (~4 million in 2012, ~20 million in 2014) and the increasing trend of using Python for science (~300K repos 2012, 1.4M repos 2014).  It was also nice to see a slow but steady increase in Fortran (7000 in 2014), Latex (50,000 in 2014), C++ (~615K in 2014)  and IDL code (~50 repos in 2012, 2000 repos in 2014 - inc. my own).

Given the nature of the conference there was also a great deal of discussion of the recent trend called "Data Science",  the number of astronomers moving into the tech industry (via the Insight and S2DS programs, as freelancers or through start ups) and the lack of alternative career paths for technical/data academics. The first mention of the Moore-Sloan Foundation's Data-Driven Discovery Initiative (which I happily champion - look out Swinburne! Great things are about to happen), occurred within the first hour or so of the conference. Perhaps this shouldn't have been a surprise given the participant list which includes a number of data science champions (inc. David Hogg -  Deputy Director of NYU's Centre for Data Science, Alberto Pepe - Co-founder of Authorea, Jonathan Fay - architect of WorldWide Telescope @ Microsoft Research, and many others.)

#2  A great forum for candidly discussing things we tend not to talk about at conferences.

Throughout the conference we talked a lot about the current academia culture, the good, the bad and the ugly. We also talked about what could be done, what should be done, and what we can do now. Spending time talking about everything we talked about and spending time on Hack Day projects is generally frowned upon in academia. Despite the fact that astronomy community will happily support these projects there is still an expectation that they are done in your own time. So what happens when 50+ excited researchers get to talk freely about anything and everything? Four (yes four!) unconference streams are cobbled together each made up of four sessions, each with #hashtags so that everyone could be in no less than four places at once. At this point I learned the true value of Tweetdeck.

Here is a summary (edited from the Day 1 blog) of what we talked about including the collaborative Google docs we put together during each session.

Unconference sessions (note: bold hashtags weren't really used):

  • #dotpy (Google notes: dotpy): Astropy updates! Impending release of Version 1.0. Most of the time was spent talking about Astropy tutorials, which are a high priority for researchers.
  • #dotvoice: We discussed the state of academia for young researchers and what people can do with it to improve it. Big shift towards alternate career paths.
  • #hackJ (Google notes: hackj): Journals are antiques. They’re still following the same model of publishing as they were 100 years ago. This needs to change we don't think it's an impossible task. Change is already happening but how can researchers influence progress. Authorea (see article: Paper of the Future) is one example.
  • #astrocult (Google notes: AstroCult: What are the #dotastro cultural themes that work and that we’d like to see extended? What should future #dotastro conferences look like? How does this fit in with sharing data science skills in general.
  • osss: Open-Source Sky Survey. It doesn’t exist yet. Could we build it? Should we build it? If we did, what would it look like?
  • starchive: Starchive will not only be open access but will also be open source.
  • #teachrepo (Google notes: TeachingRepos): How can we create and organise a teaching repository?
  • #astrogames (Google notes: AstroGames): Should you make games explicitly educational? Or should you try to hide the fact you’re teaching people things?
  • #openwwt (Google notes: WWTsession): WWT is now going open source; some ideas for how to adapt it:
  • #dotmuse & #dotall (Google notes: dotMuse): What does it mean to try and be a science museum? How to develop successful outreach programmes for people who don’t go to museums?
  • #dotmeme: We made some gifs, we made a meme, we had a nice relaxing time, it was lovely.
  • #softcred: Improving software citation and best practises for researchers. The role of DOIs for GitHub repositories and the Astrophysics Source Code Library - ACSL. (Google notes: Softcred)
  • js101: We had an Introductory JavaScript 101 session for those who wanted to get started.
  • #dotphone (Google notes: DotPhone): Smartphones as Citizen Science Detectors
  • dotimprov:  Discussion/workshop on Astro Improv talks
  • #briefideas:  We talked about the new Journal of Brief Ideas: http://ideas.theoj.org (see Hack Projects)
  • dotcomms: Communicating DotAstro to the wider astronomy community, and ideas for developing similar workshops in other disciplines (dotBiology etc.)
  • astrotrain: Software training for astronomers - practical action items
  • arxiv & peerreview: Proof of concept peer review - Arxiv with GitHub.
  • ascl: Improving the Astrophysics Source Code Library
  • hubot101: Developing web robots - Hubot 101

#3 A mind-boggling Hack-a-thon that kicked off around 9am and ended at 3:30am (apparently...) 

A nearly complete list of all the Hack Day projects. Many were pitched at the start of Day 2 but this list also include some of the projects proposed prior to the conference: https://sites.google.com/a/dotastronomy.com/wiki/dotastro6/hack-day-ideas (you can also see projects from previous years). I've taken the liberty of grouping them by theme and including notes on required skills/tools. Not all hack projects were completed. Those labelled (Demo) were presented on Day 3. I've tried to include all the URLs and GitHub repositories. Apologies if any are missing. I'll try and hunt them down later.

Visualisation & Software/Tools for Astronomers

  • (Demo) Visualize the AAS job register using location to pin jobs on a map to see which are relevant to you. [URL: http://www.physics.usyd.edu.au/~vmoss/jobvis/]
  • Make a visualization of SDSS data where people can “fly through” the Hubble UDF.
  • Image hosting, fully tiled, astrometry.net - coordinate info - put in cloud and share(requires? Java script, WebGL, d3js)
  • (Demo) UnclockifyChrome extension to find and replace sexigesimal coordinates on Wikipedia with coordinates in decimal degrees (requires? web plugin skills) [Code: https://github.com/adrn/unclockify]
  • trillionverse.org - get a tonne of astro data and put it in one place - a lot of people here work on astro data and want to put it on the web - let’s not reinvent the wheel - let’s do it together
  • Interative imaging data sets - simple light weight interface to zoom in and zoom out etc.  catalogue overlays - ability to interact with full data set - but visually more appealing than Aladdin (requires? Django and JavaScript, people who are willing to reinvent the wheel - a bit)
  • Web front end for astropy co-ordinate system scripts - changing between coordinates.
  • Astronomy image format support to the Modular File Renderer (MFR) developed by the Center for Open Science.
  • Development of the Open Access Stellar Archive - has ADAP funding. (requires? PHP, SQL, Java, visualisation)

Software, Journals and Metrics

Research Output and Social Media

  • Automatically post each day’s APOD to instagram, tumblr, and/or reddit
  • Improve digital museum collections with social media / self-curation
  • Crowd-source images during natural disasters and find important, otherwise-missed content
  • (Demo) Connect the Chinese zodiac statues at the Adler to modern astronomy with a mobile website (Requires Odyssey or similar? Google notes: AstroCurate) URL: http://shouldbedoingsomethingelse.com/adler-zodiac-story/
  • Mobile website to learn connection between zodiac and modern astronomy (requires app skills)
  • Create new astronomy-themed game(s) using WWT / augmented reality / Exo-Flappy Bird
  • Make the podcast aggregator astronomy.fm more accessible.
  • (Demo) Build an astronomer twitter bot that answers your astronomical questions (requires JavaScript) [Code:  https://github.com/ttfnrob/botastro]
  • Set up a scavenger hunt from various locations within the Adler Planetarium.
  • IAU allowed us to name exoplanets - let’s register a #dotastro group and name an exoplanet.
  • Chasing lions in the Serengeti: build an iPython querying tool on the Snapshot Serengeti dataset - useful for all Zooniverse projects, not just astronomy(requires iPython, SQLAlchemy)
  • Visualisation for Galaxy Zoo data - help getting into a form that can be easily deployed to a web server
  • Make a Haiku of every paper on ArXiv - do they get more citations if Haiku better?
  • 15 Zooniverse projects are translatable (requires? Non-english native speakers)
  • (Demo) Chromoscope.net now has a trippy version: kaleidochromescope (Best using Chrome)
  • Determine what type of astronomy images are most popular on APOD (analysing, Twitter, Facebook & Google Stats/Scraping)
  • Hubble Universe fly-through

Data Analysis and Statistics

  • Write a comprehensive statistics tutorial based on David Hogg and Justin Lang’s 2010 paper "How to Fit a Line".

Academic Culture

  • Create an institution rating website/survey for astronomers to report the culture and climate of their department.
  • Use ADS to track the career trajectories of individuals (requires? API).
  • Make a website for kids and career advisors with links to people in the field.
  • Alternative metrics including social media research output (requires? API).

#4 - An incomplete list of tools for the .Astronomer and aspiring data scientist.

Figuring out what tools are out there, what to use for your Hack Day project,  and when to use it seems to be the hardest part. I spent quite a while figuring which of Leaflet, CartoDB or D3js would work best for what I wanted to do. In the end I think all would work. The trickiest bit was trying to get SQLAlchemy talking to my iPython notebook.  Not at all intuitive for someone like me.  I think this is where mind maps or a handy whiteboard really help. This is just the start of my list of .Astronomer and aspiring data scientist tools. I know there are infinitely more out there and at some point I plan on putting together some sort of .DotAstronomy Hack Day catalogue  including previous years projects.

GitHub (& Zenodo: DOIs for software citation)

  • Astrophysics Source Code Library (ASCL)
  • iPython Notebook (& iPython)
  • SQLAlchemy the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
  • JavaScript
  • WebGL is a JavaScript API for rendering interactive 3D graphics and 2D graphics within any compatible web browser without the use of plug-ins.
  • HTML & CSS
  • OpenRefine  (formerly Google Refine)  - great for cleaning up messy spreadsheet data e.g. add latitude and longitude for the a list of places (teleescopes, cities in the AAS job Register, historical astronomy data.
  • APIs and Web Scraping it’s useful to know what they are an how they work.
  • Tableau Public free tool for visualising data
  • CartoDB free tool for visualising data
  • Leaflet an open-source JavaScript library for mobile-friendly interactive maps
  • OdysseyJS an open-source tool that allows you to combine maps, narratives, and other multimedia into a beautiful story (from the makers of CartoDB)  http://cartodb.github.io/odyssey.js/   (.Astro6 Hack: Adler Zodiac Story)
  • TimelineJS is an open source tool for building interactive timelines (.Astro6 Hack: History of Sundials)
  • D3js Data-driven documents - JavaScript library.
  • Omeka is an open source web publishing platform for creating online exhibitions.
  • Omeka CSV  plugin allows users to import items from a simple CSV file, and then map the CSV column data to multiple elements, files, and/or tags.
  • OpenScience Framework.

#5 Final thoughts and suggestions for .Astronomy7

If you've never participated in a Hack Day they can be a little intimidating. Everyone comes together with different ideas and skill sets and unless you already know what you want to work on it can be a little daunting. My suggestion for making the most of the Hack Days is to join a group of really clever people and eavesdrop on whatever they are doing. Also, getting your laptop ready before you come is really useful. At the very least you should have Twitter, Tweetdeck, and a basic GitHub profile,Python/iPython and iPython Notebooks set up.

I also think it would be fantastic to have an extra day set aside BEFORE the conference for first timers - a sort of DotAstronomy 101. The idea would be to have a short session on 'Tools for DotAstronomy', with examples of previous Hack Day projects and details of the tools that were required/used. The rest of the day could be made up of a number of basic tutorials: for example JavaScript 101, D3js, iPython Notebooks, or setting up a basic SQL database and making it work with iPy Notebook. Or participants could work though their own Codecademy tutorials. This would give first timers a better idea of what will happen on the Hack Day and some basic skills (fresh in their mind) so they can hit the ground running. I tried to do all of this before I came but it does take time and  you forget a lot of things if you aren't actively working on a project. Often the hardest part is just figuring which are the most useful tools to know and what are most commonly used. DotAstronomy veterans could arrive the next day for the start of the conference or hold similar sessions for more advanced software developers.

Right.... now back to work!

Chicago: City of Big Data

open workshop - sports analytics research cluster