I'm a data scientist working at the intersection of technology and design. Reformed astrophysicist & former e-Research/data consultant.

A few weeks ago I attended Astro Hack Week at the Berkeley Institute for Data Science (BIDS). It was a seriously fantastic week. It’s going to take a while to wrap my head around all the great discussions, ideas, projects, lectures, tips ’n’ tricks, and my rapidly growing to do list. This relatively free-format conference exceeded expectations in nearly every possible way, except for the fact that I didn't finish everything I wanted to and I didn't immediately retain the vast amount of information that flowed into my brain. An utterly exhausting and satisfying week. Honestly, I don’t even know where to start with this blog post, other than thanking the organisers profusely for such fantastic week, for getting GitHub involved, and for the financial support that enabled me to attend. 

The focus of Astro Hack Week

One one level, it isn't really about astronomy at all. With a focus on

  • effective computing & statistics,
  • machine learning, Bayesian statistics,
  • pair coding, and
  • code optimization & sampling,

the content presented would likely be useful to researchers from many other scientific disciplines.       

We also discussed; 

  • mixture models
  • hierarchical models,
  • probabilistic graphical models,
  • Gaussian processes, 
  • Jupyter notebooks,
  • parallel programming in Python,
  • natural language processing,
  • failing efficiently,
  • career transitions, and
  • imposter syndrome

These topics are nicely summarised in a retrospective* by astronomer turned public health scientist,  @piccolomud 

*at the end of this blog post.

Notable highlights

The hack projects were pretty fantastic and their was am impressively large number of them. At face value, many of them appeared intimidating, perhaps more so than the projects pithed at other unconferences, for example., .Astronomy (pronounced "dot astronomy"). But when you get past the research and statistics lingo and the specific science drivers they weren't so daunting – at least that's what I kept telling myself. In some ways they were just more prescriptive, and the people who pitched them had specific research goals they wanted to achieve. The only downside of this approach is that hack project may ultimately benefit only one or two people, and so this type of hack might not lend itself to collaborative coding (pair-coding maybe). The Hackpad contains pretty much everything that was proposed and “completed” during the week. Not all hacks made it to Friday and that's really important to note. The fail fast and fail effectively mantra proved to be successful and I'll talk more about that in a bit. At a later date I plan on going back and revisiting the lectures, the tutorials, the other hacks, and of course finishing my own projects. 

Fortunately best–hack–practise dictated (gently) that everyone create a Google doc, or an iPython or Jupyter Notebook to document everything that was done, a suggestion Phil Marshall (@drphilmarshall) made at the start of the week. Experience has shown that GitHub repos, Issues and Jupyter Notebooks prove very effective at preserving projects. Communication tools like Hackpad and Slack work really well for these types of conferences. Gitter too, although I must admit I rarely checked into the AHW Gitter account (I'm really not a fan of online chatter). As with the .Astronomy conferences, archiving and documenting discussions and projects is also important to the future success of these events. The facilitate community building and future collaboration, and they provide a way for organisers to demonstrate, outcomes, derive impact, and showcase projects and themes to potential sponsors. 

On the first day Daniel Huppenkothen (@Tiana_Athriel)  gave a really great talk about Imposter Syndrome. For participant driven conferences it's is really useful to address this at the start of the week. I’m constantly surprised by the number of astronomers who, despite all their years of research and university training, still feel like they don’t make the grade. Ironically I am one of them. I can say with certainty that my own imposter syndrome has led to self-sabotage and missed opportunities. It's something that I try not to dwell on, but it's always with me.  During morning coffee on the first day at least two people (highly accomplished and well respected researchers I might add) commented that they weren't entirely sure whether they should be at this conference – of course by day two their fears were mostly behind them. I'm sure they were not the only ones. I suspect that like .Astronomy, Astro Hack Week is a bit of an unknown. You're never quite sure what you've got yourself into.  I later learned that the conference organisers took on board feedback from previous years, and took into consideration discussions from .Astronomy8, specifically this excellent blog post; The Horror of Hack Days written by Aleks Sholz (@Dalcash_Dvinsky), from the University of St Andrews. 

Even better still, imposter syndrome was monitored throughout the conference, particularly when more mature hacks (which began well before AstroHackWeek) were discussed or presented. Phil Marshall did a really excellent job of pointing out his own imposter syndrome which I think put a lot of people at ease. Failing effectively and discussing failures was also encouraged (even AHW veterans failed!) and this really helped to set up a "safe" environment.

Since Astro Hack Week tends to focus on more advanced programming, high-level statistics and and computational algorithms, the related projects are  somewhat more intimidating, particularly so for us old-timer observational astronomers who deal with small sample sizes and global parameters, and have never really thought Bayesian. The organisers where clearly aware of this, and at some point many hack projects morphed into useful tutorials for the community. I couldn’t help but feel collective relaxation around half way through the weeks when more general, tutorials, social hacks, and dare I say it more whimsical hack projects were pitched.

One of my favourites was Adrian Price–Whelan (@adrianpw), Dan Foreman–Mackey (@exoplaneteer), and Ben Nelson's custom queried colormaps. The example they presented was for cities at night. I couldn't help but think of the Apollo Project image gallery. I'd love to create custom palettes based on those (now added to ...).

Machine Learning was probably the most anticipated topic of the  week. Joshua Bloom gave an excellent lecture and guided tutorial on machine learning in science, which included not only algorithm theory, but how to determine when it should be used, and how to tell when it’s not giving useful information. There is such a buzz about machine learning in the technology, business, and research sectors, that it’s assumed to be the most appropriate and the most informative methodology to use. In many cases it isn’t, and figuring out when it should be used is a bit of dark art. Fortunately, as astronomers, we spend most the majority of our time evaluating instrumentation, raw data, processed data, software pipelines and translating the underlying physics into useful code. We are very much used to reserving judgement about the usefulness of techniques until they are fully explored. Incidentally this is probably why researchers are highly sought after as data scientists. Many hack projects were inspired by this session.  A flurry of machine learning hacks resulted from this session. I'm still yet to create my own. Working through Josh's tutorial is a good way to start. 

Working at GitHub HQ in San Francisco

The #1 highlight of the week was definitely spending Wednesday at GitHub HQ in San Francisco.  Honestly, the place is amazing. It's such an attractive work environment and we got the impression that it really values and takes care of its employees. I know I wasn’t the only one asking "what the hell are we still doing in academia?".

Again, in his role as unofficial Master of Ceremonies, Phil gave an excellent talk to our GitHub hosts that showed how researchers, and specifically large collaborations like the Large Synoptic Survey Telescope (LSST), use their software. His talk has definitely influenced the way I now use GitHub.

Jonathan Whitmore (@jbwhitmore), research astronomer turned data scientist now based at Silicon Valley Data Science, also gave an excellent tutorial on getting the most out of Jupyter Notebooks. He cooly wowed us with his notebook prowess while we all scrambled to jot down keystroke commands. Jonathan is an excellent speaker and great person to talk to if you're considering applying for an Insight Data Science Fellowship or moving into data science.

Half-day hacking spread over a week worked incredibly well, amounting to roughly two and a half days hacking. Unlike 24 hour hacks, spreading the load takes the pressure off and enables one to ditch projects that aren’t worth pursuing (or pursuing at a later date). It also gives you space to mull over ideas and work more effectively on multiple of hacks. Of course it's far more relaxed ––made more so by the open bar and cocktails at gitHub –– and it does lose some of the frenzied crazy competitiveness that I love about .Astronomy. The only real downside is the need to constantly switch brain gears: coffee, lecture, discussion, hack project #1, more discussion, dinner, a pint, cocktail anyone? coffee, hack project #2. You get the picture. For the scatterbrained and attention deficit like me, you can spend too much time figuring out where you got to the first time and where you think you were heading.  For multi-day, multi-project GitHub is a godsend. Even if your hack doesn't involve a lot of coding, a repo enables you to organise your team (and yourself), archive documentation e.g. using Jupyter Notebooks, and keep on top of your hacking todo list e.g. the GitHub Issues feature is perfect for this, even if it's up being used in a way that wasn't intended.

My hack projects: I had one two hack projects in mind before I set off to San Francisco. The first was a Twitter Bot. Why? Because they are fun and we often take ourselves far too seriously. Twitter bots remind us that useful skills can be learned with whimsy – and I'm all for whimsy.  I also wanted to do a hack that required some level of web-scraping and using APIs to make calls. Both are really useful skills to have.  I also thought they would make a good hack to document as a tutorial. The second hack was going to be something to do with interactive plotting of published research figures, either using; mpld3, dimple.js, D3js, or GlueViz. Something similar to the project Ruth Angus started at .Astronomy8. 

Adam Becker (@freelanceastro) was also keen to create a twitter bot and I think this is something we'll do together at a later date. We spent the first day putting some ideas together; we wanted it to useful, engaging and involve images of some sort. We also wanted it to be responsive (or "active"), triggered by some action,  rather than a passive Bot that would just tweet to the world. A the days progressed we both ended up working on other hack projects so this ended up moving down the list of priorities. One day... 

I spent most of my time creating a set of database tutorialsSimple Databases for Pythonic Astronomers, with Usman Khan (National University of Sciences & Technology – NUST, Pakistan) Phil Marshall (SLAC National Accelerator Laboratory), Jen Sobeck (University of Virginia) . This is an ongoing project. 

I also did a 5-minute hack. I created a Made at AstroHackWeek badge for GitHub repos.

Blog Posts & Retrospectives

Although no live blogging took place during the week I know that a few people have plans to write up their experiences. For posterity, I’ll add them as I find them.





The Hadoop Platform and Application Framework

Reflecting on the past 6 months