This lesson is being piloted (Beta version)

Foundations of Astronomical Data Science

The Foundations of Astronomical Data Science curriculum covers a range of core concepts necessary to efficiently study the ever-growing datasets developed in modern astronomy. In particular, this curriculum teaches learners to perform database operations (SQL queries, joins, filtering) and to create publication-quality data visualisations. Learners will use software packages common to the general and astronomy-specific data science communities (Pandas, Astropy, Astroquery combined with two astronomical datasets: the large, all-sky, multi-dimensional dataset from the Gaia satellite, which measures the positions, motions, and distances of approximately a billion stars in our Milky Way galaxy with unprecedented accuracy and precision; and the Pan-STARRS photometric survey, which precisely measures light output and distribution from many stars. Together, the software and datasets are used to reproduce part of the analysis from the article “Off the beaten path: Gaia reveals GD-1 stars outside of the main stream” by Drs. Adrian M. Price-Whelan and Ana Bonaca. This lesson shows how to identify and visualize the GD-1 stellar stream, which is a globular cluster that has been tidally stretched by the Milky Way.

This lesson can be taught in approximately 10 hours and covers the following topics:


This lesson assumes you have a working knowledge of Python and some previous exposure to the Bash shell. These requirements can be fulfilled by:
a) completing a Software Carpentry Python workshop or
b) completing a Data Carpentry Ecology workshop (with Python) and a Data Carpentry Genomics workshop or
c) independent exposure to both Python and the Bash shell.

If you’re unsure whether you have enough experience to participate in this workshop, please read over this detailed list, which gives all of the functions, operators, and other concepts you will need to be familiar with.

In addition, this lesson assumes that learners have some familiarity with astronomical concepts, including reference frames, proper motion, color-magnitude diagrams, globular clusters, and isochrones. Participants should bring their own laptops and plan to participate actively.


Setup Download files required for the lesson
00:00 1. Basic queries How can we select and download the data we want from the Gaia server?
01:30 2. Coordinate Transformations How do we transform celestial coordinates from one frame to another and save a subset of the results in files?
03:05 3. Plotting and Tabular Data How do we make scatter plots in Matplotlib?
How do we store data in a Pandas DataFrame?
04:00 4. Plotting and Pandas How do efficiently explore our data and identify appropriate filters to produce a clean sample (in this case of GD-1 stars)?
05:05 5. Transform and Select When should we use the database server for computation?
When should we download the data from the database server and compute locally?
06:15 6. Join How do we use JOIN to combine information from multiple tables?
How can we make a selection within a joined table?
How should we save the result?
07:45 7. Photometry How do we use Matplotlib to define a polygon and select points that fall inside it?
08:25 8. Visualization What elements make a compelling visualization that authentically reports scientific results ready for scientific presentation and publication?
What tools and techinques are available to save time on creating presentation and publication-ready figures?
10:35 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.