Foundations of Astronomical Data Science: Instructor Notes

Instructor notes


This lesson guides learners through analyzing data from a large database. Scientifically, we are identifying stars in GD-1, a stellar stream in the Milky Way (creating Figure 1 in “Off the beaten path: Gaia reveals GD-1 stars outside of the main stream” by Adrian Price-Whelan and Ana Bonaca). The first part of this lesson (episodes 1-6) shows learners how to prototype a query, starting by querying a subset of the sky we ultimately want and then building up stronger and stronger filters locally. With our filters in place, episode 7 performs the full query remote on the remote database, giving us a dataset to visualize in episode 8. Episode 8 demonstrates best practices and tips and tricks to efficiently and effectively visualize data.

Because this episode follows a single dataset throughout, its easy for learners (and instructors) to lose sight of the bigger picture and focus instead on the scientific goals or individual commands. At the beginning of each episode it is recommended that the instructor discuss both the scientific goal of the episode (with frequent references to Figure 1) and highlight the big picture skills that we hope each student takes away from the episode, beyond the specific science case. At the end of the episode the instructors should recap the same information, highlighting the best practices covered.

We have an onboarding video and accompanying slides available to prepare Instructors to teach this lesson. After watching this video, please contact so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at Centrally-Organised Data Carpentry Foundations of Astronomical Data Science workshops.


Unfortunately these episodes are incredibly cumulative and there is not much that can be cut along the way. If you are running short on time, we recommend eliminating or condensing these sections:

Decisions Made


Episode 1: Queries

Episode 2: Coordinates and units

Episode 3: Tabular Data and Transformations

Episode 4: Proper motion

x = ...
y = ...
plt.plot(x, y)

This idiom violates the recommendation not to repeat variables names, but since they are defined and used immediately, it should be ok. This syntax simplifies the final plot expression making it easier to read.

Episode 5: Coordinate transformation and selection

Episode 6: Joining tables

Episode 7: Photometry

Episode 8: Visualization