Summary and Schedule
The Foundations of Astronomical Data Science curriculum covers a range of core concepts necessary to efficiently study the ever-growing datasets developed in modern astronomy. In particular, this curriculum teaches learners to perform database operations (SQL queries, joins, filtering) and to create publication-quality data visualisations. Learners will use software packages common to the general and astronomy-specific data science communities (Pandas, Astropy, Astroquery combined with two astronomical datasets: the large, all-sky, multi-dimensional dataset from the Gaia satellite, which measures the positions, motions, and distances of approximately a billion stars in our Milky Way galaxy with unprecedented accuracy and precision; and the Pan-STARRS photometric survey, which precisely measures light output and distribution from many stars. Together, the software and datasets are used to reproduce part of the analysis from the article “Off the beaten path: Gaia reveals GD-1 stars outside of the main stream” by Drs. Adrian M. Price-Whelan and Ana Bonaca. This lesson shows how to identify and visualize the GD-1 stellar stream, which is a globular cluster that has been tidally stretched by the Milky Way.
GD-1 is a stellar stream around the Milky Way. This means it is a collection of stars that we believe was once part of a bound clump, but the gravitational influence of the Milky Way has torn it apart and spread it over an arc that traces out its orbit on the sky. This is interesting, because if the original bound clump was a dwarf galaxy, understanding its orbit with sufficient precision allows us to measure the mass of the Milky Way, which is very important for understanding the future and past of the Milky Way as a whole. But that is much easier to do if we have a coordinate system aligned with the stream because that makes fitting the location of the stars much easier mathematically - it becomes more linear instead of some complicated curve. Additionally, this stream is especially interesting because it has “gaps”, which have a natural interpretation as being caused by the influence of small clumps of dark matter passing near the stream. Knowing the typical rate of these gaps tells you about the typical size and density of these clumps, which turns out to be one of the best probes we have of the fine structure of dark matter.
This lesson can be taught in approximately 10 hours and covers the following topics:
- Incremental creation of complex ADQL and SQL queries.
- Using Astroquery to query a remote server in Python.
- Transforming coordinates between common coordinate systems using Astropy units and coordinates.
- Working with common astronomical file formats, including FITS, HDF5, and CSV.
- Managing your data with Pandas DataFrames and Astropy Tables.
- Writing functions to make your work less error-prone and more reproducible.
- Creating a reproducible workflow that brings the computation to the data.
- Customising all elements of a plot and creating complex, multi-panel, publication-quality graphics.
Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach this lesson. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at Centrally-Organised Data Carpentry Foundations of Astronomical Data Science workshops.
Prerequisites
This lesson assumes you have a working knowledge of Python and some
previous exposure to the Bash shell. These requirements can be fulfilled
by:
a) completing a Software Carpentry Python workshop
or
b) completing a Data Carpentry Ecology workshop (with Python)
and a Data Carpentry Genomics workshop
or
c) independent exposure to both Python and the Bash shell.
If you’re unsure whether you have enough experience to participate in this workshop, please read over this detailed list, which gives all of the functions, operators, and other concepts you will need to be familiar with.
In addition, this lesson assumes that learners have some familiarity with astronomical concepts, including reference frames, proper motion, color-magnitude diagrams, globular clusters, and isochrones. Participants should bring their own laptops and plan to participate actively.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Basic Queries | How can we select and download the data we want from the Gaia server? |
Duration: 01h 30m | 2. Coordinate Transformations | How do we transform celestial coordinates from one frame to another and save a subset of the results in files? |
Duration: 03h 05m | 3. Plotting and Tabular Data |
How do we make scatter plots in Matplotlib? How do we store data in a Pandas DataFrame ?
|
Duration: 04h 00m | 4. Plotting and Pandas | How do efficiently explore our data and identify appropriate filters to produce a clean sample (in this case of GD-1 stars)? |
Duration: 05h 05m | 5. Transform and Select |
When should we use the database server for computation? When should we download the data from the database server and compute locally? |
Duration: 06h 15m | 6. Join |
How do we use JOIN to combine information from multiple
tables?How can we make a selection within a joined table? How should we save the result? |
Duration: 07h 45m | 7. Photometry | How do we use Matplotlib to define a polygon and select points that fall inside it? |
Duration: 08h 25m | 8. Visualization |
What elements make a compelling visualization that authentically reports
scientific results ready for scientific presentation and
publication? What tools and techinques are available to save time on creating presentation and publication-ready figures? |
Duration: 10h 35m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Overview
This workshop is designed to be run on your local machine. First, you will need to download the data we use in the workshop. Then, you need to set up your machine to use the required software. Lastly, you will download and run a Jupyter Notebook that contains test code to check that your installation was successful.
Optional - You may also be interested in reading the journal article that we will explore during the workshop - Off the Beaten Path: Gaia Reveals GD-1 Stars outside of the Main Stream by Adrian M. Price-Whelan and Ana Bonaca.
Data
To start your installation process download this zip file.
Move the downloaded file to your Desktop. If your file does not
automatically unzip into a directory called
student_download
, you can unzip it with the following
steps:
- Mac: navigate in a finder window to your Desktop and double click
student_download.zip
- Windows: navigate in a file explorer window to your Desktop, right
click the
student_download.zip
file, and selectExtract All
- Linux: open a terminal and navigate to your Desktop. Type
unzip student_download.zip
You should now have a directory called student_download
.
In this directory you will find files that you will use during the
workshop as well as files that you will use in the remainder of the set
up process.
Software
You will need to install Python, Jupyter, and some additional libraries. Python is a popular language for scientific computing, and great for general-purpose programming as well. For this workshop we use Python version 3.x. Installing all of its scientific packages individually can be a bit difficult, so we recommend an all-in-one installer. We will use Anaconda.
Anaconda
Download and install Anaconda.
To create a new Conda environment, which includes the additional
packages we will be using in this workshop, you will need the
environment file (environment.yml
) you downloaded in the
data section.
In a Terminal or Jupyter Prompt, make sure you are in the
student_download
directory, where
environment.yml
is stored, and run:
conda env create -f environment.yml
Then, to activate the environment you just created, run:
conda activate AstronomicalData
Jupyter
We will test our environment setup using a test notebook
(test_setup.ipynb
) that you downloaded in the data
section.
In a Terminal, Jupyter Prompt or Anaconda Prompt, make sure you are
in the student_download
directory. To start Jupyter, make
sure you have activated your new conda environment, then run:
jupyter notebook
The notebook should open automatically in your browser. If it does not or you wish to use a different browser, open this link: http://localhost:8888.
Now open the notebook you downloaded, test_setup.ipynb
,
and read through the instructions there. Make sure to run the cells that
contain import
statements. If they work and you get no
error messages, you are ready for the workshop.
Why didn’t the imports work?
Occasionally learners will need to take one additional step to make Jupyter run within the environment we have created. If your imports fail, close Jupyter by closing its terminal, and try running the following from your Anaconda prompt ( Terminal or otherwise):
python -m ipykernel install --user --name=AstronomicalData
Then start Jupyter up again:
jupyter notebook
This time, when you open your notebook, navigate to the Kernel menu –> Change Kernel –> select AstronomicalData . This will ensure that the relevant packages are all available. You can seek installation help if this looks confusing!
Please contact your instructors if you experience any problems with these installation instructions. If you are working through these materials independently, let us know about any problems you encounter by filing an issue on the lesson’s GitHub repository or emailing team@carpentries.org.