Data Analysis and Visualization in Python for Ecologists
- Python is an open source and platform independent programming
- Jupyter Notebook and the Spyder IDE are great tools to code in and
interact with Python. With the large Python community it is easy to find
help on the internet.
- Python is an interpreted language which can be used interactively
(executing one command at a time) or in scripting mode (executing a
series of commands saved in file).
- One can assign a value to a variable in Python. Those variables can
be of several types, such as string, integer, floating point and complex
- Lists and tuples are similar in that they are ordered lists of
elements; they differ in that a tuple is immutable (cannot be
- Dictionaries are data structures that provide mappings between keys
- Libraries enable us to extend the functionality of Python.
- Pandas is a popular library for working with data.
- A Dataframe is a Pandas data structure that allows one to access
data by column (name or index) or row.
- Aggregating data using the
groupby() function enables
you to generate useful summaries of data quickly.
- Plots can be created from DataFrames or subsets of data that have
been generated with
- In Python, portions of data can be accessed using indices, slices,
column headings, and condition-based subsetting.
- Python uses 0-based indexing, in which the first element in a list,
tuple or any other data structure has an index of 0.
- Pandas enables common data exploration steps such as data indexing,
slicing and conditional subsetting.
- pandas uses other names for data types than Python, for example:
object for textual data.
- A column in a DataFrame can only have one data type.
- The data type in a DataFrame’s single column can be checked using
- Make conscious decisions about how to manage missing data.
- A DataFrame can be saved to a CSV file using the
concat can be used to
combine subsets of a DataFrame, or even data from different files.
join function combines DataFrames based on index or
- Joining two DataFrames can be done in multiple ways (left, right,
and inner) depending on what data must be in the final DataFrame.
to_csv can be used to write out DataFrames in CSV
- Loops help automate repetitive tasks over sets of items.
- Loops combined with functions provide a way to process data more
efficiently than we could by hand.
- Conditional statements enable execution of different operations on
- Functions enable code reuse.
aes variables and a
geometry are the main elements of a plotnine graph
- With the
+ operator, additional
elements are added
- Matplotlib is the engine behind plotnine and Pandas plots.
- The object-based nature of matplotlib plots enables their detailed
customization after they have been created.
- Export plots to a file using the
- sqlite3 provides a SQL-like interface to read, query, and write SQL
databases from Python.
- sqlite3 can be used with Pandas to read SQL data to the familiar
- Pandas and sqlite3 can also be used to transfer between the CSV and