{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cruise data analysis in Python\n", "\n", "## WCOA 2013 data set\n", "\n", "To download the data used in this tutorial, use the following command in the Terminal (Mac) or Git Bash (Windows).\n", "\n", "```\n", "git clone https://github.com/mlmldata2022/wcoa_cruise.git\n", "```\n", "\n", "The data comes from the West Coast Ocean Acidification (WCOA) cruise in 2013. The goal of this NOAA-supported research cruise is to collect data to help understand the effects of coastal upwelling on ocean acidification, and the impacts of ocean acidification on organisms and ecosystems. [This video](https://www.youtube.com/watch?v=Eesi6e03Yx0&t=134s) gives an idea of life aboard the ship and the type of science operations conducted.\n", "\n", "In this part of the tutorial, we will go over the basics of working with dates in Pandas and Numpy, make some exploratory plots and start a regression analysis. The data exploration will be largely guided by student interest." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt\n", "from scipy import stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction to Pandas dataframes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use Pandas to import the csv data file. \n", "\n", "Here, there is an optional `parse_dates` argument. The numbers in double brackets `[[8,9]]` indicate which columns to interpret as dates." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "filename = 'data/wcoa_cruise/WCOA2013_hy1.csv'\n", "df = pd.read_csv(filename,header=31,na_values=-999,\n", " parse_dates=[[8,9]])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | DATE_TIME | \n", "EXPOCODE | \n", "SECT_ID | \n", "LEG | \n", "LINE | \n", "STNNBR | \n", "CASTNO | \n", "BTLNBR | \n", "BTLNBR_FLAG_W | \n", "LATITUDE | \n", "... | \n", "TCARBN | \n", "TCARBN_FLAG_W | \n", "ALKALI | \n", "ALKALI_FLAG_W | \n", "PH_TOT | \n", "PH_TOT_FLAG_W | \n", "PH_TMP | \n", "CO32 | \n", "CO32__FLAG_W | \n", "CHLORA | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2013-08-05 02:12:20 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "1 | \n", "2 | \n", "48.2 | \n", "... | \n", "2370.2 | \n", "2 | \n", "2369.0 | \n", "2 | \n", "7.294 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "
1 | \n", "2013-08-05 02:12:53 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "2 | \n", "2 | \n", "48.2 | \n", "... | \n", "NaN | \n", "9 | \n", "NaN | \n", "9 | \n", "7.295 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "
2 | \n", "2013-08-05 02:19:58 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "3 | \n", "2 | \n", "48.2 | \n", "... | \n", "2349.6 | \n", "2 | \n", "2343.7 | \n", "2 | \n", "7.282 | \n", "2 | \n", "25.0 | \n", "43.521 | \n", "3 | \n", "NaN | \n", "
3 | \n", "2013-08-05 02:27:01 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "4 | \n", "2 | \n", "48.2 | \n", "... | \n", "2318.7 | \n", "2 | \n", "2311.9 | \n", "2 | \n", "7.287 | \n", "2 | \n", "25.0 | \n", "45.641 | \n", "2 | \n", "NaN | \n", "
4 | \n", "2013-08-05 02:30:53 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "5 | \n", "2 | \n", "48.2 | \n", "... | \n", "2300.0 | \n", "2 | \n", "2299.7 | \n", "2 | \n", "7.308 | \n", "2 | \n", "25.0 | \n", "47.741 | \n", "2 | \n", "NaN | \n", "
5 rows × 42 columns
\n", "\n", " | DATE_TIME | \n", "EXPOCODE | \n", "SECT_ID | \n", "LEG | \n", "LINE | \n", "STNNBR | \n", "CASTNO | \n", "BTLNBR | \n", "BTLNBR_FLAG_W | \n", "LATITUDE | \n", "... | \n", "TCARBN_FLAG_W | \n", "ALKALI | \n", "ALKALI_FLAG_W | \n", "PH_TOT | \n", "PH_TOT_FLAG_W | \n", "PH_TMP | \n", "CO32 | \n", "CO32__FLAG_W | \n", "CHLORA | \n", "CTDTMP_F | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2013-08-05 02:12:20 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "1 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2369.0 | \n", "2 | \n", "7.294 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "38.63894 | \n", "
1 | \n", "2013-08-05 02:12:53 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "2 | \n", "2 | \n", "48.2 | \n", "... | \n", "9 | \n", "NaN | \n", "9 | \n", "7.295 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "38.64236 | \n", "
2 | \n", "2013-08-05 02:19:58 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "3 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2343.7 | \n", "2 | \n", "7.282 | \n", "2 | \n", "25.0 | \n", "43.521 | \n", "3 | \n", "NaN | \n", "39.85916 | \n", "
3 rows × 43 columns
\n", "\n", " | DATE_TIME | \n", "EXPOCODE | \n", "SECT_ID | \n", "LEG | \n", "LINE | \n", "STNNBR | \n", "CASTNO | \n", "BTLNBR | \n", "BTLNBR_FLAG_W | \n", "LATITUDE | \n", "... | \n", "TCARBN_FLAG_W | \n", "ALKALI | \n", "ALKALI_FLAG_W | \n", "PH_TOT | \n", "PH_TOT_FLAG_W | \n", "PH_TMP | \n", "CO32 | \n", "CO32__FLAG_W | \n", "CHLORA | \n", "CTDTMP_F | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2013-08-05 02:12:20 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "1 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2369.0 | \n", "2 | \n", "7.294 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "38.63894 | \n", "
1 | \n", "2013-08-05 02:12:53 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "2 | \n", "2 | \n", "48.2 | \n", "... | \n", "9 | \n", "NaN | \n", "9 | \n", "7.295 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "38.64236 | \n", "
2 | \n", "2013-08-05 02:19:58 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "3 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2343.7 | \n", "2 | \n", "7.282 | \n", "2 | \n", "25.0 | \n", "43.521 | \n", "3 | \n", "NaN | \n", "39.85916 | \n", "
3 rows × 43 columns
\n", "\n", " | DATE_TIME | \n", "EXPOCODE | \n", "SECT_ID | \n", "LEG | \n", "LINE | \n", "STNNBR | \n", "CASTNO | \n", "BTLNBR | \n", "BTLNBR_FLAG_W | \n", "LATITUDE | \n", "... | \n", "TCARBN_FLAG_W | \n", "ALKALI | \n", "ALKALI_FLAG_W | \n", "PH_TOT | \n", "PH_TOT_FLAG_W | \n", "PH_TMP | \n", "CO32 | \n", "CO32__FLAG_W | \n", "CHLORA | \n", "CTDTMP_F | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2013-08-05 02:12:20 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "1 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2369.0 | \n", "2 | \n", "7.294 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "38.63894 | \n", "
1 | \n", "2013-08-05 02:12:53 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "2 | \n", "2 | \n", "48.2 | \n", "... | \n", "9 | \n", "NaN | \n", "9 | \n", "7.295 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "38.64236 | \n", "
2 | \n", "2013-08-05 02:19:58 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "3 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2343.7 | \n", "2 | \n", "7.282 | \n", "2 | \n", "25.0 | \n", "43.521 | \n", "3 | \n", "NaN | \n", "39.85916 | \n", "
3 | \n", "2013-08-05 02:27:01 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "4 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2311.9 | \n", "2 | \n", "7.287 | \n", "2 | \n", "25.0 | \n", "45.641 | \n", "2 | \n", "NaN | \n", "41.05328 | \n", "
4 rows × 43 columns
\n", "\n", " | CTDTMP | \n", "CTDPRS | \n", "
---|---|---|
0 | \n", "3.6883 | \n", "999.5 | \n", "
1 | \n", "3.6902 | \n", "1000.8 | \n", "
2 | \n", "4.3662 | \n", "749.0 | \n", "
3 | \n", "5.0296 | \n", "503.9 | \n", "
\n", " | DATE_TIME | \n", "EXPOCODE | \n", "SECT_ID | \n", "LEG | \n", "LINE | \n", "STNNBR | \n", "CASTNO | \n", "BTLNBR | \n", "BTLNBR_FLAG_W | \n", "LATITUDE | \n", "... | \n", "TCARBN_FLAG_W | \n", "ALKALI | \n", "ALKALI_FLAG_W | \n", "PH_TOT | \n", "PH_TOT_FLAG_W | \n", "PH_TMP | \n", "CO32 | \n", "CO32__FLAG_W | \n", "CHLORA | \n", "CTDTMP_F | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2013-08-05 02:12:20 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "1 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2369.0 | \n", "2 | \n", "7.294 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "38.63894 | \n", "
1 | \n", "2013-08-05 02:12:53 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "2 | \n", "2 | \n", "48.2 | \n", "... | \n", "9 | \n", "NaN | \n", "9 | \n", "7.295 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "9 | \n", "NaN | \n", "38.64236 | \n", "
2 | \n", "2013-08-05 02:19:58 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "3 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2343.7 | \n", "2 | \n", "7.282 | \n", "2 | \n", "25.0 | \n", "43.521 | \n", "3 | \n", "NaN | \n", "39.85916 | \n", "
3 | \n", "2013-08-05 02:27:01 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "4 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2311.9 | \n", "2 | \n", "7.287 | \n", "2 | \n", "25.0 | \n", "45.641 | \n", "2 | \n", "NaN | \n", "41.05328 | \n", "
4 | \n", "2013-08-05 02:30:53 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "5 | \n", "2 | \n", "48.2 | \n", "... | \n", "2 | \n", "2299.7 | \n", "2 | \n", "7.308 | \n", "2 | \n", "25.0 | \n", "47.741 | \n", "2 | \n", "NaN | \n", "41.82404 | \n", "
5 rows × 43 columns
\n", "\n", " | DATE_TIME | \n", "EXPOCODE | \n", "SECT_ID | \n", "LEG | \n", "LINE | \n", "STNNBR | \n", "CASTNO | \n", "BTLNBR | \n", "BTLNBR_FLAG_W | \n", "LATITUDE | \n", "... | \n", "TCARBN_FLAG_W | \n", "ALKALI | \n", "ALKALI_FLAG_W | \n", "PH_TOT | \n", "PH_TOT_FLAG_W | \n", "PH_TMP | \n", "CO32 | \n", "CO32__FLAG_W | \n", "CHLORA | \n", "CTDTMP_F | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
770 | \n", "2013-08-25 21:23:35 | \n", "32P020130821 | \n", "WCOA2013 | \n", "2 | \n", "10 | \n", "133 | \n", "1 | \n", "1 | \n", "2 | \n", "37.67 | \n", "... | \n", "6 | \n", "2430.4 | \n", "6 | \n", "7.493 | \n", "2 | \n", "25.0 | \n", "73.346 | \n", "2 | \n", "NaN | \n", "35.12876 | \n", "
771 | \n", "2013-08-25 21:32:13 | \n", "32P020130821 | \n", "WCOA2013 | \n", "2 | \n", "10 | \n", "133 | \n", "1 | \n", "2 | \n", "2 | \n", "37.67 | \n", "... | \n", "2 | \n", "2427.6 | \n", "2 | \n", "7.467 | \n", "2 | \n", "25.0 | \n", "69.812 | \n", "2 | \n", "NaN | \n", "35.30336 | \n", "
772 | \n", "2013-08-25 21:40:44 | \n", "32P020130821 | \n", "WCOA2013 | \n", "2 | \n", "10 | \n", "133 | \n", "1 | \n", "3 | \n", "2 | \n", "37.67 | \n", "... | \n", "2 | \n", "2424.3 | \n", "2 | \n", "7.442 | \n", "2 | \n", "25.0 | \n", "65.502 | \n", "2 | \n", "NaN | \n", "35.55914 | \n", "
773 | \n", "2013-08-25 21:49:45 | \n", "32P020130821 | \n", "WCOA2013 | \n", "2 | \n", "10 | \n", "133 | \n", "1 | \n", "4 | \n", "2 | \n", "37.67 | \n", "... | \n", "2 | \n", "2413.7 | \n", "2 | \n", "7.400 | \n", "2 | \n", "25.0 | \n", "60.170 | \n", "2 | \n", "NaN | \n", "36.10526 | \n", "
774 | \n", "2013-08-25 21:57:55 | \n", "32P020130821 | \n", "WCOA2013 | \n", "2 | \n", "10 | \n", "133 | \n", "1 | \n", "5 | \n", "2 | \n", "37.67 | \n", "... | \n", "2 | \n", "2401.4 | \n", "2 | \n", "7.374 | \n", "2 | \n", "25.0 | \n", "57.739 | \n", "2 | \n", "NaN | \n", "36.92894 | \n", "
5 rows × 43 columns
\n", "\n", " | DATE_TIME | \n", "EXPOCODE | \n", "SECT_ID | \n", "LEG | \n", "LINE | \n", "STNNBR | \n", "CASTNO | \n", "BTLNBR | \n", "BTLNBR_FLAG_W | \n", "LATITUDE | \n", "... | \n", "TCARBN_FLAG_W | \n", "ALKALI | \n", "ALKALI_FLAG_W | \n", "PH_TOT | \n", "PH_TOT_FLAG_W | \n", "PH_TMP | \n", "CO32 | \n", "CO32__FLAG_W | \n", "CHLORA | \n", "CTDTMP_F | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18 | \n", "2013-08-05 03:00:52 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "19 | \n", "4 | \n", "48.20 | \n", "... | \n", "9 | \n", "NaN | \n", "9 | \n", "NaN | \n", "9 | \n", "NaN | \n", "NaN | \n", "9 | \n", "NaN | \n", "56.54660 | \n", "
19 | \n", "2013-08-05 03:01:10 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "11 | \n", "1 | \n", "20 | \n", "2 | \n", "48.20 | \n", "... | \n", "2 | \n", "2189.1 | \n", "6 | \n", "7.983 | \n", "3 | \n", "25.0 | \n", "155.043 | \n", "2 | \n", "NaN | \n", "56.54570 | \n", "
38 | \n", "2013-08-05 06:37:22 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "12 | \n", "1 | \n", "19 | \n", "2 | \n", "48.30 | \n", "... | \n", "9 | \n", "2180.0 | \n", "6 | \n", "7.980 | \n", "3 | \n", "25.0 | \n", "152.868 | \n", "2 | \n", "NaN | \n", "57.72218 | \n", "
39 | \n", "2013-08-05 06:37:42 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "12 | \n", "1 | \n", "20 | \n", "2 | \n", "48.30 | \n", "... | \n", "2 | \n", "NaN | \n", "9 | \n", "7.981 | \n", "2 | \n", "25.0 | \n", "NaN | \n", "5 | \n", "NaN | \n", "57.72290 | \n", "
58 | \n", "2013-08-05 10:41:19 | \n", "317W20130803 | \n", "WCOA2013 | \n", "1 | \n", "2 | \n", "13 | \n", "1 | \n", "19 | \n", "2 | \n", "48.37 | \n", "... | \n", "2 | \n", "2178.7 | \n", "2 | \n", "7.931 | \n", "2 | \n", "25.0 | \n", "142.629 | \n", "2 | \n", "NaN | \n", "55.69844 | \n", "
5 rows × 43 columns
\n", "