Example: comparing temperature in three regions
Contents
Example: comparing temperature in three regions¶
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.formula.api as smf
import pingouin as pg
import PyCO2SYS as pyco2
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/outdated/utils.py:14: OutdatedPackageWarning: The package pingouin is out of date. Your version is 0.5.0, the latest is 0.5.1.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
return warn(
Load the data. Note: you may need to change the path to the data already downloaded last week.
filename07 = 'data/wcoa_cruise_2007/32WC20070511.exc.csv'
df07 = pd.read_csv(filename07,header=29,na_values=-999,parse_dates=[[6,7]])
Use the PyCO2SYS package to calculate seawater carbon chemistry parameters.
https://pyco2sys.readthedocs.io/en/latest/
c07 = pyco2.sys(df07['ALKALI'], df07['TCARBN'], 1, 2,
salinity=df07['CTDSAL'], temperature=df07['CTDTMP'],
pressure=df07['CTDPRS'])
df07['OmegaA'] = c07['saturation_aragonite']
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/autograd/tracer.py:48: RuntimeWarning: invalid value encountered in sqrt
return f_raw(*args, **kwargs)
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:99: RuntimeWarning: invalid value encountered in sqrt
lnKF = 1590.2 / TempK - 12.641 + 1.525 * IonS ** 0.5
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:577: RuntimeWarning: overflow encountered in power
K1 = 10.0 ** -(pK1)
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:583: RuntimeWarning: overflow encountered in power
K2 = 10.0 ** -(pK2)
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:603: RuntimeWarning: overflow encountered in power
K1 = 10.0 ** -pK1
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:611: RuntimeWarning: overflow encountered in power
K2 = 10.0 ** -pK2
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:636: RuntimeWarning: overflow encountered in power
K1 = 10.0 ** -pK1
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:641: RuntimeWarning: overflow encountered in power
K2 = 10.0 ** -pK2
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:653: RuntimeWarning: overflow encountered in power
K1 = 10.0 ** -pK1
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:658: RuntimeWarning: overflow encountered in power
K2 = 10.0 ** -pK2
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/equilibria/p1atm.py:715: RuntimeWarning: overflow encountered in power
K2 = 10.0 ** -pK2
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/solubility.py:41: RuntimeWarning: overflow encountered in power
KAr = 10.0 ** logKAr # this is in (mol/kg-SW)^2
/Users/tomconnolly/programs/miniconda3/envs/data-book/lib/python3.8/site-packages/PyCO2SYS/solubility.py:25: RuntimeWarning: overflow encountered in power
KCa = 10.0 ** logKCa # this is in (mol/kg-SW)^2 at zero pressure
Create a subset of good data in the upper 10m (near surface).
iisurf07 = ((df07['CTDPRS'] <= 10) &
(df07['NITRAT_FLAG_W'] == 2) & (df07['PHSPHT_FLAG_W'] == 2)
& (df07['CTDOXY_FLAG_W'] == 2) & (df07['CTDSAL_FLAG_W'] == 2)
& (df07['ALKALI_FLAG_W'] == 2) & (df07['TCARBN_FLAG_W'] == 2))
df07surf = df07[iisurf07]
Comparing three sample means¶
In this example, we are interested in comparing the observed surface temperature in three regions of the California Current System.
Create a categorical variable for the region¶
We will divide the data into three categories: north if the Columbia River, between the Columbia River and Golden Gate, and south of the Golden Gate.
# create a new variable called "region" with no values
df07surf = df07surf.assign(region = [None]*len(df07surf))
df07surf['region']
22 None
46 None
71 None
95 None
151 None
...
2275 None
2287 None
2310 None
2334 None
2335 None
Name: region, Length: 138, dtype: object
# assign string values to region based on latitude
northern = (df07surf['LATITUDE'] > 46.2)
central = (df07surf['LATITUDE'] <= 46.2) & (df07surf['LATITUDE'] >= 37.8)
southern = (df07surf['LATITUDE'] < 37.8)
df07surf.loc[northern,'region'] = 'north'
df07surf.loc[central,'region'] = 'central'
df07surf.loc[southern,'region'] = 'south'
df07surf['region']
22 north
46 north
71 north
95 north
151 north
...
2275 south
2287 south
2310 south
2334 south
2335 south
Name: region, Length: 138, dtype: object
Box plot¶
plt.figure()
plt.boxplot([df07surf['CTDTMP'][df07surf['region']=='north'],
df07surf['CTDTMP'][df07surf['region']=='central'],
df07surf['CTDTMP'][df07surf['region']=='south']],
labels=['north','central','south'],showmeans=True,notch=True);
plt.title('CTDTMP - upper 10m 2007')
plt.ylabel('[deg C]')
plt.xlabel('region')
Text(0.5, 0, 'region')
Analysis¶
What test (or tests) would you use to determine whether there is a statistically significant difference in surface temperature between these regions?
What possible alternative tests did you rule out?
What is the null hypothesis being tested?
What are the underlying assumptions?
Try to implement the analysis with pingouin. For reference, see the pingouin documentation.