Skip to content

Commit

Permalink
Merge pull request #161 from holukas/rolling-zscore
Browse files Browse the repository at this point in the history
v0.78.0
  • Loading branch information
holukas committed Aug 18, 2024
2 parents 60e6623 + 4bb4122 commit db999b1
Show file tree
Hide file tree
Showing 33 changed files with 12,964 additions and 5,025 deletions.
68 changes: 68 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,74 @@

![DIIVE](images/logo_diive1_256px.png)

## v0.78.0 | 18 Aug 2024

### New features

- Added new class for outlier removal, based on the rolling z-score. It can also be used in step-wise outlier detection
and during meteoscreening from the
database. (`diive.pkgs.outlierdetection.zscore.zScoreRolling`, `diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection`, `diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb`).
- Added Hampel filter for outlier removal (`diive.pkgs.outlierdetection.hampel.Hampel`)
- Added Hampel filter (separate daytime, nighttime) for outlier
removal (`diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime`)
- Added function to plot daytime and nighttime outliers during outlier
tests (`diive.core.plotting.outlier_dtnt.outlier_daytime_nighttime`)

### Changes

- Flux processing chain:
- Several changes to the flux processing chain to make sure it can also work with data files not directly output by
EddyPro. The class `FluxProcessingChain` can now handle files that have a different format than the two EddyPro
output files `EDDYPRO-FLUXNET-CSV-30MIN` and `EDDYPRO-FULL-OUTPUT-CSV-30MIN`. See following notes.
- Removed option to process EddyPro `_full_output_` files, since it as an older format and its variables do not
follow FLUXNET conventions.
- Removed keyword `filetype` in class `FluxProcessingChain`. It is now assumed that the variable names follow the
FLUXNET convention. Variables used in FLUXNET are
listed [here](https://fluxnet.org/data/fluxnet2015-dataset/fullset-data-product/) (`diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain`)
- When detecting the base variable from which a flux variable was calculated, the variables defined for
filetype `EDDYPRO-FLUXNET-CSV-30MIN` are now assumed by default. (`diive.pkgs.flux.common.detect_basevar`)
- Renamed function that detects the base variable that was used to calculate the respective
flux (`diive.pkgs.flux.common.detect_fluxbasevar`)
- Renamed `gas` in functions related to completeness tests to `fluxbasevar` to better reflect that the completeness
test does not necessarily require a gas (e.g. `T_SONIC` is used to calculate the completeness for sensible heat
flux) (`flag_fluxbasevar_completeness_eddypro_test`)
- Removing the radiation offset now uses `0.001` (W m-2) instead of `50` as the threshold value to flag nighttime values
for the correction (`diive.pkgs.corrections.offsetcorrection.remove_radiation_zero_offset`)
- The database tag for meteo data screened with `diive` is
now `meteoscreening_diive` (`diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.resample`)
- During noise generation, function now uses the absolute values of the min/max of a series to calculate minimum noise
and maximum noise (`diive.pkgs.createvar.noise.add_impulse_noise`)

### Notebooks

- Added new notebook for outlier detection using class `zScore` (`notebooks/OutlierDetection/zScore.ipynb`)
- Added new notebook for outlier detection using
class `zScoreDaytimeNighttime` (`notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb`)
- Added new notebook for outlier removal using trimming (`notebooks/OutlierDetection/TrimLow.ipynb`)
- Updated notebook (`notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase_v7.0.ipynb`)
- When uploading screened meteo data to the database using the notebook `StepwiseMeteoScreeningFromDatabase`, variables
with the same name, measurement and data version as the screened variable(s) are now deleted from the database before
the new data are uploaded. Implemented in the Python package `dbc-influxdb` to avoid duplicates in the database. Such
duplicates can occur when one of the tags of an otherwise identical variable changed, e.g., when one of the tags of
the originally uploaded data was wrong and needed correction. The database `InfluxDB` stores a new time series
alongside the previous time series when one of the tags is different in an otherwise identical time series.

### Tests

- Added test case for `Hampel` filter (`tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter`)
- Added test case for `HampelDaytimeNighttime`
filter (`tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter_daytime_nighttime`)
- Added test case for `zScore` (`tests.test_outlierdetection.TestOutlierDetection.test_zscore`)
- Added test case for `TrimLow` (`tests.test_outlierdetection.TestOutlierDetection.test_trim_low_nt`)
- Added test case
for `zScoreDaytimeNighttime` (`tests.test_outlierdetection.TestOutlierDetection.test_zscore_daytime_nighttime`)
- 33/33 unittests ran successfully

### Environment

- Added package [sktime](https://www.sktime.net/en/stable/index.html), a unified framework for machine learning with
time series.

## v0.77.0 | 11 Jun 2024

### Additions
Expand Down
107 changes: 39 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,16 @@ Recent releases: [Releases](https://github.com/holukas/diive/releases)

## Overview of example notebooks

- For many examples see notebooks
here: [Notebook overview](https://github.com/holukas/diive/blob/main/notebooks/OVERVIEW.ipynb)
- For many examples see notebooks here: [Notebook overview](https://github.com/holukas/diive/blob/main/notebooks/OVERVIEW.ipynb)
- More notebooks are added constantly.

## Current Features

### Analyses

- Calculate z-aggregates in quantiles (classes) of x and
y ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Analyses/CalculateZaggregatesInQuantileClassesOfXY.ipynb))
- Daily
correlation ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Analyses/DailyCorrelation.ipynb))
- Decoupling: Sorting bins
method ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Analyses/DecouplingSortingBins.ipynb))
- Calculate z-aggregates in quantiles (classes) of x and y ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Analyses/CalculateZaggregatesInQuantileClassesOfXY.ipynb))
- Daily correlation ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Analyses/DailyCorrelation.ipynb))
- Decoupling: Sorting bins method ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Analyses/DecouplingSortingBins.ipynb))
- Find data gaps ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Analyses/GapFinder.ipynb))
- Histogram ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Analyses/Histogram.ipynb))
- Optimum range
Expand All @@ -39,18 +35,14 @@ Recent releases: [Releases](https://github.com/holukas/diive/releases)

- Offset correction
- Set to threshold
- Wind direction offset detection and
correction ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Corrections/WindDirectionOffset.ipynb))
- Wind direction offset detection and correction ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Corrections/WindDirectionOffset.ipynb))

### Create variable

- Calculate time since last occurrence, e.g. since last
precipitation ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/CalculateVariable/TimeSince.ipynb))
- Calculate daytime flag, nighttime flag and potential radiation from latitude and
longitude ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/CalculateVariable/Daytime_and_nighttime_flag.ipynb))
- Calculate time since last occurrence, e.g. since last precipitation ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/CalculateVariable/TimeSince.ipynb))
- Calculate daytime flag, nighttime flag and potential radiation from latitude and longitude ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/CalculateVariable/Daytime_and_nighttime_flag.ipynb))
- Day/night flag from sun angle
- VPD from air temperature and
RH ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/CalculateVariable/Calculate_VPD_from_TA_and_RH.ipynb))
- VPD from air temperature and RH ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/CalculateVariable/Calculate_VPD_from_TA_and_RH.ipynb))

### Eddy covariance high-resolution

Expand All @@ -62,12 +54,9 @@ Recent releases: [Releases](https://github.com/holukas/diive/releases)

- Detect expected and unexpected (irregular) files in a list of files
- Split multiple files into smaller parts and export them as (compressed) CSV files
- Read single data file with
parameters ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_DataFileReader.ipynb))
- Read single data file with pre-defined
filetype ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_ReadFileType.ipynb))
- Read multiple data files with pre-defined
filetype ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/ReadFiles/Read_multiple_EddyPro_fluxnet_output_files_with_MultiDataFileReader.ipynb))
- Read single data file with parameters ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_DataFileReader.ipynb))
- Read single data file with pre-defined filetype ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_ReadFileType.ipynb))
- Read multiple data files with pre-defined filetype ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/ReadFiles/Read_multiple_EddyPro_fluxnet_output_files_with_MultiDataFileReader.ipynb))

### Fits

Expand All @@ -81,39 +70,30 @@ Recent releases: [Releases](https://github.com/holukas/diive/releases)

### Flux processing chain

For info about the Swiss FluxNet flux levels,
see [here](https://www.swissfluxnet.ethz.ch/index.php/data/ecosystem-fluxes/flux-processing-chain/).
For info about the Swiss FluxNet flux levels, see [here](https://www.swissfluxnet.ethz.ch/index.php/data/ecosystem-fluxes/flux-processing-chain/).

- Flux processing
chain ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/FluxProcessingChain/FluxProcessingChain.ipynb))
- Flux processing chain ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/FluxProcessingChain/FluxProcessingChain.ipynb))
- The notebook example shows the application of:
- Level-2 quality flags
- Level-3.1 storage correction
- Level-3.2 outlier removal
- Quick flux processing
chain ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/FluxProcessingChain/QuickFluxProcessingChain.ipynb))
- Quick flux processing chain ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/FluxProcessingChain/QuickFluxProcessingChain.ipynb))

### Formats

Format data to specific formats

- Convert EddyPro fluxnet output files for upload to FLUXNET
database ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Formats/FormatEddyProFluxnetFileForUpload.ipynb))
- Load and save parquet
files ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Formats/LoadSaveParquetFile.ipynb))
- Convert EddyPro fluxnet output files for upload to FLUXNET database ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Formats/FormatEddyProFluxnetFileForUpload.ipynb))
- Load and save parquet files ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Formats/LoadSaveParquetFile.ipynb))

### Gap-filling

Fill gaps in time series with various methods

-
XGBoostTS ([notebook example (minimal)](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb), [notebook example (more extensive)](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb))
-
RandomForestTS ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/RandomForestGapFilling.ipynb))
- Linear
interpolation ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/LinearInterpolation.ipynb))
- Quick random forest
gap-filling ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/QuickRandomForestGapFilling.ipynb))
- XGBoostTS ([notebook example (minimal)](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb), [notebook example (more extensive)](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb))
- RandomForestTS ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/RandomForestGapFilling.ipynb))
- Linear interpolation ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/LinearInterpolation.ipynb))
- Quick random forest gap-filling ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/QuickRandomForestGapFilling.ipynb))

### Outlier Detection

Expand All @@ -125,57 +105,48 @@ RandomForestTS ([notebook example](https://github.com/holukas/diive/blob/main/no

Single outlier tests create a flag where `0=OK` and `2=outlier`.

- Absolute
limits ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/AbsoluteLimits.ipynb))
- Absolute limits, separately defined for daytime and nighttime
data ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb))
- Incremental z-score: Identify outliers based on the z-score of double
increments ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/zScoreIncremental.ipynb))
- Local standard deviation: Identify outliers based on the local standard deviation from a running
median ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/LocalSD.ipynb))
- Absolute limits ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/AbsoluteLimits.ipynb))
- Absolute limits, separately defined for daytime and nighttime data ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb))
- Hampel filter: based on Median Absolute Deviation (MAD) in a moving window ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/Hampel.ipynb))
- Hampel filter, separately for daytime and nighttime data ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/HampelDaytimeNighttime.ipynb))
- Incremental z-score: Identify outliers based on the z-score of double increments ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/zScoreIncremental.ipynb))
- Local standard deviation: Identify outliers based on the local standard deviation from a running median ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/LocalSD.ipynb))
- Local outlier factor: Identify outliers based on local outlier factor, across all data
- Local outlier factor: Identify outliers based on local outlier factor, daytime nighttime separately
- Manual removal: Remove time periods (from-to) or single records from time series
- Missing values: Simply creates a flag that indicated available and missing data in a time series
- z-score: Identify outliers based on the z-score across all time series data
- z-score: Identify outliers based on the z-score, separately for daytime and nighttime
- Trimming: Remove values below threshold and remove an equal amount of records from high end of data ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/TrimLow.ipynb))
- z-score: Identify outliers based on the z-score across all time series data ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/zScore.ipynb))
- z-score: Identify outliers based on the z-score, separately for daytime and nighttime ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb))
- z-score: Identify outliers based on the rolling z-score
- z-score: Identify outliers based on max z-scores in the interquartile range data

### Plotting

- Diel cycle per
month ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/DielCycle.ipynb))
- Heatmap showing values (z) of time series as date (y) vs time (
x) ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/HeatmapDateTime.ipynb))
- Heatmap showing values (z) of time series as year (y) vs month (
x) ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/HeatmapYearMonth.ipynb))
- Long-term anomalies per
year ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/LongTermAnomalies.ipynb))
- Simple (interactive) time series
plot ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/TimeSeries.ipynb))
- Diel cycle per month ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/DielCycle.ipynb))
- Heatmap showing values (z) of time series as date (y) vs time (x) ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/HeatmapDateTime.ipynb))
- Heatmap showing values (z) of time series as year (y) vs month (x) ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/HeatmapYearMonth.ipynb))
- Long-term anomalies per year ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/LongTermAnomalies.ipynb))
- Simple (interactive) time series plot ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/TimeSeries.ipynb))
- ScatterXY plot ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/ScatterXY.ipynb))
- Various classes to generate heatmaps, bar plots, time series plots and scatter plots, among others

### Quality control

- Stepwise MeteoScreening from
database ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb))
- Stepwise MeteoScreening from database ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb))

### Resampling

- Calculate diel cycle per
month ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Resampling/ResamplingDielCycle.ipynb))
- Calculate diel cycle per month ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Resampling/ResamplingDielCycle.ipynb))

### Stats

- Time series
stats ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Stats/TimeSeriesStats.ipynb))
- Time series stats ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Stats/TimeSeriesStats.ipynb))

### Timestamps

- Create continuous timestamp based on number of records in the file and the file duration
- Detect time resolution from
data ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/TimeStamps/Detect_time_resolution.ipynb))
- Detect time resolution from data ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/TimeStamps/Detect_time_resolution.ipynb))
- Insert additional timestamps in various formats

## Installation
Expand Down
2 changes: 2 additions & 0 deletions diive/core/base/flagbase.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,3 +179,5 @@ def defaultplot(self, n_iterations: int = 1):
f"n_outliers = {n_outliers}")
fig.suptitle(plottitle, fontsize=theme.FIGHEADER_FONTSIZE)
fig.show()


Loading

0 comments on commit db999b1

Please sign in to comment.