Last updated: 2021-06-25
Checks: 7 0
Knit directory: polymeRID/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20190729)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version dd36d67. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: polymeRID.Rproj
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd
) and HTML (docs/index.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | dd36d67 | Darius Görgen | 2021-06-25 | new meta image |
html | 2a6017d | Darius Görgen | 2021-06-25 | Build site. |
Rmd | fb4c38b | Darius Görgen | 2021-06-25 | changed image path |
html | 330fb55 | Darius Görgen | 2021-06-25 | Build site. |
Rmd | 4e5b039 | Darius Görgen | 2021-06-25 | added meta and lib |
html | 4e5b039 | Darius Görgen | 2021-06-25 | added meta and lib |
html | 9a8b23c | goergen95 | 2019-09-19 | Build site. |
html | 75bc270 | goergen95 | 2019-09-05 | Build site. |
html | bc4055d | goergen95 | 2019-09-05 | Build site. |
html | 32dd5af | goergen95 | 2019-09-05 | Build site. |
Rmd | dc6b5f5 | goergen95 | 2019-09-05 | wflow_publish(“analysis/index.Rmd”) |
html | 070e93f | goergen95 | 2019-08-22 | Build site. |
Rmd | 0bdf12a | goergen95 | 2019-08-21 | updated index.html |
html | 0bdf12a | goergen95 | 2019-08-21 | updated index.html |
html | f2ee83c | goergen95 | 2019-08-19 | Build site. |
html | d960dc2 | goergen95 | 2019-08-19 | included calibration |
html | b846f0b | goergen95 | 2019-08-19 | Build site. |
Rmd | de84a71 | goergen95 | 2019-08-19 | large update for website |
html | de84a71 | goergen95 | 2019-08-19 | large update for website |
html | 2385fbc | goergen95 | 2019-08-14 | republish for layout change |
Rmd | 5d28ce0 | goergen95 | 2019-08-14 | changed citation note |
html | 5d28ce0 | goergen95 | 2019-08-14 | changed citation note |
Rmd | c52182b | goergen95 | 2019-08-13 | rebuid website |
html | c52182b | goergen95 | 2019-08-13 | rebuid website |
html | 6e92d01 | goergen95 | 2019-08-13 | Build site. |
html | 6cfd689 | goergen95 | 2019-08-13 | Build site. |
Rmd | 5774923 | goergen95 | 2019-08-13 | included preparation |
html | cbbd5b4 | goergen95 | 2019-08-13 | Build site. |
html | 471e893 | goergen95 | 2019-08-13 | Build site. |
Rmd | b07b0a6 | goergen95 | 2019-08-13 | update of index file |
html | e8c8be2 | goergen95 | 2019-08-13 | Build site. |
html | 342cd44 | goergen95 | 2019-08-13 | Build site. |
Rmd | 15bd467 | goergen95 | 2019-08-13 | update of index file |
Rmd | 3b99a1b | goergen95 | 2019-08-08 | fixed typos in index |
html | 32109b9 | goergen95 | 2019-08-07 | Build site. |
Rmd | 7a32e1f | goergen95 | 2019-08-07 | included images in index |
html | 99906c8 | goergen95 | 2019-08-07 | Build site. |
Rmd | 7b2bbac | goergen95 | 2019-08-07 | changed theme |
html | 5f2ca49 | goergen95 | 2019-08-07 | Build site. |
Rmd | caf89e2 | goergen95 | 2019-08-07 | wflow_publish(c(“analysis/index.Rmd”)) |
html | 348ad0a | goergen95 | 2019-08-05 | Build site. |
Rmd | 5b8a2e6 | goergen95 | 2019-08-05 | wflow_publish(c(“analysis/index.Rmd”)) |
Rmd | 6c813f4 | goergen95 | 2019-07-29 | implemented workflowr |
Rmd | d525cc2 | goergen95 | 2019-07-29 | Start workflowr project. |
Here I present the results of my work for a master’s seminar at the University of Marburg concerned with microplastic in the environment.
Photo of two sediment separators taken by Sarah Brüning
Microplastic particles polluting the environment has been in the public focus for some time now. The scientific efforts of analyzing the occurrences of particles in the environment and their effects on ecosystems and human health is manifold, yet there is a lack of consensus on methods for sampling, sample handling, analysis and identification, especially for samples from aquatic ecosystems. Some of the most urgent research questions concerned with microplastic are the effects on biological lifeforms (Zhang et al., 2019), their movement through and distribution in marine environments (Auta et al., 2017) as well as in freshwater ecosystems (Li et al., 2018).
Different research questions demand different methodologies for sampling, sample handling and laboratory analysis. However, the link between different research domains is that any analysis of microplastic in the environment needs a robust identification method to enable scientists to bring forward meaningful recommendations to the public and decision makers.
A broad spectrum of different polymer identification strategies evidently exists (Löder and Gerdts, 2015; Rocha-Santos and Duarte, 2015; Shim et al., 2017), ranging from traditional microscopy to spectroscopy as well as destructive methods of thermal analysis. A distinction has to be made towards the extent of automatization in the identification process. Recently, different approaches to automate the task of polymer classification, either by individual particles or for a whole collection of samples simultaneously have been reported to the scientific community (Lorenzo-Navarro et al., 2018; Masoumi et al., 2012; Primpke et al., 2019, 2017; Zhang et al., 2018).
This project sets out to contribute to the ease of the cumbersome process of classifying individual particles based on their spectral reflectance by hand. The idea is that up-to-date machine learning models applied to the high-dimensional spectral data of particles found in environmental samples can minimize the need for human intervention in the classification process and thus significantly speed up the process of classification. Other studies have reported substantial accuracies by applying different kinds of machine learning algorithms such as hierarchical clustering (Primpke et al., 2017), support-vector-machines (V. Bianco P. Memmolo, 2019), random forest (Hufnagl et al., 2019), as well as convolutional neural networks (Liu et al., 2017) to classify the spectra of microplastic and other materials found in environmental samples.
This project was grouped into different working steps, which were designed to allow to reproduce the workflows to the greatest extent possible as well as to allow alterations of the code and extensions to the database. These working steps are:
Preparation: At first the establishment of a comprehensive database of reference spectra was mandatory to allow the application of machine-learning models. We used an OpenSource database published by Primpke et al. (2018). For potential future extensions, we created a workflow of spectral resampling and baseline correction for reference polymers and other particles to ensure the consistency of the database.
Exploration: Different types of pre-processing techniques were assessed by a cross-validation approach in which different representations of data were presented to a selection of machine-learning models. Their capability to correctly classify the database was captured. Additionally, increasing levels of noise were added to the data so that the models and pre-processing techniques which most robustly classify polymer spectra could be identified.
Calibration: After the exploration stage, the best performing models were chosen to create a decision fusion model. A workflow was created to calibrate these models to a potentially changing database when needed. This step is crucial so that the work presented here can be used in the future, i.e. in the case of a extension of the reference database or a change in the spectral resolution.
Classification: At the final stage of the project, a workflow was created to classify real environmental samples in a user-friendly way to ease the classification process. Here, some accuracy values of the classification are extracted and comprehensively handed to the user, as well as some plots for a visual confirmation of the classification results. This way, it is ensured that the results are easily accessible and the possibility for a human agent to assess the quality of the classification is implemented.
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tensorflow_2.5.0 abind_1.4-5 e1071_1.7-7 plotly_4.9.4.1
[5] keras_2.4.0 workflowr_1.6.2 baseline_1.3-1 gridExtra_2.3
[9] stringr_1.4.0 prospectr_0.2.1 openxlsx_4.2.4 magrittr_2.0.1
[13] ggplot2_3.3.4 reshape2_1.4.4 dplyr_1.0.7 metathis_1.0.3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 lattice_0.20-44 tidyr_1.1.3 class_7.3-19
[5] png_0.1-7 zeallot_0.1.0 rprojroot_2.0.2 digest_0.6.27
[9] foreach_1.5.1 utf8_1.2.1 R6_2.5.0 plyr_1.8.6
[13] evaluate_0.14 httr_1.4.2 pillar_1.6.1 tfruns_1.5.0
[17] rlang_0.4.11 lazyeval_0.2.2 data.table_1.14.0 SparseM_1.81
[21] limSolve_1.5.6 whisker_0.4 Matrix_1.3-4 reticulate_1.20
[25] rmarkdown_2.9 mathjaxr_1.4-0 htmlwidgets_1.5.3 munsell_0.5.0
[29] proxy_0.4-26 compiler_4.1.0 httpuv_1.6.1 xfun_0.24
[33] pkgconfig_2.0.3 base64enc_0.1-3 htmltools_0.5.1.1 tidyselect_1.1.1
[37] tibble_3.1.2 lpSolve_5.6.15 quadprog_1.5-8 codetools_0.2-18
[41] viridisLite_0.4.0 fansi_0.5.0 crayon_1.4.1 withr_2.4.2
[45] later_1.2.0 MASS_7.3-54 grid_4.1.0 jsonlite_1.7.2
[49] gtable_0.3.0 lifecycle_1.0.0 git2r_0.28.0 scales_1.1.1
[53] zip_2.2.0 stringi_1.6.2 fs_1.5.0 promises_1.2.0.1
[57] ellipsis_0.3.2 generics_0.1.0 vctrs_0.3.8 iterators_1.0.13
[61] tools_4.1.0 glue_1.4.2 purrr_0.3.4 yaml_2.2.1
[65] colorspace_2.0-2 knitr_1.33