-
Notifications
You must be signed in to change notification settings - Fork 0
/
03_range_maps.qmd
406 lines (313 loc) · 11.3 KB
/
03_range_maps.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
---
title: Range Maps
cache: true
format:
html: default
typst:
toc: true
margin:
x: 1.5cm
y: 1.5cm
format-links: false
---
::::{.content-hidden when-format="typst"}
In this step we decide **Which receivers do we keep?**
- We keep receivers which are within ranges of the species in question
- We'll keep runs which have acceptable receivers for a given species (future steps)
So first we need to get a list of receivers that are within different species
ranges. We'll use this list to filter the Motus runs in later steps.
We'll acquire the range maps from eBird Status and Trends
:::{.callout-note}
eBird data requires attributions, citations, and disclaimers
- See (2) <https://science.ebird.org/en/status-and-trends/products-access-terms-of-use>
We also have limitations on how many species we can use for non-peer-reviewed publications (50),
but can use any number for scientific publications or grant requests.
Bear this in mind if sharing these range maps anywhere.
In general, we'll keep it below 50 for now, and ask for permission as required.
:::
## Setup
Load packages, connect to databases, and create species lists.
See [Setup](XX_setup.html) for details.
```{r}
#| message: false
#| cache: false
source("XX_setup.R")
```
### Get a map of the Americas
First we'll get a map of the Americas so we can filter out receivers outside this
area and as a base map to give context to the plots we'll create later on.
```{r}
americas <- ne_countries(continent = c("North America", "South America"),
returnclass = "sf") |>
pull(name) |>
ne_states(returnclass = "sf") |>
# Omit Hawaii
filter(name != "Hawaii") |>
st_make_valid() |>
group_by(admin) |>
summarize() |>
st_transform(crs = 3347)
```
### Get the species list
Next we'll create a combined species list base on the existing species in the data
bases (see [Setup](XX_setup.qmd)) and the eBird species list (`ebirdst_runs`).
:::{.panel-tabset}
### Code
```{r}
sp_ebird <- select(ebirdst_runs, "species_code", "scientific_name") |>
filter(species_code != "yebsap-example") |> # Remove eBird example species
right_join(sp, by = c("scientific_name" = "scientific")) |>
arrange(english)
# Get a list of codes
sp_codes <- sp_ebird$species_code
```
### Species list
```{r}
gt(sp_ebird, rownames_to_stub = TRUE) |>
gt_theme() |>
tab_options(container.height = px(600),
container.overflow.y = "auto")
```
:::
## Get Ranges
Now we'll download the species range maps from eBird at a resolution of 27 km (high resolution, raw data).
> Note: If the data is already downloaded, it will be skipped.
```{r}
#| message: false
walk(sp_codes, \(x) {
if(!any(str_detect(list.files(ebirdst_data_dir(), recursive = TRUE), x))) {
ebirdst_download_status(x, download_ranges = TRUE,
pattern = "range\\_raw\\_27km")
}
})
```
Combine the ranges into a single spatial dataset, and create another set of
ranges with a **100km** buffer.
The idea here is that we may want to include receivers that are near the edges of
a species range, as observations at those stations may be legitimate.
:::{.callout-tip}
### Question
Is a 100km buffer sufficient? Good enough, too much?
:::
```{r}
ranges <- map(sp_codes,
\(x) load_ranges(x, resolution = "27km", smoothed = FALSE)) |>
bind_rows() |>
group_by(species_code, scientific_name, common_name, prediction_year) |>
summarize(.groups = "drop") |>
st_make_valid() |>
st_transform(crs = 3347)
# Use a 100km buffer around the range
buffer <- set_units(100, "km")
ranges_buffer <- st_buffer(ranges, dist = buffer)
# Use a generous 1250km buffer around the Americas
americas_buffer <- st_union(americas) |>
st_buffer(dist = set_units(1250, "km"))
# Get the total region covered by at least one species ranges
any_range <- summarize(ranges_buffer)
```
## Get Receivers
Now we'll collect a list of receivers that exist in our data.
We'll start by doing some preliminary filtering:
- omit receivers which don't overlap with any species' range
- omit receivers labelled "test", etc.
### Filtering
Get list of receivers in the databases. We only need to do to one databases, as the
`metadata()` data step in [Download Data](02_download.html) data adds full
receiver lists to all databases (i.e. they are the same in each one).
```{r}
recvs_full <- tbl(dbs[[1]], "recvDeps") |>
select("recvDeployID" = "deployID", "recvDeviceID" = "deviceID",
"name", "recvDeployLon" = "longitude", "recvDeployLat" = "latitude",
"isMobile") |>
distinct() |>
collect()
```
There are receivers which we can't / don't want to use.
These are receivers missing coordinates, or those which are test setups.
Let's keep track of those which are problematic.
- Those labelled "test"
- Mobile stations
- Those with missing lat/lon
- Those with no overlap with any species range
- Those outside of the Americas
We'll first assess the spatial-problems (i.e. where the receiver is), then the
regular problems (what the receiver is or it's metadata).
```{r}
recvs_spatial_problems <- recvs_full |>
drop_na(recvDeployLat, recvDeployLon) |>
st_as_sf(coords = c("recvDeployLon", "recvDeployLat"), crs = 4326) |>
st_transform(crs = 3347) |>
mutate(
# No overlap at least one species
no_species = !st_intersects(geometry, any_range, sparse = FALSE),
# Not in the Americas (i.e. European receivers)
no_americas = !st_intersects(geometry, americas_buffer, sparse = FALSE)) |>
st_drop_geometry() |>
select("recvDeployID", "no_species", "no_americas")
recvs_problems <- recvs_full |>
left_join(recvs_spatial_problems, by = "recvDeployID") |>
mutate(problem = case_when(
# Missing lat/lon
is.na(recvDeployLon) | is.na(recvDeployLat) ~ "missing coords",
# Mobile receivers
isMobile == 1 ~ "mobile station",
# Receivers labelled "test"
str_detect(tolower(name), "teststation|test_sg") ~ "test station",
no_americas ~ "out of americas",
no_species ~ "out of all species ranges",
TRUE ~ "no problem")
) |>
filter(problem != "no problem") |>
select("recvDeployID", "recvDeviceID", "name", "recvDeployLon", "recvDeployLat", "problem")
```
Clean up master receivers list
- omit receivers we don't want
- convert to spatial
```{r}
recvs <- recvs_full |>
select(-"isMobile") |>
# Omit problems
anti_join(recvs_problems, by = "recvDeployID") |>
# Transform to a spatial data set
drop_na(recvDeployLat, recvDeployLon) |>
st_as_sf(coords = c("recvDeployLon", "recvDeployLat"), crs = 4326) |>
st_transform(crs = 3347)
```
### Check Removed Receivers
:::{.panel-tabset}
#### Table {style="height:500px; overflow-y:auto"}
```{r}
recvs_problems |>
arrange(recvDeployID) |>
gt() |>
gt_theme(container.height = 500)
```
#### Figures
```{r}
americas_buffer2 <- st_transform(americas_buffer, crs = "+proj=laea +lon_0=-75.23 +lat_0=15.23 +datum=WGS84 +units=m +no_defs")
r <- recvs |>
group_by(recvDeployID) |>
slice(1)
p <- recvs_problems |>
drop_na(recvDeployLat, recvDeployLon) |>
st_as_sf(coords = c("recvDeployLon", "recvDeployLat"), crs = 4326) |>
arrange(problem)
g <- ggplot() +
theme(legend.position = "top") +
geom_sf(data = americas_buffer2, fill = "blue", colour = NA, alpha = 0.2) +
geom_sf(data = americas, fill = "white") +
geom_sf(data = r, colour = alpha("black", 0.2), size = 0.5) +
geom_sf(data = p, aes(fill = problem), shape = 21, size = 2) +
scale_fill_viridis_d(direction = -1, option = "plasma") +
labs(caption = "Pale blue indicates a 1250km buffer around the Americas within which to include receivers\nSmall black points are included receivers, colourful points are those to be omitted")
g
g + coord_sf(xlim = c(-5000000, 3000000), ylim = c(0, 7000000))
```
:::
## Range/Receiver Overlap
Now we'll calculate overlaps with individual species ranges.
Here we use the *buffered* range; that is, the species range with an added
`r buffer` buffer around it. This way we include stations which are on the edge
of a species range and which might very well include valid species observations
even if they're not technically inside the range calculated by eBird.
:::{.panel-tabset}
### Code
```{r}
sp_range_intersections <- recvs |>
st_intersects(ranges_buffer, sparse = FALSE) |>
as_tibble(.name_repair = ~ranges_buffer$species_code)
recvs <- bind_cols(recvs, sp_range_intersections) |>
pivot_longer(cols = -c("recvDeployID", "recvDeviceID", "name", "geometry"),
names_to = "species_code", values_to = "in_range") |>
left_join(select(sp_ebird, "species_code", "id", "english"), by = "species_code")
```
### Receivers List Preview
```{r}
recvs |>
st_drop_geometry() |>
slice(1:50) |>
gt() |>
gt_theme(container.height = 500)
```
:::
::::
## Plots
::::{.content-hidden when-format="typst"}
### Setup
Next we'll plot each species range and the overlapping receivers for quality
control.
First we'll make sure our map of the Americas only includes countries for which
we have at least one station.
```{r}
americas <- st_filter(americas, recvs)
```
Next, we'll prepare a data frame with both ranges and buffered ranges, so we
can show on the map the true range but also show that we include stations which
are within `r buffer` of the true range.
```{r}
ranges <- bind_rows(
bind_cols(ranges, type = "Range"),
bind_cols(ranges_buffer, type = "Buffered Range"))
```
::::
### Create range plots
Now we're ready to plot!
:::{.callout-important}
### No plots online
There are temporarily no plots in this version of the page online as we have
not yet received permission from eBird to share these in a semi-open manner.
:::
```{r}
#| fig-height: 10
#| fig-width: 12
#| out-width: 100%
#| results: asis
#| code-fold: true
for(i in unique(ranges$species_code)) {
r <- filter(ranges, species_code == i)
# Print title to add to TOC
cat("\n\n### ", r$common_name[1], "\n\n")
g <- ggplot() +
theme_bw() +
theme(legend.position = "bottom") +
geom_sf(data = americas) +
geom_sf(data = r, aes(fill = type, alpha = type)) +
geom_sf(data = filter(recvs, species_code == r$species_code[1]),
aes(colour = in_range), size = 0.5) +
scale_colour_viridis_d(end = 0.9) +
scale_fill_manual(values = c("blue", "blue")) +
scale_alpha_manual(values = c(0.25, 0.4)) +
labs(colour = "Reciever within\nbuffered species range",
title = "Range Map & Motus Receivers",
subtitle = r$common_name[1],
fill = "", alpha = "") +
guides(alpha = guide_legend(override.aes = list(alpha = c(0.25, 0.65))),
colour = guide_legend(override.aes = list(size = 3)))
print(g)
}
```
::::{.content-hidden when-format="typst"}
## Save Outputs
First we'll add in all the receivers we originally omitted.
This way we have the reason each receiver is not `in_range` when we go to
examine the filters in the next step.
```{r}
p <- select(recvs_problems, "recvDeployID", "recvDeviceID", "name", "problem") |>
expand_grid(select(sp_ebird, "species_code", "id", "english")) |>
mutate(in_range = FALSE)
recvs |>
st_drop_geometry() |>
full_join(p, by = c("recvDeployID", "recvDeviceID", "name", "species_code",
"id", "english", "in_range")) |>
mutate(problem = replace_na(problem, "none")) |>
write_csv("Data/02_Datasets/receivers.csv")
```
## Wrap up
Disconnect from the databases
```{r}
walk(dbs, dbDisconnect)
```
::::
## Reproducibility
{{< include _reproducibility.qmd >}}