-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
201 lines (141 loc) · 8.69 KB
/
README.Rmd
File metadata and controls
201 lines (141 loc) · 8.69 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# austraits.portal
<!-- badges: start -->
<!-- badges: end -->
The goal of austraits.portal is to create a code-free interface for users to access the AusTraits Plant Trait database. The portal is visible at
<https://unsw.shinyapps.io/austraits-portal/>
## About AusTraits
The AusTraits database is a comprehensive collection of plant trait data for Australian flora. It includes measurements of various traits such as leaf area, plant height, seed mass, and more, collected from a wide range of sources including published literature, field studies, and herbarium records. The database is designed to support research in ecology, evolution, and conservation by providing standardized trait data for Australia's 30,000 plant species.
The AusTraits Data Portal offers an additional interface to explore and download data from the AusTraits database, complementing our primary access point, versioned releases on [Zenodo](https://zenodo.org/records/15718081), most easily accessed via the [`austraits`](https://github.com/traitecoevo/austraits) R package. As detailed on Zenodo, AusTraits has been released under an open source licence (CC-BY 4.0), enabling re-use by the community.
The database exists because of data submitted by more than 300 contributors from across Australia and the world. Without their efforts to collect, curate and contribute their data, AusTraits could not exist and we express our gratitude to all researchers and institutions who are part of the AusTraits Project. The project is jointly led by Dr Daniel Falster (UNSW Sydney), Dr Elizabeth Wenk (UNSW Sydney), Dr Rachael Gallagher (Western Sydney University), and Dr Hervé Sauquet (Royal Botanic Gardens and Domain Trust Sydney)
AusTraits has been supported by investment from the Australian Research Data Commons (ARDC), via their "Transformative data collections" (https://doi.org/10.47486/TD044), "Data Partnerships" (https://doi.org/10.47486/DP720, https://doi.org/10.47486/DP720A), and "Planet Research Data Commons" programs; and grants from the Australian Research Council (FT160100113, DE170100208, FT100100910) and Macquarie University, The ARDC is enabled by National Collaborative Research Investment Strategy (NCRIS).
Learn more about the AusTraits project on our website: <https://austraits.org/>.
## To open the data portal locally
```{r example, eval=FALSE}
pkgload::load_all()
shiny::shinyApp(ui = app_ui, server = app_server)
```
However, note that app relies on creation of data files.
By default, only a small dataset is included in the repo. To use the full dataset, you will need to download the latest version of the AusTraits database and prepare it for use in the portal (see below).
## Data Preparation
### Lite version
The portal is designed to work with two versions of the AusTraits database: a "lite" version containing a subset of core traits for demonstration purposes, and a "full" version containing all available traits and observations.
The lite version is included in the repository at `inst/extdata/austraits/austraits-5.0.0-lite`. To prepare this data for use in the portal, run the following code:
```{r, eval = FALSE}
austraits:::austraits_5.0.0_lite |>
prepare_data_for_portal("inst/extdata/austraits/austraits-5.0.0-lite", overwrite = TRUE)
```
To prepare the full version of the data,run the following code to prepare it for use in the portal
```{r, eval = FALSE}
austraits_7.0.0 <-
austraits::load_austraits(version = "7.0.0", path = "inst/extdata/austraits", update = FALSE)
# A small fix for Austraits v7.0.0 (to be deleted in future versions) - some datasets have missing source_primary_key values, which causes problems for the portal. This code fills in missing values for the Bryant_2021 dataset, which is the only one affected.
austraits_7.0.0$methods <- austraits_7.0.0$methods |>
mutate(
source_primary_key = ifelse(grepl("Bryant_2021", dataset_id), dataset_id, source_primary_key)
)
austraits_7.0.0 |>
prepare_data_for_portal("inst/extdata/austraits/austraits-7.0.0-full", overwrite = TRUE)
```
## Deploying to shinyapps.io
App is deployed at https://unsw.shinyapps.io/austraits-portal/ with configuration
details stored at `rsconnect/shinyapps.io/unsw/austraits.portal.dcf`.
To update deployment, open in RStudio and run:
``` r
rsconnect::deployApp()
```
Dependencies are managed via the file `manifest.json`. To update dependencies, run:
```{r, eval = FALSE}
rsconnect::writeManifest()
```
Note that successful installation requires that the `austraits.portal` package itself first be installed from GitHub:
```{r, eval = FALSE}
remotes::install_github("traitecoevo/austraits.portal@develop")
```
## App Design Overview
### Architecture
The AusTraits Data Portal is built as a modular Shiny application inspired by the {golem} framework. The application provides an interactive interface to explore and download trait data from the AusTraits database.
#### Core Components
**Data Layer**
- **Storage**: Data stored as Parquet files for efficient querying
- **Query Engine**: DuckDB in-memory database for fast filtering and aggregation
- **Two Datasets**:
- Raw observations (individual measurements)
- Species averages (aggregated means per species)
- **Arrow Integration**: Arrow datasets registered with DuckDB for zero-copy data access
- **Precomputed Metadata**: Cached trait definitions, dropdown values, and flora links for fast access (via `prepare_data_for_portal`).
**UI Structure** (`app_ui.R`)
- **Sidebar**: Filtering controls via `mod_filters_ui`
- **Main Panel**: Tabbed interface with 5 views:
- Data Preview (`mod_data_table`)
- App Information (`mod_app_info`)
- Taxon View (`mod_taxon_view`)
- Trait View (`mod_trait_view`)
- Citations (`mod_citations`)
**Server Logic** (`app_server.R`)
- Reactive data flow coordinating filters, queries, and display
- Debounced filtering to reduce computational overhead
- Lazy data loading (100 rows initially, load more on demand)
- Cached computations using `memoise` for performance
### Key Modules
| Module | Purpose | Key Features |
|--------|---------|--------------|
| **mod_filters** | User filter controls | Taxonomy, traits, location, custom filters |
| **mod_data_table** | Interactive data table | Sortable, paginated, DT with truncated cells |
| **mod_taxon_view** | Taxon profile pages | Species info, trait summary, external links |
| **mod_trait_view** | Trait profile pages | Trait definitions, distributions, maps |
| **mod_citations** | Citation information | Dynamic reference generation for filtered data |
| **mod_app_info** | Portal documentation | Usage guide, attribution, telemetry metrics |
### Data Processing Pipeline
1. **Filter Parsing** (`fct_filter_parsing.R`)
- Converts UI inputs into structured filter objects
- Validates and normalizes filter values
2. **Filter Application** (`fct_filter_application.R`)
- Applies filters to DuckDB queries
- Optimized single-pass filtering
- Supports regex patterns for text searches
3. **Query Execution**
- Initial load: First 100 rows for display
- Background: Count total matching rows
- On-demand: Load additional batches as needed
4. **Data Display**
- Format columns for presentation
- Generate interactive visualizations
- Cache expensive computations
### Performance Optimizations
- **DuckDB**: High-performance analytical queries on Parquet files
- **Lazy Loading**: Only loads visible data (100 rows at a time)
- **Memoization**: Caches repeated computations (trait groups, dropdown values)
- **Debouncing**: Delays filter execution until user input settles (300-1000ms)
- **Precomputed Dropdowns**: All filter options cached at startup
- **Efficient String Matching**: DuckDB's optimized regex for filtering
### Data Preparation
The `prepare_data_for_portal()` function (in `fct_prepare_data.R`) processes the raw AusTraits database for portal use:
- Flattens nested database structure
- Computes species-level averages for core traits
- Exports to Parquet format (display and full versions)
- Generates cached metadata (definitions, sources, trait groups, dropdown values)
- Pre-processes flora links and state/territory distributions
### Telemetry
Optional analytics tracking via Supabase or local SQLite:
- Session starts
- Search events (when filters applied)
- Download events
- Displayed in real-time on App Information tab
### URL Parameters
The app supports deep linking via URL query parameters:
- `?taxon_name=Eucalyptus+globulus`
- `?trait_name=leaf_area`
- `?tab=Trait+View`
- All filter states can be encoded in URLs for sharing