extract_data.Rmd
First, install wranglEHR from github, or download the dev release directly.
Establish a database connection. WranglEHR is optimised for EMAP, but you can technically connect it to any OMOP CDM version 5.3.1.
library(wranglEHR)
ctn <- ctn <- DBI::dbConnect(
RPostgres::Postgres(),
host = "****", # Host target for the UDS
port = 5432,
user = "****",
password = rstudioapi::askForPassword(),
dbname = "uds")
Enter the fields you want to extract using OMOP concept ids. See Athena for more information on these codes. Alternately, the mdata
object from wranglEHR contains all the information on what data is currently expressed in EMAP. There are additional concept tables inside the database itself.
I write this here as a tribble to keep everything in the same document. If you want to rename column names then it is important to keep names and code names together and in the same order. A column containing a summary function is added. This summary function will come in to action when there are more data points than rows, and thus a summary is necessary.
my_codes <- tibble::tribble(
~concept_codes, ~short_name, ~summary_func
3013502, "spo2", min,
44809212, "spo2_target", min
)
ltb <- extract(
connection = ctn,
target_schema = "ops_dev",
visit_occurrence_ids = 600000:600005,
concept_names = my_codes$concept_codes,
relabel = my_codes$short_name,
coalesce_rows = my_codes$summary_func,
chunk_size = 5000,
cadance = 1
)
ltb
This will return a table with 1 row per patient per time unit specified in the cadance option. For example, a cadance of 1 will return a table with 1 row per patient per hour. This does not create a row if nothing was recorded, so the rows might run: 1, 2, 4, if nothing was recorded in hour 3. Note how a summary function can be provided to coalese_rows
. This determines the behaviour when collapsing dataitems down into rows below the natural cadance of recording.
Depending upon how much data you want to pull into a long form for analysis, this can take considerable time. Choose carefully, and do a test run on a small number of cases first. The chunk_size
option can be useful if you are encountering performance issues.