Calculates distribution of track expressions' values over the given set of bins.

emr_dist(
  ...,
  include.lowest = FALSE,
  right = TRUE,
  stime = NULL,
  etime = NULL,
  iterator = NULL,
  keepref = FALSE,
  filter = NULL,
  dataframe = FALSE,
  names = NULL
)

Arguments

...

pairs of [expr, breaks], where expr is the track expression and breaks are the breaks that determine the bin or 'NULL'.

include.lowest

if 'TRUE', the lowest (or highest, for 'right = FALSE') value of the range determined by breaks is included

right

if 'TRUE' the intervals are closed on the right (and open on the left), otherwise vice versa.

stime

start time scope

etime

end time scope

iterator

track expression iterator. If 'NULL' iterator is determined implicitly based on track expressions. See also 'iterator' section.

keepref

If 'TRUE' references are preserved in the iterator.

filter

Iterator filter.

dataframe

return a data frame instead of an N-dimensional vector.

names

names for track expressions in the returned dataframe (only relevant when dataframe == TRUE)

Value

N-dimensional vector where N is the number of 'expr'-'breaks' pairs. If dataframe == TRUE - a data frame with a column for each track expression and an additional column 'n' with counts.

Details

This function calculates the distribution of values of the numeric track expressions over the given set of bins.

The range of bins is determined by 'breaks' argument. For example: 'breaks=c(x1, x2, x3, x4)' represents three different intervals (bins): (x1, x2], (x2, x3], (x3, x4].

If the track expression constitutes of a categorical track or a virtual track which source is a categorical track, the 'breaks' is allowed to be 'NULL' meaning that the breaks are derived implicitly from the unique values of the underlying track.

'emr_dist' can work with any number of dimensions. If more than one 'expr'-'breaks' pair is passed, the result is a multidimensional vector, and an individual value can be accessed by [i1,i2,...,iN] notation, where 'i1' is the first track and 'iN' is the last track expression.

iterator

There are a few types of iterators:

  • Track iterator: Track iterator returns the points (including the reference) from the specified track. Track name is specified as a string. If `keepref=FALSE` the reference of each point is set to `-1`
    Example:

    # Returns the level of glucose one hour after the insulin shot was made
    emr_vtrack.create("glucose", "glucose_track", func="avg", time.shift=1)
    emr_extract("glucose", iterator="insulin_shot_track")

  • Id-Time Points Iterator: Id-Time points iterator generates points from an *id-time points table*. If `keepref=FALSE` the reference of each point is set to `-1`.
    Example:

    # Returns the level of glucose one hour after the insulin shot was made
    emr_vtrack.create("glucose", "glucose_track", func = "avg", time.shift = 1)
    r <- emr_extract("insulin_shot_track") # <-- implicit iterator is used here
    emr_extract("glucose", iterator = r)

  • Ids Iterator: Ids iterator generates points with ids taken from an *ids table* and times that run from `stime` to `etime` with a step of 1. If `keepref=TRUE` for each id-time pair the iterator generates 255 points with references running from `0` to `254`. If `keepref=FALSE` only one point is generated for the given id and time, and its reference is set to `-1`.
    Example:

    stime <- emr_date2time(1, 1, 2016, 0)
    etime <- emr_date2time(31, 12, 2016, 23)
    emr_extract("glucose", iterator = data.frame(id = c(2, 5)), stime = stime, etime = etime)

  • Time Intervals Iterator: *Time intervals iterator* generates points for all the ids that appear in 'patients.dob' track with times taken from a *time intervals table* (see: Appendix). Each time starts at the beginning of the time interval and runs to the end of it with a step of 1. That being said the points that lie outside of `[stime, etime]` range are skipped.
    If `keepref=TRUE` for each id-time pair the iterator generates 255 points with references running from `0` to `254`. If `keepref=FALSE` only one point is generated for the given id and time, and its reference is set to `-1`.
    Example:
    # Returns the level of hangover for all patients the next day after New Year Eve for the years 2015 and 2016
    stime1 <- emr_date2time(1, 1, 2015, 0)
    etime1 <- emr_date2time(1, 1, 2015, 23)
    stime2 <- emr_date2time(1, 1, 2016, 0)
    etime2 <- emr_date2time(1, 1, 2016, 23)
    emr_extract("alcohol_level_track", iterator = data.frame(
    stime = c(stime1, stime2),
    etime = c(etime1, etime2)
    ))

  • Id-Time Intervals Iterator: *Id-Time intervals iterator* generates for each id points that cover `['stime', 'etime']` time range as specified in *id-time intervals table* (see: Appendix). Each time starts at the beginning of the time interval and runs to the end of it with a step of 1. That being said the points that lie outside of `[stime, etime]` range are skipped.
    If `keepref=TRUE` for each id-time pair the iterator generates 255 points with references running from `0` to `254`. If `keepref=FALSE` only one point is generated for the given id and time, and its reference is set to `-1`

  • Beat Iterator: *Beat Iterator* generates a "time beat" at the given period for each id that appear in 'patients.dob' track. The period is given always in hours.
    Example:
    emr_extract("glucose_track", iterator=10, stime=1000, etime=2000)
    This will create a beat iterator with a period of 10 hours starting at `stime` up until `etime` is reached. If, for example, `stime` equals `1000` then the beat iterator will create for each id iterator points at times: 1000, 1010, 1020, ...
    If `keepref=TRUE` for each id-time pair the iterator generates 255 points with references running from `0` to `254`. If `keepref=FALSE` only one point is generated for the given id and time, and its reference is set to `-1`.

  • Extended Beat Iterator: *Extended beat iterator* is as its name suggests a variation on the beat iterator. It works by the same principle of creating time points with the given period however instead of basing the times count on `stime` it accepts an additional parameter - a track or a *Id-Time Points table* - that instructs what should be the initial time point for each of the ids. The two parameters (period and mapping) should come in a list. Each id is required to appear only once and if a certain id does not appear at all, it is skipped by the iterator.
    Anyhow points that lie outside of `[stime, etime]` range are not generated.
    Example:
    # Returns the maximal weight of patients at one year span starting from their birthdays
    emr_vtrack.create("weight", "weight_track", func = "max", time.shift = c(0, year()))
    emr_extract("weight", iterator = list(year(), "birthday_track"), stime = 1000, etime = 2000)

  • Periodic Iterator: periodic iterator goes over every year/month. You can use it by running emr_monthly_iterator or emr_yearly_iterator.
    Example:
    iter <- emr_yearly_iterator(emr_date2time(1, 1, 2002), emr_date2time(1, 1, 2017))
    emr_extract("dense_track", iterator = iter, stime = 1, etime = 3)
    iter <- emr_monthly_iterator(emr_date2time(1, 1, 2002), n = 15)
    emr_extract("dense_track", iterator = iter, stime = 1, etime = 3)

  • Implicit Iterator: The iterator is set implicitly if its value remains `NULL` (which is the default). In that case the track expression is analyzed and searched for track names. If all the track variables or virtual track variables point to the same track, this track is used as a source for a track iterator. If more then one track appears in the track expression, an error message is printed out notifying ambiguity.

Revealing Current Iterator Time: During the evaluation of a track expression one can access a specially defined variable named `EMR_TIME` (Python: `TIME`). This variable contains a vector (`numpy.ndarray` in Python) of current iterator times. The length of the vector matches the length of the track variable (which is a vector too).
Note that some values in `EMR_TIME` might be set 0. Skip those intervals and the values of the track variables at the corresponding indices.
# Returns times of the current iterator as a day of month
emr_extract("emr_time2dayofmonth(EMR_TIME)", iterator = "sparse_track")

See also

Examples


emr_db.init_examples()
#> NULL
emr_dist("sparse_track", c(0, 15, 20, 30, 40, 50), keepref = TRUE)
#>  (0,15] (15,20] (20,30] (30,40] (40,50] 
#>       4       1       4       1       0 
#> attr(,"breaks")
#> attr(,"breaks")[[1]]
#> [1]  0 15 20 30 40 50
#> 
emr_dist("sparse_track", c(0, 15, 20, 30, 40, 50), keepref = TRUE, dataframe = TRUE)
#>   sparse_track n
#> 1       (0,15] 4
#> 2      (15,20] 1
#> 3      (20,30] 4
#> 4      (30,40] 1
#> 5      (40,50] 0