Initializes connection with Naryn Database

emr_db.connect(db_dirs = NULL, load_on_demand = NULL, do_reload = FALSE)

emr_db.init(
  global.dir = NULL,
  user.dir = NULL,
  global.load.on.demand = TRUE,
  user.load.on.demand = TRUE,
  do.reload = FALSE
)

emr_db.ls()

Arguments

db_dirs

vector of db directories

load_on_demand

vector of booleans, same length as db_dirs, if load_on_demand[i] is FALSE, tracks from db_dirs[i] will be pre-loaded, or a single 'TRUE' or 'FALSE' to set load_on_demand for all the databases. If NULL is passed, load_on_demand is set to TRUE on all the databases

do_reload

If TRUE, rebuilds DB index files.

global.dir, user.dir, global.load.on.demand, user.load.on.demand, do.reload

old parameters of the deprecated function emr_db.init

Value

None.

Details

Call `emr_db.connect` function to establish the access to the tracks in the db_dirs. To establish a connection using `emr_db.connect`, Naryn requires to specify at-least one db dir. Optionally, `emr_db.connect` accepts additional db dirs which can also contain additional tracks.

In a case where 2 or more db dirs contain the same track name (namespace collision), the track will be taken from the db dir which was passed *last* in the order of connections.

For example, if we have 2 db dirs /db1 and /db2 which both contain a track named track1, the call emr_db.connect(c('/db1', '/db2')) will result with Naryn using track1 from /db2. As you might expect the overriding is consistent not only for the track's data, but also for any other Naryn entity using or pointing to the track.

Even though all the db dirs may contain track files, their designation is different. All the db dirs except the last dir in the order of connections are mainly read-only. The directory which was connected last in the order, also known as *user dir*, is intended to store volatile data like the results of intermediate calculations.

New tracks can be created only in the db dir which was last in the order of connections, using emr_track.import or emr_track.create. In order to write tracks to a db dir which is not last in the connection order, the user must explicitly reconnect and set the required db dir as the last in order, this should be done for a well justified reason.

When the package is attached it internally calls 'emr_db.init_examples' which sets a single example db dir - 'PKGDIR/naryndb/test'. ('PKGDIR' is the directory where the package is installed).

Physical files in the database are supposed to be managed exclusively by Naryn itself. Manual modification, addition or deletion of track files may be done, yet it must be ratified via running 'emr_db.reload'. Some of these manual changes however (like moving a track from global space to user or vice versa) might cause 'emr_db.connect' to fail. 'emr_db.reload' cannot be invoked then as it requires first the connection to the DB be established. To break the deadlock use 'do_reload=True' parameter within 'emr_db.connect'. This will connect to the DB and rebuild the DB index files in one step.

If 'load_on_demand' is 'TRUE' a track is loaded into memory only when it is accessed and it is unloaded from memory as R sessions ends or the package is unloaded.

If 'load_on_demand' parameter is 'FALSE', all the tracks from the specified space (global / user) are pre-loaded into memory making subsequent track access significantly faster. As loaded tracks reside in shared memory, other R sessions running on the same machine, may also enjoy significant run-time boost. On the flip side, pre-loading all the tracks prolongs the execution of 'emr_db.connect' and requires enough memory to accommodate all the data.

Choosing between the two modes depends on the specific needs. While 'load_on_demand=TRUE' seems to be a solid default choice, in an environment where there are frequent short-living R sessions, each accessing a track one might opt for running a "daemon" - an additional permanent R session. The daemon would pre-load all the tracks in advance and stay alive thus boosting the run-time of the later emerging sessions.

Upon completion the connection is established with the database and a few variables are added to the .naryn environment. These variables should not be modified by the user!

.naryn$EMR_GROOTFirst db dir of tracks in the order of connections.naryn$EMR_UROOT
Last db dir of tracks in the order of connection (user dir).naryn$EMR_ROOTSVector of directories (db_dirs)

emr_db.init is the old version of this function which is now deprecated.

emr_db.ls lists all the currently connected databases.