Initializes connection with Naryn Database
emr_db.connect(db_dirs = NULL, load_on_demand = NULL, do_reload = FALSE)
emr_db.init(
global.dir = NULL,
user.dir = NULL,
global.load.on.demand = TRUE,
user.load.on.demand = TRUE,
do.reload = FALSE
)
emr_db.ls()
vector of db directories
vector of booleans, same length as db_dirs, if load_on_demand[i] is FALSE, tracks from db_dirs[i] will be pre-loaded, or a single 'TRUE' or 'FALSE' to set load_on_demand
for all the databases. If NULL is passed, load_on_demand
is set to TRUE on all the databases
If TRUE
, rebuilds DB index files.
old parameters of the deprecated function emr_db.init
None.
Call `emr_db.connect` function to establish the access to the tracks in the db_dirs. To establish a connection using `emr_db.connect`, Naryn requires to specify at-least one db dir. Optionally, `emr_db.connect` accepts additional db dirs which can also contain additional tracks.
In a case where 2 or more db dirs contain the same track name (namespace collision), the track will be taken from the db dir which was passed *last* in the order of connections.
For example, if we have 2 db dirs /db1
and /db2
which both contain
a track named track1
, the call emr_db.connect(c('/db1', '/db2'))
will result with
Naryn using track1
from /db2
. As you might expect the overriding is consistent not
only for the track's data, but also for any other Naryn entity using or pointing
to the track.
Even though all the db dirs may contain track files, their designation is different. All the db dirs except the last dir in the order of connections are mainly read-only. The directory which was connected last in the order, also known as *user dir*, is intended to store volatile data like the results of intermediate calculations.
New tracks can be created only in the db dir which was last in the order of
connections, using emr_track.import
or emr_track.create
. In order to write tracks
to a db dir which is not last in the connection order, the user must explicitly
reconnect and set the required db dir as the last in order, this should be done for a
well justified reason.
When the package is attached it internally calls 'emr_db.init_examples' which sets a single example db dir - 'PKGDIR/naryndb/test'. ('PKGDIR' is the directory where the package is installed).
Physical files in the database are supposed to be managed exclusively by Naryn itself. Manual modification, addition or deletion of track files may be done, yet it must be ratified via running 'emr_db.reload'. Some of these manual changes however (like moving a track from global space to user or vice versa) might cause 'emr_db.connect' to fail. 'emr_db.reload' cannot be invoked then as it requires first the connection to the DB be established. To break the deadlock use 'do_reload=True' parameter within 'emr_db.connect'. This will connect to the DB and rebuild the DB index files in one step.
If 'load_on_demand' is 'TRUE' a track is loaded into memory only when it is accessed and it is unloaded from memory as R sessions ends or the package is unloaded.
If 'load_on_demand' parameter is 'FALSE', all the tracks from the specified space (global / user) are pre-loaded into memory making subsequent track access significantly faster. As loaded tracks reside in shared memory, other R sessions running on the same machine, may also enjoy significant run-time boost. On the flip side, pre-loading all the tracks prolongs the execution of 'emr_db.connect' and requires enough memory to accommodate all the data.
Choosing between the two modes depends on the specific needs. While 'load_on_demand=TRUE' seems to be a solid default choice, in an environment where there are frequent short-living R sessions, each accessing a track one might opt for running a "daemon" - an additional permanent R session. The daemon would pre-load all the tracks in advance and stay alive thus boosting the run-time of the later emerging sessions.
Upon completion the connection is established with the database and a few variables are added to the .naryn environment. These variables should not be modified by the user!
.naryn$EMR_GROOT | First db dir of tracks in the order of connections | .naryn$EMR_UROOT |
Last db dir of tracks in the order of connection (user dir) | .naryn$EMR_ROOTS | Vector of directories (db_dirs) |
emr_db.init
is the old version of this function which
is now deprecated.
emr_db.ls
lists all the currently connected databases.