-
Notifications
You must be signed in to change notification settings - Fork 6
Class_EngineDVFileJson
Johann Petrak edited this page Apr 16, 2018
·
8 revisions
This page describes the inner workings of class EngineDVFileJson
which implements the engine for using external algorithms with dense vectors represented in JSON format (and handled out-of-memory).
Currently (as of 2018-04-16), the invocation protocol for engines is a bit complex. The required protocol depends on the situation the engine gets used in (training versus application).
When training:
- The engine class gets selected in the PR based on the trainingAlgorithm runtime PR
-
Engine.createEngine(trainingAlgorithm, algorithmParameters, featureInfo, TargetType, dataDirectory)
is called- this executes the non-static
initializeAlgorithm(algorithm,parms)
method (overriden but empty for EngineDVFileJson) - then runs method
initWhenCreating(directory, algorithm, parms, featureInfo, targetType)
: for EngineDVFileJson, this essentially creates the instance of the appropriate corpus representation and sets the mode to "adding". - creates and initializes the Info instance
- returns the Engine instance
- this executes the non-static
- document processing uses the corpus representation retrieved from the engine to add new instances
- After all documents have been processed, the engine's info gets updated
- Then
engine.trainModel(dataDir, instanceAnnotationType, algoParms)
gets called:- turns off adding for the corpus representation
- updates the info
- copies the whole wrapper software unless already there (based on
WRAPPER_NAME
) - creates the command to invoke the training script, also using the settings in the config file
WRAPPER_NAME.yaml
which is treated as a key/value map - this optionally uses settings
shellcmd
andshellparms
for running the shell script - TODO: this should also allow to configure the python path and python location
- before running the command, sets environment variable
WRAPPER_HOME
which is a subdirectory of the data directory. - runs the command
- updates the info and saves it
- saves the featureInfo (NOTE: this is currently done again later in the saveEngine method)
- Finally
engine.saveEngine(dataDir)
gets called (from base class Engine) which:- saves the feature info using
featureInfo.save(dir)
- invokes the engine-specific
saveModel(dir)
class, in this case, this does nothing since the model gets saved by the scripts we call - invokes the engine-specific
saveCorpusRepresentation(dir)
class, which in this case does nothing, since the corpus representation is already out-of-memory and stored to a file
- saves the feature info using
Brought to you by the GATE team