Skip to content

Class_EngineDVFileJson

Johann Petrak edited this page Apr 16, 2018 · 8 revisions

Class EngineDVFileJson

This page describes the inner workings of class EngineDVFileJson which implements the engine for using external algorithms with dense vectors represented in JSON format (and handled out-of-memory).

Protocol of use

Currently (as of 2018-04-16), the invocation protocol for engines is a bit complex. The required protocol depends on the situation the engine gets used in (training versus application).

When training:

  • The engine class gets selected in the PR based on the trainingAlgorithm runtime PR
  • Engine.createEngine(trainingAlgorithm, algorithmParameters, featureInfo, TargetType, dataDirectory) is called
    • this executes the non-static initializeAlgorithm(algorithm,parms) method (overriden but empty for EngineDVFileJson)
    • then runs method initWhenCreating(directory, algorithm, parms, featureInfo, targetType): for EngineDVFileJson, this essentially creates the instance of the appropriate corpus representation and sets the mode to "adding".
    • creates and initializes the Info instance
    • returns the Engine instance
  • document processing uses the corpus representation retrieved from the engine to add new instances
  • After all documents have been processed, the engine's info gets updated
  • Then engine.trainModel(dataDir, instanceAnnotationType, algoParms) gets called:
    • turns off adding for the corpus representation
    • updates the info
    • copies the whole wrapper software unless already there (based on WRAPPER_NAME)
    • creates the command to invoke the training script, also using the settings in the config file WRAPPER_NAME.yaml which is treated as a key/value map
    • this optionally uses settings shellcmd and shellparms for running the shell script
    • TODO: this should also allow to configure the python path and python location
    • before running the command, sets environment variable WRAPPER_HOME which is a subdirectory of the data directory.
    • runs the command
    • updates the info and saves it
    • saves the featureInfo (NOTE: this is currently done again later in the saveEngine method)
  • Finally engine.saveEngine(dataDir) gets called (from base class Engine) which:
    • saves the feature info using featureInfo.save(dir)
    • invokes the engine-specific saveModel(dir) class, in this case, this does nothing since the model gets saved by the scripts we call
    • invokes the engine-specific saveCorpusRepresentation(dir) class, which in this case does nothing, since the corpus representation is already out-of-memory and stored to a file
Clone this wiki locally