Interface CollectionProcessingManager
CollectionProcessingManager (CPM) manages the application of an
AnalysisEngine to a collection of artifacts. For text analysis applications, this will be
a collection of documents. The analysis results will then be delivered to one ore more
CasConsumers.
The CPM is configured with an Analysis Engine and CAS Consumers by calling its
setAnalysisEngine(AnalysisEngine) and addCasConsumer(CasConsumer) methods.
Collection processing is then initiated by calling the process(CollectionReader) or
process(CollectionReader,int) methods.
The process methods take a CollectionReader object as an argument. The
Collection Reader retrieves each artifact from the collection as a
CAS object.
Listeners can register with the CPM by calling the
addStatusCallbackListener(StatusCallbackListener) method. These listeners receive status
callbacks during the processing. At any time, performance and progress reports are available from
the getPerformanceReport() and getProgress() methods.
A CPM implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.
Note that a CPM only supports processing one collection at a time. Attempting to reconfigure a
CPM or start a new processing job while a previous processing job is occurring will result in a
UIMA_IllegalStateException. Processing multiple collections
simultaneously is done by instantiating and configuring multiple instances of the CPM.
A CollectionProcessingManager instance can be obtained by calling
UIMAFramework.newCollectionProcessingManager().
-
Method Summary
Modifier and TypeMethodDescriptionvoidaddCasConsumer(CasConsumer aCasConsumer) Adds aCasConsumerto this CPM.voidaddStatusCallbackListener(StatusCallbackListener aListener) Registers a listsner to receive status callbacks.Gets theAnalysisEnginethat is assigned to this CPM.Gets theCasConsumerss assigned to this CPM.Gets a performance report for the processing that is currently occurring or has just completed.Progress[]Gets a progress report for the processing that is currently occurring or has just completed.booleanisPaused()Determines whether this CPM's processing is currently paused.booleanGets whether this CPM will automatically pause processing if an exception occurs.booleanDetermines whether this CPM is currently processing.booleanGets whether this CPM is required to process the collection's elements serially (as opposed to performing parallelization).voidpause()Pauses processing.voidprocess(CollectionReader aCollectionReader) Initiates processing of a collection.voidprocess(CollectionReader aCollectionReader, int aBatchSize) Initiates processing of a collection.voidremoveCasConsumer(CasConsumer aCasConsumer) Removes aCasConsumerfrom this CPM.voidUnregisters a status callback listener.voidresume()Resumes processing that has been paused.voidresume(boolean aRetryFailed) Resumes processing that has been paused.voidsetAnalysisEngine(AnalysisEngine aAnalysisEngine) Sets theAnalysisEnginethat is assigned to this CPM.voidsetPauseOnException(boolean aPause) Sets whether this CPM will automatically pause processing if an exception occurs.voidsetSerialProcessingRequired(boolean aRequired) Sets whether this CPM is required to process the collection's elements serially* (as opposed to performing parallelization).voidstop()Stops processing.
-
Method Details
-
getAnalysisEngine
AnalysisEngine getAnalysisEngine()Gets theAnalysisEnginethat is assigned to this CPM.- Returns:
- the
AnalysisEnginethat this CPM will use to analyze each CAS in the collection.
-
setAnalysisEngine
Sets theAnalysisEnginethat is assigned to this CPM.- Parameters:
aAnalysisEngine- theAnalysisEnginethat this CPM will use to analyze each CAS in the collection.- Throws:
ResourceConfigurationException- if this CPM is currently processing
-
getCasConsumers
CasConsumer[] getCasConsumers()Gets theCasConsumerss assigned to this CPM.- Returns:
- an array of
CasConsumers
-
addCasConsumer
Adds aCasConsumerto this CPM.- Parameters:
aCasConsumer- aCasConsumerto add- Throws:
ResourceConfigurationException- if this CPM is currently processing
-
removeCasConsumer
Removes aCasConsumerfrom this CPM.- Parameters:
aCasConsumer- theCasConsumerto remove- Throws:
UIMA_IllegalStateException- if this CPM is currently processing
-
isSerialProcessingRequired
boolean isSerialProcessingRequired()Gets whether this CPM is required to process the collection's elements serially (as opposed to performing parallelization). Note that a value offalsedoes not guarantee that parallelization is performed; this is left up to the CPM implementation.- Returns:
- true if and only if serial processing is required
-
setSerialProcessingRequired
void setSerialProcessingRequired(boolean aRequired) Sets whether this CPM is required to process the collection's elements serially* (as opposed to performing parallelization). If this method is not called,* the default isfalse. Note that a value offalsedoes not guarantee that parallelization is performed; this is left up to the CPM implementation.- Parameters:
aRequired- true if and only if serial processing is required- Throws:
UIMA_IllegalStateException- if this CPM is currently processing
-
isPauseOnException
boolean isPauseOnException()Gets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling theresume(boolean)method.- Returns:
- true if and only if this CPM will pause on exception
-
setPauseOnException
void setPauseOnException(boolean aPause) Sets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling theresume(boolean)method.- Parameters:
aPause- true if and only if this CPM should pause on exception- Throws:
UIMA_IllegalStateException- if this CPM is currently processing
-
addStatusCallbackListener
Registers a listsner to receive status callbacks.- Parameters:
aListener- the listener to add
-
removeStatusCallbackListener
Unregisters a status callback listener.- Parameters:
aListener- the listener to remove
-
process
Initiates processing of a collection. CollectionReader initializes the CAS with Documents from the Colection. This method starts the processing in another thread and returns immediately. Status of the processing can be obtained by registering a listener with theaddStatusCallbackListener(StatusCallbackListener)method.A CPM can only process one collection at a time. If this method is called while a previous processing request has not yet completed, a
UIMA_IllegalStateExceptionwill result. To find out whether a CPM is free to begin another processing request, call theisProcessing()method.- Parameters:
aCollectionReader- theCollectionReaderfrom which to obtain the Entities to be processed- Throws:
ResourceInitializationException- if an error occurs during initializationUIMA_IllegalStateException- if this CPM is currently processing
-
process
void process(CollectionReader aCollectionReader, int aBatchSize) throws ResourceInitializationException Initiates processing of a collection. This method works in the same way asprocess(CollectionReader), but it breaks the processing up into batches of a size determined by theaBatchSizeparameter. EachCasConsumerwill be notified at the end of each batch.- Parameters:
aCollectionReader- theCollectionReaderfrom which to obtain the Entities to be processedaBatchSize- the size of the batch.- Throws:
ResourceInitializationException- if an error occurs during initializationUIMA_IllegalStateException- if this CPM is currently processing
-
isProcessing
boolean isProcessing()Determines whether this CPM is currently processing. This means that a processing request has been submitted and has not yet completed or beenstop()ped. If processing is paused, this method will still returntrue.- Returns:
- true if and only if this CPM is currently processing.
-
pause
void pause()Pauses processing. Processing can later be resumed by calling theresume(boolean)method.- Throws:
UIMA_IllegalStateException- if no processing is currently occurring
-
isPaused
boolean isPaused()Determines whether this CPM's processing is currently paused.- Returns:
- true if and only if this CPM's processing is currently paused.
-
resume
void resume(boolean aRetryFailed) Resumes processing that has been paused.- Parameters:
aRetryFailed- if processing was paused because an exception occurred (seesetPauseOnException(boolean)), setting a value oftruefor this parameter will cause the failed entity to be retried. A value offalse(the default) will cause processing to continue with the next entity after the failure.- Throws:
UIMA_IllegalStateException- if processing is not currently paused
-
resume
void resume()Resumes processing that has been paused.- Throws:
UIMA_IllegalStateException- if processing is not currently paused
-
stop
void stop()Stops processing.- Throws:
UIMA_IllegalStateException- if no processing is currently occuring
-
getPerformanceReport
ProcessTrace getPerformanceReport()Gets a performance report for the processing that is currently occurring or has just completed.- Returns:
- an object containing performance statistics
-
getProgress
Progress[] getProgress()Gets a progress report for the processing that is currently occurring or has just completed.- Returns:
- an array of
Progressobjects, each of which represents the progress in a different set of units (for example number of entities or bytes)
-