public interface CollectionProcessingManager
CollectionProcessingManager (CPM) manages the application of an
AnalysisEngine to a collection of artifacts. For text analysis applications, this will be
a collection of documents. The analysis results will then be delivered to one ore more
CasConsumers.
The CPM is configured with an Analysis Engine and CAS Consumers by calling its
setAnalysisEngine(AnalysisEngine) and addCasConsumer(CasConsumer) methods.
Collection processing is then initiated by calling the process(CollectionReader) or
process(CollectionReader,int) methods.
The process methods take a CollectionReader object as an argument. The
Collection Reader retrieves each artifact from the collection as a
CAS object.
Listeners can register with the CPM by calling the
addStatusCallbackListener(StatusCallbackListener) method. These listeners receive status
callbacks during the processing. At any time, performance and progress reports are available from
the getPerformanceReport() and getProgress() methods.
A CPM implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.
Note that a CPM only supports processing one collection at a time. Attempting to reconfigure a
CPM or start a new processing job while a previous processing job is occurring will result in a
UIMA_IllegalStateException. Processing multiple collections
simultaneously is done by instantiating and configuring multiple instances of the CPM.
A CollectionProcessingManager instance can be obtained by calling
UIMAFramework.newCollectionProcessingManager().
| Modifier and Type | Method and Description |
|---|---|
void |
addCasConsumer(CasConsumer aCasConsumer)
Adds a
CasConsumer to this CPM. |
void |
addStatusCallbackListener(StatusCallbackListener aListener)
Registers a listsner to receive status callbacks.
|
AnalysisEngine |
getAnalysisEngine()
Gets the
AnalysisEngine that is assigned to this CPM. |
CasConsumer[] |
getCasConsumers()
Gets the
CasConsumerss assigned to this CPM. |
ProcessTrace |
getPerformanceReport()
Gets a performance report for the processing that is currently occurring or has just completed.
|
Progress[] |
getProgress()
Gets a progress report for the processing that is currently occurring or has just completed.
|
boolean |
isPaused()
Determines whether this CPM's processing is currently paused.
|
boolean |
isPauseOnException()
Gets whether this CPM will automatically pause processing if an exception occurs.
|
boolean |
isProcessing()
Determines whether this CPM is currently processing.
|
boolean |
isSerialProcessingRequired()
Gets whether this CPM is required to process the collection's elements serially (as opposed to
performing parallelization).
|
void |
pause()
Pauses processing.
|
void |
process(CollectionReader aCollectionReader)
Initiates processing of a collection.
|
void |
process(CollectionReader aCollectionReader,
int aBatchSize)
Initiates processing of a collection.
|
void |
removeCasConsumer(CasConsumer aCasConsumer)
Removes a
CasConsumer from this CPM. |
void |
removeStatusCallbackListener(StatusCallbackListener aListener)
Unregisters a status callback listener.
|
void |
resume()
Resumes processing that has been paused.
|
void |
resume(boolean aRetryFailed)
Resumes processing that has been paused.
|
void |
setAnalysisEngine(AnalysisEngine aAnalysisEngine)
Sets the
AnalysisEngine that is assigned to this CPM. |
void |
setPauseOnException(boolean aPause)
Sets whether this CPM will automatically pause processing if an exception occurs.
|
void |
setSerialProcessingRequired(boolean aRequired)
Sets whether this CPM is required to process the collection's elements serially* (as opposed to
performing parallelization).
|
void |
stop()
Stops processing.
|
AnalysisEngine getAnalysisEngine()
AnalysisEngine that is assigned to this CPM.AnalysisEngine that this CPM will use to analyze each CAS in the
collection.void setAnalysisEngine(AnalysisEngine aAnalysisEngine) throws ResourceConfigurationException
AnalysisEngine that is assigned to this CPM.aAnalysisEngine - the AnalysisEngine that this CPM will use to analyze each CAS in the
collection.ResourceConfigurationException - if this CPM is currently processingCasConsumer[] getCasConsumers()
CasConsumerss assigned to this CPM.CasConsumersvoid addCasConsumer(CasConsumer aCasConsumer) throws ResourceConfigurationException
CasConsumer to this CPM.aCasConsumer - a CasConsumer to addResourceConfigurationException - if this CPM is currently processingvoid removeCasConsumer(CasConsumer aCasConsumer)
CasConsumer from this CPM.aCasConsumer - the CasConsumer to removeUIMA_IllegalStateException - if this CPM is currently processingboolean isSerialProcessingRequired()
false does not guarantee that
parallelization is performed; this is left up to the CPM implementation.void setSerialProcessingRequired(boolean aRequired)
false.
Note that a value of false does not guarantee that parallelization is performed;
this is left up to the CPM implementation.aRequired - true if and only if serial processing is requiredUIMA_IllegalStateException - if this CPM is currently processingboolean isPauseOnException()
resume(boolean) method.void setPauseOnException(boolean aPause)
resume(boolean) method.aPause - true if and only if this CPM should pause on exceptionUIMA_IllegalStateException - if this CPM is currently processingvoid addStatusCallbackListener(StatusCallbackListener aListener)
aListener - the listener to addvoid removeStatusCallbackListener(StatusCallbackListener aListener)
aListener - the listener to removevoid process(CollectionReader aCollectionReader) throws ResourceInitializationException
addStatusCallbackListener(StatusCallbackListener) method.
A CPM can only process one collection at a time. If this method is called while a previous
processing request has not yet completed, a UIMA_IllegalStateException will
result. To find out whether a CPM is free to begin another processing request, call the
isProcessing() method.
aCollectionReader - the CollectionReader from which to obtain the Entities to be processedResourceInitializationException - if an error occurs during initializationUIMA_IllegalStateException - if this CPM is currently processingvoid process(CollectionReader aCollectionReader, int aBatchSize) throws ResourceInitializationException
process(CollectionReader), but it breaks the processing up into batches of a size
determined by the aBatchSize parameter. Each CasConsumer will be
notified at the end of each batch.aCollectionReader - the CollectionReader from which to obtain the Entities to be processedaBatchSize - the size of the batch.ResourceInitializationException - if an error occurs during initializationUIMA_IllegalStateException - if this CPM is currently processingboolean isProcessing()
stop()ped. If processing is paused,
this method will still return true.void pause()
resume(boolean)
method.UIMA_IllegalStateException - if no processing is currently occurringboolean isPaused()
void resume(boolean aRetryFailed)
aRetryFailed - if processing was paused because an exception occurred (see
setPauseOnException(boolean)), setting a value of true for
this parameter will cause the failed entity to be retried. A value of
false (the default) will cause processing to continue with the next
entity after the failure.UIMA_IllegalStateException - if processing is not currently pausedvoid resume()
UIMA_IllegalStateException - if processing is not currently pausedvoid stop()
UIMA_IllegalStateException - if no processing is currently occuringProcessTrace getPerformanceReport()
Progress[] getProgress()
Progress objects, each of which represents the progress in a
different set of units (for example number of entities or bytes)Copyright © 2006–2017 The Apache Software Foundation. All rights reserved.