HRCapture Application Integration Manual

General System Architecture

Capture has been designed for enabling customers to visually perform the post-correction and completion of semantically extracted HR documents, whereas documents may be CVs as well as application attachments or job offers (ads).

Basically, Capture constitutes one step in the application or job offer import into a customer's HR system. Usually, data is transferred from an external source to the customer's database whether directly (binary or entered by a web form) or via a semantic extraction system.

In both cases, corrections and additional information have to be performed manually without any visual support given by the view of the original document. Capture is integrated between the income of external documents and the insertion in a customer's database. It substitutes any manual entry forms used formerly for information coverage completion and the semantic extraction process as well.

The Capture-Module has to be integrated in an asynchronous way. The customer's application has to communicate with Capture three times:

  1. If new data is arriving, whether by online application or crawling

  2. When the visual completion process of the document shall be started

  3. If the data input has been completed and submitted within Capture

A new document is transferred to Capture via a SOAP web service synchronously, including a callback-URL. Capture semantically extracts the document, generates a unique ID for the document and returns it to the caller.

When the visual capture process shall start, the Capture browser interface is called using the unique ID of the document to be processed. No time limit is given for the manual correction.

As it cannot be determined preliminary at which point in time the capture process will be completed by human input, Capture has to callback the customer's application in order to inform about the process completion.

After input completion, the integrating application has to collect the completed extraction result from the Capture database for further processing. Altogther, the steps for a document to be imported to, processed in, and exported from Capture are as follows:

  1. Document import (SOAP) ⇒ Method “upload”

  2. Capture start (Browser) ⇒ Call of SemanticGui.html?token=XXX

  3. Capture finish (REST) (callback) ⇒ Click on “finish” in the GUI

  4. Document export (SOAP) ⇒ Method “downloadStandardXML”

Reason for the different interface technologies used is that implementing the callback in SOAP would complicate the integration into the client's application significantly.

The following sequence diagram illustrates the communication and data exchange flow when Capture is integrated in a customer application:

Server Interfaces

The Capture SOAP interface is located at

and offers the following methods (Java syntax):

/** * Processes a given document for the user corresponding to the given * identity hash. Stores the result for later manual processing and * download. * * @param identityhash * The authentication hash corresponding to a specific user * @param document * Binary document * @param format * Document format (postfix) * @param callbackUrl * The callback URL which is accessed for signaling a document * being downloadable (optional) * @return The unique document hash */ public String upload(String identityhash, byte[] document, String format, String callbackUrl); /** * Provided the document as XML schema-verified XML based on the given * scheme * * @param documenthash * Hash of the document to be exported * @param schema * XML schema similar to CVlizer semantic service * @param targetlang * The codeNamePair target language (independent from document * language!) * @param doctype * Type of the document as delivered by HRClassifier. Important * for attachemnts. * @return Schema-verified XML */ public String downloadStandardXML(String documenthash, String schema, String targetlang, String doctype); /** * Provides the document corresponding to the given hash for download (customized XML) * * @param documenthash * @return XML */ public String download(String documenthash);

Method: upload

This method returns, if the document has been successfully extracted, a unique cryptographic, secure hash of 32 bytes, also referred to as “token”.

The callbackUrl which is provided on upload will be called by Capture if the user has finished the manual completion process within the Web Browser by clicking on “Finish” in the application. The URL may be “null” which requires the host application to take over control of the application via JavaScript (see next chapter).

Identity Hash for default Capture usage

If you are using our capture module with no customization, the “Identity Hash” parameter must contain the provided CVlizer/JOBolizer access credentials using the following syntax:


whereas model refers to the respective CVlizer or JOBolizer model (for example: “cvlizer_3_0”) similar to the downloadStandardXML method.

Identity Hash for customized Capture usage

If you are using our capture module with customisations (like custom CSS, custom extraction model etc…), the “Identity Hash” parameter must only contain the provided capture customization token:

This customization token can currently not be found in our service portal. If you have any questions on this issue please contact

Method: downloadStandardXML

This method returns the data as JoinVision semantic application default XML. “doctype” has to be stated for attachments only (feature in preparation), otherwise it may remain empty. Please refer to the CVlizer web interface description and the respective XML schemas for further information.

Currently, the following models are supported:

  • cvlizer_3_0

  • jobolizer_3_0

Method: download

This method returns the data as simplified, non-schema-validated XML. It should only be used (and has to be used) if the extraction model or data mapping has been customized, making the resulting data structure incompatible with the JoinVision default XML-schema.

Application Interface

The Capture-interface is loaded by accessing the following URL:

… whereas “TOKEN” must be replaced by the token returned on document upload and “ISOCODE” must be replaced by the desired display language of the GUI. The user interface is then displayed and the control is given to the user.

“language=ISOCODE” is optional, if it's not provided, “de” (German) is taken as default. Capture ships with three languages “de”, “en” and “fr”

Embedding Capture

Usually, a direct communication with the Capture Ajax-Application is not required by the integrating host application. If, for some reason, a notification via callback-URL doesn't fit the customer's requirements (e.g. when embedded in another web application), alternative ways to communicate with the Capture browser frame are provided.

Catching the End-Of-Editing Event

Basically we recommend to embed Capture via IFrame in order to catch the end-of-editing-event, which happens when the user clicks on “Record completed”. To catch this event in the parent window, the following options can be used:

  • A JavaScript Window.postMessage is sent to the parent browser window carrying the message “capture.submitted”. This message can be caught in the parent window as shown in the following example:

  • As an alternative for older browser versions, which do not support postMessages, a popup window is shown, containing the HTML Element ID “save_successful”. By querying the existence of this ID in the document's DOM, the host application can determine when the commit of the document has been finished.

Important: In these cases, the callback-URL requires to be an empty string on document upload!

As an alternative to the use of IFrames, the callback-URL, which was passed as parameter when calling the upload method, can be put as the current address for the currently active browser window (or tab) Capture is currently running in. To achieve this, additionally to specifying the desired callback-URL, a GET parameter has to be set when loading Capture itself, namely “redirect=true”.

A code example for running capture embedded in an IFrame can be found here

Catching the Document-Dirty Status Events

Capture allows you to take control over the handling of warning the user about unsaved changes. In order to enable this mode, the GET parameter “mode=capture” has to be set in the capture URL.

This mode causes the following:

  • Capture will not warn the user anymore when trying to close the window while unsaved data is present.

  • As soon as unsafe data is present, capture will trigger a post message with the value “capture.isDirty”

  • As soon as the user has saved/reloaded the document, capture will trigger a post message with the value “capture.isClean”

For more information about how to catch post messages, please have a look at Catching the End-Of-Editing event.

Catching on-Load Errors

Currently there are two types of errors that can be caught via post messages:

  • Errors due to an invalid or expired document token will trigger a post message with the value “capture.error.invalidToken”

  • All other kinds of on-load-errors will trigger a post message with the value “capture.error.general”


Triggering the End-Of-Editing Event externally

Another option is to provide an external button (e.g. outside the IFrame), which triggers the same action as clicking on the “Record Complete” button. To achieve this, a Window.postMessage has to be sent from the parent frame to the client IFrame capture is running in. This can be done as shown in the following example, whereas 'ifr' refers to the ID of the IFrame and the address “” refers to the host of the IFrame's target URL:

When providing a button for performing an external commit, it is necessary to hide the default commit button in order to prevent users from using it. In order to achieve that, the GET parameter “hidemenu=true” has to be added to the URL when loading Capture itself.