Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

This page is intended to give you a jumpstart on accessing the CVlizer web service and to show you how easy it is to convert an arbitrary binary CV into XML.

0. Acquire Credentials

In order to use the CVlizer service you will need your user credentials to verify calls against the web service. While SOAP-calls use a combination of username and token, REST-calls require you to pass a token. You can find your token here.

1. Choose a Web Service Technology

The CVlizer web service may be used with SOAP or REST. When using REST the extraction result can be formatted in JSON or XML while SOAP only offers results in XML. REST uses a token for authentication whilst SOAP requires to pass both the username and token.

2. Call Web Service

For this example, we use Python 2 as client language as it's widely used and already ships with the necessary APIs, REST as web service technology and XML as output format. The CV location is passed to the script as command line argument. Let's assume your access token is 'foo'.

(You can find a Python 3 example here)

So, create a new Python file, for example program.py, and enter the following lines of code:

    #!/usr/bin/python
    import urllib2
    import json
    import base64
    from argparse import ArgumentParser
    import os.path
     
    outformat = 'xml' # xml or json
    url = 'https://cvlizer.joinvision.com:443/cvlizer/rest/v1/extract/' + outformat # REST url for semantic extraction 
    token = 'foo' 
    model = 'cvlizer_3_0' # extraction model, see http://helpdesk.joinvision.com/cvlizer/api/params for other models
    language = 'EN' # xml or json tag language, either EN or DE
     
    # ARGUMENTS 
    # checks for valid file, returns filename if true
    def is_valid_file(parser, arg):
    	if not os.path.exists(arg):
    		parser.error("The file %s does not exist!" % arg)
    	else:
    		return arg
     
    # command line arguments parsing
    parser = ArgumentParser(description="Extract CV to XML with CVlizer via REST.")
    parser.add_argument("-i", dest="filename", required=True,
    	help="input file to be extracted", metavar="FILE",
    	type=lambda x: is_valid_file(parser, x))
    args = parser.parse_args()
    # /ARGUMENTS
     
    infile = args.filename
     
    # file to Base64-String
    with open(infile,'rb') as f:
        content = f.read()
    base64 = content.encode("base64")
     
    # call webservice
    req = urllib2.Request(url);
    req.add_header("Content-Type","application/json");
    req.add_header("Authorization", 'Bearer ' + token)
    parameters = {'model':model,'data':base64,'language':language,'filename':os.path.basename(infile)}
    res = urllib2.urlopen(req, json.dumps(parameters))
    result = res.read()
     
    # save result as new file
    outfile = open(infile + '.' + outformat,'wb')
    outfile.write(result)
    outfile.close()

The 'ARGUMENTS' section was added for easier usage. The script will take the file location as parameter, checks whether it is a valid file, abort on failure or assign the passed parameter to the variable infile. You may omit this section and assign a value to infile any other way.

To run this script from a command line interface do as follows:

python program.py -i /PATH/TO/cv.pdf

And that's it! If you run the script, a XML-file with the extraction result can be found at /PATH/TO/cv.pdf.xml. Regarding the code snippet shown, the ARGUMENTS-section may be omitted if the value for the input-file is hardcoded.

For a full description of the XML structure or the tag values used, please refer to our XML schema and semantic value domains definition, which both can be found in the resources section: Resources

3. Further Parsing of Result

The variable result contains all the extracted information of the CV as structured XML String. All possible xml-tags returned by the service are listed and explained in the XML schema file, which can be found in the resources section: Resources. By adding the following line to the end of the script the first name of the candidate will be printed to the command line interface:

import lxml.etree as et
tree = et.fromstring(result)
namespace = tree.xpath('namespace-uri(.)')
firstname = tree.find(".//{%s}firstname" % namespace).text
print 'candidate: ' + firstname

You can access every information from the CV you've just parsed this simple. For a full description of the XML structure, please refer to our XML schema definition. To find out what else you can use our web services for, have a look at our API description. You find code examples for your preferred language and framework in the examples section.

  • No labels