Frequently Asked Questions

How do I parse the result?

All possible result elements for the current extraction model “cvlizer_3_0” are described in the XML Schema Definition File (XSD) “jv_hr_3_0.xsd” which can be downloaded from the resources section. The XSD file may be used to create class files or data structures necessary to automatically handle or at least map the provided extraction results. What to create, and how, depends on what tool and programming language you are using. The JDK for instance contains an app called xjc to create fully annotated Java class files, for C# Visual Studio comes with the "xsd.exe" tool to do so.

Please note that some result element values will be denoted as abbreviations or codes,for example ISO code “AT” for “Austria” as the country value. A ZIP-archive containing CSV-files with these codes and abbreviations can be downloaded from the resources section under “Domains”.

Which input file formats are supported?

Our extraction services support the following plaintext and binary input formats, regardless of the applied semantic extraction mode:

  • txt (Plaintext)

  • html (Hypertext Version 5 including JavaScript and CSS3)

  • rtf (Rich Text Format)

  • odt (Open Document Format, Writer)

  • doc (Microsoft Office Binaries, Word)

  • docx (Microsoft Office Open XML, Word)

  • pdf (Adobe Portable Document Format, also including embedded images)

  • png (Scanned images)

  • jp(e)g (Scanned images)

  • url (in case of HTML pages a respective filter, e.g. the Cutter Module, is applied)

  • eml (Standard mail format, may contain a text body and multiple attached files)

  • msg (Microsoft Outlook mail messages)

  • zip (Compressed set of files, may contain all other file types)

All text processing formats and PDF may also contain scanned images, which will be detected and converted.

These file types are supported by the “categorize (REST/SOAP)” and “merge (REST) / mergeToXML (SOAP)” methods. In case of embedded archival formats (i.e. attached messages, .zip in mails), the files will be parsed recursively.

Which languages are supported?

The JoinVision extraction services support the following output languages for structured information (codes):

  • DE (German)

  • EN (English)

This does not affect any non-transformed information, as JoinVision does not provide any kind of translation service. Non-transformed information remains in the original language of the provided document. Only data of type or inheriting from type “codeNamePair” is provided in one of the languages stated above, based on the parameter provided to the web service

The JoinVision extraction services support the following input languages (document languages):

  • Afrikaans

  • Albanian

  • Arabic

  • Belarusian

  • Bulgarian

  • Catalan

  • Chinese (simplified and traditional)

  • Croatian

  • Czech

  • Danish

  • Dutch

  • English

  • Estonian

  • Finnish

  • French

  • German

  • Greek

  • Hebrew

  • Hindi

  • Hungarian

  • Indonesian

  • Irish

  • Italian

  • Japanese

  • Korean

  • Latvian

  • Lithuanian

  • Macedonian

  • Norwegian

  • Persian

  • Polish

  • Portuguese

  • Romanian

  • Russian

  • Serbian

  • Slovak

  • Slovenian

  • Spanish

  • Swahili

  • Swedish

  • Thai

  • Turkish

  • Ukrainian

  • Vietnamese

What information is extracted?

The extracted fields depend on the used extraction model.

CVlizer extracts the following information from CVs (without attachments):

  • Personal Information: First name, Last name, Gender*, Academic Titles, International Education Level (ISCED)*, Birthday*, Birth Place*, Nationality*, Civil States*, Children*, Full Address (Street, Number, Postcode, State, Country*), Phone/Cell Numbers, Fax Numbers, E-Mail Addresses, Personal Homepage, Social Media Links

  • All Work and Project Phases: Date From, Date To (both Strict* and Fuzzy), Duration in Months*, First Order and Sub Phases*, Company Name, Function/Position Title, Position Level*, Type of Employment/Contract*, Work Time/Volume of Work*, Project Focused Work*, Project Topic, Work Location (City, Postcode, State, Country*), Skills*, Operation Areas/Operational Fields*, Industry/Branch* (NACE), Customer-Specific Codes*, Free Text Comments, Full Phase Text, Internet Resources/URLs

  • All Educational Phases: Date From, Date To (both Strict* and Fuzzy), Duration in Months*, First Order and Sub Phases, School/Academic Institute Name, Graduation/Degree, Achieved International Education Level (ISCED)*, Education Successfully Completed (y/n)*, Topic, Education Location (City, Postcode, State, Country*), Achieved Skills*, Trained Operation Areas/Operational Fields*, Trained Industry/Branch* (NACE), Customer-Specific Codes*, Free Text Comments, Full Phase Text, Internet Resources/URLs

  • List of Academic Publications: Date*, Title/Topic, Academic Institute, Proceedings, List of Authors, Conference Location (City, Postcode, State, Country*), Skills mentioned*, Operation Areas mentioned*, Branch* (NACE), Customer-Specific Codes*, Free Text Comments, Full Phase Text, Internet Resources/URLs

  • Other Information: Language Skills* (including Skill Level/CEFR*), Driving Licenses, Military/Civil Service Attendance*, List of Additional Competences, Personal Interests, Additional Textual Information like References, Additional Skills*, Additional Operation Area Experiences*, Additional Industry Experience* (NACE), Additional Internet Resources

  • Statistics: Skills/Competences*, Operation Areas*, Industry Experience*, International Experience*

  • Job Objectives: Salary, Availability Date*, Function, Position Level*, Type of Employment/Contract*, Work Time/Volume of Work*, Industry/Branch* (NACE), Required Skills* and Operation Fields*, Preferred Location (City, Postcode, State, Country*)

*Standardized values