MyScript’s recognition technology is very flexible. While the default configurations support common use cases, this page explains how you can fine tune them to address specific needs.
MyScript Developer Portal lets you download recognition assets to support a wide range of languages, as well as math and diagram use cases. Each pack comes with a ready-to-use configuration that will work in most cases.
However, there are a few situations where you may want to adapt these provided configurations:
- You need the engine to recognize some vocabulary that is not included within the default MyScript lexicons, like proper names. In this case, you may build and attach a custom lexicon.
- You target different education levels with a math application and want to restrict the amount of symbols that MyScript can recognize: this will reduce some possible ambiguities (many math symbols are very similar) and improve the overall user experience. In that case, you can build and attach a custom math grammar.
You are building a form application and want to reduce some fields to only accept certain types of symbols, such as alphanumerical symbols, digits or even capital letters. In this case, consider building and attaching a subset knowledge.
- You need more or less recognition candidates to be made available to the end user, or you plan to index the recognition results for search purposes and want to just consider the top n candidates. You may edit the configuration accordingly.
As explained in the runtime part of the guide, iink SDK consumes configuration files, a standardized way to provide the right parameters and knowledge to recognize a specific type of content.
To deploy and use a configuration, you need to:
- Deploy the
*.conffile with your application, along with all the resource files that it references (make sure that all paths are correct).
- Add the folder containing the
*.conffile to the paths stored in the engine configuration for the
- Depending on the content type, set the right configuration keys. For instance, to recognize text (in “Text”, “Diagram” and “Text Document” parts)
you will need to ensure that the values of the
text.configuration.namekeys are matching your text configuration bundle and configuration item name (see example below).
A configuration file is a text file with a
*.conf extension. It is composed of a header (identifying a configuration bundle) and one or more named
configuration items (defining configuration names) separated by empty lines.
Here is an example:
# Bundle header Bundle-Version: 1.2 Bundle-Name: en_US Configuration-Script: AddResDir ../resources/ # Configuration item #1 Name: text Type: Text Configuration-Script: AddResource en_US/en_US-ak-cur.res AddResource en_US/en_US-lk-text.res SetTextListSize 1 SetWordListSize 5 SetCharListSize 1 # Configuration item #2 Name: text-no-candidate Type: Text Configuration-Script: AddResource en_US/en_US-ak-cur.res AddResource en_US/en_US-lk-text.res SetTextListSize 1 SetWordListSize 1 SetCharListSize 1
- Lines starting with
!are considered as comments and ignored.
- Lines starting with a space are continuation lines. Here, several commands are gathered under
- The value provided as
Bundle-nameis the name of the bundle. This is what iink SDK expects as a possible value for the
text.configuration.bundleconfiguration key. In this example, it would be
- The value provided as
Namedefines a configuration item. This is one of these names that iink SDK expects as a possible value for the
text.configuration.nameconfiguration key. In this example, it could be
text-no-candidate. A given engine can only be configured with a single configuration item for each type of recognizer at any point in time.
- Possible values for the
Analyzer. They correspond to the types of content that the core MyScript technology is able to recognize.
The following tables lists the types of configuration items that you need to provide for iink SDK to support its different content types:
|Content type||Required configuration item types|
The table below lists some possible configuration commands (to be placed under
|Configuration item type||Syntax||Argument|
||Folder that the engine shall consider for resource files relative paths|
||Name of an individual resource file to attach|
||An integer between 1 and 20, representing the number of character candidates to keep|
||An integer between 1 and 20, representing the number of word candidates to keep|
||An integer between 1 and 20, representing the number of text candidates to keep|
Resources are pieces of knowledge that can be attached to the recognition engine to make it able to recognize a given language or content.
They are attached in the
Configuration-Script part of the configuration items by using the
For example, in the case of an
en_US AK, you would write:
An Alphabet knowledge (AK) is a resource that enables the engine to recognize individual characters for a given language and a given writing style. Default configurations include a cursive AK for each supported language.
A Linguistic knowledge (LK) is a resource that provides the engine with linguistic information for a given language. It allows the recognition engine to improve its accuracy by favoring words from its lexicon that are the most likely to occur. Default configurations include an LK for each supported language.
An LK is not mandatory but not attaching one often results in a significant accuracy drop. It may be relevant if you do not expect to write full meaningful words, for instance if you plan to filter a list with a few letters.
A lexicon is a resource that lists words that can be recognized in addition to what is included into linguistic knowledge resources.
You can build and attach your own custom lexicons.
A subset knowledge (SK) is a resource that restricts the number of text characters that the engine shall attempt to recognize. It thus corresponds to a restriction of an AK resource. It can be useful in a form application, for example, to restrict the authorized characters of an email field to alphanumerical characters, @ and a few allowed punctuation signs.
You can build and attach your own custom subset knowledge.
A math grammar is a resource that restricts the number of math symbols and rules that the engine shall be able to process. In education use cases, it can prove very useful to adapt the recognition to a given math level (for instance, only digits and basic operators for pupils).
You can build and attach your own custom math grammars.