Image OCR

Use the Image OCR integration to extract text from images. The integration utilizes the open-source tesseract OCR engine.

Use Cases

  • Extract text from images included in emails during a phishing investigation.
  • Extract text from images included in an html page.

Configure Image OCR on Demisto

  1. Navigate to Settings > Integrations > Servers & Services .
  2. Search for Image OCR.
  3. Click Add instance to create and configure a new integration instance.
    • Name : a textual name for the integration instance.
    • A CSV of language codes of the language to use for OCR (leave empty to use defaults). Default language is English.
  4. Click Test to validate that the configuration is valid.

Note : The default language used for OCR is English. To configure additional languages, in the Languages option specify a CSV list of language codes. For example, to set the integration for English and French, set this value: eng,fra . To see all supported language codes, use the following command:

!image-ocr-list-languages

Commands

You can execute these commands from the Demisto CLI, as part of an automation, or in a playbook. After you successfully execute a command, a DBot message appears in the War Room with the command details.

  1. Get a list of supported OCR languages: image-ocr-list-languages
  2. Extract text from an image: image-ocr-extract-text

1. Get a list of supported OCR languages


Lists supported languages for which the integration can extract text.

Base Command

image-ocr-list-languages

Input

There are no input arguments for this command.

Context Output

There is no context output for this command.

Command Example
image-ocr-list-languages
Human Readable Output

Image OCR Supported Languages

  • ara
  • chi_sim
  • chi_sim_vert
  • chi_tra
  • chi_tra_vert
  • deu
  • eng
  • fra
  • heb
  • ita
  • jpn
  • jpn_vert
  • osd
  • rus
  • spa
  • tur

2. Extract text from an image


Extracts text from an image.

Base Command

image-ocr-extract-text

Input
Argument Name Description Required
entryid Entry ID of the image file to process. Required
langs A CSV of language codes of the language to use for OCR. Overrides default language. languages. Optional

Context Output
Path Type Description
File.Text String Extracted text from the passed image file.

Command Example

image-ocr-extract-text entryid="922@e84104f7-b235-4d82-860a-ea09f5dc0559"

Context Example
{
    "File": {
        "Text": "The quick brown fox\njumped over the 5\nlazy dogs!\n\f", 
        "EntryID": "922@e84104f7-b235-4d82-860a-ea09f5dc0559"
    }
}
Human Readable Output

Image OCR Extracted Text

The quick brown fox
jumped over the 5
lazy dogs!