Image Recognition

Understanding how Qontinui identifies UI elements visually

How Image Recognition Works

Qontinui uses template matching to find UI elements on screen. Instead of relying on element IDs, CSS selectors, or hardcoded coordinates, Qontinui visually compares screenshot images against the current screen to locate buttons, dialogs, and other elements.

Why Visual Recognition?

•Resolution independent: Works across different screen sizes and DPI settings
•UI framework agnostic: Works with any application (web, desktop, games)
•Resilient to minor changes: Similarity thresholds tolerate small visual variations
•Simple to configure: Just capture screenshots, no need to inspect DOM or learn APIs

The Template Matching Process

1.Qontinui captures the current screen (or a region)
2.Your pattern image is compared pixel-by-pixel using OpenCV
3.A similarity score (0.0-1.0) is calculated based on how closely pixels match
4.If the score exceeds your threshold, the element is considered "found"
5.The location (x, y coordinates) is returned for clicking or verification

Key Concepts

Similarity Threshold

Controls how closely the screen must match your pattern image

0.70 - 0.80Fuzzy Matching

More tolerant of variations. Use for elements that change slightly (e.g., different themes, slight color shifts).

0.85 - 0.90Standard Matching (Recommended)

Balanced accuracy. Works well for most UI elements with minor anti-aliasing or compression differences.

0.95+Exact Matching

Very strict. Only use when elements must match perfectly (e.g., pixel-perfect logos, fixed graphics).

Patterns

Individual image templates within a StateImage

Each StateImage can contain multiple patterns representing different variations of the same element. This enables matching across:

•Light and dark themes
•Different languages (button text changes)
•Hover/active/disabled states
•Different resolutions or zoom levels

Masks

Optional masks that define which pixels to compare

Masks allow you to ignore certain parts of an image during matching. White pixels (255) are compared, black pixels (0) are ignored. Use masks to:

•Ignore dynamic text within buttons
•Match partial elements (e.g., icon only, not surrounding area)
•Handle elements with variable content

Search Regions

Rectangular areas that limit where to search for images

Search regions improve performance and accuracy by restricting template matching to specific screen areas. Benefits include:

•Faster execution (less area to search)
•Avoid false matches from similar elements elsewhere
•Multi-monitor support (search specific monitor)
•Focus on relevant UI sections (sidebars, toolbars, content areas)

Best Practices for Capturing Images

Crop tightly to the essential element

Capture just the button, icon, or label - not the entire screen or large surrounding areas. Smaller images match faster and more accurately.

Use PNG format for best quality

PNG preserves exact pixel values without compression artifacts. JPEG compression can cause template matching failures.

Capture at the same resolution you'll run automation

If possible, capture screenshots at the same screen resolution and DPI settings where automation will execute. Enable multi-scale search if resolution varies.

Include unique visual features

Capture elements with distinctive colors, shapes, or text. Avoid generic elements that appear multiple times on screen.

Use multiple patterns for dynamic elements

If an element changes appearance (theme, language, state), capture multiple patterns and add them all to the StateImage.

Avoid blurry or low-contrast images

Ensure screenshots are crisp and clear. Motion blur, poor lighting, or low contrast reduces matching accuracy.

Don't capture animated elements mid-animation

Wait for animations to complete before capturing. Animated elements should be captured in their stable, final state.

Be careful with text-based elements

Text can vary by font rendering, anti-aliasing, and locale. Consider using masks to ignore text or capture icon-only portions.

Troubleshooting Recognition Issues

Problem: Image not found (false negative)

Possible Solutions:

•Lower the similarity threshold (try 0.8 instead of 0.9)
•Check if the UI element has changed appearance slightly
•Verify the pattern image matches what's currently on screen
•Add a search region if searching the entire screen
•Enable multi-scale search if resolution differs
•Capture a new screenshot if the element has been updated

Problem: Wrong element matched (false positive)

Possible Solutions:

•Raise the similarity threshold (try 0.95 instead of 0.85)
•Capture a more specific image with unique visual features
•Use search regions to limit matching to the correct screen area
•Make the pattern image larger to include more context
•Avoid matching very generic elements (plain buttons, white boxes)

Problem: Matching is too slow

Possible Solutions:

•Use search regions to limit the search area
•Make pattern images smaller (crop to essential features)
•Disable multi-scale search if resolution doesn't vary
•Switch to grayscale color space (faster but less accurate)
•Mark static elements as fixed=true for optimization

Problem: Works on one machine but not another

Possible Solutions:

•Verify screen resolution and DPI settings match
•Enable multi-scale search for resolution independence
•Check if theme or appearance settings differ (light/dark mode)
•Add multiple patterns for each visual variation
•Lower similarity threshold to tolerate minor rendering differences

Advanced Recognition Settings

multi_scale_search

Boolean

Search for images at multiple scales/resolutions. Enables resolution-independent matching but significantly slower.

Default: true

color_space

String

Color space for comparison: 'rgb' (most accurate), 'grayscale' (fastest), 'hsv' (lighting independent).

Default: rgb

edge_detection

Boolean

Use edge detection preprocessing. Helps when lighting conditions vary but may reduce accuracy for color-based matching.

Default: false

default_threshold

Float (0.0-1.0)

Global default threshold for all images. Individual StateImages can override this value.

Default: 0.85

Note: Most users should keep default settings. Only adjust these if you're experiencing specific recognition issues or need to optimize performance.

Next Steps

Working with States

Learn how to use images for state identification

Action Types

Use FIND, EXISTS, and VANISH actions for recognition

How Image Recognition Works

Why Visual Recognition?

•Resolution independent: Works across different screen sizes and DPI settings
•UI framework agnostic: Works with any application (web, desktop, games)
•Resilient to minor changes: Similarity thresholds tolerate small visual variations
•Simple to configure: Just capture screenshots, no need to inspect DOM or learn APIs

The Template Matching Process

1.Qontinui captures the current screen (or a region)
2.Your pattern image is compared pixel-by-pixel using OpenCV
3.A similarity score (0.0-1.0) is calculated based on how closely pixels match
4.If the score exceeds your threshold, the element is considered "found"
5.The location (x, y coordinates) is returned for clicking or verification

Key Concepts

Similarity Threshold

Controls how closely the screen must match your pattern image

0.70 - 0.80Fuzzy Matching

More tolerant of variations. Use for elements that change slightly (e.g., different themes, slight color shifts).

0.85 - 0.90Standard Matching (Recommended)

Balanced accuracy. Works well for most UI elements with minor anti-aliasing or compression differences.

0.95+Exact Matching

Very strict. Only use when elements must match perfectly (e.g., pixel-perfect logos, fixed graphics).

Patterns

Individual image templates within a StateImage

Each StateImage can contain multiple patterns representing different variations of the same element. This enables matching across:

•Light and dark themes
•Different languages (button text changes)
•Hover/active/disabled states
•Different resolutions or zoom levels

Masks

Optional masks that define which pixels to compare

Masks allow you to ignore certain parts of an image during matching. White pixels (255) are compared, black pixels (0) are ignored. Use masks to:

•Ignore dynamic text within buttons
•Match partial elements (e.g., icon only, not surrounding area)
•Handle elements with variable content

Search Regions

Rectangular areas that limit where to search for images

Search regions improve performance and accuracy by restricting template matching to specific screen areas. Benefits include:

•Faster execution (less area to search)
•Avoid false matches from similar elements elsewhere
•Multi-monitor support (search specific monitor)
•Focus on relevant UI sections (sidebars, toolbars, content areas)

Best Practices for Capturing Images

Crop tightly to the essential element

Capture just the button, icon, or label - not the entire screen or large surrounding areas. Smaller images match faster and more accurately.

Use PNG format for best quality

PNG preserves exact pixel values without compression artifacts. JPEG compression can cause template matching failures.

Capture at the same resolution you'll run automation

If possible, capture screenshots at the same screen resolution and DPI settings where automation will execute. Enable multi-scale search if resolution varies.

Include unique visual features

Capture elements with distinctive colors, shapes, or text. Avoid generic elements that appear multiple times on screen.

Use multiple patterns for dynamic elements

If an element changes appearance (theme, language, state), capture multiple patterns and add them all to the StateImage.

Avoid blurry or low-contrast images

Ensure screenshots are crisp and clear. Motion blur, poor lighting, or low contrast reduces matching accuracy.

Don't capture animated elements mid-animation

Wait for animations to complete before capturing. Animated elements should be captured in their stable, final state.

Be careful with text-based elements

Text can vary by font rendering, anti-aliasing, and locale. Consider using masks to ignore text or capture icon-only portions.

Troubleshooting Recognition Issues

Problem: Image not found (false negative)

Possible Solutions:

•Lower the similarity threshold (try 0.8 instead of 0.9)
•Check if the UI element has changed appearance slightly
•Verify the pattern image matches what's currently on screen
•Add a search region if searching the entire screen
•Enable multi-scale search if resolution differs
•Capture a new screenshot if the element has been updated

Problem: Wrong element matched (false positive)

Possible Solutions:

•Raise the similarity threshold (try 0.95 instead of 0.85)
•Capture a more specific image with unique visual features
•Use search regions to limit matching to the correct screen area
•Make the pattern image larger to include more context
•Avoid matching very generic elements (plain buttons, white boxes)

Problem: Matching is too slow

Possible Solutions:

•Use search regions to limit the search area
•Make pattern images smaller (crop to essential features)
•Disable multi-scale search if resolution doesn't vary
•Switch to grayscale color space (faster but less accurate)
•Mark static elements as fixed=true for optimization

Problem: Works on one machine but not another

Possible Solutions:

•Verify screen resolution and DPI settings match
•Enable multi-scale search for resolution independence
•Check if theme or appearance settings differ (light/dark mode)
•Add multiple patterns for each visual variation
•Lower similarity threshold to tolerate minor rendering differences

Advanced Recognition Settings

multi_scale_search

Boolean

Search for images at multiple scales/resolutions. Enables resolution-independent matching but significantly slower.

Default: true

color_space

String

Color space for comparison: 'rgb' (most accurate), 'grayscale' (fastest), 'hsv' (lighting independent).

Default: rgb

edge_detection

Boolean

Use edge detection preprocessing. Helps when lighting conditions vary but may reduce accuracy for color-based matching.

Default: false

default_threshold

Float (0.0-1.0)

Global default threshold for all images. Individual StateImages can override this value.

Default: 0.85

Note: Most users should keep default settings. Only adjust these if you're experiencing specific recognition issues or need to optimize performance.