WebRobot is designed to be extensible so teams can ship vertical capabilities without forking the core engine.
This guide explains what can be extended and how to plan a supported plugin integration, while intentionally not exposing core engine implementation details.
Important (confidentiality / stability): We do not publish implementation-level examples yet. Until the plugin system can target a more abstract, stable integration interface, this documentation stays at the conceptual contract level (what you can do + how it behaves from the user/API perspective).
These plugins extend what a YAML pipeline can do by adding:
- Stages: new
stage: <name>entries usable inpipeline: - Attribute resolvers: new
method: <name>resolvers usable insideextract/flatSelect - Custom actions: new
fetch.traces[].actionentries (browser/action layer)
This is the mechanism used to add domain primitives (e.g. image scoring, clustering, specialized parsers), while keeping pipelines declarative.
These plugins add new REST endpoints that wrap and productize pipelines.
Typical responsibilities:
- expose a simplified API (e.g.
upload/execute/status/query/images) - orchestrate jobs and handle scheduling
- apply tenant/org rules and credentials injection
- provide domain-specific validation and defaults
Example: the EAN plugin (see guides/ean-image-sourcing.md).
Python extensions enable controlled, rapid iteration by registering python_row_transform:<name> functions at runtime. They are ideal for data cleaning/normalization/enrichment that changes frequently.
User-facing contract:
- a stage has a stable name
- it accepts a list of args
- it transforms the current dataset or navigation plan
Documentation requirements:
- stage name + supported aliases
argsschema (positional / map), defaults, validation rules- input/output schema changes
- operational constraints (requires browser, requires credentials, rate limits, etc.)
User-facing contract:
- used from
extract/flatSelectasmethod: "<resolver>" - may accept optional
args: [...] - returns a value (scalar/string/number/map/list) assigned to
as: "<field>"
Documentation requirements:
- resolver name + expected input (selector vs field)
- output type(s)
- optional args + examples at YAML level (no code)
User-facing contract:
- used under
fetch.tracesas{ action: "<name>", params: { ... } } - executed in order before the pipeline starts (or as part of navigation flows)
Documentation requirements:
- action name
- required/optional
params - safety considerations (timeouts, idempotency, rate limiting)
User-facing contract:
- stable endpoint paths under
/webrobot/api/<plugin>/... - request/response schemas in OpenAPI
- auth scopes and tenant isolation
- support for CloudCredentials selection/injection (where relevant)
Recommended endpoint set (pattern):
POST .../upload(ingest data)POST .../execute(run a job)GET .../status(observe last run)POST .../query(query latest dataset / filtered retrieval)POST .../downloador “dataset discovery + storagePath” pattern
- Semantic versioning: bump minor for backward-compatible additions; bump major for breaking changes.
- Stable names: treat
stage, resolvermethod, and traceactionnames as public API. - Deprecation: keep deprecated aliases for at least one minor cycle and document migration.
- No secret leakage: all credentials must be injected via CloudCredentials/secure runtime mechanisms; never hardcode keys in YAML/docs.
- Data licensing: plugins must document expected data sources and required rights (especially for training/fit datasets).
- Least privilege: enforce scopes for plugin endpoints; separate read/query from execute/upload.
- Implementation confidentiality: do not publish internal class names, registry wiring, or engine internals until the abstract integration interface is stable.
- Stage syntax and YAML constraints:
guides/pipeline-stages.md - Runnable pipelines:
guides/pipeline-examples.md - EAN plugin (API + dataset/images retrieval):
guides/ean-image-sourcing.md - Partner/technical integration overview:
guides/technical-partners.md