Module Resolution Specification (MRS)

1. Introduction

The [ASIMOV] Platform is a polyglot development platform for trustworthy, neurosymbolic AI.

This specification defines the algorithm for resolving data source URIs to ASIMOV modules using pattern matching. The resolution process enables the platform to automatically discover and select modules that are capable of extracting and transforming specific data sources into knowledge graph datasets.

1.1. Overview

The ASIMOV Module Resolution Specification defines a standardized algorithm for matching URIs against module capability declarations to determine which modules can handle specific resources. The resolution process enables:

Automatic Module Selection: Given a URI, the platform can automatically select appropriate modules for processing
Pattern-Based Matching: Supports exact matches, prefixes, and parameterized patterns for flexible resource handling
Conflict Resolution: Provides deterministic rules for selecting modules when multiple candidates are available
Extensible Architecture: Allows modules to declare new resource types and patterns

1.2. Scope

This specification covers:

The URI tokenization and normalization process
Pattern matching algorithms for different handler types
Module selection and conflict resolution rules
The data structures and state machines used in resolution

This specification does not cover:

The format of module manifests (see [ASIMOV-MMS])
Runtime execution of selected modules
Inter-module communication protocols

1.3. Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

A conforming resolver is one that implements the resolution algorithm defined in this specification and produces correct results for all valid inputs.

2. Resolution Algorithm

2.1. URI Tokenization

The resolution process begins with tokenizing the input URI into a sequence of sections that can be matched against module patterns.

2.1.1. Tokenization Process

Given a URI, the tokenizer MUST:

Extract the scheme: The protocol portion before the first colon (e.g., https, file, near)
Parse the authority: For hierarchical URIs, extract and reverse the domain components
Extract path segments: Split the path on forward slashes, ignoring empty segments
Extract query parameters: Parse query string into name-value pairs

2.1.2. Section Types

The tokenizer produces the following section types:

Protocol

The URI scheme (e.g., https, file, near)

Domain

A single domain component in reverse order (e.g., com, example from example.com)

Path

A single path segment (e.g., search, users from /search/users)

QueryParamName

The name of a query parameter (e.g., q from ?q=value)

QueryParamValue

The value of a query parameter (e.g., value from ?q=value)

2.1.3. Normalization Rules

During tokenization, the following normalization rules MUST be applied:

www Removal: For HTTP/HTTPS URIs, remove leading www. from the domain
Domain Reversal: Domain components are stored in reverse order (TLD first)
Empty Segment Filtering: Empty path segments are ignored
Query Parameter Ordering: Query parameters are processed in the order they appear

2.1.4. Tokenization Examples

# Input: https://example.com/search?q=test
# Output: [Protocol("https"), Domain("com"), Domain("example"),
#          Path("search"), QueryParamName("q"), QueryParamValue("test")]

# Input: near://account/alice.near
# Output: [Protocol("near"), Path("account"), Path("alice.near")]

# Input: file:///path/to/file.txt
# Output: [Protocol("file"), Path("path"), Path("to"), Path("file.txt")]

2.2. Pattern Types

Modules can declare different types of patterns for matching URIs:

2.2.1. Protocol Patterns

Protocol patterns match URIs based on their scheme. A protocol pattern matches any URI that begins with the specified protocol, effectively acting as a prefix match.

Example:

handles:
  url_protocols:
    - near
    - ipfs

2.2.2. Prefix Patterns

Prefix patterns match URIs that begin with a specific prefix. The matching is exact up to the end of the declared prefix, and any additional path segments or query parameters are ignored.

Example:

handles:
  url_prefixes:
    - https://api.github.com/
    - https://example.com/api/v1/

2.2.3. Parameterized Patterns

Parameterized patterns match URIs with variable components, allowing extraction of parameters from the URI structure.

2.2.3.1. Pattern Syntax

Parameterized patterns use the following syntax:

* in domain position: Matches zero or more subdomains
:name in path position: Matches any single path segment
:name in query value position: Matches any query parameter value

Example:

handles:
  url_patterns:
    - https://*.example.com/users/:id
    - https://search.example.com/?q=:query

2.2.3.2. Wildcard Matching

Wildcard domain patterns (*) match zero or more subdomain components. This enables matching of URIs with varying numbers of subdomains.

Wildcard path patterns (:name) match exactly one path segment with any value.

Wildcard query patterns (:name) match any value for a specific query parameter name.

2.2.4. File Extension Patterns

File extension patterns match URIs with file:// scheme based on the file extension. The extension is extracted from the last path segment.

Example:

handles:
  file_extensions:
    - csv
    - json
    - tar.gz

2.3. Resolution State Machine

The resolution algorithm uses a finite state machine to track possible matches as it processes the tokenized URI.

2.3.1. State Representation

Each state in the resolution process is represented by a node that contains:

Transitions: A mapping from section types to destination nodes
Modules: A set of modules that can handle URIs reaching this state
Free Moves: Special transitions that match any input without consuming it

2.3.2. State Transitions

The state machine processes input sections sequentially, following these rules:

Start with root states: Initialize with all root nodes whose patterns match the first input section
Process remaining input: For each subsequent input section, find all reachable states
Follow free moves: After each transition, follow any available free move transitions
Collect results: Gather all modules from states reached after processing all input

2.3.3. Free Move Semantics

Free moves are special transitions that enable:

Prefix matching: Allowing additional path segments beyond the declared prefix
Protocol matching: Treating protocols as prefixes that match any URI with that scheme
Wildcard domain repetition: Enabling * patterns to match multiple subdomain levels

2.4. Resolution Process

2.4.1. Input Processing

The resolution process follows these steps:

Tokenize URI: Convert the input URI into a sequence of sections
Handle file extensions: For file:// URIs, check file extension patterns first
Initialize state set: Find all root states that match the first input section
Process input sequence: For each remaining input section, advance the state machine
Collect results: Gather all modules from final states

2.4.2. Matching Rules

Section matching follows these precedence rules:

Exact matches: Literal sections match exactly
Wildcard matches: Wildcard sections match corresponding input types
Free moves: Always match without consuming input

The matching function for sections is defined as:

matches(pattern_section, input_section) :=
  pattern_section == input_section OR
  (pattern_section == WildcardDomain AND input_section is Domain) OR
  (pattern_section == WildcardPath AND input_section is Path) OR
  (pattern_section == WildcardQueryParamValue AND input_section is QueryParamValue) OR
  pattern_section == FreeMove

2.4.3. Conflict Resolution

When multiple modules match a URI, the resolver returns all matching modules. The selection of which module to use for processing is left to higher-level platform components.

However, for informational purposes, the following precedence rules are RECOMMENDED:

Specificity: More specific patterns take precedence over less specific ones
Pattern type precedence: Parameterized patterns > Prefix patterns > Protocol patterns
Path length: Longer paths take precedence over shorter ones

3. Examples

3.1. Basic Resolution Examples

3.1.1. Protocol Resolution

# Module declares:
handles:
  url_protocols:
    - near

# Resolves:
near://account/alice.near -> [near-module]
near://tx/ABC123 -> [near-module]
near -> [near-module]

3.1.2. Prefix Resolution

# Module declares:
handles:
  url_prefixes:
    - https://api.github.com/

# Resolves:
https://api.github.com/ -> [github-module]
https://api.github.com/users -> [github-module]
https://api.github.com/repos/owner/name -> [github-module]

3.1.3. Pattern Resolution

# Module declares:
handles:
  url_patterns:
    - https://youtube.com/watch?v=:video_id

# Resolves:
https://youtube.com/watch?v=ABC123 -> [youtube-module]

3.2. Advanced Resolution Examples

3.2.1. Wildcard Domains

# Module declares:
handles:
  url_patterns:
    - https://*.example.com/api/:endpoint

# Resolves:
https://example.com/api/users -> [api-module]
https://api.example.com/api/users -> [api-module]
https://v1.api.example.com/api/users -> [api-module]

3.2.2. Multiple Handlers

# Module A declares:
handles:
  url_protocols:
    - https

# Module B declares:
handles:
  url_prefixes:
    - https://example.com/

# Module C declares:
handles:
  url_patterns:
    - https://example.com/api/:endpoint

# Resolution:
https://example.com/api/users -> [Module A, Module B, Module C]
https://example.com/page -> [Module A, Module B]
https://other.com/page -> [Module A]

3.2.3. File Extensions

# Module declares:
handles:
  file_extensions:
    - csv
    - tar.gz

# Resolves:
file:///path/to/data.csv -> [csv-module]
file:///archive.tar.gz -> [csv-module]

3.3. Complex Resolution Scenario

Consider a comprehensive example with multiple module types:

# Search module
name: search-aggregator
handles:
  url_patterns:
    - https://google.com/search?q=:query
    - https://bing.com/search?q=:query

# Social media module
name: social-scraper
handles:
  url_prefixes:
    - https://twitter.com/
    - https://x.com/
  url_patterns:
    - https://youtube.com/watch?v=:video_id

# NEAR module
name: near-integration
handles:
  url_protocols:
    - near
  url_patterns:
    - https://explorer.near.org/accounts/:account

# File processor module
name: data-processor
handles:
  file_extensions:
    - csv
    - json

Resolution results:

https://google.com/search?q=ASIMOV → [search-aggregator]
https://x.com/username → [social-scraper]
https://youtube.com/watch?v=ABC123 → [social-scraper]
near://account/alice.near → [near-integration]
https://explorer.near.org/accounts/alice.near → [near-integration]
file:///data/export.csv → [data-processor]

4. Implementation Considerations

4.1. Data Structures

4.1.1. Resolver State

A conforming resolver implementation MUST maintain:

Module registry: A mapping from module names to module metadata
File extension index: A mapping from file extensions to lists of capable modules
State machine nodes: A collection of nodes representing the resolution state space
Root node registry: A mapping from initial sections to starting nodes

4.1.2. Node Structure

Each node in the state machine MUST contain:

Transition table: A mapping from section types to destination node identifiers
Module set: A collection of modules that can handle URIs reaching this node
Free move target: An optional reference to a node reachable via free move

4.1.3. Memory Management

Implementations SHOULD consider:

Shared module references: Avoid duplicating module metadata across nodes
Compact node representation: Use efficient data structures for transition tables
Lazy evaluation: Only compute reachable states when needed

4.2. Performance Considerations

4.2.1. Algorithmic Complexity

The resolution algorithm has the following complexity characteristics:

Time complexity: O(n × m) where n is the number of input sections and m is the number of active states
Space complexity: O(k) where k is the total number of registered patterns
Preprocessing: O(p) where p is the number of patterns to register

4.2.2. Optimization Strategies

Implementations MAY employ:

Early termination: Stop processing when no more states are reachable
State deduplication: Merge identical states during construction
Transition caching: Cache frequently used transition computations
Batch processing: Process multiple URIs in batches to amortize setup costs

4.3. Error Handling

4.3.1. Invalid URIs

The resolver MUST handle invalid URIs gracefully:

Malformed URIs: Return an error indicating the URI cannot be parsed
Empty URIs: Return an error indicating the URI is empty
Unsupported schemes: Attempt resolution but may return no results

4.3.2. Resolution Failures

When no modules can handle a URI:

Return empty result: The resolver SHOULD return an empty list rather than an error
Logging: Implementations MAY log unsuccessful resolution attempts for debugging
Fallback modules: Implementations MAY provide fallback modules for common cases

5. Security Considerations

5.1. Pattern Injection

Implementations MUST prevent pattern injection attacks:

Input validation: Validate all pattern strings before registration
Sanitization: Remove or escape potentially dangerous characters
Pattern limits: Impose reasonable limits on pattern complexity

5.2. Resource Consumption

The resolution algorithm MUST protect against resource exhaustion:

State explosion: Limit the number of active states during resolution
Pattern complexity: Impose limits on pattern depth and branching factor
Memory usage: Implement bounds on memory consumption for large pattern sets

5.3. URI Validation

Input URIs SHOULD be validated before processing:

Scheme validation: Ensure schemes conform to RFC 3986
Length limits: Impose reasonable limits on URI length
Character encoding: Handle Unicode characters appropriately

6. IANA Considerations

This specification does not require any IANA registrations.

7. Acknowledgments

The editors would like to thank the ASIMOV Platform community for their contributions and feedback during the development of this specification.

8. Changes

This section will document changes between versions of this specification.

8.1. Version 1.0

Initial version of the ASIMOV Module Resolution Specification.

ASIMOV Module Resolution Specification (MRS)

Living Standard, 27 June 2025

Abstract

1. Introduction

1.1. Overview

1.2. Scope

1.3. Conformance

2. Resolution Algorithm

2.1. URI Tokenization

2.1.1. Tokenization Process

2.1.2. Section Types

2.1.3. Normalization Rules

2.1.4. Tokenization Examples

2.2. Pattern Types

2.2.1. Protocol Patterns

2.2.2. Prefix Patterns

2.2.3. Parameterized Patterns

2.2.3.1. Pattern Syntax

2.2.3.2. Wildcard Matching

2.2.4. File Extension Patterns

2.3. Resolution State Machine

2.3.1. State Representation

2.3.2. State Transitions

2.3.3. Free Move Semantics

2.4. Resolution Process

2.4.1. Input Processing

2.4.2. Matching Rules

2.4.3. Conflict Resolution

3. Examples

3.1. Basic Resolution Examples

3.1.1. Protocol Resolution

3.1.2. Prefix Resolution

3.1.3. Pattern Resolution

3.2. Advanced Resolution Examples

3.2.1. Wildcard Domains

3.2.2. Multiple Handlers

3.2.3. File Extensions

3.3. Complex Resolution Scenario

4. Implementation Considerations

4.1. Data Structures

4.1.1. Resolver State

4.1.2. Node Structure

4.1.3. Memory Management

4.2. Performance Considerations

4.2.1. Algorithmic Complexity

4.2.2. Optimization Strategies

4.3. Error Handling

4.3.1. Invalid URIs

4.3.2. Resolution Failures

5. Security Considerations

5.1. Pattern Injection

5.2. Resource Consumption

5.3. URI Validation

6. IANA Considerations

7. Acknowledgments

8. Changes

8.1. Version 1.0

Conformance

Index

Terms defined by this specification

References

Normative References

Informative References