ASIMOV Module Resolution Specification (MRS)

Living Standard,

This version:
https://asimov-specs.github.io/module-resolution/
Issue Tracking:
GitHub
Editors:
(ASIMOV Systems)
(ASIMOV Systems)

Abstract

Defines the algorithm for resolving data source URIs to ASIMOV modules using pattern matching, enabling automatic discovery and selection of modules capable of processing specific resource types.

1. Introduction

The [ASIMOV] Platform is a polyglot development platform for trustworthy, neurosymbolic AI.

This specification defines the algorithm for resolving data source URIs to ASIMOV modules using pattern matching. The resolution process enables the platform to automatically discover and select modules that are capable of extracting and transforming specific data sources into knowledge graph datasets.

1.1. Overview

The ASIMOV Module Resolution Specification defines a standardized algorithm for matching URIs against module capability declarations to determine which modules can handle specific resources. The resolution process enables:

1.2. Scope

This specification covers:

This specification does not cover:

1.3. Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

A conforming resolver is one that implements the resolution algorithm defined in this specification and produces correct results for all valid inputs.

2. Resolution Algorithm

2.1. URI Tokenization

The resolution process begins with tokenizing the input URI into a sequence of sections that can be matched against module patterns.

2.1.1. Tokenization Process

Given a URI, the tokenizer MUST:

  1. Extract the scheme: The protocol portion before the first colon (e.g., https, file, near)

  2. Parse the authority: For hierarchical URIs, extract and reverse the domain components

  3. Extract path segments: Split the path on forward slashes, ignoring empty segments

  4. Extract query parameters: Parse query string into name-value pairs

2.1.2. Section Types

The tokenizer produces the following section types:

Protocol

The URI scheme (e.g., https, file, near)

Domain

A single domain component in reverse order (e.g., com, example from example.com)

Path

A single path segment (e.g., search, users from /search/users)

QueryParamName

The name of a query parameter (e.g., q from ?q=value)

QueryParamValue

The value of a query parameter (e.g., value from ?q=value)

2.1.3. Normalization Rules

During tokenization, the following normalization rules MUST be applied:

  1. www Removal: For HTTP/HTTPS URIs, remove leading www. from the domain

  2. Domain Reversal: Domain components are stored in reverse order (TLD first)

  3. Empty Segment Filtering: Empty path segments are ignored

  4. Query Parameter Ordering: Query parameters are processed in the order they appear

2.1.4. Tokenization Examples

# Input: https://example.com/search?q=test
# Output: [Protocol("https"), Domain("com"), Domain("example"),
#          Path("search"), QueryParamName("q"), QueryParamValue("test")]

# Input: near://account/alice.near
# Output: [Protocol("near"), Path("account"), Path("alice.near")]

# Input: file:///path/to/file.txt
# Output: [Protocol("file"), Path("path"), Path("to"), Path("file.txt")]

2.2. Pattern Types

Modules can declare different types of patterns for matching URIs:

2.2.1. Protocol Patterns

Protocol patterns match URIs based on their scheme. A protocol pattern matches any URI that begins with the specified protocol, effectively acting as a prefix match.

Example:

handles:
  url_protocols:
    - near
    - ipfs

2.2.2. Prefix Patterns

Prefix patterns match URIs that begin with a specific prefix. The matching is exact up to the end of the declared prefix, and any additional path segments or query parameters are ignored.

Example:

handles:
  url_prefixes:
    - https://api.github.com/
    - https://example.com/api/v1/

2.2.3. Parameterized Patterns

Parameterized patterns match URIs with variable components, allowing extraction of parameters from the URI structure.

2.2.3.1. Pattern Syntax

Parameterized patterns use the following syntax:

Example:

handles:
  url_patterns:
    - https://*.example.com/users/:id
    - https://search.example.com/?q=:query
2.2.3.2. Wildcard Matching

Wildcard domain patterns (*) match zero or more subdomain components. This enables matching of URIs with varying numbers of subdomains.

Wildcard path patterns (:name) match exactly one path segment with any value.

Wildcard query patterns (:name) match any value for a specific query parameter name.

2.2.4. File Extension Patterns

File extension patterns match URIs with file:// scheme based on the file extension. The extension is extracted from the last path segment.

Example:

handles:
  file_extensions:
    - csv
    - json
    - tar.gz

2.3. Resolution State Machine

The resolution algorithm uses a finite state machine to track possible matches as it processes the tokenized URI.

2.3.1. State Representation

Each state in the resolution process is represented by a node that contains:

2.3.2. State Transitions

The state machine processes input sections sequentially, following these rules:

  1. Start with root states: Initialize with all root nodes whose patterns match the first input section

  2. Process remaining input: For each subsequent input section, find all reachable states

  3. Follow free moves: After each transition, follow any available free move transitions

  4. Collect results: Gather all modules from states reached after processing all input

2.3.3. Free Move Semantics

Free moves are special transitions that enable:

2.4. Resolution Process

2.4.1. Input Processing

The resolution process follows these steps:

  1. Tokenize URI: Convert the input URI into a sequence of sections

  2. Handle file extensions: For file:// URIs, check file extension patterns first

  3. Initialize state set: Find all root states that match the first input section

  4. Process input sequence: For each remaining input section, advance the state machine

  5. Collect results: Gather all modules from final states

2.4.2. Matching Rules

Section matching follows these precedence rules:

  1. Exact matches: Literal sections match exactly

  2. Wildcard matches: Wildcard sections match corresponding input types

  3. Free moves: Always match without consuming input

The matching function for sections is defined as:

matches(pattern_section, input_section) :=
  pattern_section == input_section OR
  (pattern_section == WildcardDomain AND input_section is Domain) OR
  (pattern_section == WildcardPath AND input_section is Path) OR
  (pattern_section == WildcardQueryParamValue AND input_section is QueryParamValue) OR
  pattern_section == FreeMove

2.4.3. Conflict Resolution

When multiple modules match a URI, the resolver returns all matching modules. The selection of which module to use for processing is left to higher-level platform components.

However, for informational purposes, the following precedence rules are RECOMMENDED:

  1. Specificity: More specific patterns take precedence over less specific ones

  2. Pattern type precedence: Parameterized patterns > Prefix patterns > Protocol patterns

  3. Path length: Longer paths take precedence over shorter ones

3. Examples

3.1. Basic Resolution Examples

3.1.1. Protocol Resolution

# Module declares:
handles:
  url_protocols:
    - near

# Resolves:
near://account/alice.near -> [near-module]
near://tx/ABC123 -> [near-module]
near -> [near-module]

3.1.2. Prefix Resolution

# Module declares:
handles:
  url_prefixes:
    - https://api.github.com/

# Resolves:
https://api.github.com/ -> [github-module]
https://api.github.com/users -> [github-module]
https://api.github.com/repos/owner/name -> [github-module]

3.1.3. Pattern Resolution

# Module declares:
handles:
  url_patterns:
    - https://youtube.com/watch?v=:video_id

# Resolves:
https://youtube.com/watch?v=ABC123 -> [youtube-module]

3.2. Advanced Resolution Examples

3.2.1. Wildcard Domains

# Module declares:
handles:
  url_patterns:
    - https://*.example.com/api/:endpoint

# Resolves:
https://example.com/api/users -> [api-module]
https://api.example.com/api/users -> [api-module]
https://v1.api.example.com/api/users -> [api-module]

3.2.2. Multiple Handlers

# Module A declares:
handles:
  url_protocols:
    - https

# Module B declares:
handles:
  url_prefixes:
    - https://example.com/

# Module C declares:
handles:
  url_patterns:
    - https://example.com/api/:endpoint

# Resolution:
https://example.com/api/users -> [Module A, Module B, Module C]
https://example.com/page -> [Module A, Module B]
https://other.com/page -> [Module A]

3.2.3. File Extensions

# Module declares:
handles:
  file_extensions:
    - csv
    - tar.gz

# Resolves:
file:///path/to/data.csv -> [csv-module]
file:///archive.tar.gz -> [csv-module]

3.3. Complex Resolution Scenario

Consider a comprehensive example with multiple module types:

# Search module
name: search-aggregator
handles:
  url_patterns:
    - https://google.com/search?q=:query
    - https://bing.com/search?q=:query

# Social media module
name: social-scraper
handles:
  url_prefixes:
    - https://twitter.com/
    - https://x.com/
  url_patterns:
    - https://youtube.com/watch?v=:video_id

# NEAR module
name: near-integration
handles:
  url_protocols:
    - near
  url_patterns:
    - https://explorer.near.org/accounts/:account

# File processor module
name: data-processor
handles:
  file_extensions:
    - csv
    - json

Resolution results:

4. Implementation Considerations

4.1. Data Structures

4.1.1. Resolver State

A conforming resolver implementation MUST maintain:

4.1.2. Node Structure

Each node in the state machine MUST contain:

4.1.3. Memory Management

Implementations SHOULD consider:

4.2. Performance Considerations

4.2.1. Algorithmic Complexity

The resolution algorithm has the following complexity characteristics:

4.2.2. Optimization Strategies

Implementations MAY employ:

4.3. Error Handling

4.3.1. Invalid URIs

The resolver MUST handle invalid URIs gracefully:

4.3.2. Resolution Failures

When no modules can handle a URI:

5. Security Considerations

5.1. Pattern Injection

Implementations MUST prevent pattern injection attacks:

5.2. Resource Consumption

The resolution algorithm MUST protect against resource exhaustion:

5.3. URI Validation

Input URIs SHOULD be validated before processing:

6. IANA Considerations

This specification does not require any IANA registrations.

7. Acknowledgments

The editors would like to thank the ASIMOV Platform community for their contributions and feedback during the development of this specification.

8. Changes

This section will document changes between versions of this specification.

8.1. Version 1.0

Initial version of the ASIMOV Module Resolution Specification.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Informative References

[ASIMOV]
Arto Bendiken; et al. ASIMOV Platform Documentation. URL: https://asimov.sh
[ASIMOV-MMS]
Arto Bendiken; et al. ASIMOV Module Manifest Specification. URL: https://asimov-specs.github.io/module-manifest/