Structured Data Roadmap¶
This document outlines the roadmap for implementing first-class support for structured data and system interaction within the Endo shell. It builds upon the F# syntax extensions, particularly records and union types.
Vision¶
The goal is to evolve Endo from a shell that primarily deals with streams of text to one that can natively understand and manipulate streams of structured objects. This enables more powerful, reliable, and composable automation, akin to PowerShell but with a functional approach.
Instead of parsing text with awk, sed, and grep:
# Instead of: ps aux | grep 'endo' | awk '{print $2}'
ps |> filter (_.command == "endo") |> map _.pid
Phase 6.1: Core Infrastructure -- Complete¶
CoreVM Support for Records and Unions¶
Discriminated unions and record types are fully implemented in the CoreVM, supporting efficient creation, manipulation, and reference counting.
Structured Command Interface¶
An internal C++ interface (StructuredCommand) allows commands to declare that they produce structured data and advertise the type of their output. Platform-abstracted via ProcessProvider interface for cross-platform support.
Structured Data Wrapper¶
Four data source commands for ad-hoc structured parsing:
open-json/open-csv-- read from filesfrom-json/from-csv-- read from pipe input
These support inline record type definitions and named type references:
# Inline type definition
open-json "file.json" as { name: string; age: int }
# Named type reference
type Person = { name: str; age: int }
open-csv "people.csv" as Person
# Pipe-based source
curl api/users | from-json as { name: string; id: int } |> map _.name
Note
Output Recognition Files (Phase 6.3a) automate this for pipeline contexts by declaratively defining how to parse command output. Explicit wrappers like from-json remain available for ad-hoc use.
Phase 6.2: Built-in Structured Commands -- Complete¶
ls¶
Returns a stream of FileInfo records.
| Field | Type | Description |
|---|---|---|
name | string | File name |
size | int | File size in bytes |
mode | FileMode | File permissions |
mtime | int | Modification time (epoch seconds) |
isDir | bool | Whether the entry is a directory |
ps¶
Returns a stream of ProcessInfo records.
| Field | Type | Description |
|---|---|---|
pid | int | Process ID |
ppid | int | Parent process ID |
user | string | Owner |
cpu | float | CPU usage percentage |
mem | int | Memory usage in KB |
command | string | Command name |
jobs¶
Returns a stream of JobInfo records.
| Field | Type | Description |
|---|---|---|
id | int | Job number |
state | str | Running, Stopped, etc. |
command | str | Command string |
pid | int | Process ID |
Phase 6.3: Extensible Command Discovery¶
Phase 6.3a: Output Recognition Files -- Complete¶
Declarative YAML definitions teach Endo how to parse external command output without modifying the tools themselves.
- YAML definition file format with JSON and fields parser types
- Variant matching by command arguments with priority system
command_to_runoverride to redirect commands to structured-output flags- Definition file search paths:
~/.config/endo/definitions/, system-wide, and bundled - Pipeline integration via
StructuredPipelineSourceExprAST node - Bundled definitions for
docker ps,docker images,git log,git status
command: "docker"
variants:
- name: "ps-json"
matches:
- ["ps"]
- ["ps", "-a"]
priority: 10
command_to_run: "docker ps --format json {args}"
parser:
type: "json"
format: "lines"
See Structured Output Recognition for the full specification.
Phase 6.3b: Self-Describing Commands -- Planned¶
For new commands that opt into structured output:
- Discovery via
my-command --endo-schemaconvention - Libraries for C++, Rust, Go, Python to simplify writing structured commands
- Schema caching for performance
Phase 6.4: Structured Data Pipeline Integration -- Complete¶
Record-Aware List Operations¶
Standard F# higher-order functions work directly with record-typed lists. No special-purpose verbs are needed:
| Verb Equivalent | F# Function | Example |
|---|---|---|
where | filter | filter (_.name == "endo") |
select | map | map _.pid |
sort-by | sortBy | sortBy _.cpu |
group-by | groupBy | groupBy _.user |
Placeholder Lambda Sugar¶
Parser-level sugar for concise field access:
_.fielddesugars tofun __x -> __x.field_.field == valuedesugars tofun __x -> __x.field == value_ + 1desugars tofun __x -> __x + 1
Table Rendering¶
Lists of records are automatically rendered as tables when displayed:
- Auto-detect column widths with terminal-width-aware shrinking
- Three styles: Bordered (Unicode box-drawing), Compact, Plain
- Auto-style selection: Bordered with color for terminals, Plain for pipes
Suggested Future Structured Commands¶
Candidates for built-in structured output:
| Command | Fields |
|---|---|
df | filesystem, size, used, available, mountpoint |
netstat/ss | proto, localAddress, localPort, peerAddress, peerPort, state, pid |
git-log | sha, author, email, date, message |
docker-ps | id, image, status, ports, names |
ip-addr | interface, address, netmask, family |
history | index, timestamp, command |
env | name, value |