agentlanguages.dev
syntactic camp · also verification

LLMLang.

Prefix-arity AST with single-character ASCII operators and De Bruijn variable indices. Linear ownership enforced at compile time. Compiler-injected OpenTelemetry spans triggered by a metadata marker. LLVM IR via Rust, OpenCL JIT for GPU map operations.

authorPaul Williams (paulprogrammer)
implementationRust
targetLLVM IR (then native via clang); OpenCL JIT for GPU map kernels at runtime
licenceGPL-3.0 with Runtime Exception
first seenMay 2026
maturityworking compiler
markdownllmlang.md

The thesis.

LLMLang takes the syntactic camp’s premise — that the symbols an LLM emits cost tokens, so the language surface should minimise them — to its density extreme. The LLM_SPEC.md header is [TOKEN_OPTIMIZED: HIGH_DENSITY] and the design guide names the audience directly: “Target Audience: Large Language Models (LLMs). Non-Goal: Human readability.” Source is a prefix-arity AST written in single-character ASCII operators: + 10 20 is addition, > ^0 consumes the most-recent binding, $ ^1 borrows the next-most-recent, ? cond t f is a branch, # Point x y declares a struct-of-arrays shape, : name args body defines a function, . e1 e2 sequences. There are no parentheses, no semicolons, no infix precedence to disambiguate. Variables are referenced by their De Bruijn index in the binding stack — ^0, ^1, ^2 — rather than by names; the parser also accepts named identifiers but resolves them to indices before the AST stores anything.

"Target Audience: Large Language Models (LLMs). Non-Goal: Human readability."

The distinctive move sits in two places at once. The first is the density lever: where NERD bets on English keywords because BPE tokenisers fragment punctuation, LLMLang bets the opposite — that single ASCII characters cost one token each in the right tokeniser and the win is biggest when there is no punctuation to fragment. The second is enforcement: affine ownership (> move, $ borrow, ~ mut-borrow) is verified at compile time in src/compiler/analysis/verify.rs, with a VariableState stack that issues E004 for use-after-move, E005 for double-move, E009 for branch-state mismatch, and E016 for moving a borrowed variable. The same syntactic-camp surface ships a Rust-style borrow checker rather than relying on convention, which is why the entry spans into verification — the safety story is enforced, not advisory.

What it looks like.

// Factorial. ^0 refers to the most-recent binding (the parameter n).
: fact n ? ^0 * $ ^0 @ fact - > ^0 1 > ^0

// Auto-instrumented function. The M marker triggers compiler-injected // span entry/exit and timing around handle_request. M “otel” “handle_request” : handle_request req + $ req 1

Every form is prefix-arity; ^0 is De Bruijn for "most-recent binding"; > consumes, $ borrows. The M metadata marker is read by the compiler in src/main.rs and routes the following definition through a code path that wraps the body in llm_otel_enter_span / llm_get_time_ns / llm_otel_emit_span / llm_otel_exit_span calls.

Distinctive moves.

Maturity.

v0.4.0 at the time of cataloguing, sixteen tagged releases (v0.1.0 to v0.4.0) cut between 18 and 24 May 2026 against a repository created 18 May 2026 — one feature wave per day for roughly a week, then consolidation commits through 27 May. Roughly 13,300 lines of Rust and C across 46 source files (src/compiler/{lexer,parser,ast,analysis,codegen} and a C runtime covering HTTP client and server with picohttpparser, TLS via mbedtls, cJSON, SQLite/Redis/MongoDB drivers, OpenCL dispatcher, MPSC emission queue, and a libtai-baseline temporal module); 31 self-hosted test programs under tests/lang/ and 47 Rust unit tests in tests/compiler_tests.rs. GPLv3 with the llmlang Runtime Exception — a GCC-style carve-out that keeps the compiler copyleft but lets generated binaries link the runtime libraries into proprietary code without the licence propagating. Single author Paul Williams (paulprogrammer, Denver, Colorado, GitHub bio “Barefoot Coders”); 0 stars and 0 forks at time of cataloguing.

The README opens with the disclosure: “This entire repository has been largely vibecoded with humans acting as the product owners, and the LLM acting as the developer.” That places LLMLang in the same factual family as AILANG’s “written autonomously by AI agents” framing and Codong’s “designed for AI to write, humans to review” position — what is shipped is real engineering with real automated tests, and the catalogue notes the authorship model as context rather than judgement. MAYBE.md separates roadmap from shipped: first-class AST manipulation beyond the existing patch_symbol, formal intent-and-contract metadata nodes, and TDD/BDD scenario nodes are not yet in the compiler, with OpenTelemetry already crossed off the list. The bet is the syntactic camp’s bet intensified — that a surface compressed to single-character prefix operators with indexed variables, plus an MCP server that exposes the same AST the compiler sees, will produce more correct output per token than a conventional language plus a smarter model.

Agent tooling.

The llm-mcp binary is the primary agent surface and ships as a second cargo target alongside the compiler. It exposes seven tools over stdio: analyze_codebase walks a directory and parses every .llm file into the same AST the compiler uses; search_symbols looks up functions and shapes by name; get_definition returns the realised AST and file location of a symbol; get_diagnostics runs the parser front-end against a file and returns E00x/W00x codes; find_callers traverses the call graph; structural_search computes a SHA-256 hash of the operator-and-control-flow shape of a function body (literals and names omitted) and returns other functions sharing the same fingerprint — an LLM can ask “what else does the same thing?” without relying on name similarity. patch_symbol accepts a JSON AST for a new function body, parses the source file, swaps the matching Define node’s body, and rewrites the file through the compiler’s own pretty-printer (PrettyExpr in src/compiler/ast/display.rs), so edits stay syntactically valid by construction. Two MCP resources back the tools: llm://spec embeds LLM_SPEC.md directly (the token-density grammar reference), and llm://agent-workflow embeds MCP_GUIDE.md (the analyse → locate → extract → patch workflow). Stable diagnostic codes (E000E018, W001) are catalogued in DIAGNOSTICS.md so the same identifiers appear in compiler output, MCP responses, and the spec text the model receives from llm://spec.

design DNA
  • NERD syntactic Closest editorial sibling on the token-efficiency axis, opposite lever. NERD swaps operators for English keywords (plus, minus, eq) on the bet that BPE tokenisers fragment punctuation; LLMLang collapses operators to single ASCII characters (+, >, $, ~) on the opposite bet that the right tokeniser maps each symbol to one token. Same camp, same diagnosis, opposite side of the symbol-vs-word spectrum.
  • Magpie syntactic Same camp, more extreme densification. Magpie surfaces SSA with %-prefixed typed values and accepts ~2.3× more tokens per operation for unambiguity; LLMLang strips the surface further to prefix-arity with single-character operators and indexed variables, betting on density over explicitness. Both ship structured diagnostics with stable codes.
  • Vera verification Cross-camp foil on De Bruijn indices. Vera uses typed slot references @T.n as a verification-camp move — the empirical case is that LLMs make naming errors faster than they make logic errors. LLMLang uses ^0, ^1 as a syntactic-camp move — the case is that names cost tokens. Same mechanism, different camp.
  • Lumen orchestration Also ships MCP integration but at different positioning. Lumen's lumen-provider-mcp is one provider crate among several (alongside HTTP, Gemini, custom-model providers) inside a human-facing orchestration language; LLMLang's llm-mcp binary is the primary agent surface and exposes structural-fingerprint search and a patch_symbol tool that rewrites source via the compiler's own pretty-printer.