Skip to main content

Strategic Architecture

TL;DR

The BBj AI strategy centers on unified infrastructure: a single fine-tuned model, shared RAG pipeline, and an MCP server that exposes three tools -- search_bbj_knowledge, generate_bbj_code, and validate_bbj_syntax -- to any AI-powered client. The generate-validate-fix loop uses the BBj compiler as ground truth, eliminating hallucinated syntax before code ever reaches a developer. Build it once, share it everywhere, maintain it in one place.

The previous chapter established that BBj is invisible to generic LLMs and that a fine-tuned model is unavoidable. This chapter addresses the next question: how should that model -- and the infrastructure around it -- be organized?

The answer is deceptively simple. Every AI-powered BBj tool needs the same two things: a model that understands BBj syntax across all four generations, and a retrieval layer that can surface relevant documentation and code examples. Rather than letting each tool reinvent these capabilities independently, the strategy builds them once as shared infrastructure and exposes them to any consumer application through standard APIs. This chapter also defines the MCP server that makes this architecture concrete -- three tools that any AI-powered client can consume without custom integration code.

This is the architectural decision that makes the entire strategy economically viable. Without it, each new BBj AI capability would require its own model training, its own document pipeline, and its own maintenance burden.

The Case Against Point Solutions

Consider the alternative: each initiative builds its own AI stack.

The VSCode extension trains a code-completion model. The documentation chat trains a Q&A model. A future CLI assistant trains yet another model. Each team curates its own training data, manages its own RAG database, and deploys its own Ollama instance.

The result is three models that understand BBj slightly differently, three document pipelines that drift out of sync, and three maintenance burdens that grow linearly with every new capability. When a new BBj API is released, three teams update three systems. When a training data bug is found, it must be hunted down in three places.

Decision: Unified Infrastructure Over Point Solutions

Choice: Build a single shared foundation (fine-tuned model + RAG database) consumed by all BBj AI applications through standard APIs.

Rationale: Unified infrastructure means one training investment benefits every consumer, documentation updates propagate everywhere automatically, and answers remain consistent whether a developer asks via IDE or chat. The alternative -- independent AI stacks per tool -- multiplies cost and introduces inconsistency.

Alternatives considered: Per-tool models with shared training data (rejected: still duplicates compute and deployment); cloud-hosted API with no self-hosting (rejected: customers need data privacy options).

Status: Architecture defined and operational. RAG pipeline operational for internal exploration (51K+ chunks). Model fine-tuning is active research (bbjllm experiment; 14B-Base recommended).

Architecture Overview

The system is organized into two layers: a shared foundation that encapsulates all BBj AI knowledge, and an application layer where consumer tools access that knowledge through an MCP server that mediates all interactions.

Every application in the top layer connects through the MCP server, which exposes three tools that abstract the backend complexity. When the model improves through additional fine-tuning, every consumer benefits immediately. When new documentation is indexed into the RAG pipeline, every consumer can surface it in its next query.

The MCP server provides standard tool discovery and schema-based invocation, which means any MCP-compatible host -- whether it is a VS Code extension, Claude Desktop, Cursor, or a custom chat backend -- can consume all three BBj AI capabilities without custom integration code.

The Shared Foundation

Two components form the foundation layer. Each is covered in depth in its own chapter; this section explains what they are and why they exist as shared resources.

Fine-Tuned BBj Model

The core of the infrastructure is a language model fine-tuned specifically on BBj. Starting from a strong open-source code model (the current recommendation is Qwen2.5-Coder-14B-Base, selected for its benchmark performance, Apache 2.0 license, and practical size for self-hosting), the model is trained on curated BBj examples covering all four generations.

The fine-tuned model is hosted via Ollama, which provides:

  • Local deployment -- runs on commodity hardware, no cloud dependency
  • OpenAI-compatible API -- standard interface any client can consume
  • Customer self-hosting -- organizations can run ollama run bbj-coder on their own infrastructure for complete data privacy
  • Quantization support -- Q4/Q8 quantization makes even 7B+ models practical on consumer GPUs

The model develops two capabilities through training:

  1. Comprehension -- reading and explaining code from any BBj generation, supporting migration and debugging workflows
  2. Generation -- producing syntactically correct, generation-appropriate BBj code that matches the context of the surrounding codebase

Chapter 3 covers base model selection, training data structure, the QLoRA fine-tuning workflow, and deployment via Ollama in full detail.

RAG Database

The fine-tuned model provides language understanding, but it cannot memorize every API signature, every version-specific behavior, or every code example in the BBj ecosystem. That is the RAG database's job.

The RAG (Retrieval-Augmented Generation) pipeline stores BBj documentation, API references, code examples, best practices, and migration guides -- all tagged with generation metadata. When a consumer application receives a query, it retrieves relevant context from the RAG database and includes it in the prompt alongside the user's request.

The critical design choice is generation-aware tagging. Every document chunk carries metadata indicating which BBj generation(s) it applies to:

  • "all" -- universal syntax like FOR/NEXT, string functions, file I/O
  • ["bbj-gui", "dwc"] -- modern object-oriented API patterns
  • ["vpro5"] -- legacy Visual PRO/5 mnemonic-based syntax
  • ["character"] -- character terminal mnemonics

This tagging enables the retrieval layer to prioritize generation-appropriate documentation. When a developer is working in DWC code, the system surfaces modern API references first -- while still including legacy documentation when relevant for migration context.

Chapter 6 covers the multi-generation document structure, embedding strategy, and retrieval algorithms in detail.

How They Work Together

When any consumer application handles a request, the shared foundation operates as a pipeline:

  1. The consumer app detects the generation context (from the code being edited, the user's question, or explicit configuration)
  2. It queries the RAG database with a generation hint, retrieving the most relevant documentation
  3. The retrieved context is assembled into a prompt alongside the user's input
  4. The fine-tuned model processes the enriched prompt and returns a generation-appropriate response

This flow is identical regardless of whether the consumer is the VSCode extension completing code, the documentation chat answering a question, or a future CLI tool suggesting fixes. The shared foundation handles the BBj-specific intelligence; the consumer apps handle their domain-specific UX.

The MCP Server: Concrete Integration Layer

The architecture overview above describes what the shared foundation provides. The Model Context Protocol (MCP) defines how applications access it. MCP is a standard protocol for connecting AI applications to external tools -- it provides standard tool discovery, schema-based invocation, and any-client compatibility. Rather than building custom REST endpoints or editor-specific plugin APIs, the BBj strategy exposes all AI capabilities through a single MCP server that any MCP-compatible host can consume.

The BBj MCP server defines three tools. Each tool maps to one component of the shared foundation and serves a distinct role in the AI-assisted development workflow.

search_bbj_knowledge

The knowledge search tool connects to the RAG database (pgvector with hybrid search) and returns ranked documentation and code examples filtered by BBj generation. When a developer asks about creating a window, the tool retrieves generation-appropriate API references -- DWC patterns for modern code, Visual PRO/5 mnemonics for legacy maintenance. Every result includes source citations so the consuming application can link back to official documentation.

{
"name": "search_bbj_knowledge",
"description": "Search BBj documentation and code examples with generation-aware filtering. Returns ranked results from the RAG pipeline with source citations.",
"inputSchema": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "Natural language search query" },
"generation": { "type": "string", "enum": ["all", "character", "vpro5", "bbj-gui", "dwc"], "description": "Filter results by BBj generation. Omit for cross-generation search." },
"limit": { "type": "integer", "default": 5, "description": "Maximum number of results to return" }
},
"required": ["query"]
}
}

generate_bbj_code

The code generation tool sends enriched prompts to the fine-tuned model (Qwen2.5-Coder-14B-Base via Ollama). It accepts a natural language description, a target BBj generation, and optional surrounding code context. The tool assembles RAG-retrieved documentation into the prompt automatically, so the model always has relevant API references and examples available when generating code.

{
"name": "generate_bbj_code",
"description": "Generate BBj code using the fine-tuned model with RAG-enriched context. Produces generation-appropriate syntax based on the target BBj generation.",
"inputSchema": {
"type": "object",
"properties": {
"prompt": { "type": "string", "description": "Natural language description of the code to generate" },
"generation": { "type": "string", "enum": ["character", "vpro5", "bbj-gui", "dwc"], "description": "Target BBj generation for the generated code" },
"context": { "type": "string", "description": "Surrounding code or additional context for more accurate generation" },
"max_tokens": { "type": "integer", "default": 512, "description": "Maximum tokens in the generated response" }
},
"required": ["prompt", "generation"]
}
}

validate_bbj_syntax

The syntax validation tool runs generated code through the BBj compiler (bbjcpl -N) for ground-truth syntax checking. This is not heuristic analysis and not LLM-based review -- it is the same compiler that would reject the code in production. The tool returns pass/fail with the compiler's exact error messages, enabling a generate-validate-fix loop where compiler errors feed back into the generation tool for correction.

{
"name": "validate_bbj_syntax",
"description": "Validate BBj code syntax using the BBj compiler (bbjcpl). Returns ground-truth validation results — not heuristic, not LLM-based.",
"inputSchema": {
"type": "object",
"properties": {
"code": { "type": "string", "description": "BBj source code to validate" },
"classpath": { "type": "string", "description": "Optional classpath for resolving external dependencies" }
},
"required": ["code"]
}
}

Organizational Precedent

BASIS already ships a webforJ MCP server that exposes knowledge search, project scaffolding, and theme creation tools for the webforJ framework. webforJ needs only knowledge search because Java is well-understood by LLMs -- the model already knows the language. The BBj MCP server follows the same organizational approach but adds code generation and compiler validation because LLMs do not understand BBj. This is extending a proven pattern, not an experiment.

Decision: MCP as the Unified Integration Protocol

Choice: Expose all BBj AI capabilities (RAG search, code generation, compiler validation) through a single MCP server that any MCP-compatible client can consume.

Rationale: MCP provides a standard protocol for connecting AI applications to external tools. Rather than building custom APIs for each consumer (IDE, chat, CLI), a single MCP server exposes three tools that any MCP-enabled host -- Claude, VS Code, Cursor, or custom applications -- can use without custom integration code. MCP shares the same lineage as the Language Server Protocol (LSP) that Langium already uses for the BBj language server, making it a natural fit for the ecosystem. The webforJ MCP server already in production validates this approach organizationally.

Alternatives considered: REST API with OpenAPI specification (requires custom client code in each consumer; no standard tool discovery); custom VS Code extension API (locks integration to one editor; excludes Claude, Cursor, and CLI workflows); language-specific plugin system (fragments the ecosystem; each editor needs its own plugin; multiplies maintenance).

Status: Two of three tools operational -- search_bbj_knowledge and validate_bbj_syntax are running via stdio and Streamable HTTP transports. generate_bbj_code is planned (requires operational fine-tuned model). Generate-validate-fix loop validated by bbjcpltool proof-of-concept.

Integration Patterns

The MCP server enables several concrete integration patterns. Each pattern demonstrates a different way clients combine the three tools to solve real development problems.

Generate-Validate-Fix

The most important pattern is the generate-validate-fix loop. A client requests code generation, validates the result with the BBj compiler, and feeds any errors back for correction. This is the key innovation in the architecture -- the BBj compiler provides ground-truth validation that eliminates hallucinated syntax before code ever reaches a developer. The loop continues until the code compiles cleanly or a maximum iteration count is reached, ensuring that generated BBj code meets the same standard as hand-written code.

This pattern was validated by the bbjcpltool v1 proof-of-concept, which confirmed that compiler feedback meaningfully improves generated code quality across iterations.

Documentation Query

The simplest pattern -- a client sends a natural language question, the server searches the RAG database with generation-aware filtering, and returns relevant documentation with source citations. No code generation or compilation is involved. This is the pattern that the documentation chat system primarily uses, and it is the same pattern that any MCP-compatible host (Claude, Cursor, or a custom chat interface) can invoke with zero custom code by calling search_bbj_knowledge.

Code Review and Migration

For legacy codebase modernization, a client submits existing BBj code and the server combines all three tools. It uses search_bbj_knowledge to find relevant migration documentation and modern API patterns, generate_bbj_code to suggest modernized alternatives in the target BBj generation, and validate_bbj_syntax to confirm the suggestions compile. This pattern is useful for automated migration analysis -- scanning a legacy Visual PRO/5 codebase and producing DWC equivalents that are verified by the compiler before a developer reviews them.

Deployment Options

The MCP server supports two deployment modes, matching different organizational needs for privacy, performance, and team collaboration.

Local Deployment (stdio)

The MCP server runs as a local process on the developer's machine, communicating with the MCP host via stdio. All data stays local -- model inference through Ollama, RAG queries against a local pgvector instance, and compiler validation through the local BBj installation happen on the same machine. This is the default mode for individual developers and organizations that require complete data privacy, ensuring that proprietary BBj source code never leaves the developer's workstation.

Remote Deployment (Streamable HTTP)

For team environments, the MCP server can be deployed as a shared service accessible via Streamable HTTP. Multiple developers connect to the same server instance, sharing the fine-tuned model and RAG database without each needing local GPU resources or a local database. This follows the same deployment pattern as the webforJ MCP server at mcp.webforj.com. Streamable HTTP replaces the older HTTP+SSE transport in the current MCP specification (2025-11-25), providing simpler connection management and better compatibility with standard HTTP infrastructure.

Three Initiatives

The shared foundation supports three consumer applications, each acting as an MCP client that connects to the BBj MCP server. Each initiative is introduced briefly here and covered in full in its own chapter.

VSCode Extension (Chapter 4)

The IDE integration combines Langium-powered language server capabilities with AI-powered code completion to deliver a development experience comparable to what mainstream languages enjoy through Copilot.

Langium provides deterministic, 100%-correct completions for symbols, types, and keywords -- the things a parser can resolve definitively. The fine-tuned model provides generative completions for multi-line code, pattern completion, and context-aware suggestions -- the things that require understanding intent.

The extension acts as an MCP client, using generate_bbj_code for AI-powered completions and validate_bbj_syntax for compiler validation of generated suggestions. The Langium language server continues to provide deterministic completions for symbols and keywords; the MCP server handles the generative AI layer.

The extension is generation-aware: it detects whether the developer is working in character UI, Visual PRO/5, BBj GUI, or DWC code and adjusts its suggestions accordingly. A developer editing a 1990s Visual PRO/5 module receives PRINT (sysgui)'WINDOW'(...) suggestions; a developer building a new DWC application receives BBjAPI() patterns.

Documentation Chat (Chapter 5)

A conversational AI interface embedded in the BBj documentation website, allowing developers to ask natural language questions and receive accurate, cited answers.

Unlike generic documentation chat services (Algolia Ask AI, kapa.ai), which rely on base LLMs that have no BBj understanding, this system uses the shared fine-tuned model. The chat backend queries the shared RAG database for relevant documentation, assembles an enriched prompt, and streams a response with source citations.

The chat backend acts as an MCP client, primarily using search_bbj_knowledge for retrieval-augmented responses. Because the MCP server exposes a standard tool interface, any MCP-compatible host -- Claude, Cursor, or a custom application -- can serve as a documentation chat interface with zero custom integration code.

The chat system is generation-aware in the same way as the IDE extension: it detects generation hints in the user's question and prioritizes appropriate documentation. A question about "creating a window" yields different primary answers depending on whether the user mentions Visual PRO/5 or DWC context.

Future Capabilities

The unified architecture is designed to support capabilities beyond the initial two applications:

  • CLI assistant -- terminal-based BBj help for developers who prefer command-line workflows
  • Migration tooling -- automated analysis and modernization suggestions for legacy BBj codebases
  • Code review -- generation-aware review that flags deprecated patterns and suggests modern alternatives
  • Training data feedback loop -- user questions and corrections from chat and IDE usage feed back into model improvement

Each of these capabilities is just another MCP client connecting to the same server. Because the shared foundation already exposes all BBj AI capabilities through three well-defined tools, adding a new consumer application is primarily a UX problem -- designing the right interface for each workflow -- not an AI infrastructure problem.

Benefits of This Approach

The unified architecture creates different value for different stakeholders.

For leadership and decision-makers:

  • Single investment, multiple returns -- one fine-tuning effort and one RAG pipeline serve every current and future BBj AI tool
  • Consistent messaging -- the same model ensures consistent guidance regardless of which tool a developer uses
  • Predictable scaling -- adding new AI capabilities is incremental cost, not new infrastructure
  • Customer offering -- self-hosted Ollama deployment enables customers to run BBj AI tools on their own infrastructure
  • Any-client compatibility -- the same MCP server works with Claude, VS Code, Cursor, or custom applications; no vendor lock-in

For developers building with the infrastructure:

  • Standard API -- Ollama's OpenAI-compatible endpoint means any HTTP client works; no proprietary SDK needed
  • Separation of concerns -- consumer apps focus on UX, not AI/ML plumbing
  • Shared improvements -- model retraining and RAG updates benefit all consumers automatically
  • Generation context -- the foundation provides generation detection and appropriate document retrieval out of the box
  • Standard tool protocol -- MCP provides schema-based tool discovery; new clients can integrate without reading implementation code

For BBj developers using the tools:

  • Consistent answers -- the same question gets the same answer whether asked in the IDE, in documentation chat, or via a CLI tool
  • Generation awareness -- tools understand which BBj generation is in play and adapt accordingly
  • Privacy option -- self-hosted deployment means sensitive code never leaves the organization's network
  • Improving over time -- the feedback loop from usage across all tools continuously improves model quality
  • Choose your client -- use the MCP server from whichever AI tool you prefer; the BBj intelligence is the same everywhere

Current Status

Where Things Stand

The unified architecture is operational for internal exploration, with most components running and the fine-tuned model in active research.

  • Operational: bbj-language-server (v0.5.0) -- Langium-powered VS Code extension with syntax highlighting, completion, and diagnostics.
  • Operational for internal exploration: RAG knowledge system -- 51K+ documentation chunks across 7 source groups, PostgreSQL + pgvector database, REST API (search, stats, health endpoints), hybrid retrieval with source-balanced ranking.
  • Operational for internal exploration: MCP server with two tools -- search_bbj_knowledge (semantic search across documentation corpus) and validate_bbj_syntax (BBj compiler validation via bbjcpl). Available via stdio and Streamable HTTP transports.
  • Operational for internal exploration: Web chat at /chat endpoint -- Claude API backend with RAG retrieval, SSE streaming, source citations with clickable links, automatic BBj code validation in responses.
  • Operational: Compiler validation (bbjcpltool) -- integrated into MCP server and web chat with automatic syntax checking and 3-attempt auto-fix.
  • Active research: Fine-tuned BBj code model -- bbjllm experiment (9,922 ChatML examples on Qwen2.5-Coder-32B-Instruct via QLoRA/PEFT); research recommends 14B-Base with two-stage training.
  • Planned: generate_bbj_code MCP tool -- requires operational fine-tuned model.
ComponentStatusNotes
bbj-language-serverOperationalv0.5.0 on VS Code Marketplace
RAG ingestion pipelineOperational for internal exploration7 parsers, 51K+ chunks
REST retrieval APIOperational for internal explorationPOST /search, GET /stats, GET /health
MCP serverOperational for internal explorationsearch_bbj_knowledge, validate_bbj_syntax
Web chatOperational for internal explorationClaude API + RAG + SSE streaming
Compiler validationOperationalbbjcpltool, MCP tool, chat integration
Fine-tuned BBj modelActive researchbbjllm experiment; 14B-Base recommended
generate_bbj_codePlannedRequires fine-tuned model
Training data repositoryOperational2 seed examples, 7 topic directories

The chapters that follow cover each component in implementation-level detail: model fine-tuning (Chapter 3), IDE integration (Chapter 4), documentation chat (Chapter 5), RAG database design (Chapter 6), and the implementation roadmap covering progress to date and the forward plan (Chapter 7).