Over Thirty LLM Connectors: Cloud, Local and Hybrid Integration

Why Connector Breadth Matters

One of the most consequential decisions in any enterprise AI programme is the choice of language model — or, more accurately, the choice of which models to use, where to host them, and how to orchestrate them across different tasks and security boundaries. The AI landscape is evolving rapidly, with new models, providers and runtimes emerging on a near-weekly basis. Organisations that lock themselves into a single provider or a single deployment model risk finding themselves unable to adapt as the technology matures, as pricing shifts, or as regulatory expectations tighten.

Cloud-Dog's AI platform is built on the principle that model choice should be a configuration decision, not an architectural constraint. With support for over thirty distinct LLM connectors spanning commercial cloud APIs, open-source local runtimes, hybrid framework integrations, embedding and reranking services, and specialised emerging models, the platform enables organisations to select the right model for each task, each security context and each budget — without re-engineering their infrastructure every time the landscape changes.

This breadth is not a feature list for its own sake. It is the foundation of a deployment strategy that preserves sovereignty, optimises cost, and ensures that AI capabilities can evolve as fast as the technology itself.

Commercial Cloud APIs

For organisations that can operate within the boundaries of cloud-hosted inference, the platform provides native connectors to the major commercial LLM providers. These integrations are production-grade, supporting key-based authentication, streaming token delivery, structured response parsing and robust error handling.

OpenAI remains the most widely adopted commercial provider, and Cloud-Dog supports the full family of GPT models — from GPT-3.5 through GPT-4, GPT-4-Turbo and GPT-4o — via both REST and MCP interfaces. The integration handles authentication, token streaming and response structuring transparently, allowing agents to leverage OpenAI's capabilities without coupling application logic to provider-specific implementation details.

Anthropic Claude — including Claude 2, 3 and 3.5 variants — offers context windows scaling up to two hundred thousand tokens, making it particularly well-suited for summarisation, long-document analysis and complex reasoning tasks. The connector manages context-window optimisation automatically, ensuring that large documents are processed efficiently without exceeding token limits or degrading response quality.

Google Vertex AI, encompassing both PaLM and Gemini model families, integrates through Google Cloud Platform's managed authentication and quota handling. For organisations already invested in the Google Cloud ecosystem, this connector provides a natural extension of existing infrastructure into AI reasoning capabilities.

Azure OpenAI Service offers the same model capabilities as OpenAI's direct API, but hosted within Microsoft's Azure infrastructure with region-specific deployment options. For enterprises with existing Azure commitments or requirements for data residency within specific Azure regions, this connector provides OpenAI model access within a compliance framework they already understand and manage.

The platform also supports Cohere for both text generation and embedding workloads, AI21 Labs for high-context reasoning with the Jurassic and Jamba model families, and Mistral's cloud API for accessing Mixtral and Mistral models through a managed inference endpoint. Each connector follows the same integration pattern — standardised interfaces, consistent error handling, and transparent token and cost management — so that switching between providers or using multiple providers simultaneously requires configuration changes, not code changes.

Open-Source and Local Models

For organisations that require data sovereignty, operate in air-gapped environments, or simply want to avoid the ongoing cost and dependency of cloud API subscriptions, the platform provides deep integration with locally hosted and open-source model runtimes. This is where Cloud-Dog's commitment to sovereign AI becomes most tangible.

Ollama is the primary runtime for local model deployment, and the platform's native connector supports launching, managing and querying any model in the Ollama ecosystem — including Llama 3, Mistral, Gemma, Qwen, Falcon, Granite and dozens of other open model families. The integration works seamlessly whether Ollama is running on a developer's workstation, a departmental server, or a production GPU cluster. Models are managed as configuration, not infrastructure — making it straightforward to test new models, roll back to previous versions, or run different models for different tasks within the same deployment.

vLLM provides high-performance inference for open models with dynamic batching, making it the runtime of choice for production workloads that demand throughput and concurrency. The platform's vLLM adapter handles connection management, request queuing and response parsing, allowing organisations to serve multiple concurrent users from a single GPU instance without custom engineering.

llama.cpp offers the lightest-weight option for local inference, running quantised models through GGUF format on standard hardware — including laptops and edge devices without dedicated GPU resources. For prototyping, offline operation or resource-constrained environments, the llama.cpp integration provides surprisingly capable inference at a fraction of the infrastructure cost of full-scale GPU deployment.

Hugging Face connectivity operates at multiple levels. The Inference API connector supports both hosted and privately deployed Hugging Face models through standard REST endpoints. The Text Generation Inference (TGI) integration connects to Hugging Face's dedicated inference server, which can be deployed locally for production-grade open model serving. And the broader Hugging Face Transformers integration provides direct access to the model hub for inference, embedding and reranking tasks — enabling organisations to leverage the vast Hugging Face ecosystem without building custom integration plumbing.

A generic custom endpoint connector rounds out the local model options, supporting any LLM that exposes an OpenAPI-compatible REST interface. This ensures that proprietary, partner-hosted or experimental models can be integrated into the platform without waiting for a dedicated connector to be developed.

Framework and Orchestration Integrations

Modern AI applications rarely rely on a single model call. They orchestrate multiple models, tools, data sources and reasoning steps into coherent workflows. The platform integrates natively with the leading AI orchestration frameworks, enabling complex multi-step reasoning without requiring organisations to build orchestration logic from scratch.

LangChain integration — available in both Python and JavaScript — provides full workflow orchestration capabilities, including tool-calling, retriever chaining, memory management and structured output parsing. Agents built on the Cloud-Dog platform can leverage LangChain's extensive ecosystem of tools and integrations while benefiting from Cloud-Dog's governance, audit and security layers.

LlamaIndex integration supports context retrieval, document chunking, vector memory management and index construction. For retrieval-augmented generation workflows — which form the backbone of most enterprise AI applications — the LlamaIndex connector provides the document processing and retrieval pipeline that feeds high-quality, relevant context to the reasoning models.

Haystack, from Deepset, provides an alternative pipeline orchestration framework with particular strengths in document question-answering and search-oriented workflows. The platform's Haystack adapter supports both local and cloud-hosted pipeline execution, giving organisations flexibility in how they structure their AI reasoning chains.

At the centre of Cloud-Dog's own orchestration model is the Model Context Protocol (MCP) — a standardised communication protocol that enables secure, governed message passing between AI agents, tools and services. MCP provides the backbone for multi-agent collaboration, allowing different agents to handle different stages of a task — retrieval, reasoning, data access, formatting, notification — while maintaining a consistent security context, audit trail and governance framework across every interaction.

Embedding and Reranking

Retrieval-augmented generation is only as good as the retrieval that feeds it. The platform supports a comprehensive range of embedding and reranking models that power the vector search, semantic matching and context prioritisation capabilities underlying every RAG workflow.

OpenAI Embeddings — including the text-embedding-3-large and text-embedding-3-small models — provide high-dimensional, multilingual embedding generation through a cloud API. For organisations already using OpenAI for generation, the embedding connector provides a natural complement for indexing and retrieval.

Cohere's Embed and Rerank APIs support both semantic search and hybrid ranking strategies, enabling the platform to combine keyword and semantic matching for optimal retrieval quality.

On the local and open-source side, the platform integrates with the leading open embedding and reranking models: bge, e5, Instructor, nomic-embed-text and gte model families for embedding generation, and bge-reranker, Jina AI Reranker and cross-encoder MiniLM models for result reranking. These models run locally with minimal resource requirements, making high-quality retrieval available even in air-gapped or resource-constrained environments.

Specialised and Emerging Models

The AI model ecosystem is expanding rapidly beyond general-purpose chat and reasoning. The platform tracks this expansion with connectors for specialised model families that address specific enterprise needs.

IBM Granite models — available in both text and code variants — are Apache 2.0 licensed and enterprise-tuned for reasoning and coding tasks. Their permissive licensing and strong performance on structured enterprise tasks make them an increasingly attractive option for organisations seeking capable models without the licensing complexity of some alternatives.

DeepSeek models, including the Math, R1 and Coder variants, integrate through both Ollama and custom bridge connectors. The R1 family in particular has demonstrated remarkable reasoning capability, offering performance that approaches frontier commercial models at a fraction of the cost and with full local deployment options.

Qwen models from Alibaba Cloud — including the standard, vision-language and mixture-of-experts variants — provide strong multilingual and multimodal capabilities. The Gemma family from Google offers lightweight, multilingual models with open licensing. And Mixtral local APIs provide high-performance open alternatives for internal RAG and reasoning workloads.

The Strategic Value of Choice

Thirty-plus connectors across REST, MCP, A2A, WebSocket and CLI interfaces. Deployment options spanning on-premises, private cloud, sovereign cloud and public cloud. Runtime compatibility with Ollama, vLLM, llama.cpp, TGI, LangChain and LlamaIndex.

The breadth is the point. It means that an organisation can start with a cloud API for rapid prototyping, migrate to local models for production sovereignty, run different models for different sensitivity levels, and orchestrate the whole estate through a single governed platform — without rebuilding their AI infrastructure at each stage.

In a landscape where the only constant is change, the ability to choose — and to change that choice without penalty — is perhaps the most valuable capability an AI platform can offer. Cloud-Dog is built on that principle, because we believe that sovereign, flexible, governed AI is not a constraint on innovation. It is the foundation that makes sustainable innovation possible.

Cloud-Dog AI

Over Thirty LLM Connectors: Cloud, Local and Hybrid Integration

How Cloud-Dog integrates with more than thirty large language model providers and runtimes — from commercial cloud APIs through open-source local models to hybrid framework orchestration.

Why Connector Breadth Matters

Commercial Cloud APIs

Open-Source and Local Models

Framework and Orchestration Integrations

Embedding and Reranking

Specialised and Emerging Models

The Strategic Value of Choice

News & Blogs

Deploying AI Locally: Open-Source LLMs from Edge to Enterprise

The Risks of Cloud and SaaS AI: Why Deployment Choice Matters

Natural Language Intelligence: Transforming How Organisations Engage with Data

Introducing the Cloud-Dog RAG Agent