Blending AI in Enterprise Architecture.pdf

Presented by:
Calvin Hendryx-Parker
CTO and Co-Founder
sixfeetup.com calvin@sixfeetup.com Fishers, Indiana

TODAY'S
AGENDA
1 How to Pick an LLM
2 Making the Model Aware of your Data
3 Picking an Application Framework
4 Choosing a Deployment Strategy
5 Scaling your Application
6 Key Take-Aways
7 Q&A

Service or API?
Using an LLM service is simpler and quicker, but offers less
custom control. An API allows deeper tuning, at the cost of
more setup and maintenance. Both options come with terms of
service that can limit usage, data handling, or intellectual
property rights, so review them carefully before choosing.
HOW TO
PICK
AN LLM
Evaluating Trustworthiness
Trustworthy Leaderboard
Decoding Trust Overview
The Open Source Question
Open source LLMs vary in how much they share. “Open Weights”
means you can use the pretrained model files but not fully
reproduce it. Truly open means you also have the training code
and dataset, allowing you to replicate and retrain the model from
scratch.
Evaluating Performances
Open LLM Leaderboard
Chatbot Arena LLM Leaderboard
Artificial Analysis LLM Performance Leaderboard

Choose your embedding model carefully because different
models have different training sets and specialties—some focus
on plain English, others on multilingual content, and some even
handle code or addresses. Using a default model often yields
mediocre outcomes, so be sure to align your data type with the
right embedding.
MAKING THE
MODEL AWARE
OF YOUR DATA
Pick an appropriate trans
f
o
r
m
e
r
m
o
d
e
l
Evaluate the shape of
y
o
u
r
d
a
t
a
Review your d
a
t
a

EMBEDDING MODEL
LEADERBOARD
https://huggingface.co/spaces/mteb/leaderboard

EMBEDDING MODEL SELECTION
Use Case
General Purpose RAG
OpenAI text-embedding-3-small
+ BAAI reranker
Cost-effective with 70%+
accuracy via re-ranking
Financial Analysis Voyage-finance-2
22% higher precision on SEC
filings than general models
Multilingual Search
Vectara Boomerang/Cohere-
embed-v3
Superior cross-lingual NDCG
scores
Self-hosted Solutions Nomic-embed-text/BGE-large
Zero API fees; 71-75% accuracy
on custom corpora
Rationale
Recommended Models

DATA HUNGRY
Fine-tuning demands large amounts of data because the
model must see enough varied examples to learn new
patterns without forgetting its previous knowledge.
Insufficient or low-quality data can lead to overfitting or
underperformance, emphasizing the need for abundant,
relevant training material.
FINE-TUNING MAY OR MAY NOT
BE THE BEST CHOICE
BEST FOR CLASSIFICATION
Fine-tuned models are tailored to the specific domain or
task, capturing relevant patterns with higher precision. This
specialization boosts accuracy while reducing computational
overhead during inference. By focusing on the data that
matters, fine-tuning can yield more efficient processing and
potentially lower inference costs.
COSTLY
Fine-tuning is resource-intensive, requiring specialized
hardware and extensive compute time. As model sizes
grow, so do energy and infrastructure costs. Gathering
and cleaning enough data adds extra expenses, and
multiple training runs further drive up the final bill.
TIME CONSUMING
Fine-tuning takes time because it requires multiple
training and validation cycles with extensive data. Each
iteration involves careful hyperparameter adjustments,
extending the overall process.

FIGHTING
HALLUCINATIONS
AND HALF-TRUTHS
WITH RAG
RAG uses vector-based
retrieval to fetch relevant
context from a knowledge
base. By relying on
semantic similarities, it
helps reduce hallucinations.
Vectors

FIGHTING
HALLUCINATIONS
AND HALF-TRUTHS
WITH RAG
Vectors
base. By relying on
helps reduce hallucinations.
Data Storage
RAG’s data storage can
use both open source
solutions or commercial
platforms. Each option
comes with trade-offs in
cost, scalability, and
control.

Open Source Commercial
ChromaDB Pinecone
pg_vector Snowflake
Elastic Weaviate
Mongo Milvus
RAG DATA STORAGE OPTIONS

FIGHTING
HALLUCINATIONS
AND HALF-TRUTHS
WITH RAG
Vectors
base. By relying on
helps reduce
hallucinations.
Data Storage
control.
Retrieval
RAG relies on converting the user
query into an embedding, then
searching a knowledge base for
matching vectors. It combines the
retrieved context with a context
window and may factor in recent
chat history to ensure continuity
and relevance.
RAG Demo

FIGHTING
HALLUCINATIONS
AND HALF-TRUTHS
WITH RAG
Vectors
base. By relying on
helps reduce
hallucinations.
Data Storage
control.
Retrieval
RAG relies on converting the user
query into an embedding, then
searching a knowledge base for
matching vectors. It combines the
retrieved context with a context
window and may factor in recent
chat history to ensure continuity
and relevance.
Feedback Loop
RAG combats
hallucinations by rating
responses and refining
them for accuracy. A/B
testing model settings
uncovers half-truths,
leading to more reliable
outputs.

USING GUARDRAILS WITH
PRE-GENERATED ANSWERS
Pre-Generated Answers
A method where common
responses are produced in
advance and served quickly.
Less Creativity
PGA is less creative because it relies on
fixed responses, offering limited
adaptability. It can’t spontaneously craft
new content or adjust to unique queries
in real time.
More Control
PGA provides vetted, consistent
responses, reducing the risk of
unverified or unsuitable outputs. This
tighter control is crucial for regulated
markets like healthcare, where accuracy
and compliance are paramount.
SEE BLOG POST

Agent Apps
LangChain, Haystack, and LlamaIndex are agent apps designed
to orchestrate LLM operations. They simplify tasks like
retrieval, data management, and chaining prompts, enabling
more advanced AI-driven workflows.
HOW TO PICK AN
APPLICATION
FRAMEWORK

Framework Strengths
LangChain General-Purpose LLM Orchestration
Flexible workflow
design
Steeper learning
curve
Customizable multi-step agent
workflows
LlamaIndex Data Framework RAG & Indexing
Optimized for
efficient data
retrieval
Limited to
indexing/retrieval
tasks
Domain-specific RAG pipelines
Haystack Search-Oriented NLP & Semantic Search
Built-in document
preprocessing
Less flexible for
non-search tasks
Enterprise search applications
LangGraph Stateful Agents
Complex Decision-
Making
Handles
loops/human-in-
the-loop
Requires LangChain
integration
Dynamic customer
support/approval systems
CrewAI Multi-Agents
Collaborative Task
Execution
Role-based agent
collaboration
Early-stage tooling Research/data analysis teams
Type Focus Weaknesses Best for

Agent Apps
LangChain, Haystack, and LlamaIndex are agent apps designed
to orchestrate LLM operations. They simplify tasks like
retrieval, data management, and chaining prompts, enabling
more advanced AI-driven workflows.
HOW TO PICK AN
APPLICATION
FRAMEWORK
Conversational AI Frameworks
Simple REPL and WebUI wrappings (like ChainLit) are
frameworks for quickly testing, iterating, and deploying
conversational AI. They provide user-friendly interfaces and
straightforward setups, making it easier to refine prompts and
manage dialogue flows.
Getting to First Principles
Do you even need a framework? Are you solving a unique
problem or just following a trend? Sometimes a simpler,
framework-less approach can be more flexible and transparent.
By stripping down to first principles and building only what you
really need, you gain control and avoid unnecessary overhead.

HOW TO CHOOSE
A DEPLOYMENT
STRATEGY
Service Proxy
A service proxy approach (like LiteLLM) acts as a layer between
your app and the LLM provider, granting granular control over
data handling and API usage. It helps protect sensitive data and
manage costs through features like encryption, caching, and
rate limiting.
On-Premises
On-premises deployments give you full control of your data but
demand GPUs and skilled setup. Popular inference engines like
Ollama, GPT4All, vLLM, exo, and Provide offer private hosting, but
the hardware and maintenance costs can be significant.
You will need more RAM and VRAM than you think
Additional networking is needed, not just storage and compute.
Cloud Options
Self-Hosted
Managed Hosting

Self-Hosted
Endpoints
as a Service
EKS Azure hosted ChatGPT Bedrock
AKS Fireworks HuggingFace Spaces
GKE HuggingFace Endpoint Azure ML
Etc. Together.ai Google Cloud AI
Replicate Etc.
OpenRouter
CLOUD
OPTIONS
Hosted Models

What terms have you signed up for?
What shadow terms has your staff
signed you up for?
YOUR DATA
IN THE CLOUD
Enterprises need an AI Acceptable Use
Policy (AUP) to clarify permissible AI
activities, ensure regulatory compliance, and
reduce risks. It helps manage data handling,
ethical considerations, and user rights,
preventing legal or reputational pitfalls.
Set up an AI AUP
Conducting a threat analysis reveals any
hidden or risky terms your team may have
unknowingly accepted. By reviewing
agreements and commitments, you can
prevent unforeseen compliance or security
problems.
Perform a Threat Analysis

FIRST: GET REAL METRICS
Gather key performance indicators (KPIs)
through load testing and platform
instrumentation to spot bottlenecks and cost
drivers.
SCALING YOUR AI APPLICATION:
SECOND: ADD HARDWARE
Next, add more hardware to reduce latency and
handle load while you diagnose deeper issues.
This temporary fix buys time but won’t solve
the underlying problems.
THIRD: DITCH ABSTRATIONS
Removing frameworks and abstrations cuts
overhead, exposes inefficiencies, and enables
targeted optimizations for better control and
performance.
INVEST IN MLOPS AND LLMOPS
Deploying AI apps is still software deployment
at its core. Continuous deployment and
monitoring remain essential for maintaining
quality and performance.
HIGH LATENCY - HIGH COSTS

KEY TAKE-AWAYS
Choosing the Defaults Will Give You Very Average or Even Poor Results
1
Build Observabilty into your Solutions
2
Deploying AI is Deploying Software
3
Create an AI User Policy and Ethics Guide
4
Establish an AI Guild
5
Challenge your Teams to Use AI
6

THANK YOU
Presented by Calvin Hendryx-Parker
Come see me to talk further
sixfeetup.com calvin@sixfeetup.com Fishers, Indiana

Blending AI in Enterprise Architecture.pdf

More Related Content

Similar to Blending AI in Enterprise Architecture.pdf

More from Calvin Hendryx-Parker

Recently uploaded

Blending AI in Enterprise Architecture.pdf