Practical Challenges in Deploying RAG Systems

RAG shows great potential with its powerful ability to combine external knowledge bases and large language models (LLM). However, it only takes 1 hour to make a RAG prototype, but to transform it into a stable, reliable, and trustworthy product in real business scenarios, a series of major obstacles need to be overcome.

Part One: Challenges in Retrieval (R)

The quality of retrieval directly determines the ceiling of RAG system performance. If the retrieval link cannot provide accurate, relevant, and complete "raw materials" for generating models, everything that follows will be a castle in the air.

Question 1: Knowledge intake and preprocessing

This is the most basic, most easily underestimated, but also the most fatal link.

1) Complexity of unstructured data processing

Corporate knowledge bases are filled with PDFs, Words, PPTs, pictures and even scans.

Layout parsing failed

The multi-column layout, cross-page tables, headers and footers, charts and formulas in PDF can easily cause content confusion and information loss during automated parsing and text extraction.

The “black hole” of tables and images

Although the accuracy of tools or models such as MinerU and VL is getting higher and higher, complex tables may require special processing, and manual inspection and optimization are still required, and it is best to filter out information irrelevant to RAG.

2) Quality and consistency of knowledge
Outdated content and version conflicts

Knowledge bases often contain multiple versions of manuals, conflicting policy documents, or defunct bulletins. The system must have the ability to identify or be informed of the "authoritative version" of knowledge.

Factual errors and noise

The source document itself may contain incorrect information. The RAG system faithfully amplifies these errors.

3) The "art" of document slicing
Disadvantages of regular slicing

Simply and crudely segmenting the text by a fixed number of characters (such as 1,000 words) or punctuation marks can easily cut off complete semantic units, resulting in the loss of context. The answer to a question may be split into two separate chunks, causing retrieval failure.

The challenge of semantic slicing

Slicing based on semantic boundaries such as paragraphs and titles is better, but this requires writing complex parsing rules for documents in different formats.

“Finding a needle in a haystack” vs. “Seeing only the trees”

If the block is too large and contains too much irrelevant "noise", it will interfere with the LLM's attention and lead to the "lost in the middle" problem (large models may pay more attention to the information at the beginning and end); if the block is too small and there is insufficient contextual information, the LLM cannot perform effective reasoning.

**4) Structure and relevance of knowledge In order to establish connections with the inherent, rich structured relationships between knowledge, advanced RAG systems need to introduce knowledge graph (Knowledge Graph, KG). The knowledge graph extracts key information from unstructured text, such as entities (people, projects, departments), attributes and the relationships between them (Belong to, person in charge, technology stack), and stores them in a structured form in the form of a graph: **
Extremely expensive to build

Automatically building a high-quality knowledge graph that meets business scenarios from massive and heterogeneous documents is itself a complex NLP (natural language processing) project, involving multiple technical links such as named entity recognition (NER) and relationship extraction (RE), and often requires a large number of entity definitions, relationship modeling and domain expert verification (the entities and relationships in many scenarios need to be designed by themselves, and GraphRAG cannot be used directly).

Fusion Complexity of Query and Retrieval (Graph RAG)

After the introduction of knowledge graph, the retrieval link is no longer a single vector search. The system needs a more intelligent "Query Planner" to decide: Should the user's problem be searched for similar text in the vector library, or should the knowledge graph be used for graph query (such as path traversal), or a combination of the two? For example, to answer "Who is the superior of the manager responsible for the 'Phoenix Project'?", you need to first find the path of "Phoenix Project" -> "Responsible Person" -> "Supervisor" in the graph. The architecture design and implementation of this fusion retrieval is very complex.

Life cycle management challenges

The knowledge graph also needs to be updated synchronously with the source documents, which introduces additional maintenance costs and data consistency challenges.

Issue 2: Entity-level ambiguity

The document content itself is full of entities that need to be disambiguated, which is a two-dimensional problem compared to simple paragraph matching.

1) Entity identification and linking

The system needs to be able to identify key entities in the text, such as person names, project code names, product names, departments, etc., and link them to unique entity IDs.

2) Entity disambiguation

When a user asks a question or mentions "Zhang Wei" in a document, the system needs to have a mechanism to distinguish whether it is "Zhang Wei", the project manager of the R&D department, or "Zhang Wei", the director of the finance department. Lack of this capability can lead to retrieval of incorrect person information.

3) Synonyms and aliases

Users may ask about "Project Phoenix", but the name in the official documentation is "Project PXR". The system must maintain a huge synonym library and establish an alias library for various entity nouns.

Question 3: Accuracy of retrieval recall

Even if the user's questions and knowledge base are clear, traditional retrieval methods still have bottlenecks.

1) Semantic gap

There is a huge difference between the user's colloquial question and the written language or professional terminology in the document. Relying solely on vector similarity may not capture this deep connection.

2) Rigidity of keywords

For proper nouns, product models, and error codes that must be accurately matched, the "fuzziness" of vector retrieval becomes a weakness.

3) In order to solve the above problems, a more complex retrieval architecture must be introduced:
Hybrid Search

Combines the semantic power of vector search with the exact matching power of traditional keyword search (such as BM25).

Query Rewriting/Transformation

Before retrieval, an LLM is used to rewrite and expand the user's original question into multiple queries that are more suitable for retrieval and contain more potential keywords and expressions.

Multi-path Recall & Reranking

A large number of candidate document blocks are recalled from multiple sources (such as vector libraries, keyword indexes, and graph databases), and then a more powerful and resource-consuming reranker model (Reranker) is used to secondary sort these candidate blocks, and the best ones are selected and handed over to the generation model. This significantly increases system complexity and latency.

Part 2: Challenges of Generating (G) Links

Even if the retrieval phase provides perfect context, the behavior of the generative model (LLM) is full of uncertainty.

Problem 1: Hallucinations are inconsistent with facts

"Creation" that goes against context

LLM was not 100% faithful to the context provided, but "creative play" occurred. It may erroneously generalize, make inappropriate inferences, or stitch together multiple unrelated pieces of context to create "new knowledge" that seems smooth but contradicts the facts of the original text.

Pollution from citing external knowledge

LLM may unintentionally mix its vast internal knowledge (training data) and the context provided by RAG, resulting in contaminated answers, especially when the context information is insufficient.

Question 2: Contextual synthesis and reasoning

Multi-source information integration failed

LLM may not perform well when the answer to a question needs to be extracted from multiple document fragments, even with slightly contradictory views. It may only focus on the first or last fragment, or fail to construct a coherent logical chain.

The bottleneck of complex logical reasoning

Conclusions that require multi-step reasoning are still a challenge for LLM. For example, "Based on product A's installation manual and company B's security policy, explain whether installing product A will violate company B's regulations." This requires extremely high logical reasoning capabilities of the model.

Question 3:Completeness of Answers and Ability to Reject

The challenge of “knowing what you don’t know”

When the retrieved context is insufficient to fully answer the question, an ideal RAG system should explicitly reject it and inform the user "Based on the available information, I cannot answer the part about XXX". However, many LLMs tend to “do their best” and give speculative or one-sided answers based on insufficient information.

Answer bias

LLM may unintentionally favor certain segments when integrating information, resulting in the final answer being context-based but biased.

Question 4: Decomposition and planning of complex problems

Problem description:

A single user question often contains multiple sub-questions that require independent queries to answer (query decomposition), or require a logical chain that relies on the results of the previous step to solve (multi-step reasoning). A simple "retrieve-generate" pipeline cannot handle this type of problem.

Comparison questions

"Compare the advantages and disadvantages of product A and product B." -> This needs to be broken down into two separate subqueries: "What are the advantages and disadvantages of product A?" and "What are the advantages and disadvantages of product B?".

Association/Multi-hop issues

"What's the phone number of the manager's supervisor in charge of the 'Phoenix Project'?" -> This requires a longer chain of reasoning:

Step 1: Find the documentation about "Phoenix Project".
Step 2: Identify who the project manager is from the document (such as "Zhang San").
Step 3: Search "Zhang San"'s personal file or organizational chart to find out who his superior is (such as "Li Si"). Step 4: Finally, search the contact information of "John Doe" and find his phone number.
Solution introduction and challenges:

In order to solve this problem, a "Planner" or "Orchestrator" needs to be introduced, which is essentially "Agentic RAG" to the RAG system.

The role of the planner

This higher-level LLM (or logic module) first analyzes the user's original question and decides whether it should be answered directly or if it needs to be broken down into a "plan" with multiple steps.

Tool Use

The planner assigns each sub-problem or reasoning step to a different "tool" for execution. The core tool is the RAG retriever we discussed earlier, but it may also include code interpreters (for calculations), web searches, database queries, etc.

Status Management

The system needs to maintain a "scratchpad". After each step is executed, the execution results need to be recorded, because the execution of subsequent steps may depend on the output of the previous step.

Question 5: Fusion and synthesis of results

Problem description:

After all subqueries or inference steps have been performed, the system has a bunch of fragmented pieces of information from different search results. How to fuse these fragmentsintelligently, coherently and without distortioninto a final, unified answer is a huge challenge.

The challenge:
Information conflict

Different search results may contain conflicting information, and the system needs to have strategies to determine which is more authoritative or how to present this conflict to the user.

Logical coherence

Simply splicing all the answer fragments together will appear stilted and confusing. LLM needs to understand the role of each fragment in the entire logical chain and organize it in fluent language.

Completeness of answers

If a subquery fails, the system must be able to identify the missing information when synthesizing the final answer and clearly inform the user "I could not find relevant information about XXX."

Surge in system complexity

The introduction of planning and decomposition mechanisms means that the system changes from a linear "pipeline" to a complex "graph" or "state machine" with loops and conditional judgments. This brings huge challenges to development, debugging and maintenance.

Error Propagation

In this chain call, small errors in any link (inaccurate retrieval of a subquery, illusion of one-step reasoning) may be amplified and passed to subsequent links, eventually leading to the failure of the entire plan or the production of a final answer that seems reasonable but is completely wrong.

Addence of cost and delay

Each decomposition, each step of reasoning, and each final synthesis may require an LLM call. This multiplies the latency and API call costs of the entire process.

Part 3: Product Design and Interaction

In production-level RAG projects, the product manager's ability requirements far exceed the requirements analysis and functional design of traditional software systems. They must become "super connectors" that connect the four major areas of user mind, business logic, large model capabilities and system engineering. This places extremely high demands on the technical literacy and comprehensive capabilities of product managers:

Well versed in large model (LLM)

Understand the capabilities of large models

The generation principle of large model, as well as the capabilities of generation, reasoning, planning, summarization, and intention understanding.

Clear the capability boundaries of large models

We must clearly understand the inherent "illusions", factual inconsistencies, and fragile logical reasoning of LLM.

Learn about relevant technology stacks

Multi-modal model (model capabilities and boundaries), embedded model (based on encoder architecture and decoder architecture, and multi-modal), agent framework LangChain or platform Dify, etc.

Understand costs and delays

It is necessary to have a quantitative understanding of the token consumption, API call cost and inference delay of different models. When designing a complex interaction that requires multi-step reasoning, the user value it brings must be weighed against the high resource consumption behind it.

Proficient in the operational logic of RAG and Agentic architectures

Insight into the entire RAG process

Be able to clearly draw the entire flow chart from "data ingestion" to "final answer generation", and understand the technical details of each link (such as chunking, embedding, retrieval, rearrangement, generation) and its impact on the final result.

Understand the overall process of Agentic

For Agentic RAG that involves planning, decomposition, and tool use, product managers need to understand its internal execution logic such as state management, error propagation (waterfall effect), etc.

Identify failed attribution

When a user reports an error, product managers need to be able to collaborate with engineers to quickly determine whether the root cause of the problem lies in the "retrieval (R)", "generation (G)" or "planning (Planning)" link, and see whether the design or system execution process needs to be modified, thereby promoting effective system optimization.

Product design, software systems and data engineering

Carry out corresponding interaction design (quoting, guidance, clarification, questioning, confirmation) for various situations that may arise.
Understand the complexity of ETL (extraction, transformation, loading) of unstructured data, especially when dealing with data such as PDFs and tables, and avoid over-designing (for example, wanting to support all formats of files).
Understand the basic principles and application scenarios of core components such as knowledge graph and vector database.
Have a deep understanding of system evaluability, maintainability, and knowledge life cycle management, and reserve space for these "non-functional" requirements in product design.

Part 4: System and Operational Challenges

These are larger challenges throughout the entire RAG system life cycle.

Question 1: Model selection, deployment and maintenance

The dilemma of intranet/privatized deployment

In scenarios that require intranet deployment, powerful commercial APIs (such as DeepSeek V3.1) cannot be used. Teams must:

Choose the right open source model

Among many models such as Qwen and GLM, selection is based on model performance, video memory, Chinese and English capabilities, and reasoning performance. If you need OCR, you also need to select an OCR model and weigh the accuracy and generation speed.

Addressing Hardware Challenges

Deploying LLM requires selecting and proficiently using commonly used inference engines, understanding optimization techniques such as quantification, configuring reasonable parameters, and professional engineering capabilities to ensure service stability and throughput.

Fine-tuning

In order to improve the performance of the model in specific fields (such as customer service scenarios, operation and maintenance scenarios), fine-tuning may also be required, which requires high-quality data sets and fine-tuning technology.

Question 2: Lack of evaluation system

This is one of the core pain points in the implementation of RAG.

Assessment Complexity

How to scientifically and automatically evaluate the quality of a RAG system? This requires a multi-dimensional assessment framework, including at least:

Search evaluation

Precision, Recall, Mean Ranking Reciprocal (MRR), etc.

Generate assessment

Faithfulness, Relevance, and Harmlessness of the answer.

Challenges of automated assessment

Although there are currently open source evaluation frameworks such as RAGAS, ARES, and TruLens, building an automated evaluation pipeline that is stable, reliable, and can reflect the real user experience is a complex engineering task in itself. In addition, the data distribution of each customer is different, and the data quality is also uneven. The accuracy of testing in one scenario is likely to be very different from the accuracy in another scenario.

Question 3: Knowledge life cycle management

Continuous updating of knowledge

Business knowledge changes dynamically. How to ensure that the content in the vector database is updated simultaneously when the source document is modified or deleted? How to handle version control of documents? How to dynamically update graph data?

Cost and Efficiency

Fully re-indexing the entire knowledge base is costly and time-consuming. Achieving efficient incremental indexing and real-time updates places high demands on the system's engineering architecture.

in conclusion

Although RAG technology is powerful, it is by no means a plug-and-play "silver bullet." A RAG product that can truly create value in a production environment must be a product of systems engineering and requires careful design and polishing. It requires at least:

1) On the front end

It has a robust data ETL pipeline that can handle all types of complex documents.

2) On the retrieval side

There are advanced search strategies that combine techniques such as hybrid search, query rewriting, and reordering, and can handle entity-level ambiguity.

3) On the generation side

There are sophisticated prompt engineering and fact checking mechanisms to ensure the loyalty and reliability of answers.

4) On the operational side

There is a scientific evaluation framework to guide iteration, and a complete model management and knowledge update mechanism to ensure the continuous evolution of the system.

Ignoring any of the above links may cause the RAG system to be exposed to its fragility and unreliability in the face of the complexity of the real world and the stringent requirements of users.

TopicRAG / Embeddings

Published2025-08-26 14:17

WeChat account智能大时代

Part One: Challenges in Retrieval (R) ​

Question 1: Knowledge intake and preprocessing ​

Issue 2: Entity-level ambiguity ​

Question 3: Accuracy of retrieval recall ​

Part 2: Challenges of Generating (G) Links ​

Problem 1: Hallucinations are inconsistent with facts ​

Question 2: Contextual synthesis and reasoning ​

Question 3:Completeness of Answers and Ability to Reject ​

Question 4: Decomposition and planning of complex problems ​

Question 5: Fusion and synthesis of results ​

Part 3: Product Design and Interaction ​

Part 4: System and Operational Challenges ​

Question 1: Model selection, deployment and maintenance ​

Question 2: Lack of evaluation system ​

Question 3: Knowledge life cycle management ​

in conclusion ​