RAG shows great potential with its powerful ability to combine external knowledge bases and large language models (LLM). However, it only takes 1 hour to make a RAG prototype, but to transform it into a stable, reliable, and trustworthy product in real business scenarios, a series of major obstacles need to be overcome.
Part One: Challenges in Retrieval (R)
The quality of retrieval directly determines the ceiling of RAG system performance. If the retrieval link cannot provide accurate, relevant, and complete "raw materials" for generating models, everything that follows will be a castle in the air.
Question 1: Knowledge intake and preprocessing
This is the most basic, most easily underestimated, but also the most fatal link.
- 1) Complexity of unstructured data processing
Corporate knowledge bases are filled with PDFs, Words, PPTs, pictures and even scans.
- Layout parsing failed
The multi-column layout, cross-page tables, headers and footers, charts and formulas in PDF can easily cause content confusion and information loss during automated parsing and text extraction.
- The “black hole” of tables and images
Although the accuracy of tools or models such as MinerU and VL is getting higher and higher, complex tables may require special processing, and manual inspection and optimization are still required, and it is best to filter out information irrelevant to RAG.
2) Quality and consistency of knowledge
Outdated content and version conflicts
Knowledge bases often contain multiple versions of manuals, conflicting policy documents, or defunct bulletins. The system must have the ability to identify or be informed of the "authoritative version" of knowledge.
- Factual errors and noise
The source document itself may contain incorrect information. The RAG system faithfully amplifies these errors.
3) The "art" of document slicing
Disadvantages of regular slicing
Simply and crudely segmenting the text by a fixed number of characters (such as 1,000 words) or punctuation marks can easily cut off complete semantic units, resulting in the loss of context. The answer to a question may be split into two separate chunks, causing retrieval failure.
- The challenge of semantic slicing
Slicing based on semantic boundaries such as paragraphs and titles is better, but this requires writing complex parsing rules for documents in different formats.
- “Finding a needle in a haystack” vs. “Seeing only the trees”
If the block is too large and contains too much irrelevant "noise", it will interfere with the LLM's attention and lead to the "lost in the middle" problem (large models may pay more attention to the information at the beginning and end); if the block is too small and there is insufficient contextual information, the LLM cannot perform effective reasoning.
**4) Structure and relevance of knowledge In order to establish connections with the inherent, rich structured relationships between knowledge, advanced RAG systems need to introduce knowledge graph (Knowledge Graph, KG). The knowledge graph extracts key information from unstructured text, such as entities (people, projects, departments), attributes and the relationships between them (
Belong to,person in charge,technology stack), and stores them in a structured form in the form of a graph: **Extremely expensive to build
Automatically building a high-quality knowledge graph that meets business scenarios from massive and heterogeneous documents is itself a complex NLP (natural language processing) project, involving multiple technical links such as named entity recognition (NER) and relationship extraction (RE), and often requires a large number of entity definitions, relationship modeling and domain expert verification (the entities and relationships in many scenarios need to be designed by themselves, and GraphRAG cannot be used directly).
- Fusion Complexity of Query and Retrieval (Graph RAG)
After the introduction of knowledge graph, the retrieval link is no longer a single vector search. The system needs a more intelligent "Query Planner" to decide: Should the user's problem be searched for similar text in the vector library, or should the knowledge graph be used for graph query (such as path traversal), or a combination of the two? For example, to answer "Who is the superior of the manager responsible for the 'Phoenix Project'?", you need to first find the path of "Phoenix Project" -> "Responsible Person" -> "Supervisor" in the graph. The architecture design and implementation of this fusion retrieval is very complex.
- Life cycle management challenges
The knowledge graph also needs to be updated synchronously with the source documents, which introduces additional maintenance costs and data consistency challenges.
Issue 2: Entity-level ambiguity
The document content itself is full of entities that need to be disambiguated, which is a two-dimensional problem compared to simple paragraph matching.
- 1) Entity identification and linking
The system needs to be able to identify key entities in the text, such as person names, project code names, product names, departments, etc., and link them to unique entity IDs.
- 2) Entity disambiguation
When a user asks a question or mentions "Zhang Wei" in a document, the system needs to have a mechanism to distinguish whether it is "Zhang Wei", the project manager of the R&D department, or "Zhang Wei", the director of the finance department. Lack of this capability can lead to retrieval of incorrect person information.
- 3) Synonyms and aliases
Users may ask about "Project Phoenix", but the name in the official documentation is "Project PXR". The system must maintain a huge synonym library and establish an alias library for various entity nouns.
Question 3: Accuracy of retrieval recall
Even if the user's questions and knowledge base are clear, traditional retrieval methods still have bottlenecks.
- 1) Semantic gap
There is a huge difference between the user's colloquial question and the written language or professional terminology in the document. Relying solely on vector similarity may not capture this deep connection.
- 2) Rigidity of keywords
For proper nouns, product models, and error codes that must be accurately matched, the "fuzziness" of vector retrieval becomes a weakness.
3) In order to solve the above problems, a more complex retrieval architecture must be introduced:
Hybrid Search
Combines the semantic power of vector search with the exact matching power of traditional keyword search (such as BM25).
- Query Rewriting/Transformation
Before retrieval, an LLM is used to rewrite and expand the user's original question into multiple queries that are more suitable for retrieval and contain more potential keywords and expressions.
- Multi-path Recall & Reranking
A large number of candidate document blocks are recalled from multiple sources (such as vector libraries, keyword indexes, and graph databases), and then a more powerful and resource-consuming reranker model (Reranker) is used to secondary sort these candidate blocks, and the best ones are selected and handed over to the generation model. This significantly increases system complexity and latency.
Part 2: Challenges of Generating (G) Links
Even if the retrieval phase provides perfect context, the behavior of the generative model (LLM) is full of uncertainty.
Problem 1: Hallucinations are inconsistent with facts
- "Creation" that goes against context
LLM was not 100% faithful to the context provided, but "creative play" occurred. It may erroneously generalize, make inappropriate inferences, or stitch together multiple unrelated pieces of context to create "new knowledge" that seems smooth but contradicts the facts of the original text.
- Pollution from citing external knowledge
LLM may unintentionally mix its vast internal knowledge (training data) and the context provided by RAG, resulting in contaminated answers, especially when the context information is insufficient.
Question 2: Contextual synthesis and reasoning
- Multi-source information integration failed
LLM may not perform well when the answer to a question needs to be extracted from multiple document fragments, even with slightly contradictory views. It may only focus on the first or last fragment, or fail to construct a coherent logical chain.
- The bottleneck of complex logical reasoning
Conclusions that require multi-step reasoning are still a challenge for LLM. For example, "Based on product A's installation manual and company B's security policy, explain whether installing product A will violate company B's regulations." This requires extremely high logical reasoning capabilities of the model.
Question 3:Completeness of Answers and Ability to Reject
- The challenge of “knowing what you don’t know”
When the retrieved context is insufficient to fully answer the question, an ideal RAG system should explicitly reject it and inform the user "Based on the available information, I cannot answer the part about XXX". However, many LLMs tend to “do their best” and give speculative or one-sided answers based on insufficient information.
- Answer bias
LLM may unintentionally favor certain segments when integrating information, resulting in the final answer being context-based but biased.
Question 4: Decomposition and planning of complex problems
- Problem description:
A single user question often contains multiple sub-questions that require independent queries to answer (query decomposition), or require a logical chain that relies on the results of the previous step to solve (multi-step reasoning). A simple "retrieve-generate" pipeline cannot handle this type of problem.
- Comparison questions
"Compare the advantages and disadvantages of product A and product B." -> This needs to be broken down into two separate subqueries: "What are the advantages and disadvantages of product A?" and "What are the advantages and disadvantages of product B?".
- Association/Multi-hop issues
"What's the phone number of the manager's supervisor in charge of the 'Phoenix Project'?" -> This requires a longer chain of reasoning:
Step 1: Find the documentation about "Phoenix Project".
Step 2: Identify who the project manager is from the document (such as "Zhang San").
Step 3: Search "Zhang San"'s personal file or organizational chart to find out who his superior is (such as "Li Si"). Step 4: Finally, search the contact information of "John Doe" and find his phone number.
Solution introduction and challenges:
In order to solve this problem, a "Planner" or "Orchestrator" needs to be introduced, which is essentially "Agentic RAG" to the RAG system.
- The role of the planner
This higher-level LLM (or logic module) first analyzes the user's original question and decides whether it should be answered directly or if it needs to be broken down into a "plan" with multiple steps.
- Tool Use
The planner assigns each sub-problem or reasoning step to a different "tool" for execution. The core tool is the RAG retriever we discussed earlier, but it may also include code interpreters (for calculations), web searches, database queries, etc.
- Status Management
The system needs to maintain a "scratchpad". After each step is executed, the execution results need to be recorded, because the execution of subsequent steps may depend on the output of the previous step.
Question 5: Fusion and synthesis of results
- Problem description:
After all subqueries or inference steps have been performed, the system has a bunch of fragmented pieces of information from different search results. How to fuse these fragmentsintelligently, coherently and without distortioninto a final, unified answer is a huge challenge.
The challenge:
Information conflict
Different search results may contain conflicting information, and the system needs to have strategies to determine which is more authoritative or how to present this conflict to the user.
- Logical coherence
Simply splicing all the answer fragments together will appear stilted and confusing. LLM needs to understand the role of each fragment in the entire logical chain and organize it in fluent language.
- Completeness of answers
If a subquery fails, the system must be able to identify the missing information when synthesizing the final answer and clearly inform the user "I could not find relevant information about XXX."
- Surge in system complexity
The introduction of planning and decomposition mechanisms means that the system changes from a linear "pipeline" to a complex "graph" or "state machine" with loops and conditional judgments. This brings huge challenges to development, debugging and maintenance.
- Error Propagation
In this chain call, small errors in any link (inaccurate retrieval of a subquery, illusion of one-step reasoning) may be amplified and passed to subsequent links, eventually leading to the failure of the entire plan or the production of a final answer that seems reasonable but is completely wrong.
- Addence of cost and delay
Each decomposition, each step of reasoning, and each final synthesis may require an LLM call. This multiplies the latency and API call costs of the entire process.
Part 3: Product Design and Interaction
In production-level RAG projects, the product manager's ability requirements far exceed the requirements analysis and functional design of traditional software systems. They must become "super connectors" that connect the four major areas of user mind, business logic, large model capabilities and system engineering. This places extremely high demands on the technical literacy and comprehensive capabilities of product managers:
Well versed in large model (LLM)
- Understand the capabilities of large models
The generation principle of large model, as well as the capabilities of generation, reasoning, planning, summarization, and intention understanding.
- Clear the capability boundaries of large models
We must clearly understand the inherent "illusions", factual inconsistencies, and fragile logical reasoning of LLM.
- Learn about relevant technology stacks
Multi-modal model (model capabilities and boundaries), embedded model (based on encoder architecture and decoder architecture, and multi-modal), agent framework LangChain or platform Dify, etc.
- Understand costs and delays
It is necessary to have a quantitative understanding of the token consumption, API call cost and inference delay of different models. When designing a complex interaction that requires multi-step reasoning, the user value it brings must be weighed against the high resource consumption behind it.
Proficient in the operational logic of RAG and Agentic architectures
- Insight into the entire RAG process
Be able to clearly draw the entire flow chart from "data ingestion" to "final answer generation", and understand the technical details of each link (such as chunking, embedding, retrieval, rearrangement, generation) and its impact on the final result.
- Understand the overall process of Agentic
For Agentic RAG that involves planning, decomposition, and tool use, product managers need to understand its internal execution logic such as state management, error propagation (waterfall effect), etc.
- Identify failed attribution
When a user reports an error, product managers need to be able to collaborate with engineers to quickly determine whether the root cause of the problem lies in the "retrieval (R)", "generation (G)" or "planning (Planning)" link, and see whether the design or system execution process needs to be modified, thereby promoting effective system optimization.
Product design, software systems and data engineering
Carry out corresponding interaction design (quoting, guidance, clarification, questioning, confirmation) for various situations that may arise.
Understand the complexity of ETL (extraction, transformation, loading) of unstructured data, especially when dealing with data such as PDFs and tables, and avoid over-designing (for example, wanting to support all formats of files).
Understand the basic principles and application scenarios of core components such as knowledge graph and vector database.
Have a deep understanding of system evaluability, maintainability, and knowledge life cycle management, and reserve space for these "non-functional" requirements in product design.
Part 4: System and Operational Challenges
These are larger challenges throughout the entire RAG system life cycle.
Question 1: Model selection, deployment and maintenance
- The dilemma of intranet/privatized deployment
In scenarios that require intranet deployment, powerful commercial APIs (such as DeepSeek V3.1) cannot be used. Teams must:
- Choose the right open source model
Among many models such as Qwen and GLM, selection is based on model performance, video memory, Chinese and English capabilities, and reasoning performance. If you need OCR, you also need to select an OCR model and weigh the accuracy and generation speed.
- Addressing Hardware Challenges
Deploying LLM requires selecting and proficiently using commonly used inference engines, understanding optimization techniques such as quantification, configuring reasonable parameters, and professional engineering capabilities to ensure service stability and throughput.
- Fine-tuning
In order to improve the performance of the model in specific fields (such as customer service scenarios, operation and maintenance scenarios), fine-tuning may also be required, which requires high-quality data sets and fine-tuning technology.
Question 2: Lack of evaluation system
This is one of the core pain points in the implementation of RAG.
- Assessment Complexity
How to scientifically and automatically evaluate the quality of a RAG system? This requires a multi-dimensional assessment framework, including at least:
- Search evaluation
Precision, Recall, Mean Ranking Reciprocal (MRR), etc.
- Generate assessment
Faithfulness, Relevance, and Harmlessness of the answer.
- Challenges of automated assessment
Although there are currently open source evaluation frameworks such as RAGAS, ARES, and TruLens, building an automated evaluation pipeline that is stable, reliable, and can reflect the real user experience is a complex engineering task in itself. In addition, the data distribution of each customer is different, and the data quality is also uneven. The accuracy of testing in one scenario is likely to be very different from the accuracy in another scenario.
Question 3: Knowledge life cycle management
- Continuous updating of knowledge
Business knowledge changes dynamically. How to ensure that the content in the vector database is updated simultaneously when the source document is modified or deleted? How to handle version control of documents? How to dynamically update graph data?
- Cost and Efficiency
Fully re-indexing the entire knowledge base is costly and time-consuming. Achieving efficient incremental indexing and real-time updates places high demands on the system's engineering architecture.
in conclusion
Although RAG technology is powerful, it is by no means a plug-and-play "silver bullet." A RAG product that can truly create value in a production environment must be a product of systems engineering and requires careful design and polishing. It requires at least:
- 1) On the front end
It has a robust data ETL pipeline that can handle all types of complex documents.
- 2) On the retrieval side
There are advanced search strategies that combine techniques such as hybrid search, query rewriting, and reordering, and can handle entity-level ambiguity.
- 3) On the generation side
There are sophisticated prompt engineering and fact checking mechanisms to ensure the loyalty and reliability of answers.
- 4) On the operational side
There is a scientific evaluation framework to guide iteration, and a complete model management and knowledge update mechanism to ensure the continuous evolution of the system.
Ignoring any of the above links may cause the RAG system to be exposed to its fragility and unreliability in the face of the complexity of the real world and the stringent requirements of users.