WebWeaver
如果无法正常显示,请先停止浏览器的去广告插件。
1. 2025-09-17
WebWeaver: Structuring Web-Scale Evidence with
Dynamic Outlines for Open-Ended Deep Research
Zijian Li, Xin Guan, Bo Zhang, Shen Huang ( ) , Houquan Zhou, Shaopeng Lai,
Ming Yan, Yong Jiang ( ) , Pengjun Xie, Fei Huang, Jun Zhang, Jingren Zhou
Tongyi Lab
, Alibaba Group
https://tongyi-agent.github.io/blog
https://github.com/Alibaba-NLP/DeepResearch
Abstract
This paper tackles open-ended deep research (OEDR), a complex challenge
where AI agents must synthesize vast web-scale information into insightful re-
ports. Current approaches are plagued by dual-fold limitations: static research
pipelines that decouple planning from evidence acquisition and one-shot gen-
eration paradigms that easily suffer from long-context failure issues like "loss
in the middle" and hallucinations. To address these challenges, we introduce
WebWeaver, a novel dual-agent framework that emulates the human research
process. The planner operates in a dynamic cycle, iteratively interleaving
evidence acquisition with outline optimization to produce a comprehensive,
source-grounded outline linking to a memory bank of evidence. The writer
then executes a hierarchical retrieval and writing process, composing the re-
port section by section. By performing targeted retrieval of only the necessary
evidence from the memory bank for each part, it effectively mitigates long-
context issues. Our framework establishes a new state-of-the-art across major
OEDR benchmarks, including DeepResearch Bench, DeepConsult, and Deep-
ResearchGym. These results validate our human-centric, iterative methodology,
demonstrating that adaptive planning and focused synthesis are crucial for
producing high-quality, reliable, and well-structured reports.
7.0
44
44.64 44.34
42
45.00
6.64
95.0
6.5
46.45
46
48
97.5
6.70
6.0
5.42
5.5
5.00
5.0
4.60
4.5
92.5
96.77
96.02
95.07
91.27
90.0
87.5
85.0
82.5
80.0
4.0
49.71
6.96
50.62 50.58
DeepResearchGym
100.0
84.46
80.25
52
50
DeepConsult
7.5
DeepResearch Bench
Figure 1: Performance of varying deep research agents on DeepResearch Bench, DeepConsult, and
DeepResearchGym. The results on DeepResearch Bench are taken from the official leaderboard. Our
proposed WebWeaver achieves state-of-the-art performance.
Corresponding author.
1
2. 1
Introduction
Large Language Models (LLMs) (OpenAI, 2025b; Qwen Team, 2025; Liu et al., 2024; DeepMind, 2025;
anthropic, 2025) have demonstrated remarkable capabilities across a wide array of well-defined tasks,
from factual question answering (Wei et al., 2025; Mialon et al., 2023) to document summarization (Zhang
et al., 2025) and code generation (Jiang et al., 2024). Their success, however, has largely been confined to
scenarios with clear instructions and ground-truth answers. The true frontier for autonomous AI lies
in transcending these structured problems to tackle the complex, open-ended challenges that define
human-level knowledge work—a process driven by curiosity, synthesis, and the discovery of novel
insights. We term this challenge open-ended deep research (OEDR). Unlike tasks with ground-truth
answers, OEDR requires an agent to independently navigate and digest a vast corpus of information,
often exceeding 100 web pages and PDFs, to form a detailed report that offers unique, synthesized
viewpoints. This represents a monumental challenge, and as shown in Fig. 1, most current agents fail
dramatically on the recent benchmarks (Du et al., 2025; Consult, 2025; Coelho et al., 2025) designed to test
this capability, highlighting a critical gap we aim to address.
Current attempts to tackle OEDR fall into two main categories: proprietary and open-source solutions.
While several powerful proprietary agents exist (OpenAI, 2025a; Research, 2025b;d;a), their prohibitively
expensive APIs and restrictive quotas create significant barriers, limiting widespread adoption and
hindering academic research. Consequently, the focus has shifted towards open-source alternatives,
which predominantly follow two paradigms. As shown in Fig. 2, the first is a straightforward "search-
then-generate" approach (Tao et al., 2025; Roucher et al., 2025), where the agent gathers all information
before directly generating a report. This method often results in low-quality, incoherent outputs because
it lacks a guiding outline to structure the synthesis. The second, more sophisticated approach first
generates a static outline and then performs targeted searches for each section (Han et al., 2025; Research,
2025e;c). However, this strategy is critically flawed: the outline is fixed upfront, relying solely on the
LLM’s internal and often outdated knowledge. This rigidity "fossilizes" the research process, preventing
the agent from exploring unexpected but valuable avenues discovered during its search. Furthermore,
feeding all retrieved materials into a single context for final generation is susceptible to well-known
issues like “loss in the middle” (Liu et al., 2023) and increased hallucinations, compromising the report’s
accuracy and depth (Bai et al., 2024; Wu et al., 2025c).
The key, we believe, lies in abandoning rigid, machine-like pipelines and instead embracing the organic
process of human intellect. Our approach is designed to do just that: it teaches the agent to research like
a person. A human expert doesn’t finalize their entire plan before starting; they allow their outline to
be a living document. We implement this principle through an agentic loop where actions of searching
and outline optimization are provided. As the agent explores the web-scale information landscape,
its discoveries continuously inform and reshape the outline. This allows for genuine exploration and
adaptation, ensuring that the research is not confined to its initial, limited understanding. Then, when it
is time to write, our agent avoids the brute-force method of "reading" everything at once. Just as a human
writer would refer to specific notes for a specific chapter, our agent composes each section by focusing
only on the most pertinent source materials. By doing so, it operates with clarity and precision, crafting
a final report that is not just a summary of data but a well-structured and deeply considered piece of
analysis.
To this end, we propose WebWeaver by following the human-centric philosophy, a dual-agent framework
comprising a planner and a writer. As shown in Fig. 2, the planner embodies the exploratory research
phase, operating in a dynamic cycle that iteratively interleaves evidence acquisition with outline op-
timization, culminating in a comprehensive, source-grounded research outline, where each section is
explicitly linked via citations to a curated memory bank of source evidence. When it turns to the writing
phase, to address the critical long context and attentional management challenge, the writer executes a
memory-grounded, hierarchical synthesis process. It constructs the report section by section, performing
2
3. Query
Planner
Query
Report
generation
Static Outline
Section 1
Section 2
Search
Planner
(a) Search then generate
Search
(b) Outline-guided search then generate
Outline
Optimization
Query
Report
generation
Planner
Hierarchical
Writing
Outline Report
Section 1 Section 1
Writer
Section 2
Search
Section 2
Retrieve
(c) WebWeaver (ours)
Report
Memory bank
Figure 2: Paradigm comparison: (a) the search-then-generate paradigm first gathers information and
then directly generates a report; (b) the paradigm initializes a static outline and then performs targeted
searches for the outline; (c) WebWeaver not only enables a dynamic research cycle where the outline
and search strategy co-evolve but allows hierarchical and attentional writing by retrieving only relevant
evidence.
targeted retrieval of only the relevant evidence from a structured memory bank for each subtask. This
synergistic division of labor enables our agent to navigate complex information landscapes and produce
reports that are both comprehensive in scope and meticulous in their evidentiary grounding.
Extensive experiments demonstrate that WebWeaver achieves state-of-the-art (SOTA) performance and
outperforms both the proprietary and open-source agent systems on three recent and challenging open-
ended deep research benchmarks. Detailed discussion is produced to demonstrate the effectiveness of
outline optimization and memory-grounded synthesis. Critically, to enhance the performance of the
smaller models for practical use, we construct a high-quality SFT dataset, WebWeaver-3k, generated
by our framework. The supervised finetuning experiments with WebWeaver-3k demonstrate that the
complex skills of thinking, searching, and writing can be distilled and taught, enabling smaller, accessible
models to achieve the expert-level performance previously confined to large-scale proprietary systems.
2
Preliminaries
Problem definition. We consider the open-ended research question without the ground-truth answers.
Given an open-ended question, the agents need to search relevant information and finally output a
report or article. To achieve this, we implement a planning agent for collecting information, a memory to
store materials, and a writing agent for report generation. For both the planning and writing agents, we
adopt ReAct (Yao et al., 2023) as the agent’s framework. Upon receiving a question, the agents perform
several iterations of thought-action-observation. Specifically, in each iteration, based on the existing context,
the LLM generates a thought and executes a parsable action, then awaits the environment to return an
observation. The planning and writing stages terminate with the output token of “<terminate>”. A
complete trajectory with T iterations can be defined as
H T = ( τ 0 , a 0 , o 0 , . . . , τ i , a i , o i , . . . , τ T , a T ) ,
(1)
where τ i , a i , o i represent thought, action, and observation sampled from the planning or writing policy
based on all previous context in the i-th round, respectively.
3
4. Actions. For the planner, the action space consists of search, write outline, and terminate. Given the
search queries, the search engine returns titles, snippets, and corresponding URLs. To save context space,
we further execute the actions of the URL selection, parsing the page via URL, summarizing relevant
contents, and extracting evidence with LLMs following the searching queries. The search tool finally
returns the selected URLs and their corresponding summaries. The action of “write outline” is to generate
and optimize the outline, and the “terminate” action is to terminate the planning process.
For the writer, the action space consists of retrieve, write, and terminate. Besides the terminate action, the
retrieve action is to retrieve evidence from the memory bank by providing the grounded citations in the
outline. The write action is provided to write the section of the report.
Memory bank. Answering an open-ended question requires long-context input of the collected informa-
tion and long-context output of the final report. To search sufficient materials, the planner often searches
and parses more than 100 web pages, with more than 100k tokens. The writer often outputs more than
20k tokens to produce a comprehensive report. Prior open-sourced deep research agents (Roucher et al.,
2025; Research, 2025e;c) include all the raw materials (e.g., web pages and PDF files) in the LLM context,
leading to quality degradation due to attentional failures like the “lost in the middle” problem, poor
coherence, and increased hallucinations (Liu et al., 2023; Li et al., 2024; Bai et al., 2024; Wu et al., 2025c).
To this end, we introduce a memory to achieve context management for both planner and writer. Only a
short summary of the web page or PDF file is included in the search context, and only necessary raw
pages will be retrieved from the memory to write the corresponding sections.
3
Method
Our methodology is embodied in a dual-agent framework, comprising a planner and a writer. The
planner is responsible for the dynamic cycle of evidence acquisition and outline optimization, while the
writer performs focused, section-by-section synthesis to construct the final report. This division of labor
directly mirrors the cognitive workflow of a human researcher.
3.1
Overview of WebWeaver
The entire workflow is visualized in Fig. 3. Tasked with evidence acquisition and outline optimization,
the planner operates in a dynamic research cycle. It iteratively interleaves evidence acquisition from
web searches with the continuous refinement and optimization of a report outline. The output of this
exploratory phase is not just a collection of sources but a comprehensive, well-structured outline where
each section is explicitly linked via citations to a curated memory bank of source evidence.
Subsequently, the writer takes over for the synthesis phase. To circumvent the pitfalls of one-shot
generation and long-context issues, the writer adopts a section-wise and memory-grounded synthesis
approach. For each section of the outline, it performs targeted retrieval of only the pertinent evidence
from the memory bank and composes the content. This division of labor ensures that the final report is
not only coherent and well-organized but also deeply source-grounded, faithfully mirroring the rigor of
human-led deep research.
3.2
Research Cycle: Iterative Evidence Acquisition and Outline Optimization
Recent deep research agents often follow an "outline-guided search" paradigm (Han et al., 2025; Research,
2025e;c). By generating a static outline before any evidence is gathered, they create a rigid research path
that is blind to emergent insights. This fundamental decoupling of planning from discovery limits the
depth and breadth of the research. We address this by proposing a dynamic research cycle where the
outline and search strategy co-evolve, allowing the agent to adapt and explore new findings.
The core of our planner’s operation is a dynamic research cycle that iteratively interleaves evidence
4
5. Query
Please conduct a study and prepare a report on the 'Construction and Application of a Sports Intelligent
Tutoring and Learning Guidance System Driven by Multimodal Data Fusion’.
Think for searching:
1. I need to conduct a compre-
hensive study on "Construction
and Application …”
2. …
{"query": [
"sports intelligent
tutoring system multimodal
data fusion",
…
]}
Think
Actions
Memory Bank
Write
Outline <ID_1>
Summary: The
article offers a …
Evidence: …
Observations
Summary
<ID_2>
Summary: The
article offers a …
Evidence: …
Think
Actions
Retrieve
Write
Terminate
Observations
Summary: The article
offers a …
Think for writing:
From the materials,
we obtain ...
Search
Terminate
Outline
Outline
Section
Round 1: Outline Optimize Round 2: Outline
1. Construction and
Application…
1.1 I. Introduction…
1.1.1 A. Definition
<citation>id_1</citation>
1.1.2 B. Role of Multimodal Data
Fusion
Evidence
Round 1: Write Section 1
1. Construction and Application…
1.1 I. Introduction…
1.1.1 A. Definition
<citation>id_1</citation>
1.1.2 B. Evolution from Traditional
Sports …<citation>id_2</citation>
1.1.3 C. Role of Multimodal Data
Fusion <citation>id_3</citation>
1. Construction and
Application…
1.1 Introduction…
1.1.1 Definition: Sports Intelligent
Tutoring Systems (Sports ITS)
<citation>id_1</citation>
…
Think for retrieving
1. I need to start writing a
comprehensive report …”
2. I should begin with
retrieving…
{
}
"retrieve_id ": [
"id_1", "id_2", "id_3”
]
Evidence:
<id_1>
A sports news web page…
Round 2: Write Section 2
2. Theoretical Foundations
2.1 Intelligent Tutoring Systems
(ITS) Framework
2.1.1 Core Components and
Architecture
The theoretical foundation of
Sports Intelligent…
Report
Figure 3: The workflow of WeaWeaver. Left: The planner first iteratively collects evidence via the
search tool and optimizes the outline until outputting a comprehensive and citation-grounded outline.
Right: The writer performs hierarchical and attentional writing by retrieving relevant evidence with the
grounded citation in the outline.
acquisition with outline optimization. Unlike static approaches, our planner continuously adapts its
strategy based on emergent findings. For each step, the planner selects one of the three actions: search,
outline optimization, and terminate.
Evidence acquisition. When there is still insufficient evidence or knowledge to make a comprehensive
outline to answer the open-ended question, the planner will continue collecting evidence by executing
the search action. Given any search queries, the planner begins by querying a web search engine, which
returns the results that contain the raw URLs with corresponding snippets and titles. To combat the
contextual noise and processing overhead from raw URLs, it employs a two-stage filtering process. First,
we prompt LLMs to select only the relevant URLs based on titles and snippets. Then, for each parsed
page of the selected URLs, we perform two critical actions: leveraging LLMs to (1) distill a query-relevant
summary, which is fed back into the planner’s context to inform subsequent search iterations, and (2)
extract verifiable, detailed evidence (e.g., quotes, data points), which is stored in a structured memory
bank for the subsequent writing.
Outline optimization. After acquiring some evidence, the planner revisits the report’s outline. This is not
a one-time generation step but a process of continuous refinement and optimization. The planner uses
the newly acquired information to expand sections, add new subsections, or even restructure the entire
outline to better reflect a comprehensive understanding of the topic. Crucially, it populates the outline
with citations, mapping each section to the specific evidence IDs in the memory bank. This citation
mechanism is vital for ensuring source-groundedness and enabling the hierarchical writing process in
the next stage. This iterative loop continues until the planner outputs a terminate action with a tag
“<terminate>” when the outline is sufficiently comprehensive and well-supported by evidence.
3.3
Memory-Grounded Synthesis: Hierarchical Retrieval and Writing
A pivotal challenge in generating long-form reports is not just information access but attentional man-
agement. The prevailing approach of feeding all gathered evidence into a single context window for
5
6. one-shot generation is fundamentally flawed. This brute-force method saturates the model’s attentional
capacity, leading to some long-context issues like “loss in the middle (Liu et al., 2023)”, where crucial
details are overlooked, and “contextual bleeding (Liu et al., 2025)”, where information from one section
incorrectly influences the synthesis of another. We argue that a successful synthesis process must mirror
human cognition by breaking down the complex task of long-context, one-step writing into manageable
subtasks of attentional writing with only relevant evidence. Therefore, we adopt a hierarchical, divide-
and-conquer strategy, where the report is constructed sequentially, with the model’s focus constrained to
only the most relevant evidence at each step.
Upon completion of the planning phase, the writer is provided with the structured, source-grounded
outline and access to the evidence memory bank. The composition of each section is not a single,
monolithic action but a deliberate, intra-sectional reasoning cycle designed to ensure both accuracy and
coherence. This cycle unfolds as follows:
First, the writer identifies its immediate subtask, such as “Let’s write the first section.” It then executes a
targeted retrieval action, pulling only the relevant evidence from the memory bank as indicated by the
outline’s citations. Upon receiving the evidence, the writer enters a crucial internal reasoning phase with
a think action. In this thinking step, it analyzes the retrieved content, synthesizes key insights, selects the
most compelling pieces of evidence, and formulates a coherent narrative structure for the section. This
internal monologue is critical for moving beyond simple summarization to genuine synthesis.
Only after this internal synthesis plan is formed does the writer proceed to the writing action, composing
the prose and encapsulating it within “<write>” tags. Once a section is complete, its corresponding source
materials are explicitly pruned from the context window and replaced with a placeholder message. This
dynamic retrieval-and-pruning mechanism is the cornerstone of our approach: it ensures the writer’s
context remains highly relevant for the next cycle, mitigates context overflow, and prevents cross-sectional
interference. This entire process repeats hierarchically for all sections until the writer outputs a final
“<terminate>” token, signaling the completion of the full report.
4
Experiments
In this section, we first evaluate WebWeaver on three recent and challenging benchmarks. Detailed
discussion is then produced to demonstrate the effectiveness of outline optimization and memory-
grounded synthesis. Furthermore, we curate a high-quality SFT dataset to improve the capabilities of
thinking, searching, and writing for a smaller model to achieve expert-level performance.
4.1
Setup
Benchmarks. To evaluate the performance of Deep Research systems, we use three open-ended bench-
mark datasets:
• DeepResearch Bench (Du et al., 2025) comprises 100 PhD-level complex research tasks metic-
ulously formulated by domain experts across 22 distinct fields, such as Science & Technology,
Finance & Business, Software Engineering, and Art & Design.
• DeepConsult (Consult, 2025) is a specialized collection of prompts tailored for in-depth research
within the business and consulting domains. The query set encompasses a wide range of topics,
including marketing strategy, financial analysis, emerging technology trends, and business
planning.
• DeepResearchGym (Coelho et al., 2025) is used to assess performance on real-world, complex
queries. This dataset contains 100 queries sampled from the extensive Researchy Questions
dataset (Rosset et al., 2024), which includes approximately 96,000 authentic information-seeking
queries.
6
7. RACE
Agent systems
FACT
Overall Comp. Insight Inst. Read. Eff. c. C. acc.
WebShaper (32B) 34.93 31.58 26.17 44.81 40.38 - -
langchain-open-deep-research
doubao-research
kimi-research
Claude-research
openai-deepresearch
Gemini-2.5-pro-deepresearch 43.44
44.34
44.64
45.00
46.45
49.71 42.97
44.84
44.96
45.34
46.46
49.51 39.17
40.56
41.97
42.79
43.73
49.45 48.09
47.95
47.14
47.58
49.39
50.12 45.22
44.69
45.59
44.66
47.22
50.00 52.62
-
-
39.79
165.34 52.86
-
-
75.01
78.3
WebWeaver (qwen3-30b-a3b-instruct-2507)
WebWeaver (gpt-oss-120b)
WebWeaver (qwen3-235b-a22b-instruct-2507)
WebWeaver (Claude-sonnet-4-20250514) 46.77
48.11
50.62
50.58 45.15
48.03
51.29
51.45 45.78
47.20
51.00
50.02 49.21
48.94
49.98
50.81 47.34
48.11
48.89
49.79 26.74
64.88
166.73
200.75 25.00
66.14
78.25
93.37
Table 1: Performance of agents on DeepResearch Bench in terms of comprehensiveness (Comp.), insight,
instruction-following (Inst.), readability (Read.), effective citations (Eff. c.), and citation accuracy (C.
acc.). The best results are highlighted with green color, and the second-best results are highlighted with
underlines.
Metric. We use the official evaluation metrics with the recommended judge LLMs for each benchmark.
• DeepResearch Bench. This benchmark utilizes two suites of metrics to evaluate different aspects
of the system’s output: 1) RACE (Report Quality): It assesses the quality of the final generated
report with a reference report across four dimensions, namely Comprehensiveness (Comp.),
Insight/Depth (Insight), Instruction-Following (Inst.), and Readability (Read.). An overall score
is then calculated as a weighted summation of these components. 2) FACT (Web Retrieval):
It measures the effectiveness and reliability of the information retrieval process. This includes
Citation Accuracy (C. Acc.) and the Average Effective Citations per Task (Eff. c.). We adopt
Gemini-2.5-pro as the judgement model by following the benchmark.
• DeepConsult. Performance on this benchmark is determined through a pairwise comparison
against the openai-deepsearch baseline. The primary metrics are the win rate, tie rate, and loss
rate, which are supplemented by a reported average quality score. The judgement model is
gpt-4.1-20250414.
• DeepResearchGym. An LLM acts as a judge to assess the generated report on several quality
dimensions, including clarity, insightfulness, depth, balance, breadth, support, and an average
quality score. The judgement model is gpt-4.1-mini-20250414.
Compared systems. We benchmark the performance of WebWeaver against a range of state-of-the-art
DeepResearch systems. These systems are categorized into two groups:
• Open-Source Systems: For open-source counterparts, we compare against WebShaper-32B (Tao
et al., 2025) and langchain-open-deep-research (LangChain, Inc., 2023).
• Proprietary Systems: We include several leading commercial systems: doubao-research (Research,
2025a), kimi-research (Research, 2025d), Claude-research (anthropic, 2025), openai-deepresearch
(OpenAI, 2025a), and Gemini-2.5-pro-deepresearch (Research, 2025b).
Implementation details. The WebWeaver is compatible with various advanced LLMs. In the experi-
ments, we utilize the following models: Qwen3-30b-a3b-instruct-2507 (Yang et al., 2025), GPT-oss-120b
(Agarwal et al., 2025), Qwen3-235b-a22b-instruct-2507 (Yang et al., 2025), and Claude-sonnet-4-20250514
7
8. DeepConsult
Agent systems
WebShaper (32B)
win
tie
lose
DeepResearchGym
Avg. score
Cla.
Depth
Bal.
Brea.
Sup.
Ins.
Avg. score
3.25 3.75 93.00 1.63 64.70 63.00 59.30 66.50 9.40 59.90 53.80
doubao-research
Claude-research
openai-deepresearch
Gemini-2.5-pro-deepresearch 29.95
25.00
0.00
61.27 40.35
38.89
100.00
31.13 29.70
36.11
0.00
7.60 5.42
4.60
5.00
6.70 68.85
86.67
84.90
90.71 93.12
96.88
98.10
99.90 83.96
84.41
89.80
93.37 93.33
96.56
97.40
99.69 84.38
26.77
88.40
95.00 83.12
90.22
89.00
97.45 84.46
80.25
91.27
96.02
WebWeaver (qwen3-30b-a3b-instruct-2507)
WebWeaver (qwen3-235b-a22b-instruct-2507)
WebWeaver (gpt-oss-120b)
WebWeaver (Claude-sonnet-4-20250514) 28.65
54.74
65.31
66.86 34.90
28.61
11.22
10.47 36.46
16.67
23.47
22.67 4.57
6.47
6.64
6.96 71.88
89.16
89.78
90.50 85.51
97.58
100.00
99.87 75.80
87.68
91.91
94.30 84.78
96.21
99.66
100.00 63.77
95.26
94.94
98.73 81.88
92.85
95.06
97.22 77.27
93.14
95.07
96.77
Table 2: Performance of agents on DeepConsult in terms of win rate and average scores and on DeepRe-
searchGym in terms of clarity (Cla.), depth, balance (Bal.), breadth (Brea.), support (Sup.), and insightful-
ness (Ins.). The best results are highlighted with green color, and the second-best results are highlighted
with underlines.
(anthropic, 2025). We adopt Claude-sonnet-4-20250514 as the default agent model for ablation studies and
discussion without any statements. We use GPT-oss-120b to select relevant URLs, perform query-relevant
summaries, and extract evidence for the search action. We present the case studies in Appendix B.
4.2
Main Results
Results on DeepResearch Bench. As presented in Table 1, our WebWeaver framework establishes a
new state-of-the-art, consistently outperforming existing agents. This superior performance is a direct
result of our dual-agent, iterative methodology. The high scores in comprehensiveness (Comp.) and
insight stem from the planner’s dynamic research cycle, which iteratively expands the report’s scope
based on emergent findings, unlike the rigid outline-first approaches. This process naturally leads to
a higher number of effective citations (Eff. c.), as the planner is intrinsically motivated to seek more
evidence to ensure that each section is well-supported. Furthermore, the remarkable citation accuracy
(C. acc.) of 93.37% is achieved by the strong synergy between our agents: the planner embeds specific
citation IDs into the outline, and the writer’s hierarchical synthesis process uses this structure for targeted
retrieval. By focusing only on relevant evidence for each section, it drastically reduces context-bleeding
and hallucinations, which also contributes to the enhanced readability (Read.) and Instruction-following
(Inst.) scores. This demonstrates that by emulating human research patterns, our framework produces
not just more thorough but also significantly more reliable and well-structured reports.
Results on DeepConsult and DeepResearchGym. To validate the generalizability of our framework,
we further evaluated WebWeaver on the DeepConsult and DeepResearchGym benchmarks, with results
presented in Table 2. Our method demonstrates clear superiority on both, achieving the highest win rate
(66.86%) on DeepConsult and the top average score (96.77) on DeepResearchGym. This success is rooted
in our core design. The near-perfect scores in Depth (100.00) and Breadth (100.00) are a direct result of the
planner’s iterative research cycle, which relentlessly expands the report’s scope beyond the limits of static
planning. Concurrently, the writer’s hierarchical synthesis process ensures these comprehensive findings
are well-organized, leading to outstanding scores in balance (94.30) and support (98.73). In essence, the
quantitative dominance in structural metrics like depth and breadth on DeepResearchGym provides a
clear explanation for the qualitative victories on DeepConsult, proving that our human-inspired, iterative
process is a fundamentally more robust strategy for complex information synthesis tasks.
4.3
Analysis
Statistics of planning and writing. The statistics in Table 3 provide a compelling quantitative narrative
that not only justifies but also demonstrates the benefits of WebWeaver’s design. The planning task
8
9. Planning statistics
DeepResearch Bench
DeepResearchGym
# Search
step
15.71
16.65
# Outline
token
4876.21
3732.87
# Outline
optimization
2.16
2.20
# Saved
page
112.25
102.55
Writing statistics
# Search
query
20.24
21.93
# Evidence
token
67237
66301
# Summary
token
14980
12543
# Output
token
26127
26004
# Writing
step
24.78
24.71
Table 3: The planning and writing statistics of Claude-sonnet-4-20250514 on DeepResearch Bench and
DeepResearchGym.
DeepResearch Bench
4 rounds
1 round
15.0% 5.0%
DeepResearchGym
4 rounds
1 round
3 rounds
13.1% 4.0%
21.0%
3 rounds
25.3%
57.6%
59.0%
2 rounds
2 rounds
Figure 4: Statistics of outline optimization of Claude-sonnet-4-20250514 on DeepResearch Bench and
DeepResearchGym.
involves an extensive exploratory phase with nearly 16 search steps and 21 unique search queries, proving
that a simple, linear search is insufficient. The critical finding is that the outline undergoes more than two
optimization cycles on average, expanding into a complex 4k-token outline. This empirically invalidates
static-outline approaches and shows the tangible benefit of our iterative process: it produces a richer,
more comprehensive plan that adapts to discovery. This deep planning phase amasses a staggering
amount of information—over 100 saved pages, culminating in 67k evidence tokens and 15k summary
tokens. This sheer volume makes a single-context approach computationally hard, thus mandating
our memory-centric architecture with targeted retrieval as a foundational requirement, not just an
optimization. Finally, the writer’s process of composing a 26k-token report in 25 discrete writing steps
validates that our hierarchical synthesis is a practical way to maintain coherence over long outputs. In
essence, the statistics of searching and writing affirm that each component of WebWeaver is a necessary
and beneficial response to the inherent challenges of OEDR.
DeepResearch Bench
DeepResearchGym
52
51
50.82
50.50
50
48
49.55
49.84
100
98
49.24
48.85
48.73 48.72
48.35
48.15
49.26
49
49.65
Rounds
Round 1
Round 2
Round 3
48.58
47.91
96
Rounds
Round 1
Round 2
Round 3
100.00
99.55 99.33
98.33
97.33
96.32
96.00
95.68
95.91
93.18
92.67
46.33
90
46
45
overall scores Comprehensiveness
Insight
88
Instruction following Readability
Evaluation Metrics
Figure 5: End-to-end scores with varying rounds
of outline optimization on Deepresearch Bench.
95.33 95.42
95.00
94.17
94
92
47
99.58
99.09 99.33
92.00
91.36
90.42
overall scores Clarity
Depth
Balance
Breadth
Evaluation Metrics
Support Insightfulness
Figure 6: End-to-end scores with varying rounds
of outline optimization on DeepresearchGym.
End-to-end benchmark comparison for varying rounds of outlines. To isolate and quantify the benefits
of outline optimization, as reported in Fig. 5, 6, we conducted an ablation study by evaluating the
end-to-end benchmark performance. We collect the samples with three-round outline optimization from
9
10. DeepResearch Bench and DeepResearchGym, adopting the same writing strategy for them.
The benefits of this iterative refinement are evident across both benchmarks. On DeepResearch Bench,
the overall score steadily climbs, driven primarily by significant gains in comprehensiveness (48.85 →
50.82) and insight (46.33 → 48.35). This directly validates our hypothesis that each optimization round
allows the planner to build a more detailed and logically structured outline. This enhanced structure
is further reflected in DeepResearchGym’s metrics, where later rounds achieve near-perfect scores in
depth (100) and breadth (99.58), indicating a more exhaustive topic coverage. Crucially, this is not just
about adding more content; the steady rise in support (95.91 → 98.33) demonstrates that a more refined
outline creates a better-scaffolded structure, enabling the writer to more tightly link claims to evidence.
In summary, this analysis empirically demonstrates that iterative outline optimization is not a redundant
step but a critical mechanism for elevating a report from a simple summary to a deep, insightful, and
well-supported piece of research.
DeepResearch Bench
98.10
95
94.29
92.30
90.63
90
89.05 89.52
88.57
87.62
86.67
81.90
83.81
81.43
89.6
88.2
92.0
93.6
92.8
Rounds
Round 1
Round 2
Round 3
95.2
94.4
87.2 87.6
85.2
89.6
81.2
78.8
77.2
71.2
70
81.6
73.6
66.4
76.19
75
60
70
rall
ove
86.2
80
80
65
98.4
90
86.19
84.29
DeepResearchGym
100
Rounds
Round 1
Round 2
Round 3
95.71
90.14
85
95.71
94.29
Scores
96.67
69.52
res
sco
In
c
stru
ing
llow
fo
tion
th
Dep
Bal
e
anc
h
adt
Bre
t
por
Sup
51.2
50
s
nes
tful
gh
Insi
s
ore
ll sc
ra
ove
Evaluation Metrics
Figure 7: LLM-judged scores for varying rounds
of outline optimization on Deepresearch Bench.
Ins
ing
llow
fo
tion
truc
th
Dep
e
anc
Bal
h
adt
Bre
t
por
Sup
ess
fuln
ght
Insi
Evaluation Metrics
Figure 8: LLM-judged scores for varying rounds
of outline optimization on DeepresearchGym.
LLM judgement for varying rounds of outlines. To directly evaluate whether our optimization truly
improves outline quality, we utilized an LLM-as-a-judge (Zheng et al., 2023) to assess the outlines from
each of the three optimization rounds using gpt-4.1-mini-2025-04-14 in terms of instruction following,
depth, balance, breadth, support, and insightfulness. The judgment prompt is provided in Appendix A.
The results in Fig. 7, 8 provide a resounding confirmation of our iterative approach. On both benchmarks,
the overall score for the outline quality shows a significant, monotonic increase, jumping from 81.9 to
92.3 on DeepResearch Bench and from 77.2 to 88.2 on DeepResearchGym. This improvement is driven
by clear gains in structural quality; the near-perfect scores in Depth (up to 95.71) and Breadth (up to
98.4) provide direct evidence that each optimization cycle successfully expands the research’s scope.
Crucially, this is not mere expansion. The substantial increase in the Support score (e.g., from 51.2 to 73.6
on DeepResearchGym) is particularly revealing, indicating that later-round outlines are more effectively
grounded with a stronger mapping between planned sections and available evidence. This enhanced
grounding and structure culminate in a plan that is itself more insightful (improving by 10-15 scores
on both benchmarks). Therefore, this direct assessment confirms that our iterative planner is not just
adding content but is actively forging a superior, more coherent, and better-supported blueprint—the
foundational prerequisite for a high-quality final report.
Hierarchical retrieval and writing vs. brute-force writing. To empirically validate our hierarchical
writing process, we conducted a critical ablation study comparing our hierarchical writer against a
brute-force baseline that attempts to include the entire memory bank to generate the final report, which
is similar to the workflow of LongWriter (Bai et al., 2025). The results are unequivocal: our hierarchical
approach dramatically outperforms the brute-force method across every metric, confirming that a “divide
and conquer” strategy is essential. The most striking improvements are in insight (40.97 → 50.02) and
readability (42.29 → 49.79), which directly validates our hypothesis on attentional management; by
10
11. DeepResearch Bench
52
50.81
50.02
Writing Type
105
Brute-force Writing (LongWriter)
Hierarchical Writing
51.45
50.58
50
Brute-force Writing (LongWriter)
Hierarchical Writing
99.87
100
49.79
48.80
47.89
48
46
45.24
44
98.59
98.18
96.77
DeepResearchGym
Writing Type
54
95
100.00
98.73
97.22
94.30
91.52
91.82
90.50
90
90.10
90.10
42.29
42
85
40.97
40
38
res
r
ove
co
all s
ess
Insi
he
pre
Com
en
nsiv
ght
Inst
llow
n fo
ctio
ing
ru
80.30
80
ity
bil
ada
Re
s
ore
ll sc
ra
ove
Evaluation Metrics
Figure 9: Performance comparison between hierar-
chical writing and brute-force writing (LongWriter)
on DeepResearch Bench.
rity
th
e
anc
Dep
Cla
Bal
h
adt
Bre
Evaluation Metrics
t
por
Sup
ess
fuln
ght
Insi
Figure 10: Performance comparison between hi-
erarchical writing and brute-force writing (Long-
Writer) on DeepResearchGym.
focusing the model on a curated context for each section, it can perform deeper reasoning rather than
shallow summarization. This is further substantiated by the leap in support on DeepResearchGym (91.82
→ 98.73), proving that our targeted retrieval-and-pruning mechanism effectively prevents “contextual
bleeding” and ensures claims are correctly grounded. In conclusion, these results provide definitive
evidence that emulating the human cognitive process of focused, section-by-section writing is not merely
a beneficial choice but a fundamental requirement for generating coherent, insightful, and reliable
long-form reports.
Agentic Finetuning
100
2.4%
15.0%
28.2%
3 rounds
4 rounds
1 round
Model Version
Qwen3-30b-a3b-Instruct
Qwen3-30b-a3b-Instruct (SFT) 85.90
WebWeaver-3k
90.89
77.27
80
60
46.77 48.11
40
25.00
20
54.4%
2 rounds
4.57
0 DeepResearch Bench (RACE) Citation Accuracy
Benchmark
Figure 11: Round statistics of outline opti-
mization on WebWeaver-SFT.
6.09
DeepConsult
DeepResearchGym
Figure 12: Performance improvement of agentic finetun-
ing on DeepResearch Bench.
Agentic finetuning. While 30B-scale LLMs (e.g., Qwen3-30b-a3b-instruct-2507) possess strong founda-
tional capabilities, they often exhibit deficiencies in stability and instruction-following when executing
complex, multi-turn tool-calling sequences over long contexts. To bridge this critical gap, we constructed
a high-quality Supervised Fine-Tuning (SFT) dataset: WebWeaver-3k. The process began by sourcing a
diverse set of queries crawled from the web, which were then processed by a powerful, tier teacher model,
instantiated within our WebWeaver agent framework. A stringent filtering protocol was applied to the
resulting end-to-end research trajectories, retaining only those where the agent successfully executed
the entire workflow and strictly adhered to the predefined action format. This quality control yielded a
curated dataset of 3.3k high-fidelity planning trajectories and 3.1k writing trajectories. As detailed in
Table 4 and Fig. 11, these trajectories encapsulate the profound complexity of the OEDR task, with an
average case involving approximately 15 search steps, over two outline optimizations, and the processing
11
12. Planning statistics
WebWeaver-SFT
# Search
step
14.67
# Outline
token
4148.57
# Outline
optimization
2.18
# Saved
page
106.65
# Search
query
18.8
Writing statistics
# Evidence
token
62637
# Summary
token
14155
# Output
token
22637
# Writing
step
22.76
Table 4: The planning and writing statistics of training data on WebWeaver-SFT.
of over 62,000 evidence tokens. By fine-tuning our base model on this data, we explicitly imbued it with
the requisite long-sequence reasoning and tool-use capabilities to master our framework.
The efficacy of our SFT strategy is quantitatively demonstrated by the significant performance gains
across all benchmarks on Fig. 12, which directly reflect the model’s acquisition of our framework’s core
competencies. The most dramatic validation is the leap in citation accuracy from a nearly unusable
25% to a reliable 85.90%. This provides direct, empirical evidence that the model has mastered the
intricate mechanics of our Writer agent, learning to execute precise tool calls for evidence retrieval
and faithfully write according to the source-grounded outline. Furthermore, the substantial increase
in overall report quality, evidenced by the score on DeepConsult (4.57 → 6.09) and the massive jump
on DeepResearchGym (77.27 → 90.89), reflects the successful acquisition of the planner’s more abstract
abilities. These holistic improvements indicate that the model has learned the core loop of thinking
(iteratively optimizing the outline) and searching (adaptively acquiring evidence), which is a prerequisite
for generating a comprehensive and insightful final report. Ultimately, these results offer a powerful
dual validation: they prove that our WebWeaver framework is a potent data generation engine, capable
of deconstructing the formidable OEDR task into learnable demonstrations of thinking, searching, and
writing, thereby enabling a smaller model to achieve expert-level performance.
5
Related Work
Deep Research. Deep Research Agents have garnered significant attention for their powerful capa-
bilities in information seeking, integration, and reasoning. Proprietary systems, such as DeepResearch
(OpenAI, 2025a), Gemini Deep Research (google, 2025), and Claude Research (anthropic, 2025), have
demonstrated performance comparable to human experts in domains like fact-checking and report
writing. However, their opaque internal architectures and workflows hinder broader research and
development. In the open-source community, many studies (Li et al., 2025b; Tao et al., 2025; Su et al.,
2025; Qiao et al., 2025; Fang et al., 2025; Li et al., 2025a; Wu et al., 2025b;a) have been developed to tackle
complex research benchmarks such as BrowseComp and GAIA by exploring methods like synthetic
uncertain Question-Answering (QA) and formalized QA synthesis. Nevertheless, these solutions are
primarily tailored for short-answer research queries and lack the capability to generate comprehensive,
long-form reports on open-domain topics. Other open-source systems like OpenDeepResearch (Research,
2025e), GPT Researcher (Research, 2025c), and TTD-DR (Han et al., 2025) address long-form generation
by first drafting a static framework, then retrieving content, and finally composing the report. This
approach, characterized by a fixed structure and one-step generation, often leads to textual incoherence
and hallucinations. In contrast, our method emphasizes the outline optimization and hierarchical writing
processes to ensure the report’s fluency and factual accuracy.
Long Writing. Ensuring the coherence and accuracy of LLM-generated long-form text is a persistent
challenge. Previous work has explored methods like recursive prompting for story extension (Yang et al.,
2022) and structured task decomposition to improve consistency (Yang et al., 2023; Wang et al., 2025).
More recently, agent-based frameworks have become a mainstream solution. Systems like LongWriter
(Bai et al., 2025) and CogWriter (Wan et al., 2025) employ a "plan-then-write" strategy, where a Planner
Agent first creates an outline, and a Generation Agent then conditions on this plan to produce the full
text. However, these methods rely on a static initial plan and a brute-force writing strategy by feeding
12
13. all the evidence into LLMs. In contrast, our approach uniquely enables the outline to be dynamically
optimized in tandem with the evidence acquisition process, allowing for a comprehensive, source-
grounded research outline. Furthermore, our proposed hierarchical writing process with only relevant
evidence also mitigates the long-context issues from the brute-force writing strategy.
6
Conclusion
In this paper, we introduced WebWeaver, a novel dual-agent framework designed to overcome the
fundamental flaws of static, machine-like pipelines in open-ended deep research (OEDR). By emulating
the human cognitive process that integrates the planner’s dynamic research cycle with the writer’s
hierarchical retrieval and writing process, WebWeaver consistently outperforms both proprietary and
open-source systems, establishing a new state-of-the-art.
Beyond its superior performance, the true significance of WebWeaver lies in the new paradigm it offers
the community for tackling complex, information-intensive tasks. It reframes the intractable challenge of
long-context reasoning, demonstrating that it can be successfully deconstructed into a structured problem
of system-level information management, orchestrated through a series of precise actions. Both the
planner and writer are embodiments of this principle: they use tools to dynamically explore, structure,
and write, rather than passively processing it in a single pass. This work does not just present a better
agent system; it presents a new blueprint for building the agent system that masters intensive knowledge
through deliberate actions, not just brute-force attention.
13
14. A
Prompt Template for Outline Judgement
The detailed prompt template and judgement criteria in terms of instruction following, depth, balance,
breadth, support, and insightfulness are shown as follows:
Judgement Criteria
{ "name": "Instruction following", "description": "Evaluate how well the outline follows the user’s instructions
for an outline. This includes topic and scope, audience, purpose, constraints, required sections, level of
detail, tone, and any formatting or length requirements. Check outline-specific expectations: clear hierar-
chical structure (e.g., H1/H2/H3 or bullet levels), logical ordering, consistent granularity across sections,
numbering if requested, and inclusion of requested components (e.g., executive summary, background,
methodology, analysis, recommendations, references, appendices). Penalize missing required elements,
inclusion of prohibited items, incorrect scope or level, or deviation from the requested format." },
{ "name": "Depth", "description": "Assess the comprehensiveness and analytical depth of the outline. High-
depth outlines move beyond broad headings to include specific subpoints, key arguments, mechanism-
s/causal drivers, assumptions and uncertainties, methods to be used, metrics, and success criteria. They
indicate sequencing and logic (what builds on what), note dependencies and open questions, and identify
where evidence, examples, and visuals will be integrated. Shallow outlines list generic topics without
meaningful substructure, rationale, or analytical scaffolding." },
{ "name": "Balance", "description": "Evaluate the fairness and objectivity of the outline. Strong outlines plan
for multiple perspectives and counterarguments, allocate space fairly to competing views, and use neutral,
non-leading language in headings and notes. Where issues are controversial or multi-faceted, the outline
should explicitly include sections for trade-offs, limitations, and counter-evidence. Poor outlines display
bias, give disproportionate space to one side without justification, or omit salient opposing views." },
{ "name": "Breadth", "description": "Evaluate how many distinct and relevant subtopics, perspectives, or con-
texts the outline covers, while staying focused on the brief. Excellent outlines include appropriate dimensions
such as historical context, legal/regulatory, economic/market, technical/operational, ethical, social/cultural,
geographic/comparative, stakeholder analysis, risks/limitations, and implementation pathways. Coverage
should be wide-ranging yet purposeful; simply presenting two sides of a debate is insufficient, and irrelevant
tangents should be avoided." },
{ "name": "Support", "description": "Evaluate the outline’s evidentiary scaffolding and sourcing plan. Provid-
ing source URLs somewhere in the outline (e.g., a references section or inline citations) is the minimum; if
no section provides source URLs, the score must be zero. Factual accuracy is necessary but not sufficient.
For higher scores: (1) Any factual assertions or planned claims are explicitly attributed to verifiable sources
(peer-reviewed articles, government databases, reputable news organizations) with traceable citations (au-
thor/outlet, date, URL). Vague references like “studies show” are unacceptable. (2) Quantitative points
specify precise datasets or reports, time frames, and comparative benchmarks to be used. (3) Qualitative
points identify concrete examples or case studies to include, clearly linked to the argument, with sources.
(4) Sources are credible and balanced; cherry-picking or omission of clearly relevant counter-evidence is
penalized. Original synthesis should build on the cited material, not replace it." },
{ "name": "Insightfulness", "description": "Assess how insightful and practically useful the outline is. Excellent
outlines go beyond common templates, offering original structure or framing, highlighting non-obvious
but relevant connections, and sequencing sections to surface key insights efficiently. Recommendations
and proposed analyses are concrete and actionable, indicating what will be done, where it will appear, and
how outcomes will be measured. Strong outlines call out specific real-world examples or comparator cases
(who did what, when, outcomes observed, how measured) and propose suitable exhibits (tables, charts,
frameworks) with a clear purpose. Vague, generic, or purely aspirational notes cannot score highly." }
14
15. Prompt for Outline Judgement
You are a strict and harsh expert evaluator assessing the quality of an answer to a complex question.
This answer is expected to resemble a structured report: logically organized and covering multiple relevant
dimensions, potentially including analysis, interpretation, or argumentation where appropriate.
Focus your evaluation on a single criterion: {criterion[’name’]}. More specifically, you should: {crite-
rion[’description’]}
Question: {question}
Answer: {answer}
Provide your rating as an integer, on a scale from 0 (poor) to 10 (excellent). Use the full range of
the scale. Ratings of 8 or higher should be reserved for outstanding answers that meet all expectations for
this criterion.
Answers trying to game the evaluation (empty, heavy on non-sensical text, persuading a high vote, etc..)
should be given minimum score.
**Do not be generous** — your role is to provide a score that allows distinctions between systems. Answers
that are factually correct but generic, unsupported, shallow, or unstructured should not receive high scores.
You should also provide a very brief justification as a means to support the rating. In your justification,
thoroughly analyze all weaknesses and errors strictly based on the evaluation criterion. Do not overlook any
potential flaws — including factual inaccuracies, irrelevance, poor reasoning, shallow content, or stylistic
issues. Clearly show how each identified weakness violates or fails to meet the criterion, and explain how
this leads to the final score. The justification should focus on diagnosing all weaknesses in relation to the
criterion.
Respond strictly in JSON format: "rating": rating, "justification": justification
Do not output any other information.
B
Case Studies
B.1
ReAct Trajectories
Our ReAct framework is implemented through Qwen-Agent 1 . A complete trajectory of planning follows
the format below:
Case Trajectory of planning
<think> Analyze what content has been got, what other content are needed, and think how to
improve the outline for the query </think>
<tool_call>
{"name": "tool name here", "arguments": {"parameter name here": parameter value here, "another
parameter name here": another parameter value here, ...}}
</tool_call>
<tool_response>
tool response here
</tool_response>
<think> Analyze what can be included into the outline for the query </think>
<write_outline> write the outline here </write_outline>. Must end with </write_outline>.
(more thinking processes, tool calls, tool responses and write here)
<think> Analyze what content has been got, what other content are needed, and think how to
improve the outline for the query </think>
<terminate> the writing process is terminated.
A complete trajectory of writing follows the format below:
1 https://github.com/QwenLM/Qwen-Agent/
15
16. Case Trajectory of Writing
<think> thinking which tool is needed here </think>
<tool_call>
{"name": "tool name here", "arguments": {"parameter name here": parameter value here, "another
parameter name here": another parameter value here, ...}}
</tool_call>
<tool_response>
tool response here
</tool_response>
<think> thinking what content can be utilized to answer the query here </think>
<write> write the section or paragraph here </write> (more thinking processes, tool calls, tool
responses and write here)
<terminate> the writing process is terminated.
B.2
Case Study of Planning
Case Trajectory of Planning in DeepResearch Bench
Question: Parkinson’s disease has a profound impact on patients. What are the potential health
warning signs associated with different stages of the disease? As family members, which specific
signs should alert us to intervene or seek medical advice regarding the patient’s condition?
Furthermore, for patients who have undergone Deep Brain Stimulation (DBS) surgery, what daily
life adjustments and support strategies can be implemented to improve their comfort and overall
well-being?
Step 1
<think>
This is a comprehensive query about Parkinson’s disease that requires detailed information on
multiple aspects:
1. Health warning signs at different stages of Parkinson’s disease
2. Specific signs that should alert family members to seek medical intervention
3. Daily life adjustments and support strategies for patients who have undergone DBS surgery
I need to gather information systematically about:
- The stages of Parkinson’s disease and their characteristic symptoms
- Early warning signs and progression indicators
- Red flag symptoms that require immediate medical attention
- DBS surgery outcomes and post-operative care
- Daily living strategies and support systems for DBS patients
Let me start by searching for information about Parkinson’s disease stages and warning signs.
</think>
<tool_call>
{"name": "search", "arguments": {"query": ["Parkinson’s disease stages symptoms warning signs",
"early signs Parkinson’s disease progression", "Parkinson’s disease Hoehn and Yahr stages"],
"goal": "Gather comprehensive information about the different stages of Parkinson’s disease and
their associated warning signs and symptoms"}}
</tool_call>
<tool_response>
<material>
16
17. <id_1>
Summary: Parkinson’s disease progresses through five defined stages: Stage 1 shows mild,
unilateral tremor with minimal impact on daily life; Stage 2 adds bilateral tremor, rigidity, and
noticeable gait/posture changes, making tasks harder; Stage 3 introduces balance problems,
frequent falls, and slowed movements, requiring assistance with dressing, eating, and bathing;
Stage 4 features severe motor impairment needing walkers or caregiver help, ending independent
living; Stage 5 is the most advanced, with inability to walk, bedridden status, and possible
cognitive disturbances. Early warning signs include a subtle tremor (often starting in a finger),
micro-handwriting, stiffness, and facial masking, which gradually intensify—tremor spreading,
rigidity, shuffling gait, swallowing difficulties, and speech changes—mirroring the stage-by-stage
deterioration.
</id_1>
...
</tool_response>
Step 2
<think>
Great! I’ve gathered comprehensive information about Parkinson’s disease stages and warning
signs. Now I need to search for specific information about:
1. Red flag symptoms that should alert family members to seek immediate medical intervention
2. Deep Brain Stimulation (DBS) surgery and post-operative care
3. Daily life adjustments and support strategies for DBS patients
Let me search for information about emergency warning signs and when family mem-
bers should seek immediate medical help.
</think>
<tool_call>
{"name": "search", "arguments": {"query": ["Parkinson’s disease emergency warning signs family
caregivers", "when to call doctor Parkinson’s disease red flags", "Parkinson’s disease complications
requiring immediate medical attention"], "goal": "Find specific warning signs and emergency
situations that should prompt family members to seek immediate medical intervention for
Parkinson’s patients"}}
</tool_call>
<tool_response>
<material>
<id_5>
Summary:
Parkinson’s disease begins with subtle, often unilateral signs such as a barely noticeable tremor,
slight stiffness, slowed movements (bradykinesia), reduced facial expression, and diminished
arm swing—these constitute the early stage. As the condition progresses, symptoms become
bilateral and more pronounced, including persistent tremor, rigid muscles, poor posture, balance
problems, loss of automatic movements, speech and writing changes, and a range of non-motor
issues (depression, anxiety, constipation, sleep disturbances, olfactory loss, fatigue). In later stages,
complications emerge: cognitive decline and dementia, emotional changes, swallowing and
chewing difficulties, severe sleep disorders, autonomic dysfunction (bladder urgency, orthostatic
hypotension), chronic pain, and sexual dysfunction. Overall, the disease follows a trajectory from
mild, often unnoticed early warnings to widespread motor and non-motor impairments that
intensify over time.
</id_5>
...
17
18. </tool_response>
Step 3
<think>
Excellent! I now have comprehensive information about warning signs and emergency situations
for family members. Now I need to search for information specifically about Deep Brain
Stimulation (DBS) surgery, post-operative care, and daily life adjustments for DBS patients.
</think>
<tool_call>
{"name": "search", "arguments": {"query": ["Deep Brain Stimulation DBS Parkinson’s post-
operative care", "DBS surgery recovery daily life adjustments Parkinson’s", "living with DBS
Parkinson’s disease support strategies"], "goal": "Gather detailed information about Deep Brain
Stimulation (DBS) surgery outcomes, post-operative care requirements, and daily life adjustments
needed for Parkinson’s patients who have undergone DBS"}}
</tool_call>
<tool_response>
<material>
<id_38>
Summary:
DBS surgery offers significant benefits for Parkinson patients, helping manage movement
disorders and enhance quality of life when combined with proper post-operative care and
follow-up. Patients receive comprehensive support—including recovery and safety guides, FAQs,
and detailed manuals—through Boston Scientific’s DBS system, and can obtain personalized
assistance from a DBS Care Specialist during business hours.
</id_38>
...
</tool_response>
Step 4
<think>
Now I have good information about DBS surgery and post-operative care. Let me search for more
specific information about daily life support strategies, comfort measures, and family caregiver
guidance for DBS patients.
</think>
<tool_call>
{"name": "search", "arguments": ["DBS patient daily living support family caregivers", "Parkinson
DBS lifestyle modifications comfort strategies", "DBS programming adjustment patient family
guide"], "goal": "Find specific daily life support strategies, comfort measures, and family guidance
for caring for Parkinson’s patients who have undergone DBS surgery"}}
</tool_call>
<tool_response>
<material>
<id_54>
Summary:
Deep-brain stimulation reliably improves motor symptoms in Parkinson’s disease for at least five
years, but the rapid symptom relief creates a "biographical disruption" that reshapes patients’
daily lives and identities. The authors describe a four-stage DBS pathway: pre-surgical decision
making; the operation with required physical support and clinical goal-setting; post-operative
changes in symptoms, behaviour and role dynamics; and finally re-engagement with life and
future outlook. Post-operative care therefore extends beyond the surgical wound to include
structured psychosocial support—counselling, triadic nurse-patient-spouse meetings, and
18
19. possibly cognitive-behavioural therapy—to help patients and partners negotiate new role
boundaries, regain autonomy, and manage the "burden of normality" that can follow sudden
functional gains. Daily-life adjustments reported by patients and spouses include shifting from a
combined caregiver-patient identity to a more balanced spousal relationship, reclaiming personal
interests, reducing over-protective behaviours, and actively addressing personal health and social
needs. Pre- and post-DBS counseling, regular multidisciplinary follow-up, and interventions that
foster positive psychology (e.g., character-strength exercises) are recommended to smooth these
transitions and support long-term quality of life.
</id_54>
...
</tool_response>
Step 5
<write_outline>
Parkinson’s Disease: Health Warning Signs, Family Alert Systems, and Post-DBS Care Strategies
I. Understanding Parkinson’s Disease Progression and Warning Signs
A. Disease Staging Systems
1. Hoehn and Yahr Scale Classifications <citation>id_2, id_6, id_9, id_12, id_13, id_14, id_15,
id_17, id_20, id_21</citation>
a. Stage 0: No detectable disease
b. Stage 1.0: Unilateral motor symptoms with minimal functional impact
c. Stage 1.5: Unilateral involvement with axial symptoms
d. Stage 2.0: Bilateral symptoms with preserved balance
e. Stage 2.5: Mild bilateral disease with normal pull-test recovery
f. Stage 3.0: Bilateral disease with postural instability but retained independence
g. Stage 4.0: Severe disability requiring assistance but can still walk/stand
h. Stage 5.0: Wheelchair-bound or bedridden without assistance
2.
Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-
UPDRS)<citation>id_3</citation>
a. Four-domain assessment approach
b. Motor and non-motor impact evaluation
c. Replacement for older staging systems
B. Early Stage Warning Signs (Stages 1-2)
1. Motor Symptoms <citation>id_2, id_5, id_7, id_18, id_22</citation> a. Subtle unilateral tremor
(often starting in finger)
b. Micrographia (small handwriting)
c. Bradykinesia (slowed movements)
d. Mild rigidity and stiffness
e. Reduced arm swing
f. Facial masking (reduced expression)
g. Shuffling gait with irregular stride
2. Non-Motor Early Indicators <citation>id_3, id_5, id_7, id_18</citation>
a. Hyposmia (loss of smell)
b. Constipation
c. Sleep disturbances and REM sleep behavior disorder
d. Depression and anxiety
e. Subtle cognitive changes
f. Autonomic dysfunction (blood pressure fluctuations)
g. Unexplained weight loss
C. Progressive Stage Warning Signs (Stages 2-3)
19
20. 1. Advancing Motor Symptoms <citation>id_5, id_6, id_18, id_22</citation>
a. Bilateral tremor and rigidity
b. Postural deterioration and stooped posture
c. Balance problems and impaired reflexes
d. Speech changes (soft, monotone voice)
e. Swallowing difficulties
f. Freezing episodes
2. Emerging Complications <citation>id_16, id_18, id_22</citation>
a. Medication wearing-off periods
b. Levodopa-induced dyskinesia
c. Increased fall risk
d. Emotional and mood changes
D. Advanced Stage Warning Signs (Stages 4-5)
1. Severe Motor Impairment <citation>id_5, id_6, id_9, id_22</citation>
a. Major mobility loss requiring assistive devices
b. Severe postural deformities
c. Inability to live independently
d. Wheelchair or bed confinement
2. Complex Non-Motor Complications <citation>id_5, id_16, id_19, id_22</citation>
a. Cognitive decline and dementia
b. Hallucinations and delusions
c. Severe autonomic dysfunction
d. Chronic pain and sexual dysfunction
II. Critical Warning Signs Requiring Immediate Medical Intervention
A. Emergency Situations for Family Members
1. Respiratory and Swallowing Emergencies <citation>id_25, id_26, id_29, id_31, id_32, id_35,
id_36</citation>
a. Sudden choking or difficulty swallowing
b. Aspiration risk with coughing while eating
c. Severe voice changes or stridor
d. Respiratory difficulty or shortness of breath
2. Neurological Crisis Signs <citation>id_24, id_26, id_28, id_32, id_33, id_35</citation>
a. Sudden confusion, delirium, or marked cognitive decline
b. New or worsening hallucinations, delusions, or paranoid thoughts
c. Abrupt neurological deficits (weakness, speech changes)
d. Loss of consciousness or seizures
3. Medication-Related Emergencies <citation>id_24, id_28, id_33</citation>
a. Abrupt stopping or alteration of medications
b. Exposure to dopamine-blocking drugs
c. Missed or delayed essential medication doses
d. Signs of serotonin syndrome
B. Cardiovascular and Autonomic Emergencies
1. Circulatory Complications <citation>id_26, id_28, id_31, id_32, id_34, id_36</citation>
a. Severe orthostatic hypotension causing fainting
b. Chest pain or cardiac symptoms
c. Signs of blood clots (leg swelling, pain, redness)
d. Marked dehydration
2. Infection and Systemic Issues <citation>id_24, id_26, id_32, id_35</citation>
a. Sudden fever with respiratory or urinary symptoms
20
21. b. Signs of urinary tract infection
c. Post-surgical complications
d. Severe constipation or bowel impaction
C. Psychiatric and Behavioral Emergencies
1. Mental Health Crises <citation>id_28, id_32, id_33, id_36</citation>
a. Severe depression or suicidal thoughts
b. Violent behaviors during REM sleep disorder
c. Severe impulse-control behaviors
d. Acute psychosis with agitation or aggression
2. Device-Related Emergencies <citation>id_28, id_34</citation>
a. Sudden malfunction of implanted devices
b. DBS system problems (infection, battery failure)
c. Unexpected sleep attacks during critical activities
III. Deep Brain Stimulation (DBS) Surgery: Comprehensive Care Framework
A. Pre-Operative Considerations and Evaluation
1. Candidate Selection Criteria <citation>id_45, id_56</citation>
a. Minimum four years of Parkinson’s disease
b. Continued medication benefit with motor complications
c. Absence of dementia or severe cognitive impairment
d. Realistic expectations about outcomes
2. Pre-Surgical Assessment Process <citation>id_39, id_45, id_56</citation>
a. Multidisciplinary team evaluation
b. Neurological and neurosurgical consultation
c. Cognitive testing and brain imaging
d. Medication review and optimization
B. Surgical Procedure and Immediate Post-Operative Care
1. Surgical Process <citation>id_45, id_49, id_51, id_56</citation>
a. Electrode implantation in target brain regions
b. Pulse generator placement under collarbone
c. Brief hospital stay (1-2 days)
d. Low mortality rate (<0.5%) and modest complication rates (4-7%)
2. Initial Recovery Phase <citation>id_41, id_47, id_49, id_51, id_52, id_65</citation>
a. Expected post-operative signs (bruising, swelling, tenderness)
b. Temporary "honeymoon" or microlesion effect
c. Activity restrictions (4-6 weeks)
d. Wound care and infection prevention
C. Device Programming and Optimization
1. Programming Timeline <citation>id_41, id_45, id_46, id_47, id_51, id_56</citation>
a. Initial activation 2-4 weeks post-surgery
b. Optimization period of 4-6 months
c. Multiple programming visits required
d. Ongoing adjustments every 6 months
2. Expected Outcomes and Adjustments <citation>id_46, id_51, id_53, id_56</citation>
a. Significant motor symptom improvement ( = 32%)
b. Medication reduction (up to 48%)
c. Functional independence gains
d. Possible side effects requiring management
IV. Daily Life Adjustments and Support Strategies for DBS Patients
A. Physical Care and Safety Measures
21
22. 1. Activity Modifications <citation>id_47, id_51, id_52, id_65, id_67</citation>
a. Gradual resumption of normal activities
b. Continued use of mobility aids to prevent falls
c. Avoidance of high-risk activities (deep water diving, extreme heat)
d. Regular exercise program (minimum 2.5 hours weekly)
2. Device Safety and Maintenance <citation>id_51, id_52, id_69</citation>
a. Protection from electromagnetic interference
b. Carrying handheld controller at all times
c. Informing healthcare providers about implanted device
d. Regular battery monitoring and replacement
B. Psychosocial Support and Adaptation
1. Identity and Role Adjustment <citation>id_54, id_66, id_68</citation>
a. Managing "biographical disruption" from rapid symptom relief
b. Negotiating new caregiver-patient role boundaries
c. Addressing "burden of normality" phenomenon
d. Reclaiming personal interests and autonomy
2. Family Relationship Dynamics <citation>id_54, id_58, id_66, id_68</citation>
a. Shifting from caregiver-patient to balanced spousal relationships
b. Reducing over-protective behaviors
c. Open communication about changing abilities
d. Professional counseling and cognitive-behavioral therapy
C. Comprehensive Support Systems
1. Healthcare Team Coordination <citation>id_39, id_46, id_61, id_72</citation>
a. Multidisciplinary follow-up care
b. Specialized Parkinson’s nurses
c. Home-health nursing services
d. Telemedicine and remote programming options
2. Family and Caregiver Support <citation>id_58, id_61, id_63, id_66, id_68</citation>
a. Pre-operative education and realistic expectation setting
b. Structured psychosocial interventions
c. Self-help groups and peer support networks
d. Caregiver self-care and stress management
D. Practical Daily Living Strategies
1. Routine and Symptom Management <citation>id_63, id_65, id_67</citation>
a. Structured daily routines and symptom diaries
b. Medication schedule adherence
c. Regular physical activity (walking, tai chi, dancing)
d. Comfort measures (leg massages, voice training)
2. Nutritional and Lifestyle Support <citation>id_67</citation>
a. Mediterranean diet implementation
b. Adequate rest and sleep hygiene
c. Intellectual activities and cognitive stimulation
d. Consultation with specialized healthcare providers
E. Long-term Monitoring and Adjustment
1. Ongoing Assessment Needs <citation>id_47, id_52, id_69</citation>
a. Regular follow-up visits (monthly for first six months)
b. Symptom tracking and device optimization
c. Monitoring for complications or side effects
d. Battery replacement planning (every 3-5 years)
22
23. 2. Quality of Life Optimization <citation>id_51, id_54, id_61</citation>
a. Patient satisfaction monitoring (95% would recommend DBS)
b. Positive psychology interventions
c. Character-strength exercises
d. Long-term relationship and social support maintenance
</write_outline>
Step 6
<think>
I have now created a comprehensive outline covering all aspects of the user’s query. The outline
includes:
1. Detailed information about Parkinson’s disease stages and warning signs at each stage
2. Specific emergency warning signs that should alert family members to seek immediate medical
intervention
3. Comprehensive information about DBS surgery, including pre-operative considerations,
surgical process, and post-operative care
4. Detailed daily life adjustments and support strategies for DBS patients
The outline is well-structured with proper hierarchical organization and includes citations from
the research materials. However, I should search for any additional information that might be
missing, particularly around specific comfort measures and practical tips for families caring for
DBS patients.
</think>
<tool_call>
{"name": "search", "arguments": {"query": ["Parkinson DBS patient comfort measures home care
tips", "DBS device troubleshooting family guide", "Parkinson DBS medication management post
surgery"], "goal": "Find additional specific comfort measures, troubleshooting guidance, and
medication management strategies for families caring for DBS patients"}}
</tool_call>
<tool_response>
<id_65>
Summary:
The guide provides comprehensive daily-life support for Parkinson’s patients after DBS: keep
incisions clean, dry, and allow them to heal without ointments; shower the day after surgery
but avoid baths/swimming until the wound check. Resume Parkinson’s meds immediately
and discuss timing for blood-thinner restart. Limit strenuous activity for 2-4 weeks and
incorporate gentle neck stretches to prevent bow-stringing. Expect DBS activation about 4 weeks
post-stage 1 and be patient with programming adjustments. Anticipate possible temporary
confusion—maintain a consistent routine, strict medication schedule, regular walking, and good
sleep (melatonin if needed); arrange caregiver assistance especially if living alone. Monitor for
fever, worsening incision signs, facial swelling, new neurological deficits, or worsening confusion,
and contact the surgeon promptly. These strategies together offer practical comfort measures and
clear family guidance for post-DBS care.
</id_65>
...
</tool_response>
Step 7
<write_outline>
Parkinson’s Disease: Health Warning Signs, Family Alert Systems, and Post-DBS Care Strategies
I. Understanding Parkinson’s Disease Progression and Warning Signs
A. Disease Staging Systems
23
24. 1. Hoehn and Yahr Scale Classifications <citation>id_2, id_6, id_9, id_12, id_13, id_14, id_15,
id_17, id_20, id_21</citation>
a. Stage 0: No detectable disease
b. Stage 1.0: Unilateral motor symptoms with minimal functional impact
c. Stage 1.5: Unilateral involvement with axial symptoms
d. Stage 2.0: Bilateral symptoms with preserved balance
e. Stage 2.5: Mild bilateral disease with normal pull-test recovery
f. Stage 3.0: Bilateral disease with postural instability but retained independence
g. Stage 4.0: Severe disability requiring assistance but can still walk/stand
h. Stage 5.0: Wheelchair-bound or bedridden without assistance
2.
Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-
UPDRS)<citation>id_3</citation>
a. Four-domain assessment approach
b. Motor and non-motor impact evaluation
c. Replacement for older staging systems
3. Disease Progression Timeline and Transition Indicators <citation>id_16, id_17, id_19</citation>
a. Median transition times between stages (20-26 months)
b. HY > 3 as critical threshold for advanced disease
c. Shift from dopaminergic to cortical pathology involvement
B. Early Stage Warning Signs (Stages 1-2)
1. Motor Symptoms <citation>id_2, id_5, id_7, id_18, id_22</citation>
a. Subtle unilateral tremor (often starting in finger)
b. Micrographia (small handwriting)
c. Bradykinesia (slowed movements)
d. Mild rigidity and stiffness
e. Reduced arm swing
f. Facial masking (reduced expression)
g. Shuffling gait with irregular stride
2. Non-Motor Early Indicators <citation>id_3, id_5, id_7, id_18</citation>
a. Hyposmia (loss of smell)
b. Constipation
c. Sleep disturbances and REM sleep behavior disorder
d. Depression and anxiety
e. Subtle cognitive changes
f. Autonomic dysfunction (blood pressure fluctuations)
g. Unexplained weight loss
3. Diagnostic Challenges and Recognition <citation>id_4</citation>
a. Non-motor signs can appear decades before motor symptoms
b. Overlap with other disorders complicates diagnosis
c. Need for asymmetric symptoms or medication response for confirmation
C. Progressive Stage Warning Signs (Stages 2-3)
1. Advancing Motor Symptoms <citation>id_5, id_6, id_18, id_22</citation>
a. Bilateral tremor and rigidity
b. Postural deterioration and stooped posture
c. Balance problems and impaired reflexes
d. Speech changes (soft, monotone voice)
e. Swallowing difficulties
f. Freezing episodes
2. Emerging Complications <citation>id_16, id_18, id_22</citation>
24
25. a. Medication wearing-off periods
b. Levodopa-induced dyskinesia
c. Increased fall risk
d. Emotional and mood changes
3. Functional Impact Assessment <citation>id_6, id_20, id_22</citation>
a. Loss of independence in daily activities
b. Need for assistance with dressing, eating, bathing
c. Occupational and social limitations
D. Advanced Stage Warning Signs (Stages 4-5)
1. Severe Motor Impairment <citation>id_5, id_6, id_9, id_22</citation>
a. Major mobility loss requiring assistive devices
b. Severe postural deformities
c. Inability to live independently
d. Wheelchair or bed confinement
2. Complex Non-Motor Complications <citation>id_5, id_16, id_19, id_22</citation>
a. Cognitive decline and dementia
b. Hallucinations and delusions
c. Severe autonomic dysfunction
d. Chronic pain and sexual dysfunction
3. End-Stage Considerations <citation>id_6, id_9, id_20</citation>
a. Complete dependency for daily activities
b. Risk of aspiration and respiratory complications
c. Need for constant supervision and care
II. Critical Warning Signs Requiring Immediate Medical Intervention
A. Emergency Situations for Family Members
1. Respiratory and Swallowing Emergencies <citation>id_25, id_26, id_29, id_31, id_32, id_35,
id_36</citation>
a. Sudden choking or difficulty swallowing
b. Aspiration risk with coughing while eating
c. Severe voice changes or stridor
d. Respiratory difficulty or shortness of breath
2. Neurological Crisis Signs <citation>id_24, id_26, id_28, id_32, id_33, id_35</citation>
a. Sudden confusion, delirium, or marked cognitive decline
b. New or worsening hallucinations, delusions, or paranoid thoughts
c. Abrupt neurological deficits (weakness, speech changes)
d. Loss of consciousness or seizures
3. Medication-Related Emergencies <citation>id_24, id_28, id_33</citation>
a. Abrupt stopping or alteration of medications
b. Exposure to dopamine-blocking drugs
c. Missed or delayed essential medication doses
d. Signs of serotonin syndrome
4. Motor System Crisis <citation>id_25, id_28, id_31, id_34</citation>
a. Sudden severe rigidity or immobility
b. Frequent falls or freezing episodes
c. Rapid loss of walking ability
d. Severe dyskinesia with complications
B. Cardiovascular and Autonomic Emergencies
1. Circulatory Complications <citation>id_26, id_28, id_31, id_32, id_34, id_36</citation>
a. Severe orthostatic hypotension causing fainting
25
26. b. Chest pain or cardiac symptoms
c. Signs of blood clots (leg swelling, pain, redness)
d. Marked dehydration
2. Infection and Systemic Issues <citation>id_24, id_26, id_32, id_35</citation>
a. Sudden fever with respiratory or urinary symptoms
b. Signs of urinary tract infection
c. Post-surgical complications
d. Severe constipation or bowel impaction
C. Psychiatric and Behavioral Emergencies
1. Mental Health Crises <citation>id_28, id_32, id_33, id_36</citation>
a. Severe depression or suicidal thoughts
b. Violent behaviors during REM sleep disorder
c. Severe impulse-control behaviors
d. Acute psychosis with agitation or aggression
2. Device-Related Emergencies <citation>id_28, id_34</citation>
a. Sudden malfunction of implanted devices
b. DBS system problems (infection, battery failure)
c. Unexpected sleep attacks during critical activities
D. Hospitalization and Perioperative Risks <citation>id_32, id_35</citation>
1. Hospital-Related Complications
a. Medication timing disruptions
b. Delirium and confusion in unfamiliar environments
c. Increased fall risk and injury potential
d. Aspiration pneumonia during procedures
2. Post-Surgical Warning Signs
a. Rapid symptom deterioration
b. Signs of infection or wound complications
c. Unexpected neuropsychiatric changes
III. Deep Brain Stimulation (DBS) Surgery: Comprehensive Care Framework
A. Pre-Operative Considerations and Evaluation
1. Candidate Selection Criteria <citation>id_45, id_56</citation>
a. Minimum four years of Parkinson’s disease
b. Continued medication benefit with motor complications
c. Absence of dementia or severe cognitive impairment
d. Realistic expectations about outcomes
2. Pre-Surgical Assessment Process <citation>id_39, id_45, id_56</citation>
a. Multidisciplinary team evaluation
b. Neurological and neurosurgical consultation
c. Cognitive testing and brain imaging
d. Medication review and optimization
3. Target Selection Considerations <citation>id_53</citation>
a. Subthalamic nucleus (STN) for greater medication reduction
b. Globus pallidus interna (GPi) for language/cognition preservation
c. Individual patient factors influencing choice
B. Surgical Procedure and Immediate Post-Operative Care
1. Surgical Process <citation>id_45, id_49, id_51, id_56</citation>
a. Electrode implantation in target brain regions
b. Pulse generator placement under collarbone
c. Brief hospital stay (1-2 days)
26
27. d. Low mortality rate (<0.5%) and modest complication rates (4-7%)
2. Initial Recovery Phase <citation>id_41, id_47, id_49, id_51, id_52, id_65</citation>
a. Expected post-operative signs (bruising, swelling, tenderness)
b. Temporary "honeymoon" or microlesion effect
c. Activity restrictions (4-6 weeks)
d. Wound care and infection prevention
3. Early Post-Operative Monitoring <citation>id_65, id_88</citation>
a. Incision care and healing assessment
b. Medication resumption protocols
c. Activity limitation guidelines
d. Signs requiring immediate medical attention
C. Device Programming and Optimization
1. Programming Timeline <citation>id_41, id_45, id_46, id_47, id_51, id_56</citation>
a. Initial activation 2-4 weeks post-surgery
b. Optimization period of 4-6 months
c. Multiple programming visits required
d. Ongoing adjustments every 6 months
2. Expected Outcomes and Adjustments <citation>id_46, id_51, id_53, id_56</citation>
a. Significant motor symptom improvement ( = 32%)
b. Medication reduction (up to 48%)
c. Functional independence gains
d. Possible side effects requiring management
3. Programming Process and Patient Education <citation>id_80, id_86, id_90</citation>
a. Patient-controlled adjustments with handheld programmer
b. Battery monitoring and replacement scheduling
c. Understanding normal sensations vs. concerning symptoms
d. Communication protocols with programming team
IV. Daily Life Adjustments and Support Strategies for DBS Patients
A. Physical Care and Safety Measures
1. Activity Modifications <citation>id_47, id_51, id_52, id_65, id_67</citation>
a. Gradual resumption of normal activities
b. Continued use of mobility aids to prevent falls
c. Avoidance of high-risk activities (deep water diving, extreme heat)
d. Regular exercise program (minimum 2.5 hours weekly)
2. Device Safety and Maintenance <citation>id_51, id_52, id_69, id_77, id_88</citation>
a. Protection from electromagnetic interference
b. Carrying handheld controller at all times
c. Informing healthcare providers about implanted device
d. Regular battery monitoring and replacement
3. Environmental Safety Considerations <citation>id_88, id_90</citation>
a. Contraindicated therapies and equipment
b. Safe vs. unsafe daily activities
c. Travel and security considerations
d. Workplace and recreational restrictions
B. Medication Management Post-DBS
1. Immediate Post-Operative Medication Protocol <citation>id_84, id_88, id_93</citation>
a. Continuation of pre-surgical regimen initially
b. Gradual, systematic reduction approach
c. Monitoring for withdrawal symptoms
27
28. d. Coordination with stimulation programming
2. Long-Term Medication Optimization <citation>id_83, id_84, id_85, id_93</citation>
a. Levodopa dosing adjustments
b. Dopamine agonist tapering strategies
c. Management of persistent symptoms
d. Monitoring for mood and cognitive changes
3. Troubleshooting Medication Issues <citation>id_81, id_86</citation>
a. Home-health nurse medication reviews
b. Emergency protocols for medication disruption
c. Balancing stimulation with pharmaceutical needs
d. Communication with neurology team
C. Home-Based Care and Comfort Measures
1. Daily Routine Optimization <citation>id_63, id_65, id_67, id_72</citation>
a. Structured daily routines and symptom diaries
b. Sleep hygiene and rest protocols
c. Regular physical activity programs
d. Comfort interventions (massage, voice training)
2. Home Health Services <citation>id_72, id_81</citation>
a. Trained home-health nurse visits
b. Remote programming capabilities
c. Vital sign monitoring and medication adjustments
d. Reduced travel burden for families
3. Device Troubleshooting at Home <citation>id_77, id_78, id_86, id_90</citation>
a. Basic device operation and monitoring
b. Recognition of malfunction signs
c. Emergency contact protocols
d. Battery status management
D. Psychosocial Support and Adaptation
1. Identity and Role Adjustment <citation>id_54, id_66, id_68</citation>
a. Managing "biographical disruption" from rapid symptom relief
b. Negotiating new caregiver-patient role boundaries
c. Addressing "burden of normality" phenomenon
d. Reclaiming personal interests and autonomy
2. Family Relationship Dynamics <citation>id_54, id_58, id_66, id_68</citation>
a. Shifting from caregiver-patient to balanced spousal relationships
b. Reducing over-protective behaviors
c. Open communication about changing abilities
d. Professional counseling and cognitive-behavioral therapy
3. Caregiver Support and Education <citation>id_61, id_66, id_68</citation>
a. Pre-operative education and expectation setting
b. Structured psychosocial interventions
c. Self-help groups and peer support networks
d. Caregiver self-care and stress management
E. Comprehensive Support Systems
1. Healthcare Team Coordination <citation>id_39, id_46, id_61, id_72</citation>
a. Multidisciplinary follow-up care
b. Specialized Parkinson’s nurses
c. Home-health nursing services
d. Telemedicine and remote programming options
28
29. 2. Long-Term Quality of Life Optimization <citation>id_51, id_54, id_61</citation>
a. Patient satisfaction monitoring (95% would recommend DBS)
b. Positive psychology interventions
c. Character-strength exercises
d. Long-term relationship and social support maintenance
3. Emergency Preparedness and End-of-Life Considerations <citation>id_78</citation>
a. Off-trial protocols for device assessment
b. Palliative care coordination
c. Hardware monitoring in advanced stages
d. Family decision-making support
F. Practical Implementation Strategies
1. Daily Living Skills <citation>id_60, id_63, id_67</citation>
a. Gradual re-engagement in routine tasks
b. Nutritional support (Mediterranean diet)
c. Intellectual and social activities
d. Family encouragement and participation
2. Long-term Monitoring and Adjustment <citation>id_47, id_52, id_69</citation>
a. Regular follow-up visits (monthly for first six months)
b. Symptom tracking and device optimization
c. Monitoring for complications or side effects
d. Battery replacement planning (every 3-5 years)
3. Resource Utilization and Support Networks <citation>id_55, id_58, id_76, id_79</citation>
a. Educational materials and patient guides
b. Webinars and professional consultations
c. Support groups and peer networks
d. Specialized healthcare provider access
</write_outline>
Step 8
<think>
I have now created a comprehensive, detailed outline that addresses all aspects of the user’s
query. The outline includes:
1. Detailed Parkinson’s disease staging and progression with comprehensive warning signs
2. Specific emergency situations requiring immediate medical intervention
3. Complete DBS surgery information including pre-operative, operative, and post-operative care
4. Extensive daily life adjustments and support strategies for DBS patients
5. Proper citations for all subsections
The outline is well-structured with four levels of hierarchy and covers all essential as-
pects requested. It provides practical, actionable information for families caring for Parkinson’s
patients, particularly those who have undergone DBS surgery. The outline is comprehensive,
logically organized, and includes insights into the reasoning behind various care strategies.
I believe this outline now fully addresses the user’s comprehensive query about Parkinson’s
disease warning signs, family alert systems, and post-DBS care strategies.
</think>
<terminate> ✓
29
30. B.3
Case Study of Writing
We present only the first 7 steps of the long writing trajectory to save space. The afterwards reasoning
process is similar to that in the first 7 steps.
Case Trajectory of Writing in DeepResearch Bench
Question: Parkinson’s disease has a profound impact on patients. What are the potential health
warning signs associated with different stages of the disease? As family members, which specific
signs should alert us to intervene or seek medical advice regarding the patient’s condition?
Furthermore, for patients who have undergone Deep Brain Stimulation (DBS) surgery, what daily
life adjustments and support strategies can be implemented to improve their comfort and overall
well-being?
Step 1
<think>
I need to write a comprehensive article about Parkinson’s disease warning signs, emergency
situations for family members, and post-DBS care strategies. Looking at the outline, I should start
by retrieving information about disease staging systems and warning signs. Let me begin with
the first section on understanding Parkinson’s disease progression and warning signs.
</think>
<tool_call>
{"name": "retrieve", "arguments": {"url_id": ["id_2", "id_6", "id_9", "id_12", "id_13", "id_14", "id_15",
"id_17", "id_20", "id_21"], "goal": "Gather comprehensive information about Hoehn and Yahr scale
classifications and disease staging systems for Parkinson’s disease"}}
</tool_call>
<tool_response>
The page content for the previous section has been masked for saving the space.
</tool_response>
Step 2
<think>
I have comprehensive information about the Hoehn and Yahr scale classifications and disease stag-
ing systems. This provides a solid foundation for understanding Parkinson’s disease progression.
I can see the clear progression from Stage 0 (no disease) through Stage 5 (wheelchair/bedridden),
with intermediate stages 1.5 and 2.5 in the modified version. The information shows both motor
and non-motor symptoms, timing of transitions between stages, and functional impacts. I should
now write the takeaway section and begin the first major section on disease staging systems.
</think>
<write>
<takeaway>
Parkinson’s disease progresses through distinct stages with recognizable warning signs that
families must understand to provide appropriate care and seek timely medical intervention. The
disease advances from subtle unilateral symptoms in early stages to severe bilateral impairment
requiring constant care in advanced stages. Critical emergency signs include sudden choking,
severe falls, medication disruptions, neurological changes, and psychiatric crises that demand
immediate medical attention. For patients who have undergone Deep Brain Stimulation
(DBS) surgery, comprehensive daily life adjustments encompass structured post-operative care,
medication management, device safety protocols, psychosocial support, and coordinated family
involvement to optimize comfort and long-term outcomes. Understanding these progressive
warning signs and implementing evidence-based support strategies enables families to navigate
the complex challenges of Parkinson’s disease while maintaining patient dignity and quality of
30
31. life.
</takeaway>
Parkinson’s Disease: Health Warning Signs, Family Alert Systems, and Post-DBS Care Strategies
I. Understanding Parkinson’s Disease Progression and Warning Signs
A. Disease Staging Systems
Understanding Parkinson’s disease progression requires familiarity with established staging
systems that provide frameworks for recognizing symptom evolution and functional decline.
These systems serve as critical tools for families to understand what to expect and when to seek
medical intervention.
1. Hoehn and Yahr Scale Classifications
The Hoehn and Yahr scale, originally published in 1967, remains the most widely used system for
staging Parkinson’s disease progression. <cite id="id_14,id_17">This scale distinguishes between
five stages of Parkinson’s Disease, from unilateral impairment (Stage 1) to bilateral impairment,
without postural control difficulties (Stage 2), to postural instability (Stage 3), to loss of physical
independence (Stage 4) and finally, to wheelchair-bound or bed-bound (Stage 5)</cite>. The
modified version includes intermediate stages that capture more nuanced progression patterns.
<table>
<caption>Hoehn and Yahr Scale Progression and Warning Signs</caption>
<thead>
<tr>
<th>Stage</th>
<th>Primary Characteristics</th>
<th>Functional Impact</th>
<th>Key Warning Signs for Families</th>
</tr>
</thead>
<tbody>
<tr>
<td>Stage 0</td>
<td><cite id="id_12,id_21">No signs of disease</cite></td>
<td>Normal function</td>
<td>Baseline monitoring period</td>
</tr>
<tr>
<td>Stage 1.0</td>
<td><cite id="id_15,id_20">Unilateral tremor, rigidity, reduced arm swing, and slowness present
only on one side</cite></td>
<td><cite id="id_2,id_6">Minimal interference with daily activities</cite></td>
<td>Subtle tremor in one finger, asymmetric arm swing</td>
</tr>
<tr>
<td>Stage 1.5</td>
<td><cite id="id_15,id_21">Unilateral involvement with axial symptoms like reduced facial
expression, stooped posture, or reduced arm swing on both sides</cite></td>
<td>Slight postural changes</td>
<td>Facial masking, posture changes visible to family</td>
</tr>
<tr>
<td>Stage 2.0</td>
31
32. <td><cite id="id_13,id_17">Bilateral involvement without impairment of balance</cite></td>
<td><cite id="id_2,id_6">Daily activities become more difficult but independent living realis-
tic</cite></td>
<td>Tremor and stiffness on both sides, speech difficulties</td>
</tr>
<tr>
<td>Stage 2.5</td>
<td><cite id="id_15,id_21">Mild bilateral disease with recovery on pull test (taking ≥ 3 steps to
recover balance)</cite></td>
<td>Mild balance impairment</td>
<td>Slight unsteadiness, requires multiple steps to recover balance</td>
</tr> <tr> <td>Stage 3.0</td>
<td><cite id="id_13,id_20">Mild to moderate bilateral disease with impaired postural reflexes;
physically independent</cite></td>
<td><cite id="id_2,id_6">Falls become frequent, assistance needed for some activities</cite></td>
<td>Loss of balance when pulled backward, increased fall risk</td>
</tr> <tr> <td>Stage 4.0</td>
<td><cite id="id_13,id_20">Severe disability; still able to walk or stand unassisted but markedly
incapacitated</cite></td>
<td><cite id="id_2,id_6">Cannot live independently, requires walker or assistance</cite></td>
<td>Need for assistive devices, inability to live alone safely</td>
</tr> <tr> <td>Stage 5.0</td>
<td><cite id="id_13,id_21">Wheelchair bound or bedridden unless aided</cite></td> <td><cite
id="id_2,id_6">Around-the-clock nursing care required, up to 50% experience hallucinations and
delusions</cite></td>
<td>Inability to stand or walk, cognitive disturbances, complete dependency</td> </tr>
</tbody>
</table>
This staging system reveals several critical insights for families. **The transition from Stage 2 to
Stage 3 marks a pivotal point** where balance problems emerge, fundamentally changing the
patient’s safety profile and care needs. <cite id="id_20">Loss of balance, with the inability to
make the rapid, automatic and involuntary movements necessary to protect against falling, is
one of the most troubling and dangerous aspects of Parkinsonism and one of the least easily
treated</cite>. Understanding this progression helps families anticipate when environmental
modifications and increased supervision become essential.
**The distinction between Stage 3 and Stage 4 centers on independence**. <cite id="id_20">The
most important factor identifying Stage III (as opposed to stage IV) is that the patient is still fully
independent in all activities of daily living (dressing, hygiene, eating, etc.), while Stage IV patients
are unable to lead an independent life because of the need for help with some activities of daily
living</cite>. This transition signals when families must consider more intensive caregiving
arrangements or assisted living options.
</write>
Step 3
<think>
Now I need to continue with information about the MDS-UPDRS system and disease progression
timeline. Let me retrieve more specific information about these aspects.
</think>
<tool_call>
{"name": "search", "arguments": {"url_id": ["id_3", "id_16", "id_17", "id_19"], "goal": "Gather
32
33. information about MDS-UPDRS assessment system, disease progression timeline, and transition
indicators between stages"}}
</tool_call>
<tool_response>
The page content for the previous section has been masked for saving the space.
</tool_response>
Step 4
<think>
The tool response provides valuable information about the MDS-UPDRS system, disease
progression timeline, and the critical transition point at Stage 3. I can see that:
1. MDS-UPDRS has replaced the older Hoehn-Yahr system as the main assessment tool
2. The progression timeline shows median transition times between stages (20-26 months
typically)
3. Stage 3 (HY > 3) represents a critical threshold marking advanced disease with increased
disability, medication-refractory features, and cognitive decline
4. There’s important information about non-motor symptoms appearing years before motor
symptoms
5. Specific progression warning signs are detailed
I should continue writing about the MDS-UPDRS system and progression timeline, then move on
to early stage warning signs.
</think>
<tool_call>
{"name": "search", "arguments": ["DBS patient daily living support family caregivers", "Parkinson
DBS lifestyle modifications comfort strategies", "DBS programming adjustment patient family
guide"], "goal": "Find specific daily life support strategies, comfort measures, and family guidance
for caring for Parkinson’s patients who have undergone DBS surgery"}}
</tool_call>
<write>
<write> 2. Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS)
While the Hoehn and Yahr scale remains valuable for understanding disease progression,
<cite id="id_3">the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale
(MDS-UPDRS) is healthcare providers’ main tool to classify this disease. The MDS-UPDRS
examines four different areas of how Parkinson’s disease affects you</cite>. This comprehensive
assessment tool provides a more nuanced evaluation of both motor and non-motor symptoms,
addressing the limitations of purely stage-based approaches.
**The significance of this shift toward individualized assessment cannot be overstated**. Rather
than simply categorizing patients into stages, the MDS-UPDRS recognizes that Parkinson’s
disease affects each person differently, requiring personalized treatment approaches. This
evolution in assessment methodology reflects our growing understanding that the disease’s
impact extends far beyond motor symptoms.
3. Disease Progression Timeline and Transition Indicators
Understanding the temporal aspects of Parkinson’s progression helps families prepare for
changing care needs. <cite id="id_17">Median transition times between stages show Stage
2 progression occurring around 20 months, with subsequent stages progressing every 24-26
months</cite>. However, <cite id="id_16">for some people, it can take as long as 20 years
to move through all the stages</cite>, emphasizing the highly variable nature of disease
progression.
**The transition to Stage 3 represents a critical threshold** in Parkinson’s disease progression.
<cite id="id_19">Transitions from scores of ≤ 2.5 to scores of ≥ 3 on the modified HY scale
33
34. mark a worsening in quality of life and an increase in the risk of medication-refractory disease
features such as dementia and mortality</cite>. This transition point signals when families must
significantly intensify their care planning and support strategies.
The underlying pathophysiology also changes at this critical juncture. <cite id="id_19">Although
striatal dopaminergic neuropathology accounts for many of the early motor features of PD, in
vivo imaging studies suggest floor effects to striatal dopaminergic denervation that occur both in
early PD and at HY stage 3 disease, at which point progressive disease burden is driven by other
factors</cite>. This shift from primarily dopaminergic to multi-system pathology explains why
treatment becomes more complex and why families observe new types of symptoms emerging.
B. Early Stage Warning Signs (Stages 1-2)
Early recognition of Parkinson’s disease symptoms enables timely medical intervention and better
long-term outcomes. The challenge for families lies in distinguishing between normal aging
changes and early disease manifestations.
1. Motor Symptoms
The classic motor symptoms of Parkinson’s disease typically appear gradually and asymmetrically.
<cite id="id_3">Motor symptoms — which means movement-related symptoms — of Parkinson’s
disease include tremor at rest, bradykinesia (slowness of movement), rigidity (muscle stiffness),
and postural instability (balance problems)</cite>.
**Tremor characteristics provide important diagnostic clues**. The typical Parkinson’s tremor
occurs at rest, often beginning subtly in one finger or hand. Families should note that this tremor
typically diminishes during purposeful movement, distinguishing it from other types of tremors.
The asymmetric onset is particularly significant—symptoms beginning on one side of the body
are more suggestive of Parkinson’s disease than bilateral symptoms.
**Bradykinesia manifests in multiple ways** that families can observe in daily activities. This
slowness of movement affects not only gross motor activities but also fine motor skills, leading to
characteristic changes in handwriting (micrographia), reduced facial expression (facial masking),
and decreased arm swing during walking. These changes often develop so gradually that they
may be attributed to normal aging until they become more pronounced.
**Rigidity and postural changes** represent another category of early motor symptoms. Families
may notice increased muscle stiffness, particularly in the neck and shoulders, leading to
complaints of aching or discomfort. The development of a stooped posture and reduced arm
swing during walking are early indicators that warrant medical evaluation.
2. Non-Motor Early Indicators
Perhaps most significantly for early detection, <cite id="id_3">many of the possible non-motor
symptoms can appear years or even decades ahead of motor symptoms</cite>. This extended
prodromal period offers families crucial opportunities for early intervention, though the subtlety
of these symptoms often leads to delayed recognition.
**Olfactory dysfunction represents one of the earliest detectable changes**. <cite id="id_3">Loss
of sense of smell (anosmia)</cite> occurs in approximately 90% of Parkinson’s patients, often
years before motor symptoms appear. Families should take note when a loved one reports that
food no longer tastes the same or when they fail to detect strong odors that others notice readily.
**Gastrointestinal symptoms, particularly constipation**, frequently precede motor symptoms
by years. <cite id="id_3">Constipation</cite> in Parkinson’s disease results from the same
pathological processes affecting the brain, specifically involving the enteric nervous system. This
symptom is often dismissed as age-related or dietary, but persistent constipation without clear
cause warrants medical attention.
**Sleep disturbances provide another early warning system**. <cite id="id_3">Rapid eye
movement sleep behavior disorder</cite> is particularly significant, as it involves acting
out dreams during sleep, potentially including violent movements or vocalizations. <cite
34
35. id="id_16">REM sleep behavior disorder can also happen in people who don’t have Parkinson’s.
However, if this isn’t something you’ve dealt with before, it’s likely related to your disease</cite>.
Family members sharing a bed are often the first to notice these changes.
**Neuropsychiatric symptoms deserve particular attention** from families.
<cite
id="id_3">Depression or anxiety</cite> can appear years before motor symptoms, and
<cite id="id_16">anxiety and depression have been linked to Parkinson’s. In addition to
movement problems, the disease can also have an impact on your mental health</cite>. These
mood changes may be subtle initially but can significantly impact quality of life and relationships.
</write>
Step 5
<think>
Now I need to continue with information about progressive stage warning signs and advanced
stage warning signs. Let me retrieve more specific information about these later stages and their
complications.
</think>
<tool_call>
{"name": "retrieve", "arguments": {"url_id": ["id_5", "id_6", "id_18", "id_22"], "goal": "Gather
detailed information about progressive and advanced stage warning signs, including motor and
non-motor complications"}}
</tool_call>
<tool_response>
The page content for the previous section has been masked for saving the space.
</tool_response>
Step 6
<think>
The tool response provides excellent detailed information about progressive and advanced stage
warning signs. I can see clear patterns in the progression from mild to moderate to advanced
stages, with specific motor and non-motor symptoms at each stage. The information shows how
symptoms evolve from unilateral to bilateral, how functional independence is progressively lost,
and how non-motor symptoms become increasingly prominent and disabling. I should organize
this information to show the progression clearly and highlight the key warning signs families
need to recognize.
</think>
<write>
C. Progressive Stage Warning Signs (Stages 2-3)
As Parkinson’s disease advances beyond the early stage, families observe more pronounced
symptoms that begin to significantly impact daily functioning. This progressive phase represents
a critical period where proactive intervention can substantially influence quality of life outcomes.
1. Advancing Motor Symptoms
**The bilateral spread of symptoms marks a fundamental shift** in disease progression. <cite
id="id_18">As the disease progresses, it eventually affects both sides. However, the symptoms
may still be more severe on one side than on the other</cite>. This bilateral involvement signals
that families must adapt their support strategies to address more comprehensive functional
limitations.
<cite id="id_22">Often within 3 to 7 years, you’ll see more changes. Early on, you might have a
little trouble with something like buttoning a shirt. At this point, you may not be able to do it at
all</cite>. **This progression from mild difficulty to complete inability** represents a pattern
families will observe across multiple daily activities, requiring gradual assumption of caregiving
responsibilities.
35
36. **Speech and swallowing changes become particularly noticeable** during this stage. <cite
id="id_5">You may speak softly or quickly, slur, or hesitate before talking. Your speech may
be flat or monotone, without typical speech patterns</cite>. These changes affect not only
communication but also safety, as <cite id="id_22">trouble swallowing</cite> increases the risk
of aspiration and nutritional deficiencies.
**Gait disturbances and freezing episodes** represent some of the most challenging aspects of
progressive Parkinson’s disease. <cite id="id_18">People with Parkinson’s disease often develop
a parkinsonian gait that includes a tendency to lean forward; take small, quick steps; and reduce
swinging their arms. They also may have trouble initiating or continuing movement</cite>. The
emergence of <cite id="id_22">freezing when you first start to walk or change direction, as if your
feet are glued to the ground</cite> creates significant safety concerns and mobility limitations.
2. Emerging Complications
**Medication effectiveness begins to fluctuate** during the progressive stage, creating new
challenges for symptom management. <cite id="id_22">You might also find that the medicine you
take starts to wear off between doses</cite>. This "wearing-off" phenomenon requires families to
become more attentive to timing of medications and to recognize when symptoms return between
doses.
**Balance problems and fall risk increase substantially**. <cite id="id_6">At this stage, falls
become more frequent due to balance and reflex problems</cite>, and <cite id="id_22">falls
to be more likely</cite>. This increased fall risk necessitates environmental modifications and
potentially the introduction of assistive devices.
3. Functional Impact Assessment
**The distinction between Stages 2 and 3 centers on independence preservation**. <cite
id="id_6">Due to the various issues this stage presents, daily tasks are much more difficult to do,
but people can still live independently. A combination of medication and therapy can help to
manage the symptoms mentioned above</cite>. Understanding this threshold helps families
recognize when to increase support while still encouraging independence.
D. Advanced Stage Warning Signs (Stages 4-5)
The advanced stages of Parkinson’s disease present complex challenges that require comprehen-
sive care planning and family coordination. These stages are characterized by severe functional
limitations and the emergence of complications that significantly impact quality of life.
1. Severe Motor Impairment
**Stage 4 marks the loss of independent living capability**. <cite id="id_6">The critical factor in
separating people with stage 3 Parkinson’s and stage 4 is independence. Motor skills and deep
brain stimulation are heavily impacted at stage 4 and as such, movement symptoms affect a
person’s ability to retain their independence</cite>. This transition requires families to make
difficult decisions about living arrangements and care provision.
<cite id="id_22">At this point, you likely: Are limited to bed or a wheelchair, Can’t live on your
own, Have severe posture issues in your neck, back, and hips, Need help with daily tasks</cite>.
**The comprehensive nature of these limitations** means that families must coordinate multiple
aspects of care, from basic hygiene to mobility assistance.
**Stage 5 represents the most severe manifestation** of motor symptoms. <cite id="id_6">Severe
stiffness can make it difficult, if not impossible, for a person to stand or walk. This is due to
stiffness causing the legs to essentially freeze when the patient attempts to stand</cite>. This
"freezing" phenomenon at the advanced stage differs qualitatively from earlier freezing episodes,
representing a more permanent limitation rather than intermittent difficulty.
2. Complex Non-Motor Complications
**Cognitive decline becomes a prominent feature** in advanced stages. <cite id="id_18">Over
time, as the disease progresses, some people may develop dementia and be diagnosed with
36
37. Parkinson’s dementia, a type of Lewy body dementia. People with Parkinson’s dementia may
have severe memory and thinking problems that affect daily living</cite>. This cognitive
impairment adds another layer of complexity to caregiving responsibilities.
**Psychiatric symptoms emerge as significant challenges**. <cite id="id_6">Furthermore, up to
50% of Parkinson’s patients in stages 4 and 5 experience confusion, hallucinations, and delusions.
For clarity, hallucinations are seeing things that aren’t really there, and delusions are when a
person believes something despite evidence to the contrary</cite>. These symptoms can be
particularly distressing for families and may require specialized psychiatric intervention.
**Swallowing and nutritional complications** become life-threatening concerns.
<cite
id="id_5">Late-stage Parkinson’s disease affects the muscles in the mouth. This causes trouble
swallowing and chewing, which can lead to not getting enough nutrients in your diet. If food or
saliva collects in the mouth, it can cause choking or drooling</cite>. These complications require
careful monitoring and may necessitate alternative feeding methods.
<table>
<caption>Progressive Warning Signs by Disease Stage</caption>
<thead> <tr> <th>Stage</th>
<th>Motor Warning Signs</th>
<th>Non-Motor Warning Signs</th>
<th>Functional Impact</th>
<th>Family Action Required</th> </tr> </thead> <tbody> <tr>
<td>Early (1-2)</td>
<td>Unilateral tremor, mild rigidity, reduced arm swing, micrographia</td>
<td>Sleep disturbances, constipation, loss of smell, mood changes</td>
<td>Minimal daily life disruption</td>
<td>Medical evaluation, symptom monitoring</td> </tr> <tr>
<td>Progressive (2-3)</td>
<td>Bilateral symptoms, gait freezing, speech changes, balance problems</td>
<td>Medication wearing-off, cognitive changes, swallowing difficulty</td>
<td>Increased difficulty with daily tasks, still independent</td>
<td>Environmental modifications, fall prevention, medication timing</td>
</tr> <tr> <td>Advanced (4-5)</td>
<td>Severe rigidity, wheelchair/bed bound, postural deformities</td>
<td>Dementia, hallucinations, severe dysphagia, autonomic failure</td>
<td>Complete dependency for daily activities</td>
<td>24-hour care, safety monitoring, end-of-life planning</td>
</tr>
</tbody>
</table>
**The temporal progression of non-motor symptoms** follows a predictable pattern that families
should understand. <cite id="id_22">What may show up early. You may have these issues years
before any classic motor symptoms like tremors: Constipation, Depression, Loss of smell, Low
blood pressure when you stand up, Pain, Sleep issues</cite>. Conversely, <cite id="id_22">What
may show up later. Dementia and psychosis are two serious mental health issues that usually
take a while to show up</cite>.
This progression pattern reveals that **non-motor symptoms often become more disabling than
motor symptoms** in advanced stages. <cite id="id_22">Almost everyone with Parkinson’s gets
at least one of these. When severe, they’re more likely than motor issues to lead to a disability
or make you move into a nursing home</cite>. This insight helps families understand why
comprehensive care planning must address both motor and non-motor aspects of the disease.
37
38. 3. End-Stage Considerations
**The advanced stage requires families to confront difficult realities** about prognosis and care
goals. <cite id="id_22">After about 10 years, most people will have at least one major issue, like
dementia or a physical disability</cite>. Understanding this timeline helps families engage in
advance care planning while the patient can still participate in decision-making.
**The variability in progression** means that some individuals may never reach the most severe
stages, while others progress more rapidly. <cite id="id_22">Some people never reach this
stage</cite> of advanced disability, emphasizing the importance of individualized care planning
rather than assuming inevitable decline. </write>
Step 7
<think>
Now I need to move on to the second major section about critical warning signs requiring
immediate medical intervention. This is crucial information for families to know when to seek
emergency care. Let me retrieve information about emergency situations and warning signs.
</think>
<tool_call>
{"name": "retrieve", "arguments": {"url_id": ["id_24", "id_25", "id_26", "id_28", "id_29", "id_31",
"id_32", "id_33", "id_34", "id_35", "id_36"], "goal": "Gather comprehensive information about
emergency warning signs and situations that require immediate medical intervention for
Parkinson’s patients"}}
</tool_call>
<tool_response>
The page content for the previous section has been masked for saving the space.
</tool_response>
Step 8
...
<terminate> ✓
38
39. References
Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K
Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. gpt-oss-120b & gpt-oss-20b model card. arXiv preprint
arXiv:2508.10925, 2025.
anthropic. Meet claude, 2025. URL https://www.anthropic.com/claude.
Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi
Li. Longwriter: Unleashing 10,000+ word generation from long context llms. arXiv preprint
arXiv:2408.07055, 2024.
Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li.
Longwriter: Unleashing 10, 000+ word generation from long context llms. In The Thirteenth International
Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025.
URL https://openreview.net/forum?id=kQ5s9Yh0WI.
João Coelho, Jingjie Ning, Jingyuan He, Kangrui Mao, Abhijay Paladugu, Pranav Setlur, Jiahe Jin, Jamie
Callan, João Magalhães, Bruno Martins, et al. Deepresearchgym: A free, transparent, and reproducible
evaluation sandbox for deep research. arXiv preprint arXiv:2505.19253, 2025.
Deep Consult. Deep consult. 2025. URL https://github.com/Su-Sea/ydc-deep-research-evals.
Google DeepMind. Gemini 2.5, 2025. URL https://blog.google/technology/google-deepmind/gemi
ni-model-thinking-updates-march-2025/.
Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, and Zhendong Mao. Deepresearch bench: A
comprehensive benchmark for deep research agents. arXiv preprint arXiv:2506.11763, 2025.
Runnan Fang, Shihao Cai, Baixuan Li, Jialong Wu, Guangyu Li, Wenbiao Yin, Xinyu Wang, Xiaobin Wang,
Liangcai Su, Zhen Zhang, Shibin Wu, Zhengwei Tao, Yong Jiang, Pengjun Xie, Fei Huang, and Jingren
Zhou. Towards general agentic intelligence via environment scaling, 2025.
google. Try deep research and our new experimental model in gemini, your ai assistant, 2025. URL
https://blog.google/products/gemini/google-gemini-deep-research/.
Rujun Han, Yanfei Chen, Zoey CuiZhu, Lesly Miculicich, Guan Sun, Yuanjun Bi, Weiming Wen, Hui Wan,
Chunfeng Wen, Solène Maître, George Lee, Vishy Tirumalashetty, Emily Xue, Zizhao Zhang, Salem
Haykal, Burak Gokturk, Tomas Pfister, and Chen-Yu Lee. Deep researcher with test-time diffusion.
CoRR, abs/2507.16075, 2025. doi: 10.48550/ARXIV.2507.16075. URL https://doi.org/10.48550/arX
iv.2507.16075.
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models
for code generation. arXiv preprint arXiv:2406.00515, 2024.
LangChain, Inc. LangChain: Building applications with LLMs through composability, 2023. URL
https://python.langchain.com/.
Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Yida Zhao, Liwen Zhang, Litu Ou, Dingchu Zhang,
Xixi Wu, Jialong Wu, Xinyu Wang, Zile Qiao, et al. Websailor-v2: Bridging the chasm to proprietary
agents via synthetic data and scalable reinforcement learning, 2025a.
Kuan Li, Zhongwang Zhang, Huifeng Yin, Liwen Zhang, Litu Ou, Jialong Wu, Wenbiao Yin, Baixuan
Li, Zhengwei Tao, Xinyu Wang, Weizhou Shen, Junkai Zhang, Dingchu Zhang, Xixi Wu, Yong Jiang,
Ming Yan, Pengjun Xie, Fei Huang, and Jingren Zhou. Websailor: Navigating super-human reasoning
for web agent. CoRR, abs/2507.02592, 2025b. doi: 10.48550/ARXIV.2507.02592. URL https:
//doi.org/10.48550/arXiv.2507.02592.
39
40. Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, and Wenhu Chen. Long-context llms struggle with long
in-context learning. arXiv preprint arXiv:2404.02060, 2024.
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi
Deng, Chenyu Zhang, Chong Ruan, et al. DeepSeek-V3 technical report. arXiv preprint arXiv:2412.19437,
2024.
Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy
Liang. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172,
2023.
Siyi Liu, Kishaloy Halder, Zheng Qi, Wei Xiao, Nikolaos Pappas, Phu Mon Htut, Neha Anna John, Yassine
Benajiba, and Dan Roth. Towards long context hallucination detection. arXiv preprint arXiv:2504.19457,
2025.
Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. Gaia: a
benchmark for general ai assistants. In The Twelfth International Conference on Learning Representations,
2023.
OpenAI. Deep research system card, 2025a. URL https://cdn.openai.com/deep-research-system-c
ard.pdf.
OpenAI. Introducing openai o3 and o4-mini, 2025b. URL https://openai.com/index/introducing-o
3-and-o4-mini/.
Zile Qiao, Guoxin Chen, Xuanzhong Chen, Donglei Yu, Wenbiao Yin, Xinyu Wang, Zhen Zhang, Baixuan
Li, Huifeng Yin, Kuan Li, Rui Min, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, and Jingren
Zhou. WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents, 2025.
Qwen Team. QwQ-32B: Embracing the power of reinforcement learning, March 2025. URL https:
//qwenlm.github.io/blog/qwq-32b/.
Doubao Deep Research. Doubao deep research. 2025a. URL https://www.doubao.com/chat/.
Gemini Research. Gemini research. 2025b. URL https://gemini.google/overview/deep-research/.
GPT Research. Gpt research. 2025c. URL https://github.com/assafelovic/gpt-researcher.
Kimi Deep Research. Kimi deep research. 2025d. URL https://www.kimi.com/.
Open Deep Research. Open deep research. 2025e. URL https://github.com/langchain-ai/open_dee
p_research.
Corby Rosset, Ho-Lam Chung, Guanghui Qin, Ethan C. Chau, Zhuo Feng, Ahmed Awadallah, Jennifer
Neville, and Nikhil Rao. Researchy questions: A dataset of multi-perspective, decompositional
questions for LLM web agents. CoRR, abs/2402.17896, 2024. doi: 10.48550/ARXIV.2402.17896. URL
https://doi.org/10.48550/arXiv.2402.17896.
Aymeric Roucher, Albert Villanova del Moral, merve, Thomas Wolf, and Clémentine Fourrier. Open-
source deepresearch – freeing our search agents. 2025. URL https://huggingface.co/blog/open-d
eep-research.
Liangcai Su, Zhen Zhang, Guangyu Li, Zhuo Chen, Chenxi Wang, Maojia Song, Xinyu Wang, Kuan Li,
Jialong Wu, Xuanzhong Chen, Zile Qiao, Zhongwang Zhang, Huifeng Yin, Shihao Cai, Runnan Fang,
Zhengwei Tao, Wenbiao Yin, et al. Scaling agents via continual pre-training, 2025.
40
41. Zhengwei Tao, Jialong Wu, Wenbiao Yin, Junkai Zhang, Baixuan Li, Haiyang Shen, Kuan Li, Liwen
Zhang, Xinyu Wang, Yong Jiang, Pengjun Xie, Fei Huang, and Jingren Zhou. Webshaper: Agentically
data synthesizing via information-seeking formalization. CoRR, abs/2507.15061, 2025. doi: 10.48550/A
RXIV.2507.15061. URL https://doi.org/10.48550/arXiv.2507.15061.
Kaiyang Wan, Honglin Mu, Rui Hao, Haoran Luo, Tianle Gu, and Xiuying Chen. A cognitive writing
perspective for constrained long-form text generation. In Wanxiang Che, Joyce Nabende, Ekaterina
Shutova, and Mohammad Taher Pilehvar (eds.), Findings of the Association for Computational Linguistics,
ACL 2025, Vienna, Austria, July 27 - August 1, 2025, pp. 9832–9844. Association for Computational
Linguistics, 2025. URL https://aclanthology.org/2025.findings-acl.511/.
Qianyue Wang, Jinwu Hu, Zhengping Li, Yufeng Wang, Daiyuan Li, Yu Hu, and Mingkui Tan. Generating
long-form story using dynamic hierarchical outlining with memory-enhancement. In Luis Chiruzzo,
Alan Ritter, and Lu Wang (eds.), Proceedings of the 2025 Conference of the Nations of the Americas Chapter
of the Association for Computational Linguistics: Human Language Technologies, NAACL 2025 - Volume
1: Long Papers, Albuquerque, New Mexico, USA, April 29 - May 4, 2025, pp. 1352–1391. Association
for Computational Linguistics, 2025. doi: 10.18653/V1/2025.NAACL-LONG.63. URL https:
//doi.org/10.18653/v1/2025.naacl-long.63.
Jason Wei, Zhiqing Sun, Spencer Papay, Scott McKinney, Jeffrey Han, Isa Fulford, Hyung Won Chung,
Alex Tachard Passos, William Fedus, and Amelia Glaese. Browsecomp: A simple yet challenging
benchmark for browsing agents. arXiv preprint arXiv:2504.12516, 2025.
Jialong Wu, Baixuan Li, Runnan Fang, Wenbiao Yin, Liwen Zhang, Zhengwei Tao, Dingchu Zhang, Zekun
Xi, Yong Jiang, Pengjun Xie, et al. Webdancer: Towards autonomous information seeking agency. arXiv
preprint arXiv:2505.22648, 2025a.
Weiqi Wu, Xin Guan, Shen Huang, Yong Jiang, Pengjun Xie, Fei Huang, Jiuxin Cao, Hai Zhao, and Jingren
Zhou. Masksearch: A universal pre-training framework to enhance agentic search capability. 2025b.
URL https://arxiv.org/abs/2505.20285.
Yuhao Wu, Ming Shan Hee, Zhiqiang Hu, and Roy Ka-Wei Lee. Longgenbench: Benchmarking long-form
generation in long context llms. In The Thirteenth International Conference on Learning Representations,
2025c.
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao,
Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025.
Kevin Yang, Yuandong Tian, Nanyun Peng, and Dan Klein. Re3: Generating longer stories with recursive
reprompting and revision. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of
the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United
Arab Emirates, December 7-11, 2022, pp. 4393–4479. Association for Computational Linguistics, 2022. doi:
10.18653/V1/2022.EMNLP-MAIN.296. URL https://doi.org/10.18653/v1/2022.emnlp-main.296.
Kevin Yang, Dan Klein, Nanyun Peng, and Yuandong Tian. DOC: improving long story coherence with
detailed outline control. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings
of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL
2023, Toronto, Canada, July 9-14, 2023, pp. 3378–3465. Association for Computational Linguistics, 2023.
doi: 10.18653/V1/2023.ACL-LONG.190. URL https://doi.org/10.18653/v1/2023.acl-long.190.
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. Re-
act: Synergizing reasoning and acting in language models. In International Conference on Learning
Representations (ICLR), 2023.
Haopeng Zhang, Philip S Yu, and Jiawei Zhang. A systematic survey of text summarization: From
statistical methods to large language models. ACM Computing Surveys, 57(11):1–41, 2025.
41
42. Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin,
Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging
LLM-as-a-judge with MT-Bench and Chatbot Arena. In NeurIPS, 2023.
42