# MCP Download Issue - Fix Documentation ## Problem Summary The MCP arXiv client was experiencing an issue where the `download_paper` tool would complete successfully on the remote MCP server, but the downloaded PDF files would not appear in the client's local `data/mcp_papers/` directory. ### Root Cause The issue stems from the **client-server architecture** of MCP (Model Context Protocol): 1. **MCP Server** runs as a separate process (possibly remote) 2. **Server downloads PDFs** to its own storage location 3. **Server returns** `{"status": "success"}` without file path 4. **Client expects files** in its local `data/mcp_papers/` directory 5. **No file transfer mechanism** exists between server and client storage This is fundamentally a **storage path mismatch** between what the server uses and what the client expects. ## Solution Implemented ### 1. Tool Discovery (Diagnostic) Added automatic tool discovery when connecting to MCP server: - Lists all available MCP tools at session initialization - Logs tool names, descriptions, and schemas - Helps diagnose what capabilities the server provides **Location:** `utils/mcp_arxiv_client.py:88-112` (`_discover_tools` method) ### 2. Direct Download Fallback Implemented a fallback mechanism that downloads PDFs directly from arXiv when MCP download fails: - Detects when MCP download completes but file is not accessible - Downloads PDF directly from `https://arxiv.org/pdf/{paper_id}.pdf` - Writes file to client's local storage directory - Maintains same retry logic and error handling **Location:** `utils/mcp_arxiv_client.py:114-152` (`_download_from_arxiv_direct` method) ### 3. Enhanced Error Handling Updated `download_paper_async` to: - Try MCP download first (preserves existing functionality) - Check multiple possible file locations - Fall back to direct download if MCP fails - Provide detailed logging at each step **Location:** `utils/mcp_arxiv_client.py:462-479` (updated error handling) ## How It Works Now ### Download Flow ``` 1. Check if file already exists locally → Return if found 2. Call MCP server's download_paper tool 3. Check if file appeared in expected locations: a. Expected path: data/mcp_papers/{paper_id}.pdf b. MCP-returned path (if provided in response) c. Any file in storage matching paper_id 4. If file not found → Fall back to direct arXiv download 5. Download PDF directly to client storage 6. Return path to downloaded file ``` ### Benefits - **Zero breaking changes**: Existing MCP functionality preserved - **Automatic fallback**: Works even with remote MCP servers - **Better diagnostics**: Tool discovery helps troubleshoot issues - **Guaranteed downloads**: Direct fallback ensures files are retrieved - **Client-side storage**: Files always accessible to client process ## Using the Fix ### Running the Application No changes needed! The fix is automatic: ```bash # Set environment variables (optional - defaults work) export USE_MCP_ARXIV=true export MCP_ARXIV_STORAGE_PATH=data/mcp_papers # Run the application python app.py ``` The system will: 1. Try MCP download first 2. Automatically fall back to direct download if needed 3. Log which method succeeded ### Running Diagnostics Use the diagnostic script to test your MCP setup: ```bash python test_mcp_diagnostic.py ``` This will: - Check environment configuration - Verify storage directory setup - List available MCP tools - Test search functionality - Test download with detailed logging - Show file system state before/after **Expected Output:** ``` ================================================================================ MCP arXiv Client Diagnostic Test ================================================================================ [1] Environment Configuration: USE_MCP_ARXIV: true MCP_ARXIV_STORAGE_PATH: data/mcp_papers [2] Storage Directory: Path: /path/to/data/mcp_papers Exists: True Contains 0 PDF files [3] Initializing MCP Client: ✓ Client initialized successfully [4] Testing Search Functionality: ✓ Search successful, found 2 papers First paper: Attention Is All You Need... Paper ID: 1706.03762 [5] Testing Download Functionality: Attempting to download: 1706.03762 PDF URL: https://arxiv.org/pdf/1706.03762.pdf ✓ Download successful! File path: data/mcp_papers/1706.03762v7.pdf File exists: True File size: 2,215,520 bytes (2.11 MB) [6] Storage Directory After Download: Contains 1 PDF files Files: ['1706.03762v7.pdf'] [7] Cleaning Up: ✓ MCP session closed ================================================================================ Diagnostic Test Complete ================================================================================ ``` ## Interpreting Logs ### Successful MCP Download If MCP server works correctly, you'll see: ``` 2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Downloading paper 2203.08975v2 via MCP 2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - MCP download_paper response type: 2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Successfully downloaded paper to data/mcp_papers/2203.08975v2.pdf ``` ### Fallback to Direct Download If MCP fails but direct download succeeds: ``` 2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - File not found at expected path 2025-11-12 01:50:27 - utils.mcp_arxiv_client - ERROR - MCP download call completed but file not found 2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - Falling back to direct arXiv download... 2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Attempting direct download from arXiv for 2203.08975v2 2025-11-12 01:50:28 - utils.mcp_arxiv_client - INFO - Successfully downloaded 1234567 bytes to data/mcp_papers/2203.08975v2.pdf ``` ### Tool Discovery At session initialization: ``` 2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - MCP server provides 3 tools: 2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - - search_papers: Search arXiv for papers 2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - - download_paper: Download paper PDF 2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - - list_papers: List cached papers ``` ## Troubleshooting ### Issue: MCP server not found **Symptom:** Error during initialization: `command not found: arxiv-mcp-server` **Solution:** - Ensure MCP server is installed and in PATH - Check server configuration in your MCP settings - Try using direct ArxivClient instead: `export USE_MCP_ARXIV=false` ### Issue: Files still not downloading **Symptom:** Both MCP and direct download fail **Possible causes:** 1. Network connectivity issues 2. arXiv API rate limiting 3. Invalid paper IDs 4. Storage directory permissions **Debugging steps:** ```bash # Check network connectivity curl https://arxiv.org/pdf/1706.03762.pdf -o test.pdf # Check storage permissions ls -la data/mcp_papers/ touch data/mcp_papers/test.txt # Run diagnostic script python test_mcp_diagnostic.py ``` ### Issue: MCP server uses different storage path **Symptom:** MCP downloads succeed but client can't find files **Current solution:** Direct download fallback handles this automatically **Future enhancement:** Could add file transfer mechanism if MCP provides retrieval tools ## Technical Details ### Architecture Decision: Why Fallback Instead of File Transfer? We chose direct download fallback over implementing a file transfer mechanism because: 1. **Server is third-party**: Cannot modify MCP server to add file retrieval tools 2. **Simpler implementation**: Direct download is straightforward and reliable 3. **Better performance**: Avoids two-step download (server → client transfer) 4. **Same result**: Client gets PDFs either way 5. **Fail-safe**: Works even if MCP server is completely unavailable ### Performance Impact - **MCP successful**: No performance change (same as before) - **MCP fails**: Extra ~2-5 seconds for direct download - **Network overhead**: Same (one download either way) - **Storage**: Client-side only (no redundant server storage) ### Comparison with Direct ArxivClient | Feature | MCPArxivClient (with fallback) | Direct ArxivClient | |---------|-------------------------------|-------------------| | Search via MCP | ✓ | ✗ | | Download via MCP | Tries first | ✗ | | Direct download | Fallback | Primary | | Remote MCP server | ✓ | N/A | | File storage | Client-side | Client-side | | Reliability | High (dual method) | High | ## Future Enhancements If MCP server capabilities expand, possible improvements: 1. **File retrieval tool**: MCP server adds `get_file(paper_id)` tool 2. **Streaming transfer**: MCP response includes base64-encoded PDF 3. **Shared storage**: Configure MCP server to write to shared filesystem 4. **Batch downloads**: Optimize multi-paper downloads For now, the fallback solution provides robust, reliable downloads without requiring MCP server changes. ## Files Modified 1. `utils/mcp_arxiv_client.py` - Core client with fallback logic 2. `test_mcp_diagnostic.py` - New diagnostic script 3. `MCP_FIX_DOCUMENTATION.md` - This document ## Testing Run the test suite to verify the fix: ```bash # Test MCP client pytest tests/test_mcp_arxiv_client.py -v # Run diagnostic python test_mcp_diagnostic.py # Full integration test python app.py # Then use the Gradio UI to analyze papers with MCP enabled ``` ## Summary The fix ensures **reliable PDF downloads** by combining MCP capabilities with direct arXiv fallback: - ✅ **Preserves MCP functionality** for servers that work correctly - ✅ **Automatic fallback** when MCP fails or files aren't accessible - ✅ **No configuration changes** required - ✅ **Better diagnostics** via tool discovery - ✅ **Comprehensive logging** for troubleshooting - ✅ **Zero breaking changes** to existing code The system now works reliably with **remote MCP servers**, **local servers**, or **no MCP at all**.