使用ChatGPT总结长PDF文档

30 Jan 2024

2024年1月30日

Open In Colab

▂▂▂▂▂▂▂▂▂▂▂▂

▂▂▂▂▂▂▂▂▂▂▂▂

GitHub

A friend of mine was taking a college course in political science with a ton of assigned reading material, and found that ChatGPT could produce helpful summaries (and in case you’re wondering, the summaries are intended as an additional learning aid, rather than a replacement for doing the reading 😜).

我有一个朋友在上政治科学的大学课程,阅读材料很多,发现 ChatGPT 可以生成有用的摘要(如果你在想,摘要是作为额外的学习辅助,而不是替代阅读的内容 🤒)。

There were a few challenges to trying to use ChatGPT for this, though:

不过,尝试使用 ChatGPT 来做这个有一些挑战:

  • The reading materials are in the form of PDFs, and there are just too many (39! 😳) to do this manually.
  • 阅读材料以 PDF 形式存在,数量实在太多(39!😳)无法手动处理。
  • Most of the readings are too long to fit into ChatGPT in a single pass.
  • 大多数阅读材料太长,无法在一次中适应ChatGPT。
  • Some of the PDFs are scans (or even just photos!) of pages from books, and none of the text is selectable.
  • 一些 PDF 是书页的扫描(甚至只是照片!),并且没有任何文本可以选择。
  • Even for the PDFs which do have selectable text, copying and pasting it into ChatGPT isn’t trivial.
  • 即使是那些有可选择文本的PDF,将其复制并粘贴到ChatGPT中也并不简单。

So, I created this Notebook to automate the process and summarize all 39 of the PDFs assigned for the class, and it sounds like they were really helpful!

所以,我创建了这个Notebook来自动化这个过程,并总结所有39个分配给课堂的PDF,听起来它们真的很有帮助!

This Notebook is intended both as a relatively polished tool for completing this task, and as a tutorial and example code for working on this “summarization” problem yourself. I’m sure you can improve on it by experimenting with various details of the process!

这个 Notebook 既是一个相对完善的工具,用于完成这个任务,也是一个教程和示例代码,供您自己处理这个“摘要”问题。我相信通过尝试过程中的各种细节,您可以对此进行改进!

Note: I think the biggest caveat to this Notebook as a practical tool is that it does rely on OpenAI’s interface, which means you’ll need to do some setup work on OpenAI’s website in order to fully run it. Sorry!

注意:我认为这个笔记本作为实用工具的最大警告是它 确实 依赖于 OpenAI 的接口,这意味着您需要在 OpenAI 的网站上进行一些设置工作才能完全运行它。抱歉!

i. Text Sources

i. 文本来源

Part 1 of this Notebook turns all of the PDFs into “plain text” .txt files. The PyMuPDF library has everything we need for ...

开通本站会员,查看完整译文。

inicio - Wiki
Copyright © 2011-2025 iteam. Current version is 2.147.1. UTC+08:00, 2025-11-08 09:58
浙ICP备14020137号-1 $mapa de visitantes$