AI preprocessing: Extract text from PDF files and convert them to Markdown files
- name
- Extract text from PDF
- tag
- AI preprocessing/Claude/Generative AI
- Connector used
- REST Connector
- API
- API version: 2023-06-01
This is a HULFT Square application that makes it easy to prepare RAG data for use in generative AI.
This application uses Claude, an LLM, to extract text from PDF files and output it as a Markdown file.
Script Details
Convert PDF files to Markdown files
ScannedPdf_To_Markdown_Claude_convert
Checking the limit value for the number of tokens required for PDF file conversion
ScannedPdf_To_Markdown_Claude_validate_limits
Get the number of pages in a PDF file
ScannedPdf_To_Markdown_Claude_get_max_page
Extract text from each page of a PDF file and output it to a Markdown file.
ScannedPdf_To_Markdown_Claude_convert_to_markdown
How to install and use it