AI preprocessing: Extract text from PDF files and convert them to Markdown files

name
Extract text from PDF
tag
AI preprocessing/Claude/Generative AI
Connector used
REST Connector
API
API version: 2023-06-01
AI preprocessing: Extract text from PDF files and convert them to Markdown files

This is a HULFT Square application that makes it easy to prepare RAG data for use in generative AI.
This application uses Claude, an LLM, to extract text from PDF files and output it as a Markdown file.

Script Details

Convert PDF files to Markdown files

ScannedPdf_To_Markdown_Claude_convert

Checking the limit value for the number of tokens required for PDF file conversion

ScannedPdf_To_Markdown_Claude_validate_limits

Get the number of pages in a PDF file

ScannedPdf_To_Markdown_Claude_get_max_page

Extract text from each page of a PDF file and output it to a Markdown file.

ScannedPdf_To_Markdown_Claude_convert_to_markdown

How to install and use it