LlamaParse is an API by LlamaIndex to parse and represent files for efficient retrieval and context augmentation.
When selecting the llamaParse provider option, ingestion will redirect to get text representation of the associated documents; you need an API Key to use the service (check pricing and usage).
The following parameters are available:
| Parameter |
Description |
| apiKey |
Required. Obtain it from the LlamaParse site following the steps in Get an API key |
| resultType |
Output format. Options: - markdown (default) - text - json |
| splitByPage |
Document splitting option: - true (default): Split document text by pages - false: Keep document as a single text |
| fastMode |
Parsing speed option: - true: Bypass reconstruction, significantly accelerating parsing - false (default): Standard parsing speed |
| targetPages |
Comma-separated list of pages to extract. Default: all pages (numbered from 0) |
| language |
Document language. Default: en. For multiple languages, repeat this parameter for each language. See documentation for other options. |
| invalidateCache |
Cache invalidation: - true: Invalidate cache - false (default): Use existing cache |
| doNotCache |
Caching option: - true: Do not cache results - false (default): Allow caching |
- pageNumber parameter is not assigned when ingesting PDF files.
- This options is specific for RAG Assistants
The following shows how to use the provider options:
- Markdown format, first page only:
curl -X POST "$BASE_URL/v1/search/profile/{name}/document" \
-H "Authorization: Bearer $SAIA_PROJECT_APITOKEN" \
-H "Content-Type: multipart/form-data" \
--form 'file=@"/C:/temp/SampleFile.pdf"' \
--form 'provider="llamaParse"' \
--form 'format="markdown"' \
--form 'targetPages="0"' \
--form 'splitByPage="true"' \
--form 'apiKey=""'
- Text format without splitting, two languages:
curl -X POST "$BASE_URL/v1/search/profile/{name}/document" \
-H "Authorization: Bearer $SAIA_PROJECT_APITOKEN" \
-H "Content-Type: multipart/form-data" \
--form 'file=@"/C:/temp/SampleFile.pdf"' \
--form 'provider="llamaParse"' \
--form 'format="text"' \
--form 'splitByPage="false"' \
--form 'language="en"' \
--form 'language="es"'
Note: The "name" in the URL refers to the associated RAG Assistant identifier.