PyPDFium2TocLoader
This notebook provides a quick overview for getting started with PyPDFium2Toc document loader. For detailed documentation of all PyPDFium2TocLoader features and configurations head to the API reference.
Overviewโ
Integration detailsโ
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
PyPDFium2TocLoader | langchain_community | โ | โ | โ |
Loader featuresโ
Source | Document Lazy Loading | Native Async Support |
---|---|---|
PyPDFium2TocLoader | โ | โ |
Setupโ
To access PyPDFium2 document loader you'll need to install the langchain-community
integration package.
Credentialsโ
No credentials are needed.
# import getpass
# import os
# os.environ["PYPDFIUM2TOC_API_KEY"] = getpass.getpass("Enter your PyPDFium2Toc API key: ")
If you want to get automated best in-class tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installationโ
Install langchain_community.
%pip install -qU langchain_community
Initializationโ
Now we can instantiate our model object and load documents:
- TODO: Update model instantiation with relevant params.
from langchain_community.document_loaders import PyPDFium2TocLoader
file_path = "./example_data/sample_book.pdf"
loader = PyPDFium2TocLoader(file_path)
## Load
docs = loader.load()
docs[6]
print(docs[6].metadata)
print(docs[6].page_content)
Lazy Loadโ
- TODO: Run cells to show lazy loading capabilities. Delete if lazy loading is not implemented.
page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:
# do some paged operation, e.g.
# index.upsert(page)
page = []
TODO: Any functionality specific to this document loaderโ
E.g. using specific configs for different loading behavior. Delete if not relevant.
API referenceโ
For detailed documentation of all PyPDFium2TocLoader features and configurations head to the API reference: https://python.langchain.com/v0.2/api_reference/community/document_loaders/langchain_community.document_loaders.langchain_pypdfium2_toc_loader.PyPDFium2TocLoader.html
Relatedโ
- Document loader conceptual guide
- Document loader how-to guides