Skip to main content

PyPDFium2TocLoader

This notebook provides a quick overview for getting started with PyPDFium2Toc document loader. For detailed documentation of all PyPDFium2TocLoader features and configurations head to the API reference.

Overviewโ€‹

Integration detailsโ€‹

ClassPackageLocalSerializableJS support
PyPDFium2TocLoaderlangchain_communityโœ…โŒโŒ

Loader featuresโ€‹

SourceDocument Lazy LoadingNative Async Support
PyPDFium2TocLoaderโœ…โŒ

Setupโ€‹

To access PyPDFium2 document loader you'll need to install the langchain-community integration package.

Credentialsโ€‹

No credentials are needed.

# import getpass
# import os

# os.environ["PYPDFIUM2TOC_API_KEY"] = getpass.getpass("Enter your PyPDFium2Toc API key: ")

If you want to get automated best in-class tracing of your model calls you can also set your LangSmith API key by uncommenting below:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installationโ€‹

Install langchain_community.

%pip install -qU langchain_community

Initializationโ€‹

Now we can instantiate our model object and load documents:

  • TODO: Update model instantiation with relevant params.
from langchain_community.document_loaders import PyPDFium2TocLoader

file_path = "./example_data/sample_book.pdf"

loader = PyPDFium2TocLoader(file_path)
API Reference:PyPDFium2TocLoader
## Load
docs = loader.load()
docs[6]
print(docs[6].metadata)
print(docs[6].page_content)

Lazy Loadโ€‹

  • TODO: Run cells to show lazy loading capabilities. Delete if lazy loading is not implemented.
page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:
# do some paged operation, e.g.
# index.upsert(page)

page = []

TODO: Any functionality specific to this document loaderโ€‹

E.g. using specific configs for different loading behavior. Delete if not relevant.

API referenceโ€‹

For detailed documentation of all PyPDFium2TocLoader features and configurations head to the API reference: https://python.langchain.com/v0.2/api_reference/community/document_loaders/langchain_community.document_loaders.langchain_pypdfium2_toc_loader.PyPDFium2TocLoader.html


Was this page helpful?