A type of document retriever that splits input documents into smaller chunks while separately storing and preserving the original documents. The small chunks are embedded, then on retrieval, the original "parent" documents are retrieved.

This strikes a balance between better targeted retrieval with small documents and the more context-rich larger documents.

Example

const retriever = new ParentDocumentRetriever({
vectorstore: new MemoryVectorStore(new OpenAIEmbeddings()),
docstore: new InMemoryStore(),
parentSplitter: new RecursiveCharacterTextSplitter({
chunkOverlap: 0,
chunkSize: 500,
}),
childSplitter: new RecursiveCharacterTextSplitter({
chunkOverlap: 0,
chunkSize: 50,
}),
childK: 20,
parentK: 5,
});

const parentDocuments = await getDocuments();
await retriever.addDocuments(parentDocuments);
const retrievedDocs = await retriever.getRelevantDocuments("justice breyer");

Hierarchy

Constructors

Properties

childDocumentRetriever: undefined | VectorStoreRetriever<VectorStore>
docstore: BaseStoreInterface<string, Document<Record<string, any>>>
vectorstore: VectorStore
callbacks?: Callbacks
metadata?: Record<string, unknown>
tags?: string[]
verbose?: boolean
childSplitter: TextSplitter
idKey: string = "doc_id"
childK?: number
parentK?: number
parentSplitter?: TextSplitter

Methods

  • Adds documents to the docstore and vectorstores. If a retriever is provided, it will be used to add documents instead of the vectorstore.

    Parameters

    • docs: Document<Record<string, any>>[]

      The documents to add

    • Optional config: {
          addToDocstore?: boolean;
          ids?: string[];
      }
      • Optional addToDocstore?: boolean

        Boolean of whether to add documents to docstore. This can be false if and only if ids are provided. You may want to set this to False if the documents are already in the docstore and you don't want to re-add them.

      • Optional ids?: string[]

        Optional list of ids for documents. If provided should be the same length as the list of documents. Can provided if parent documents are already in the document store and you don't want to re-add to the docstore. If not provided, random UUIDs will be used as ids.

    Returns Promise<void>

  • Default implementation of batch, which calls invoke N times. Subclasses should override this method if they can batch more efficiently.

    Parameters

    • inputs: string[]

      Array of inputs to each batch call.

    • Optional options: Partial<BaseCallbackConfig> | Partial<BaseCallbackConfig>[]

      Either a single call options object to apply to each batch call or an array for each call.

    • Optional batchOptions: RunnableBatchOptions & {
          returnExceptions?: false;
      }

    Returns Promise<Document<Record<string, any>>[][]>

    An array of RunOutputs, or mixed RunOutputs and errors if batchOptions.returnExceptions is set

  • Parameters

    Returns Promise<(Error | Document<Record<string, any>>[])[]>

  • Parameters

    Returns Promise<(Error | Document<Record<string, any>>[])[]>

  • Main method used to retrieve relevant documents. It takes a query string and an optional configuration object, and returns a promise that resolves to an array of Document objects. This method handles the retrieval process, including starting and ending callbacks, and error handling.

    Parameters

    • query: string

      The query string to retrieve relevant documents for.

    • Optional config: Callbacks | BaseCallbackConfig

      Optional configuration object for the retrieval process.

    Returns Promise<Document<Record<string, any>>[]>

    A promise that resolves to an array of Document objects.

  • Create a new runnable sequence that runs each individual runnable in series, piping the output of one runnable into another runnable or runnable-like.

    Type Parameters

    • NewRunOutput

    Parameters

    • coerceable: RunnableLike<Document<Record<string, any>>[], NewRunOutput>

      A runnable, function, or object whose values are functions or runnables.

    Returns RunnableSequence<string, Exclude<NewRunOutput, Error>>

    A new runnable sequence.

  • Stream output in chunks.

    Parameters

    Returns Promise<IterableReadableStream<Document<Record<string, any>>[]>>

    A readable stream that is also an iterable.

  • Stream all output from a runnable, as reported to the callback system. This includes all inner runs of LLMs, Retrievers, Tools, etc. Output is streamed as Log objects, which include a list of jsonpatch ops that describe how the state of the run has changed in each step, and the final state of the run. The jsonpatch ops can be applied in order to construct state.

    Parameters

    • input: string
    • Optional options: Partial<BaseCallbackConfig>
    • Optional streamOptions: Omit<LogStreamCallbackHandlerInput, "autoClose">

    Returns AsyncGenerator<RunLogPatch, any, unknown>

  • Default implementation of transform, which buffers input and then calls stream. Subclasses should override this method if they can start producing output while input is still being generated.

    Parameters

    Returns AsyncGenerator<Document<Record<string, any>>[], any, unknown>

Generated using TypeDoc