module Memo::Chunking

Overview

Text chunking for semantic search

Splits large text into semantically meaningful chunks based on configurable limits:

Range-based approach:

Extended Modules

Defined in:

memo/chunking.cr

Instance Method Summary

Instance Method Detail

def chunk_text(text : String, config : Config::Chunking) : Array(Tuple(String, Int32, Int32)) #

Chunk text into segments based on configuration

Returns array of tuples: {chunk_text, offset, size}

  • chunk_text: Exact slice from original text (text[offset, size])
  • offset: Character position in original text (0-indexed)
  • size: Character length of chunk

SQLite usage: SUBSTR(content, offset + 1, size) returns chunk_text exactly


[View source]
def estimate_tokens(text : String, tokens_per_byte : Float64 = 0.25) : Int32 #

Estimate token count using tokens_per_byte ratio


[View source]