Strategy for generating deterministic document IDs. Values: URL: hash(page_url + chunk_index) - stable across re-crawls POSITION: hash(seed_url + page_index + chunk_index) - order-based CONTENT: hash(content) - deduplicates identical content
-
URL(value:'url') -
POSITION(value:'position') -
CONTENT(value:'content')