chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
This commit is contained in:
@@ -21,7 +21,9 @@ PROVIDER_MODELS = {
|
||||
|
||||
|
||||
# Chunk token threshold
|
||||
CHUNK_TOKEN_THRESHOLD = 1000
|
||||
CHUNK_TOKEN_THRESHOLD = 500
|
||||
OVERLAP_RATE = 0.1
|
||||
WORD_TOKEN_RATE = 1.3
|
||||
|
||||
# Threshold for the minimum number of word in a HTML tag to be considered
|
||||
MIN_WORD_THRESHOLD = 5
|
||||
MIN_WORD_THRESHOLD = 1
|
||||
|
||||
Reference in New Issue
Block a user