Not getting great results, any suggestions?

#5
by mattpryor - opened

Hi, here are the results of a user query and a couple of test sentences in my project:

query = "What did Donald A Wollheim do in 1937?"

0.3258288502693176 (1873-1962) US journalist and author, whose The Great War of 1938 (September 1918 Everybody's Magazine; 1918 chap) predicts
with unusual accuracy the onset of World War Two, though its propagandist thesis for readers in September 1918
– that Germany would take advantage of any weak peace terms laid down after its coming defeat in World War One –
was very wide of the mark.
0.2392204999923706 The commonly used acronym for the Fantasy Amateur Press Association, formed in 1937 in the USA by John B Michel and Donald A Wollheim
to facilitate distribution on an APA basis of Fanzines published by and for members; it was the first of many such groups in sf Fandom
(created in the pattern of older non-fan "amateur journalism" or "ajay" APAs which the founders had learned of from H P Lovecraft),
and still continues.

Hello, I do understand your pain, had the same couple of months ago.
You should try different types of chunking of your text; it's important to split the information correctly according to your use case. If you have many big paragraphs and are searching for a specific one, embed each sentence and index it to the paragraph. Furthermore, when you execute the query, the similarities results will improve because you are now searching on a more granular level. Later, you only need to retrieve the paragraph indexed to that sentence. This is just one strategy, there are plenty more, like agentic chunking etc...
Hope this helped,
Yuriy

This comment has been hidden

Sign up or log in to comment