Skip to content

Latest commit

 

History

History
48 lines (41 loc) · 3.07 KB

local_files.md

File metadata and controls

48 lines (41 loc) · 3.07 KB
% python ask.py -i local -c -q "How does Ask.py work?"
2024-11-20 10:00:09,335 - INFO - Initializing converter ...
2024-11-20 10:00:09,335 - INFO - ✅ Successfully initialized Docling.
2024-11-20 10:00:09,335 - INFO - Initializing chunker ...
2024-11-20 10:00:09,550 - INFO - ✅ Successfully initialized Chonkie.
2024-11-20 10:00:09,850 - INFO - Initializing database ...
2024-11-20 10:00:09,933 - INFO - ✅ Successfully initialized DuckDB.
2024-11-20 10:00:09,933 - INFO - Processing the local data directory ...
2024-11-20 10:00:09,933 - INFO - Processing README.pdf ...
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 11781.75it/s]
2024-11-20 10:00:29,629 - INFO - ✅ Finished processing README.pdf.
2024-11-20 10:00:29,629 - INFO - Chunking the text ...
2024-11-20 10:00:29,639 - INFO - ✅ Generated 2 chunks ...
2024-11-20 10:00:29,639 - INFO - Saving 2 chunks to DB ...
2024-11-20 10:00:29,681 - INFO - Embedding 1 batches of chunks ...
2024-11-20 10:00:30,337 - INFO - ✅ Finished embedding.
2024-11-20 10:00:30,423 - INFO - ✅ Created the vector index ...
2024-11-20 10:00:30,483 - INFO - ✅ Created the full text search index ...
2024-11-20 10:00:30,483 - INFO - ✅ Successfully embedded and saved chunks to DB.
2024-11-20 10:00:30,483 - INFO - Querying the vector DB to get context ...
2024-11-20 10:00:30,773 - INFO - Running full-text search ...
2024-11-20 10:00:30,796 - INFO - ✅ Got 2 matched chunks.
2024-11-20 10:00:30,797 - INFO - Running inference with context ...
2024-11-20 10:00:34,939 - INFO - ✅ Finished inference API call.
2024-11-20 10:00:34,939 - INFO - Generating output ...
# Answer

Ask.py is a Python program designed to implement a search-extract-summarize flow, similar to AI search engines like Perplexity. It can be run through a command line interface or a GradIO user interface and allows for flexibility in controlling output and search behaviors[1].

When a query is executed, Ask.py performs the following steps:

1. Searches Google for the top 10 web pages related to the query.
2. Crawls and scrapes the content of these pages.
3. Breaks down the scraped text into chunks and saves them in a vector database.
4. Conducts a vector search with the initial query to identify the top 10 matched text chunks.
5. Optionally integrates full-text search results and uses a reranker to refine the results.
6. Utilizes the selected chunks as context to query a language model (LLM) to generate a comprehensive answer.
7. Outputs the answer along with references to the sources[1].

Moreover, the program allows various configurations such as date restrictions, site targeting, output language, and output length. It can also scrape specified URL lists instead of performing a web search, making it highly versatile for search and data extraction tasks[2].


# References

[1] file:///Users/feng/work/github/ask.py/data/README.pdf
[2] file:///Users/feng/work/github/ask.py/data/README.pdf