Master IR (Information Retrieval) Project On Stack-overflow Dataset Indexing by Lucene
stackoverflow dataset size is more than 30GB.
step 1. parse Posts.xml file by FinalXMLParser (I have put a sample of Posts) it gives a txt file as output.
step 2. index txt file by Indexer
step 3. Run Tester, Tester uses Searcher to search the index.