Summpy: development of an optimized summarization framework for thesis into structured imrad format utilizing the bart large language model and textrank algorithms for enhanced academic writing and research efficiency/
Armand Angelo C. Barrios, Sophia Mer C. Enriquez, Almira Jill O. Garcia, Janna Rose V. Herrera, and Andrew R. Oloroso.--
- Manila: Technological University of the Philippines, 2025.
- ix, 142pages: 29cm.
Bachelor's thesis
College of Science.--
Includes bibliographic references and index.
This study presents SummPy, a web-based application designed to generate structured summaries of academic theses using the IMRaD format that includes Introduction, Methods, Results, and Discussion. With the increasing volume and complexity of academic research, students and professionals often face challenges in efficiently digesting and summarizing lengthy documents. SummPy addresses this by leveraging advanced Natural Language Processing (NLP) techniques and Large Language Models (LLMs), specifically BART for abstractive summarization and TextRank for extractive summarization. The system processes the uploaded PDF files of the users, the system uses hierarchical multi-threading for efficiency, and evaluates the coherence and accuracy of generated summaries using DistilBERT for semantic similarity analysis. The project followed a systematic methodology involving document segmentation, parallel processing, summarization, and evaluation. Results demonstrate that SummPy effectively produces accurate, well-structured summaries that maintain the original research's context and key insights. The tool significantly reduces the time and effort required for manual summarization, offering a practical solution for academic and professional use. Evaluation metrics indicate high usability, security, and portability, affirming its adaptability across various environments. In conclusion, SummPy emerges as a valuable resource for enhancing academic productivity, understanding, and knowledge dissemination.
IMRAD format Abstractive summarization Semantic similarity analysis