Product development companies are collecting data in form of Engineering Change Requests for logged design issues and Design Guidelines to accumulate best practices. These documents are rich in unstructured data (e.g., free text) and previous research has pointed out that product developers find current it systems lacking capabilities to accurately retrieve relevant documents with unstructured data. In this research we compare the performance of Search Engine & Natural Language Processing algorithms in order to find fast related documents from two databases with Engineering Change Request and Design Guideline documents. The aim is to turn hours of manual documents searching into seconds by utilizing such algorithms to effectively search for related engineering documents and rank them in order of significance. Domain knowledge experts evaluated the results and it shows that the models applied managed to find relevant documents with up to 90% accuracy of the cases tested. But accuracy varies based on selected algorithm and length of query.
Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.