Purpose -The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documentse-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus. Design/methodology/approach -The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique. Findings -The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mutual information. It also demonstrates that discourse features play important roles among textual features, especially for large documents such as e-books. Research limitations/implications -Automatically extracted subtopic features cannot be directly entered into FS process but requires control of the threshold. Practical implications -The proposed technique has demonstrated the promised application of using discourse analysis to enhance the classification of large digital documentse-books as against to conventional techniques. Originality/value -A new FS technique is proposed which can inspect the narrative structure of large documents and it is new to the text classification domain. The other contribution is that it inspires the consideration of discourse information in future text analysis, by providing more evidences through evaluation of the results. The proposed system can be integrated into other library management systems.2 PROG 49,1 digitalized format, which provides several benefits to readers, including shortened publication cycles, faster distribution channels that permit the wider propagation of timely information, and friendly visualization to display deliberate content of articles. As regarding the essence of content, e-book is different from other type of textual materials because it usually contains lengthy content which easily leads to higher feature dimension in the perspective of term-level analysis. As such, the book may consist of a variety of themes in the text stream, which are distributed and shifted among sentences or paragraphs in turns of the hidden subtopic to stretch the expression of the main topic (Hearst, 1997;Ridel and Bieman, 2012). Such phenomena mostly addressed in discourse analysis raises interesting issues to text processing problems for which an innovative strategy is needed as exploring the high-level linguistic attributes from e-book.The properties sustaining the lexical coherence of topic imply the flow of discourse information, which is valuable to be further investigated an...