A deliberate falsehood intentionally fabricated to appear as the truth, or often called as hoax (hocus to trick) has been increasing at an alarming rate. This situation may cause restlessness/anxiety and panic in society. Even though hoaxes have no effect on threats, however, new perceptions can be spread that they can affect both the social and political conditions. Imagery blown from hoaxes can bring negative effects and intervene state policies that may decrease the economy. An early detection on hoaxes helps the Government to reduce and even eliminate the spread. There are some system that filter hoaxes based on title and also from voting processes from searching processes in a search engine. This research develops Indonesian hoax filter based on text vector representation based on Term Frequency and Document Frequency as well as classification techniques. There are several classification techniques and for this research, Support Vector Machine and Stochastic Gradient Descent are chosen. Support Vector Machine divides a word vector using linear function and Stochastic Gradient Descent divides a word vector using nonlinear function. SVM and SGD are chosen because the characteristic of text classification includes multidimensional matrixes. Each word in news articles can be modeled as feature and with Linear SVC and SGD, the feature of word vector can be reduced into two dimensions and can be separated using linear and non-linear lines. The highest accuracy obtained from SGD classifier using modified-huber is 86% over 100 hoax and 100 nonhoax websites which are randomly chosen outside dataset which are used in the training process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.