Complexity Algorithm Analysis for Edit Distance

Maarif, Haris Al Qodri; Akmeliawati, Rini; Htike, Zaw Zaw; Gunawan, Teddy Surya

doi:10.1109/iccce.2014.48

Cited by 6 publications

(3 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Konsep dari Levenshtein distance yaitu mencari jumlah minimum point mutation yang diperlukan untuk merubah suatu string ke string yang lain. Point mutation tersebut adalah insertion, subtitution dan deletion [13], [14].…”

Section: B Levenshtein Distanceunclassified

Deteksi Plagiarisme Menggunakan Algoritma Levenshtein Distance

Yuslena¹,

Khatimi²,

Fajrin³

2021

JTekInfULM

View full text Add to dashboard Cite

Deteksi kesamaan dokumen untuk sistem plagiarisme termasuk dalam riset Natural Language Processing dalam bidang kecerdasan buatan. Plagiarisme banyak terjadi pada dokumen di lingkungan akademisi, begitupun yang terjadi pada PSMTS ULM. Deteksi plagiarisme diperlukan agar menjaga orisinalitas dari hasil tesis mahasiswa. Ada beberapa algoritma yang digunakan peneliti sebelumnya untuk mendeteksi plagiarisme. Namun, algoritma yang diperlukan adalah algoritma yang cepat karena yang sedang terjadi pada tesis mahasiswa relatif memiliki string yang banyak dan data tesis yang akan terus bertambah setiap saatnya mengakibatkan memperlambat kinerja algoritma. algoritma Levenshtein Distance mengungguli algoritma adaptif. Proses preprocessing yang terdiri dari metode case folding, tokenizing, stopword removal, dan stemming yang dapat melakukan estimasi proses sistem menjadi lebih cepat. Algoritma Levenshtein Distence dapat mendeteksi plagiasi dengan baik dan rata-rata lama proses sistem tanpa dilakukan preprocessing adalah 6,283 ms dan dengan preprocessing adalah 4,920 ms.

show abstract

Section: B Levenshtein Distanceunclassified

Deteksi Plagiarisme Menggunakan Algoritma Levenshtein Distance

Yuslena¹,

Khatimi²,

Fajrin³

2021

JTekInfULM

View full text Add to dashboard Cite

show abstract

“…The driving force towards incorporating Levenshtein distance as the formula to normalise words stem from Maarif et. al aimed to determine the complexity algorithm of each of the sub-algorithms that branched from the edit distance tree such as the Levenshtein Distance (LD), the Jaro Winkler Distance (JWD), the Mahalanobis Distance, the Soundex Distance and the N-Gram Distance [3]. The importance of the study was to find out which edit distance was best suited for processing longer sentence comparison in correcting grammar in a Sign Language Synthesizer as proposed by the study.…”

Section: Levenshtein Edit Distancementioning

confidence: 99%

Exploring Edit Distance for Normalising Out-of-Vocabulary Malay Words on Social Media

2019

View full text Add to dashboard Cite

Users interact using short-formed words and abbreviations and this results in a message full of noisy words that are not recognized by the system's knowledge. The aim of this research is to overcome the limitations that still bar the progression of normalizing Malay noisy words from social media platforms. The testing data gathered is 25,000; 15,000 Tweets from Twitter and 10,000 comments from Facebook respectively. Pre-processing steps were carried out to clean the entire dataset which consists of unique 179,786 words. 36,587 out-of-vocabulary (OOV) Malay terms were then extracted and checked against an in- vocabulary (IV) Malay corpus using the Levenshtein edit distance formula and character manipulation rules. The resultant output is 3,964 unique IV Malay words. Based on the results, the usage of edit distance and rules can be further improved to elevate the normalisation of the ever changing colloquial terms of the Malay language.

show abstract

“…It is a string metric for measuring the difference between two sequences. Other popular measures of edit distance, which are calculated using a different set of allowable edit operations are: 1) the Damerau-Levenshtein (DL) distance allows insertion, deletion, substitution, and the transposition of two adjacent characters [7]; 2) the Longest Common Subsequence (LCS) distance allows only insertion and deletion, not substitution [8]; 3) the Hamming Distance (HD) allows only substitution, hence, it only applies to strings of the same length [9]; and 4) the Jaro distance allows only transposition [10]. These edit distance algorithms can also be computed between two longer strings, but the cost to compute it, which is roughly proportional to the product of the two string lengths, makes this impractical.…”

Section: Modifications and Enhancements Related To Damerau-levenshtein Distancementioning

confidence: 99%

Optimization of Edit Distance Algorithm for Sanctions Screening Risk Score Assessment

Nino¹

2019

IJATCSE

View full text Add to dashboard Cite

Evaluating Risk Score Assessment for sanctions screening is necessary to calculate and gauge the risk rate of data elements involved during screening. It includes string matching process in reviewing sanctions lists to check if any investor in a fund is involved in fraud by matching the investor information (as Stop Descriptor) with the Sanctions List which contains the names of individuals who are known to be involved in financial crime or terrorism. This paper will present the inherent capability of Edit Distance algorithm or the Damerau-Levenshtein (DL) Distance algorithm to address many common misspellings and typos in string matching through insertion, deletion, transposition and substitution which are considered as a significant component of fuzzy possible success rating used in Sanction Screening. The paper also aims to optimize the DL Distance Algorithm by applying the theories of phonetic algorithm which expected to provide big impact on speed performance problem of computing the edit distance of two longer strings.

show abstract

Complexity Algorithm Analysis for Edit Distance

Cited by 6 publications

References 4 publications

Deteksi Plagiarisme Menggunakan Algoritma Levenshtein Distance

Deteksi Plagiarisme Menggunakan Algoritma Levenshtein Distance

Exploring Edit Distance for Normalising Out-of-Vocabulary Malay Words on Social Media

Optimization of Edit Distance Algorithm for Sanctions Screening Risk Score Assessment

Contact Info

Product

Resources

About