2022
DOI: 10.1109/access.2022.3208921
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning-Based Detection of Inappropriate Speech Content for Film Censorship

Abstract: Audible content has become an effective tool for shaping one's personality and character due to the ease of accessibility to a huge audible content that could be an independent audio files or an audio of online videos, movies, and television programs. There is a huge necessity to filter inappropriate audible content of the easily accessible videos and films that are likely to contain an inappropriate speech content. With this in view, all the broadcasting and online video/audio platform companies hire a lot of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 67 publications
0
1
0
Order By: Relevance
“…Specifically, this research employs Mel-Frequency Cepstral Coefficients (MFCCs) as features, which are derived from the original audio file via windowing, discrete Fourier transform (DFT), logarithm of magnitude, frequency warping on the Mel scale, and inverse discrete consine transform (IDCT) of the log filterbank energies. Finding the most probable word sequences requires an audio model, a language model, and a lexicon that converts written words into phonetic ones [6][7][8][9]. For this decoding task, the Viterbi algorithm is frequently utilised in hybrid ASR models.…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, this research employs Mel-Frequency Cepstral Coefficients (MFCCs) as features, which are derived from the original audio file via windowing, discrete Fourier transform (DFT), logarithm of magnitude, frequency warping on the Mel scale, and inverse discrete consine transform (IDCT) of the log filterbank energies. Finding the most probable word sequences requires an audio model, a language model, and a lexicon that converts written words into phonetic ones [6][7][8][9]. For this decoding task, the Viterbi algorithm is frequently utilised in hybrid ASR models.…”
Section: Introductionmentioning
confidence: 99%