Search results
Oct 23, 2024 · Goals. The goals for this web applications are quite simple: Download YouTube videos and extract their audio.; Transcribe the audio using Deepgram's Speech-to-Text API.; Analyze the title against the transcript using Mistral AI to evaluate the likelihood of clickbait.
- Overview
- Testing the model
- How does it work?
- Train set
- Accuracy of the model
- License
Automatically detect clickbait Youtube videos from their metadata, with a 96% F1 score.
Python script
The predict.py script shows an example of how the clickbait detection is done: Provide the script with the title of the video to analyze and, if known, the number of views, likes, dislikes, and comments. The script will print the model prediction: 1 if the video is probably clickbait, 0 otherwise.
Web app
Alternatively, you can test it with this webapp.
The prediction is made through the following pipeline:
1.Tokenize the title through a custom tokenizer.
2.Embed the title into a vector representation by computing the mean vector representation of the title tokens from a Word2Vec model. If no token is present in the Word2Vec model use the mean vector representation previously computed on the train set (mean-title-embedding).
3.If the number of views, likes, dislikes and comments are known, compute their logarithm. If any value is unknown, replace it with the mean value previously computed on the train set (mean-log-video-views, mean-log-video-likes, mean-log-video-dislikes, mean-log-video-comments).
4.Apply min-max scaling using a scaler previously trained on the train set (min-max-scaler).
5.Use a SVM (svm), previously trained on the train set, to get the prediction: 1 if the video is probably clickbait, 0 otherwise.
Where is the data from?
I hand-picked a list of clickbait channels and a list of non-clickbait channels and then obtained the video metadata of their videos (ordered by views) though the Youtube API. For each channel was used roughly the same number of videos and for the non-clickbait examples were chosen channels from a variety of categories. You can use the already sanitized data by importing with pickle the pandas dataframes clickbait-df, non-clickbait-df. These dataframes contain also some statistical data about the channels, which is not used by the models but could potentially be used for other analysis. The train set which was used to train the Word2Vec model, the min-max scaler, and the SVM is the couple x-train (features), x-test (label).
The SVM model has been trained on more than 28 thousand samples and tested on more than 7 thousand samples.
Best parameters
This project is licensed under the MIT License - see the LICENSE.md file for details
Apply min-max scaling using a scaler previously trained on the train set (min-max-scaler). Use a SVM (svm), previously trained on the train set, to get the prediction: 1 if the video is probably clickbait, 0 otherwise. You can read more on how the models have been trained in the Youtube Clickbait Detector - Training Jupyter Notebook.
we explored visual-centric baits and created a cross-platform clickbait detection model comprising of a stacking framework architecture [17]. Encouraged by the outcome, we decided to foray into detection of video-centric clickbaits. Since, clickbait detection and prevention in videos is a largely unexplored field, the primary aim of the
Mar 23, 2018 · Results indicate that a majority of the typological and linguistic features associated with clickbait in online news headlines are found to be indicative of clickbait in YouTube video titles.
Jan 2, 2021 · For the automatic detection of clickbait, a browser extension has been developed by , which allows users with an option to block clickbait, as well as facilitate with the warning mechanism. The authors of [ 6 ] incorporate several set of handcrafted features like bag-of-words, n-grams, etc., to train the classifier and firstly introduce an automatic clickbait detector.
People also ask
Is YouTube a clickbait platform?
Are YouTube videos clickbait?
Does YouTube recommendation engine consider clickbait?
How do we detect clickbait videos on YouTube?
What is a clickbait problem on YouTube?
What is clickbait prevention & detection model (CPDM) for YouTube videos?
Jan 1, 2023 · We are also using gamma hyperparameter as a comparison with a low gamma value (0.1) because the size of the dataset is quite large. This is done in order to avoid overfitting the model. 3. Results and Discussion The experiment is carried out by using Google Colab as a python IDE platform to run the experiment models.