A newer version of this model is available: CrabInHoney/urlbert-tiny-v3-phishing-classifier

This is a very small version of BERT, designed to categorize links into phishing and non-phishing links

An updated, lighter version of the old classification model for URL analysis

Old version: https://huggingface.co/CrabInHoney/urlbert-tiny-v1-phishing-classifier

Val score: 0.9622

Model size

3.7M params

Tensor type

F32

Dataset (urls.json only)

Example:

from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Используемое устройство: {device}")

model_path = "./urlbert-tiny-v2-phishing-classifier"

tokenizer = BertTokenizerFast.from_pretrained(model_path)

model = BertForSequenceClassification.from_pretrained(model_path)
model.to(device)

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    return_all_scores=True
)

test_urls = [
    "huggingface.co/",
    "p64.hu991ngface.co.com.ru/"
]

for url in test_urls:
    results = classifier(url)
    print(f"\nURL: {url}")
    for result in results[0]:
        label = result['label']
        score = result['score']
        print(f"Класс: {label}, вероятность: {score:.4f}")
        

Output:

Используемое устройство: cuda

URL: huggingface.co/

Класс: good, вероятность: 0.8515

Класс: phish, вероятность: 0.1485

URL: p64.hu991ngface.co.com.ru/

Класс: good, вероятность: 0.0289

Класс: phish, вероятность: 0.9711

License

MIT

Downloads last month
201
Safetensors
Model size
3.7M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for CrabInHoney/urlbert-tiny-v2-phishing-classifier

Finetuned
(1)
this model

Dataset used to train CrabInHoney/urlbert-tiny-v2-phishing-classifier

Collection including CrabInHoney/urlbert-tiny-v2-phishing-classifier