HATCorpus BERT Authorship Classifier
Model Description
This model is a BERT-based binary text classifier trained to distinguish between human-written and AI-generated English text.
- Base model: bert-base-uncased
- Task: Binary text classification
- Labels:
0โ Human-written1โ AI-generated
The model was trained on HATCorpus, a curated dataset of human and AI-authored text.
Intended Use
This model is intended for:
- Research on AI-generated text detection
- Benchmarking authorship classifiers
- Educational and exploratory use
It is not intended for:
- Surveillance or enforcement
- Determining individual authorship
- High-stakes automated decisions
Training Data
- Dataset: HATCorpus
- Sources: Wikipedia, Project Gutenberg, AI-generated text
- Language: English
- Split: Train / Validation
For dataset details, see:
๐ https://huggingface.co/datasets/ky1916/HATCorpus
Training Details
- Architecture: BERT-base
- Optimizer: AdamW
- Loss: Cross-entropy
- Max sequence length: 512
- Batch size: 8
- Epochs: 3
Evaluation
| Metric | Value |
|---|---|
| Accuracy | 93.58% |
| Precision | 88.04% |
| Recall | 96.43% |
| F1-score | 92.05% |
(Evaluated against dataset at https://www.kaggle.com/datasets/shanegerami/ai-vs-human-text)
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("your-username/hatcorpus-bert-authorship")
model = AutoModelForSequenceClassification.from_pretrained("your-username/hatcorpus-bert-authorship")
text = "This is an example sentence."
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
print("AI-generated" if prediction == 1 else "Human-written")
Limitations
- Performance may degrade on very short text
- Model may rely on stylistic cues rather than semantic understanding
- Does not generalize to all LLMs or writing styles
Citation
If you use this model in your research, please cite both the model and dataset:
@model{hatcorpus_bert_2025,
title = {HATCorpus BERT Authorship Classifier},
year = {2025},
publisher = {Hugging Face},
note = {Fine-tuned from bert-base-uncased for human vs AI text classification}
}
- Downloads last month
- 12
Model tree for ky1916/hatcorpus-bert-authorship
Base model
google-bert/bert-base-uncased