Hebrew Binary NLI Classifier for Factuality Checking

Model Description

Fine-tuned dicta-il/neodictabert for binary Natural Language Inference in Hebrew. Detects whether a summary claim contradicts a source article.

Task: Entailment vs Contradiction Detection
Language: Hebrew
Max Context: 4,096 tokens

Performance

  • Accuracy: 96.78%
  • F1 Score: 96.20%

Architecture

  • Base Model: dicta-il/neodictabert
  • Classification Head: Binary (softmax over 2 classes)
  • Input Format: [CLS] source_article [SEP] summary_claim [SEP]
  • Output: Probability distribution over [contradiction, entailment]

Training Configuration

  • Learning Rate: 2e-5
  • Epochs: 2
  • Batch Size: 2 per device (effective: 16 with gradient accumulation)
  • Max Sequence Length: 4,096 tokens
  • Learning Rate Scheduler: Linear
  • Warmup Steps: 500
  • Best Model Selection: Based on eval_f1

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch

model_name = "Amit5674/NLI-hebrew-binary-correctness-metric" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True) model.eval()

Example usage

article = "讬砖专讗诇 讛转讞讬诇讛 讘讛专注砖讛 专讙注 讗讞专讬 讛驻住拽转 讛讗砖. 讛诪诪砖诇讛 讛讜讚讬注讛 注诇 爪注讚讬诐 讞讚砖讬诐..." summary = "讬砖专讗诇 讛转讞讬诇讛 诇讛转专讙砖 专讙注 讗讞专讬 讛驻住拽转 讛讗砖"

Tokenize

inputs = tokenizer( article, summary, return_tensors="pt", padding="max_length", max_length=4096, truncation=True )

Predict

with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits[0] probs = torch.softmax(logits, dim=-1) predicted_class_idx = torch.argmax(probs).item() predicted_class = model.config.id2label[predicted_class_idx] confidence = probs[predicted_class_idx].item()

probabilities = {
    model.config.id2label[i]: float(probs[i].item())
    for i in range(model.config.num_labels)
}

print(f"Prediction: {predicted_class}") print(f"Confidence: {confidence:.4f}") print(f"Probabilities: {probabilities}")For detailed inference examples, see the inference scripts and server API documentation.

Input Format

  • Premise: Source article text (full document)
  • Hypothesis: Summary claim (can be full summary or individual claim)
  • Processing: Binary classification (entailment vs contradiction)

Output Format

  • Prediction: String label ("entailment" or "contradiction")
  • Confidence: Probability of predicted class (0.0 to 1.0)
  • Probabilities: Dictionary with probabilities for both classes:
    • {"entailment": 0.9678, "contradiction": 0.0322}

Use Cases

  • Production Fact-Checking: Fast yes/no contradiction detection for Hebrew summaries
  • Quality Control: Automated validation of summary factuality
  • Batch Processing: Efficient processing of large document-summary pairs
  • Real-Time Validation: Low-latency factuality checking in summary generation pipelines

Limitations

  • Max sequence length: 4,096 tokens (may truncate very long articles)
  • Binary classification: Cannot identify specific error types (use multi-label models for detailed error analysis)
  • Context dependency: Performance may vary with article length and complexity
  • Hebrew-specific: Optimized for Hebrew text; may not generalize to other languages

Citation

@misc{hebrew_binary_nli_classifier, title={Hebrew Binary NLI Classifier for Factuality Checking}, author={Your Name}, year={2025}, publisher={Hugging Face} }

Downloads last month
12
Safetensors
Model size
0.4B params
Tensor type
F32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Amit5674/NLI-hebrew-binary-correctness-metric

Finetuned
(2)
this model