Lightcurve Variable Star Classifier (LightGBM)

A lightweight and ultra-fast LightGBM classifier for astronomical variable star classification based on light curve features.

Model Description

This model classifies variable stars into 10 categories using 15 extracted features from light curves. It was trained on approximately 1.1 million samples from multiple sky surveys and achieves ~92.5% accuracy through 5-fold cross-validation.

Key Features

  • Fast inference: ~500,000+ samples/sec on CPU
  • Lightweight: ~250MB model size (vs 20GB+ for RandomForest alternatives)
  • Multi-survey training: ZTF, ASAS-SN, Gaia DR2/DR3, and more

Training Data Sources

The model was trained on light curves from multiple astronomical surveys:

  • ZTF (Zwicky Transient Facility)
  • ASAS-SN (All-Sky Automated Survey for Supernovae)
  • Gaia DR2/DR3
  • And other public variable star catalogs

Supported Classes

Class Description
Non-var Non-variable star
ROT Rotational variable
EA Algol-type eclipsing binary
EW W Ursae Majoris-type eclipsing binary
CEP Cepheid variable
DSCT Delta Scuti variable
RRAB RR Lyrae type ab
RRC RR Lyrae type c
M Mira variable
SR Semi-regular variable

Input Features

The model expects 15 features extracted from light curves:

  1. PeriodLS - Lomb-Scargle period
  2. Mean - Mean magnitude
  3. Rcs - Range of cumulative sum
  4. Psi_eta - Psi-eta statistic
  5. StetsonK_AC - Stetson K with autocorrelation
  6. Gskew - Skewness of magnitude differences
  7. Psi_CS - Psi cumulative sum
  8. Skew - Skewness
  9. Freq1_harmonics_amplitude_1 - First harmonic amplitude (1st)
  10. Eta_e - Eta-e variability index
  11. LinearTrend - Linear trend coefficient
  12. Freq1_harmonics_amplitude_0 - First harmonic amplitude (0th)
  13. AndersonDarling - Anderson-Darling statistic
  14. MaxSlope - Maximum slope
  15. StetsonK - Stetson K index

Training Details

  • Training samples: 1,068,220
  • Validation: 5-fold cross-validation
  • Mean accuracy: 92.5%
  • Cross-validation scores: [0.9237, 0.9255, 0.9259, 0.9214, 0.9278]

Hyperparameters

Parameter Value
n_estimators 713
learning_rate 0.0257
num_leaves 1023
max_depth 12
min_child_samples 200
subsample 0.752
colsample_bytree 0.778
class_weight balanced
max_bin 1023

Usage

import pickle
import numpy as np

# Load model and metadata
with open('lgbm_111w_model.pkl', 'rb') as f:
    model = pickle.load(f)

with open('metadata.pkl', 'rb') as f:
    metadata = pickle.load(f)

# Feature names (must be in this order)
features = metadata['features']
classes = metadata['classes']

# Example: predict a single sample
# X should be a 2D array with shape (n_samples, 15)
X = np.array([[...]])  # Your 15 features here

# Get predicted class
pred_idx = model.predict(X)
pred_class = [classes[i] for i in pred_idx]

# Get prediction probabilities
pred_proba = model.predict_proba(X)

Using with Hugging Face Hub

from huggingface_hub import hf_hub_download
import pickle

# Download model files
model_path = hf_hub_download(repo_id="bestdo77/Lightcurve_lgbm_111w_15_model", filename="lgbm_111w_model.pkl")
metadata_path = hf_hub_download(repo_id="bestdo77/Lightcurve_lgbm_111w_15_model", filename="metadata.pkl")

# Load model
with open(model_path, 'rb') as f:
    model = pickle.load(f)

with open(metadata_path, 'rb') as f:
    metadata = pickle.load(f)

Inference Speed

Benchmarked on a standard CPU:

Batch Size Time Speed
100,000 samples ~200 ms ~500,000 samples/sec
Single sample ~2 µs -

The model is optimized for batch inference and can process large catalogs efficiently.

Limitations

  • The model is trained on specific survey data and may not generalize well to other surveys with different cadences or photometric systems.
  • Feature extraction requires sufficient data points in the light curve for reliable feature computation.
  • Performance may vary for objects near class boundaries.

Citation

If you use this model in your research, please cite:

@software{lightcurve_lgbm_classifier,
  title={Lightcurve Variable Star Classification Model},
  author={bestdo77},
  year={2026},
  url={https://huggingface.co/bestdo77/Lightcurve_lgbm_111w_15_model}
}

License

This model is released under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support