Lightcurve Variable Star Classifier (LightGBM)
A lightweight and ultra-fast LightGBM classifier for astronomical variable star classification based on light curve features.
Model Description
This model classifies variable stars into 10 categories using 15 extracted features from light curves. It was trained on approximately 1.1 million samples from multiple sky surveys and achieves ~92.5% accuracy through 5-fold cross-validation.
Key Features
- Fast inference: ~500,000+ samples/sec on CPU
- Lightweight: ~250MB model size (vs 20GB+ for RandomForest alternatives)
- Multi-survey training: ZTF, ASAS-SN, Gaia DR2/DR3, and more
Training Data Sources
The model was trained on light curves from multiple astronomical surveys:
- ZTF (Zwicky Transient Facility)
- ASAS-SN (All-Sky Automated Survey for Supernovae)
- Gaia DR2/DR3
- And other public variable star catalogs
Supported Classes
| Class | Description |
|---|---|
| Non-var | Non-variable star |
| ROT | Rotational variable |
| EA | Algol-type eclipsing binary |
| EW | W Ursae Majoris-type eclipsing binary |
| CEP | Cepheid variable |
| DSCT | Delta Scuti variable |
| RRAB | RR Lyrae type ab |
| RRC | RR Lyrae type c |
| M | Mira variable |
| SR | Semi-regular variable |
Input Features
The model expects 15 features extracted from light curves:
PeriodLS- Lomb-Scargle periodMean- Mean magnitudeRcs- Range of cumulative sumPsi_eta- Psi-eta statisticStetsonK_AC- Stetson K with autocorrelationGskew- Skewness of magnitude differencesPsi_CS- Psi cumulative sumSkew- SkewnessFreq1_harmonics_amplitude_1- First harmonic amplitude (1st)Eta_e- Eta-e variability indexLinearTrend- Linear trend coefficientFreq1_harmonics_amplitude_0- First harmonic amplitude (0th)AndersonDarling- Anderson-Darling statisticMaxSlope- Maximum slopeStetsonK- Stetson K index
Training Details
- Training samples: 1,068,220
- Validation: 5-fold cross-validation
- Mean accuracy: 92.5%
- Cross-validation scores: [0.9237, 0.9255, 0.9259, 0.9214, 0.9278]
Hyperparameters
| Parameter | Value |
|---|---|
| n_estimators | 713 |
| learning_rate | 0.0257 |
| num_leaves | 1023 |
| max_depth | 12 |
| min_child_samples | 200 |
| subsample | 0.752 |
| colsample_bytree | 0.778 |
| class_weight | balanced |
| max_bin | 1023 |
Usage
import pickle
import numpy as np
# Load model and metadata
with open('lgbm_111w_model.pkl', 'rb') as f:
model = pickle.load(f)
with open('metadata.pkl', 'rb') as f:
metadata = pickle.load(f)
# Feature names (must be in this order)
features = metadata['features']
classes = metadata['classes']
# Example: predict a single sample
# X should be a 2D array with shape (n_samples, 15)
X = np.array([[...]]) # Your 15 features here
# Get predicted class
pred_idx = model.predict(X)
pred_class = [classes[i] for i in pred_idx]
# Get prediction probabilities
pred_proba = model.predict_proba(X)
Using with Hugging Face Hub
from huggingface_hub import hf_hub_download
import pickle
# Download model files
model_path = hf_hub_download(repo_id="bestdo77/Lightcurve_lgbm_111w_15_model", filename="lgbm_111w_model.pkl")
metadata_path = hf_hub_download(repo_id="bestdo77/Lightcurve_lgbm_111w_15_model", filename="metadata.pkl")
# Load model
with open(model_path, 'rb') as f:
model = pickle.load(f)
with open(metadata_path, 'rb') as f:
metadata = pickle.load(f)
Inference Speed
Benchmarked on a standard CPU:
| Batch Size | Time | Speed |
|---|---|---|
| 100,000 samples | ~200 ms | ~500,000 samples/sec |
| Single sample | ~2 µs | - |
The model is optimized for batch inference and can process large catalogs efficiently.
Limitations
- The model is trained on specific survey data and may not generalize well to other surveys with different cadences or photometric systems.
- Feature extraction requires sufficient data points in the light curve for reliable feature computation.
- Performance may vary for objects near class boundaries.
Citation
If you use this model in your research, please cite:
@software{lightcurve_lgbm_classifier,
title={Lightcurve Variable Star Classification Model},
author={bestdo77},
year={2026},
url={https://huggingface.co/bestdo77/Lightcurve_lgbm_111w_15_model}
}
License
This model is released under the MIT License.