Update README.md by emphasizing the usage of the LLM inference engine.
#8
by
QipengGuo
- opened
README.md
CHANGED
|
@@ -60,6 +60,9 @@ temperature = 0.8
|
|
| 60 |
|
| 61 |
### Serving
|
| 62 |
|
|
|
|
|
|
|
|
|
|
| 63 |
Intern-S1-Pro can be deployed using any of the following LLM inference frameworks:
|
| 64 |
|
| 65 |
- LMDeploy
|
|
|
|
| 60 |
|
| 61 |
### Serving
|
| 62 |
|
| 63 |
+
> [!IMPORTANT]
|
| 64 |
+
> Running a trillion-parameter model using the native Hugging Face forward method is challenging. We strongly recommend using an LLM inference engine (such as LMDeploy, vLLM, or sglang) to host Intern-S1-Pro and accessing the model via API.
|
| 65 |
+
|
| 66 |
Intern-S1-Pro can be deployed using any of the following LLM inference frameworks:
|
| 67 |
|
| 68 |
- LMDeploy
|