Update README.md by emphasizing the usage of the LLM inference engine.

#8
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -60,6 +60,9 @@ temperature = 0.8
60
 
61
  ### Serving
62
 
 
 
 
63
  Intern-S1-Pro can be deployed using any of the following LLM inference frameworks:
64
 
65
  - LMDeploy
 
60
 
61
  ### Serving
62
 
63
+ > [!IMPORTANT]
64
+ > Running a trillion-parameter model using the native Hugging Face forward method is challenging. We strongly recommend using an LLM inference engine (such as LMDeploy, vLLM, or sglang) to host Intern-S1-Pro and accessing the model via API.
65
+
66
  Intern-S1-Pro can be deployed using any of the following LLM inference frameworks:
67
 
68
  - LMDeploy