amezasor commited on
Commit
ef0f96e
·
verified ·
1 Parent(s): 5e31aec

data update

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -208,7 +208,7 @@ model-index:
208
  # Granite-3.0-2B-Base
209
 
210
  ## Model Summary
211
- **Granite-3.0-2B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-2B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
212
 
213
  - **Developers:** IBM Research
214
  - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
@@ -280,7 +280,9 @@ print(output)
280
 
281
  <!-- TO DO: To be completed once the paper is ready -->
282
  ## Training Data
283
- This model is trained on a mix of open-source and proprietary datasets.
 
 
284
 
285
  ## Infrastructure
286
  We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
 
208
  # Granite-3.0-2B-Base
209
 
210
  ## Model Summary
211
+ **Granite-3.0-2B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-2B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
212
 
213
  - **Developers:** IBM Research
214
  - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
 
280
 
281
  <!-- TO DO: To be completed once the paper is ready -->
282
  ## Training Data
283
+ This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
284
+ * Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
285
+ * Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
286
 
287
  ## Infrastructure
288
  We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.