Hello Dolly 2.0 Open Instruction-Tuned LLM


Databricks has announced the launch of Dolly 2.0, the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for commercial use.

Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees.

“With Dolly 2.0, any organisation can create, own, and customise a powerful LLM that understands how to talk to people. Dolly 2.0 is available for commercial applications without the need to pay for API access or share data with third parties,” said Ali Ghodsi, co-founder and CEO at Databricks.

This announcement comes only two weeks after the launch of Dolly, a large language model (LLM) which was trained in under 30 minutes for less than $30 to exhibit ChatGPT-like human interactivity (aka instruction-following). In order to create the dataset on which Dolly 2.0 is trained on, Databricks says it incentivised over 5000 of its employees by gamifying the process, and that as a result, it managed to break 15,000 results within a week.

“Dolly 2.0 is the next step in Databricks’ mission to help every organisation harness the power of large language models. It is also a response to customer feedback which has stressed on the importance of companies owning their models, and being able to manage tradeoffs in terms of model quality, cost and behaviour,” Ghodsi added.


Comments are closed.