Stability AI Launches StableVicuna, a Game-Changer in the AI World: The First Open Source RLHF LLM Chatbot
The field of artificial intelligence has witnessed a significant rise in the development and release of chatbots, including Character.ai, ChatGPT, and Bard. The chatbot user experience is mainly due to the integration of instruction finetuning and reinforcement learning through human feedback (RLHF) paradigms. The open-source community has made substantial contributions to the development of open-source frameworks like trlX, trl, DeepSpeed Chat, and ColossalAI, promoting the success of these chat models.
Stability AI’s StableVicuna is the first large-scale open-source chatbot to combine instruction finetuning and RLHF paradigms. This groundbreaking innovation was made possible through the collaborative efforts of Open Assistant, Anthropic, and Stanford, who made chat RLHF datasets available to the public. TrlX, a framework for RLHF training, played a crucial role in this development.
StableVicuna is built upon the Vicuna v0 13b model, an instruction fine-tuned LLaMA 13b model. It was further refined through a three-stage RLHF pipeline outlined by Steinnon et al. and Ouyang et al. The base Vicuna model underwent supervised finetuning (SFT) using three datasets: OpenAssistant Conversations Dataset (OASST1), GPT4All Prompt Generations, and Alpaca. Next, trlX was used to train a reward model using RLHF preference datasets, including OASST1, Anthropic HH-RLHF, and Stanford Human Preferences (SHP). Finally, Proximal Policy Optimization (PPO) reinforcement learning was employed to complete the RLHF training process.
StableVicuna-13B is accessible on the HuggingFace Hub, where users can download the weight delta against the original LLaMA model. To fully utilize StableVicuna-13B, users need access to LLaMA weights, which can be obtained by applying separately through the GitHub repo or via a provided link. A script available in the GitHub repo can be used to combine the weight delta and LLaMA weights to create StableVicuna-13B.
Stability AI has also announced an upcoming chat interface in its final development stages. The company is committed to continuously improving StableVicuna by iterating on the chatbot and deploying a Discord bot to the Stable Foundation server in the weeks to come. Users can currently try the model on a HuggingFace space and provide valuable feedback to enhance the overall user experience.
The release of StableVicuna is a testament to the power of collaboration and innovation, with key contributors like Duy Phung, Philwee, the OpenAssistant team, Jonathan from CarperAI, Poli, and AK from Hugging Face playing pivotal roles in bringing the project to life. This milestone represents a significant leap forward in the artificial intelligence domain, paving the way for future advancements in chatbot technology.
https://stability.ai/blog/stablevicuna-open-source-rlhf-chatbot
{
"seed": 2566083351,
"used_random_seed": true,
"negative_prompt": "",
"num_outputs": 1,
"num_inference_steps": 50,
"guidance_scale": 7.5,
"width": 512,
"height": 512,
"vram_usage_level": "balanced",
"sampler_name": "euler",
"use_stable_diffusion_model": "revAnimated_v11",
"use_vae_model": "vae-ft-mse-840000-ema-pruned",
"stream_progress_updates": true,
"stream_image_progress": false,
"show_only_filtered_image": true,
"block_nsfw": false,
"output_format": "jpeg",
"output_quality": 75,
"output_lossless": false,
"metadata_output_format": "json",
"original_prompt": "Stability AI Launches StableVicuna, a Game-Changer in the AI World: The First Open Source RLHF LLM Chatbot, deepleaps.com",
"active_tags": [
"Fantasy",
"Digital Art",
"Realistic",
"Surrealist"
],
"inactive_tags": [],
"use_upscale": "RealESRGAN_x4plus",
"upscale_amount": "4",
"prompt": "Stability AI Launches StableVicuna, a Game-Changer in the AI World: The First Open Source RLHF LLM Chatbot, deepleaps.com, Fantasy, Digital Art, Realistic, Surrealist",
"use_cpu": false
}