This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# train_grpo.py | |
# | |
# See https://github.com/willccbb/verifiers for ongoing developments | |
# | |
import re | |
import torch | |
from datasets import load_dataset, Dataset | |
from transformers import AutoTokenizer, AutoModelForCausalLM | |
from peft import LoraConfig | |
from trl import GRPOConfig, GRPOTrainer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I am building a prompt for an LLM (gpt-4o) while building a conversational assistant. The LLM is expected to predict one of the available commands based on the instructions given. Here is the current prompt - | |
``` | |
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to small talk and knowledge requests. | |
These are the flows that can be started, with their description and slots: | |
transfer_money: send money to friends and family | |
slot: transfer_money_recipient (the name of a person) | |
slot: transfer_money_amount_of_money (the amount of money without any currency designation) |