我已经关注了this https://www.youtube.com/watch?v=Us5ZFp16PaU教程 (Colab笔记本 https://colab.research.google.com/drive/14xo6sj4dARk8lXZbOifHEn1f_70qNAwy?usp=sharing#scrollTo=hsD1VKqeA62Z)以微调我的模型。
尝试加载我本地保存的模型
model = AutoModelForCausalLM.from_pretrained("finetuned_model")
yields Killed
.
尝试从集线器加载模型:
yields
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "lucas0/empath-llama-7b"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(cwd+"/tokenizer.model")
# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)
yields
AttributeError: /home/ubuntu/empath/lora/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
完整的堆栈跟踪 https://pastebin.com/g9x8G7A3
模型创建:
我使用 PEFT 和 LoRa 微调了模型:
model = AutoModelForCausalLM.from_pretrained(
"decapoda-research/llama-7b-hf",
torch_dtype=torch.float16,
device_map='auto',
)
我必须下载并手动指定 llama 标记器。
tokenizer = LlamaTokenizer(cwd+"/tokenizer.model")
tokenizer.pad_token = tokenizer.eos_token
参加培训:
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
data = pd.read_csv("my_csv.csv")
dataset = Dataset.from_pandas(data)
tokenized_dataset = dataset.map(lambda samples: tokenizer(samples["text"]))
trainer = transformers.Trainer(
model=model,
train_dataset=tokenized_dataset,
args=transformers.TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
warmup_steps=100,
max_steps=100,
learning_rate=1e-3,
fp16=True,
logging_steps=1,
output_dir='outputs',
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = True # silence the warnings. Please re-enable for inference!
trainer.train()
并将其保存在本地:
trainer.save_model(cwd+"/finetuned_model")
print("saved trainer locally")
以及到枢纽:
model.push_to_hub("lucas0/empath-llama-7b", create_pr=1)
如何加载我的微调模型?