Fine-Tuning using LoRA and SFT

Introduction

The fine-tuning process has consistently proven to be a practical approach for enhancing the model's capabilities in new domains. Therefore, it is a valuable approach to adapt large language models while using a reasonable amount of resources.

As mentioned earlier, the fine-tuning process builds upon the model's existing general knowledge, which means it doesn't need to learn everything from scratch. Consequently, it can grasp patterns from a relatively small number of samples and undergo a relatively short training process.

In this lesson, we’ll see how to do SFT on an LLM using LoRA. We’ll use the dataset from the "LIMA: Less Is More for Alignment" paper. According to their argument, a high-quality, hand-picked, small dataset with a thousand samples can replace the RLHF process, effectively enabling the model to be instructively fine-tuned. Their approach yielded competitive results compared to other language models, showcasing a more efficient fine-tuning process. However, it might not exhibit the same level of accuracy in domain-specific tasks, and it requires hand-picked data points.

The TRL library has some classes for Supervised Fine-Tuning (SFT), making it accessible and straightforward. The classes permit the integration of LoRA configurations, facilitating its seamless adoption. It is worth highlighting that this process also serves as the first step for Reinforcement Learning with Human Feedback (RLHF), a topic we will explore in detail later in the course.

Spinning Up a Virtual Machine for Finetuning on GCP Compute Engine

Cloud GPUs availability today is very scarse as they are used a lot for several deep learning applications. Few people know that CPUs can be actually used to finetune LLMs through various optimizations and that’s what we’ll be doing in these lessons when doing SFT.

Let’s login to our Google Cloud Platform account and create a Compute Engine instance (see the “Course Introduction” lesson for instructions). You can choose between different machine types. In this lesson, we trained the model on the latest CPU generation from 4th Generation Intel® Xeon® Scalable Processors (formerly known as Intel® Sapphire Rapids). This architecture features an integrated accelerator designed to enhance the performance of training deep learning models. Intel® Advanced Matrix Extension (AMX) empowers the training of models with BF16 precision during the training process, allowing for half-precision training on the latest Xeon® Scalable processors. Additionally, it introduces an INT8 data type for the inference process, leading to a substantial acceleration in processing speed. Reports suggest a tenfold increase in performance when utilizing PyTorch for both training and inference processes.

Follow the instructions in the course introduction to spin up a VM with Compute Engine with high-end Intel® CPUs. Once you have your virtual machine up, you can SSH into it.

Incorporating CPUs for fine-tuning or inference processes presents an excellent choice, as renting alternate hardware is considerably less cost-effective. It worth mentioning that a minimum of 32GB of RAM is necessary to load the model and facilitate the experiment's training process. If there is an out-of-memory error, reduce arguments such as batch_size or seq_length.

⚠️

Beware of costs when you spin up virtual machines. The total cost will depend on the machine type and the up time of the machine. Always remember to monitor your costs in the billing section of GCP and to spin off your virtual machines when you don’t use them.

💡

If you just want to replicate the code in the lesson spending very few money, you can just run the training in your virtual machine and stop it after a few iterations.

Load the Dataset

The quality of a model is directly tied to the quality of the data it is trained on! The best approach is to begin the process with a dataset. Whether it is an open-source dataset or a custom one manually, planning and considering the dataset in advance is essential. In this lesson, we will utilize the dataset released with the LIMA research. It is publicly available with a non-commercial use license.

The powerful feature of Deep Lake format enables seamless streaming of the datasets. There is no need to download and load the dataset into memory. The hub provides diverse datasets, including the LIMA dataset presented in the "LIMA: Less Is More for Alignment" paper. The Deep Lake Web UI not only aids in dataset exploration but also facilitates dataset visualization using the embeddings field, taking care of clustering the dataset and map it in 3D space. (We used Cohere embedding API to generate in this example) The enlarged image below illustrates one such cluster where data points in Portuguese language related to coding are positioned closely to each other. Note that Deep Lake Visualization Engine offers you the ability to pick the clustering algorithm.

Deep Lake Visualization Engine 3D visualization feature.

The code below will create a loader object for the training and test sets.

import deeplake

# Connect to the training and testing datasets
ds = deeplake.load('hub://genai360/GAIR-lima-train-set')
ds_test = deeplake.load('hub://genai360/GAIR-lima-test-set')

print(ds)

The sample code.

Dataset(path='hub://genai360/GAIR-lima-train-set', read_only=True, tensors=['answer', 'question', 'source'])

The output.

We can then utilize the ConstantLengthDataset class to bundle a number of smaller samples together, enhancing the efficiency of the training process. Furthermore, it also handles dataset formatting by accepting a template function and tokenizing the texts.

To begin, we load the pre-trained tokenizer object for the Open Pre-trained Transformer (OPT) model using the Transformers library. We will load the model later. We are using OPT for convenience because it’s an open model with a relatively “small” amount of parameters. The same code in this lesson can be run in another model too, for example, using meta-llama/Llama-2-7b-chat-hf for LLaMa 2.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-1.3b")

The sample code.

Moreover, we need to define the formatting function called prepare_sample_text, which takes a row of data in Deep Lake format as input and formats it to begin with a question followed by the answer that is separated by two newlines. This formatting aids the model in learning the template and understanding that if a prompt starts with the question keyword, the most likely response would be to complete it with an answer.

def prepare_sample_text(example):
    """Prepare the text from a sample of the dataset."""
    text = f"Question: {example['question'].text()}\n\nAnswer: {example['answer'].text()}"

    return text

The sample code.

Now, with all the components in place, we can initialize the dataset, which can be fed to the model for fine-tuning. We call the ConstantLengthDataset class using the combination of a tokenizer, deep lake dataset object, and formatting function. The additional arguments, such as infinite=True ensure that the iterator will restart when all data points have been used, but there are still training steps remaining. Alongside seq_length, which determines the maximum sequence length, it must be completed according to the model's configuration. In this scenario, it is possible to raise it to 2048, although we opted for a smaller value to manage memory usage better. Select a higher number if the dataset primarily comprises shorter texts.

from trl.trainer import ConstantLengthDataset

train_dataset = ConstantLengthDataset(
    tokenizer,
    ds,
    formatting_func=prepare_sample_text,
    infinite=True,
    seq_length=1024
)

eval_dataset = ConstantLengthDataset(
    tokenizer,
    ds_test,
    formatting_func=prepare_sample_text,
    seq_length=1024
)

# Show one sample from train set
iterator = iter(train_dataset)
sample = next(iterator)
print(sample)

The sample code.

{'input_ids': tensor([    2, 45641,    35,  ..., 48443,  2517,   742]), 'labels': tensor([    2, 45641,    35,  ..., 48443,  2517,   742])}

The output.

As evidenced by the output above, the ConstantLengthDataset class takes care of all the necessary steps to prepare our dataset.

💡

If you use the iterator to print a sample from the dataset, remember to execute the following code to reset the iterator pointer. train_dataset.start_iteration = 0

Initialize the Model and Trainer

As mentioned previously, we will be using the OPT model with 1.3 billion parameters in this lesson, which has the facebook/opt-1.3b model id on the Hugging Face Hub.

The LoRA approach is employed for fine-tuning, which involves introducing new parameters to the network while keeping the base model unchanged during the tuning process. This approach has proven to be highly efficient, enabling fine-tuning of the model by training less than 1% of the total parameters. (For more details, refer to the following post.)

With the TRL library, we can seamlessly add additional parameters to the model by defining a number of configurations. The variable r represents the dimension of matrices, where lower values lead to fewer trainable parameters. lora_alpha serves as the scaling factor, while bias determines which bias parameters the model should train, with options of none, all, and lora_only. The remaining parameters are self-explanatory.

from peft import LoraConfig

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

The sample code.

Next, we need to configure the TrainingArguments, which are essential for the training process. We have already covered some of the parameters in the training lesson, but note that the learning rate is higher when combined with higher weight decay, increasing parameter updates during fine-tuning.

Furthermore, it is highly recommended to employ the argument bf16=True in order to minimize memory usage during the model's fine-tuning process. The utilization of the Intel® Xeon® 4s CPU empowers us to apply this optimization technique. This involves converting the numbers to a 16-bit precision, effectively reducing the RAM demand during fine-tuning. We will dive into other quantization methods as we progress through the course.

We are also using a service called Weights and Biases, which is an excellent tool for training and fine-tuning any machine-learning model. They offer monitoring tools to record every facet of the process and various solutions for prompt engineering and hyperparameter sweep, among other functionalities. Simply installing the package and utilizing the wandb parameter for the report_to argument is all that's required. This will handle the logging process seamlessly.

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./OPT-fine_tuned-LIMA-CPU",
    dataloader_drop_last=True,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=10,
    logging_steps=5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=1e-4,
    lr_scheduler_type="cosine",
    warmup_steps=10,
    gradient_accumulation_steps=1,
    bf16=True,
    weight_decay=0.05,
    run_name="OPT-fine_tuned-LIMA-CPU",
    report_to="wandb",
)

The sample code.

The final component we need is the pre-trained model. We will use the facebook/opt-1.3b key to load the model using the Transformers library.

from transformers import AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b", torch_dtype=torch.bfloat16)

The sample code.

The subsequent code block will loop through the model parameters and revert the data type of specific layers (like LayerNorm and final language modeling head) to a 32-bit format. It will improve the fine-tuning stability.

import torch.nn as nn

for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

The sample code.

Finally, we can use the SFTTrainer class to tie all the components together. It accepts the model, training arguments, training dataset, and LoRA method configurations to construct the trainer object. The packing argument indicates that we used the ConstantLengthDataset class earlier to pack samples together.

from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=lora_config,
    packing=True,
)

The sample code.

So, why did we use LoRA? Let's observe its impact in action by implementing a simple function that calculates the number of available parameters in the model and compares it with the trainable parameters. As a reminder, the trainable parameters refer to the ones that LoRA added to the base model.

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

print( print_trainable_parameters(trainer.model) )

The sample code.

trainable params: 3145728 || all params: 1318903808 || trainable%: 0.23851079820371554

The output.

As observed above, the number of trainable parameters is only 3 million. It accounts for only 0.2% of the total number of parameters that we would have had to update if we hadn't used LoRA! It significantly reduces the memory requirement. Now, it should be clear why using this approach for fine-tuning is advantageous.

The trainer object is fully prepared to initiate the fine-tuning loop by calling the .train() method, as shown below.

print("Training...")
trainer.train()

The sample code.

💡

You can access the best checkpoint that we trained by using the following URL. Additionally, find more information about the fine-tuning process on the Weights and Biases project page.

OPT-fine_tuned-LIMA-CPU.zip34953.2KB

Merging LoRA and OPT

The final step involves merging the base model with the trained LoRA layers to create a standalone model. This can be achieved by loading the desired checkpoint from SFTTrainer, followed by the base model itself using the PeftModel class. Begin by loading the OPT-1.3B base model if using a fresh environment.

from transformers import AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
  "facebook/opt-1.3b", return_dict=True, torch_dtype=torch.bfloat16
)

The sample code.

The PeftModel class can merge the base model with the LoRA layers from the checkpoint specified using the .from_pretrained() method. We should then put the model in the evaluation mode. Upon execution, it will print out the model's architecture to observe the presence of the LoRA layers.

from peft import PeftModel

# Load the Lora model
model = PeftModel.from_pretrained(model, "./OPT-fine_tuned-LIMA-CPU/<desired_checkpoint>/")
model.eval()

The sample code.

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): OPTForCausalLM(
      (model): OPTModel(
        (decoder): OPTDecoder(
          (embed_tokens): Embedding(50272, 2048, padding_idx=1)
          (embed_positions): OPTLearnedPositionalEmbedding(2050, 2048)
          (final_layer_norm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
          (layers): ModuleList(
            (0-23): 24 x OPTDecoderLayer(
              (self_attn): OPTAttention(
                (k_proj): Linear(in_features=2048, out_features=2048, bias=True)
                (v_proj): Linear(
                  in_features=2048, out_features=2048, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=2048, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=16, out_features=2048, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                )
                (q_proj): Linear(
                  in_features=2048, out_features=2048, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=2048, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=16, out_features=2048, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                )
                (out_proj): Linear(in_features=2048, out_features=2048, bias=True)
              )
              (activation_fn): ReLU()
              (self_attn_layer_norm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
              (fc1): Linear(in_features=2048, out_features=8192, bias=True)
              (fc2): Linear(in_features=8192, out_features=2048, bias=True)
              (final_layer_norm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
      )
      (lm_head): Linear(in_features=2048, out_features=50272, bias=False)
    )
  )
)

The output.

Lastly, we can use the PEFT model’s .merge_and_unload() method to combine the base model and LoRA layers as a standalone object. It is possible to save the weights using the .save_pretrained() method for later usage.

model = model.merge_and_unload()

model.save_pretrained("./OPT-fine_tuned-LIMA/merged")

The sample code.

💡

Prior to progressing to the next section to observe the outcomes of the fine-tuned model, it's important to reiterate that the base model employed in this lesson is a relatively small language model with limited capabilities when compared with the state-of-the-art models we are accustomed to by now, such as ChatGPT. Remember that the insights gained from this lesson can be easily applied to train significantly larger variations of the models, leading to notably improved outcomes. (As highlighted in the lesson's introduction, modifying the key used for loading the tokenizer/model to models with any size like LLaMA2 is possible.)

Inference

We can evaluate the fine-tuned model’s outputs by employing various prompts. The code below demonstrates how we can utilize Huggingface's .generate() method to interact with models effortlessly. Numerous arguments and decoding strategies exist that can enhance text generation quality; however, these are beyond the scope of this course. You can explore these techniques further in an informative blog post by Huggingface.

inputs = tokenizer("Question: Write a recipe with chicken.\n\n Answer: ", return_tensors="pt")

generation_output = model.generate(**inputs,
                                   return_dict_in_generate=True,
                                   output_scores=True,
                                   max_length=256,
                                   num_beams=1,
                                   do_sample=True,
                                   repetition_penalty=1.5,
                                   length_penalty=2.)

print( tokenizer.decode(generation_output['sequences'][0]) )

The sample code.

Question: Write a recipe with chicken.\n\n Answer: \n* Chicken and rice is one of the most popular meals in China, especially during Chinese New Year celebrations when it's served as an appetizer or main course for dinner parties (or just to eat by yourself). It can be made from scratch using fresh ingredients like meatballs/chicken breasts if you have them on hand but otherwise use frozen ones that are already cooked so they don't need any additional cooking time before serving. You could also substitute some vegetables instead such as broccoli florets which would make this dish even more delicious! If your family doesn’t know how to cook well then I suggest making these recipes ahead of time because once done all you really do is reheat until hot again :)\n## Make homemade marinade\n1) Combine 1 tablespoon soy sauce, 2 tablespoons sesame oil, 3 teaspoons sugar, 4 cloves garlic minced into small pieces, 6-8 green onions chopped finely, 5 cups water, salt & pepper to taste, about 8 ounces boneless skinless chicken breast fillets cut up fine enough not to stick together while being mixed thoroughly - no bones needed here since there will only ever be two servings per person), ½ cup cornstarch dissolved in ¼...

The output.

To carry out further experimentation with the OPT-fine_tuned-LIMA model, we presented an identical prompt to both the vanilla base model and the fine-tuned version. This experiment aims to measure the degree to which each of these models can follow instructions. Below is a list of prompts. You can toggle the outputs by clicking on the right arrow icon.

‣

1. Write a recipe with chicken.

‣

2. Create a marketing plan for a coffee shop.

‣

3. Why does it rain? Explain your answer.

‣

4. What’s the Italian translation of the word ‘house’?

The outcomes highlight the constraints and capabilities of both models. However, it is evident that the fine-tuned model learned to follow instructions better compared to the vanilla-based model. This effect would undoubtedly become more pronounced with the availability of resources to conduct the fine-tuning process for a large model.

Amazon Bedrock

Amazon Bedrock is a fully managed service that provides high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. You can select from a wide range of foundation models to find the one that best fits your use case. Amazon Bedrock also offers extensive capabilities to build generative AI applications with a focus on security, privacy, and responsible AI. With Amazon Bedrock, you can experiment with and evaluate top foundation models for your needs, customize them privately with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and create agents that perform tasks using your enterprise systems and data sources.

Amazon Bedrock's serverless experience allows you to start quickly, customize foundation models with your own data privately, and integrate and deploy them into your applications using AWS tools without managing any infrastructure.

To find out which models can be accessed from your region you can consult the list made available by AWS on the following page and to see which foundation models are available in Bedrock you can refer to the official page here.

Amazon Bedrock Pricing

When you sign up for AWS, your account is automatically enrolled in all AWS services, including Amazon Bedrock. However, you are only billed for the services you use.

To view your bill, visit the Billing and Cost Management Dashboard in the AWS Billing and Cost Management console. For more information about AWS account billing, refer to the AWS Billing User Guide.

With Amazon Bedrock, you pay for running inference on third-party foundation models. The pricing is determined by the volume of input and output tokens, as well as whether you have purchased provisioned throughput for the model.

For additional details, visit Amazon Bedrock Pricing.

Fine-Tune a Model with Amazon Bedrock

If you don't have an AWS account yet go to the following link and follow all the instructions. You will need to fill out the registration form with your details, such as name, email address, and password.

After submitting the form, you may need to verify your email address by clicking on a confirmation link sent to your inbox.

To create a fine-tuning job in the console, select Customize model in the Amazon Bedrock console and then select Custom models and choose Create Fine-tuning job. We will be redirected to the following link.

To prepare for fine-tuning on Amazon Bedrock, we need to convert our dataset into JSON Lines format and upload it to Amazon S3. Ensure each JSON line includes both a prompt and a completion field. We can specify up to 10,000 training data records, but even a few hundred examples can lead to noticeable performance improvements.

{"completion": "The team discussed the project's next steps and deadlines...", "prompt": "Summarize the following team meeting notes.\\n\\n#TeamMeeting..."}

Install the requirements:

pip install datasets==2.20.0 boto3==1.34.143

We can provide credentials to Boto3 by passing them as parameters when creating clients:

import boto3

client = boto3.client(
    service_name="bedrock",
    aws_access_key_id=ACCESS_KEY,
    aws_secret_access_key=SECRET_KEY,
    aws_session_token=SESSION_TOKEN
)

We can list the available foundation models that support fine-tuning using the following command:

bedrock = boto3.client(service_name="bedrock")
bedrock_runtime = boto3.client(service_name="bedrock-runtime")

for model in bedrock.list_foundation_models(
    byCustomizationType="FINE_TUNING")["modelSummaries"]:
    for key, value in model.items():
        print(key, ":", value)
    print("-----\\n")

Next, we initiate a model customization job. We select the Cohere Command Light model ID that supports fine-tuning, set the customization type to FINE_TUNING, and specify the Amazon S3 location of the training data. If necessary, we can also adjust the hyperparameters for fine-tuning.

# Select the foundation model you want to customize
base_model_id = "cohere.command-light-text-v14:7:4k"
bedrock.create_model_customization_job(
    customizationType="FINE_TUNING",
    jobName=job_name,
    customModelName=model_name,
    roleArn=role,
    baseModelIdentifier=base_model_id,
    hyperParameters = {
        "epochCount": "1",
        "batchSize": "8",
        "learningRate": "0.00001",
    },
    trainingDataConfig={"s3Uri": "s3://path/to/train-summarization.jsonl"},
    outputDataConfig={"s3Uri": "s3://path/to/output"},
)

# Check for the job status
status = bedrock.get_model_customization_job(jobIdentifier=job_name)["status"]

Note that the latest Llama3 models are also available. You can experiment with meta.llama3-8b-instruct-v1:0 and others. For a complete example, see the official blog here.

After the job is finished, we obtain a unique model ID for our custom model. Amazon Bedrock securely stores our fine-tuned model. To test and deploy our model, we must purchase Provisioned Throughput.

Let's review the results. We choose an example from the dataset and ask both the base model before fine-tuning and the custom model after fine-tuning to summarize the following dialog:

prompt = """Summarize the following conversation.\\\\n\\\\n
#Person1#: Hi, everyone. Let's get started with today's meeting.\\\\n
#Person2#: Sure, let's begin with the project updates.\\\\n
#Person3#: The development team has completed the initial prototype.\\\\n
#Person4#: That's great! The marketing team is preparing the launch strategy.\\\\n
#Person2#: Excellent. Any issues or blockers?\\\\n
#Person3#: No major issues, but we need more resources for testing.\\\\n
#Person4#: I'll arrange for additional testers by next week.\\\\n
#Person1#: Perfect. Let's reconvene next Monday for further updates.\\\\n\\\\n
Summary: The team discussed project updates, including the completion of the initial prototype by the development team and the marketing team's preparation of the launch strategy. No major issues were reported, but additional resources for testing were requested and will be arranged by next week. The next meeting is scheduled for Monday."""

You can use the Amazon Bedrock InvokeModel API to send queries to the models. This API enables interaction with the models by submitting your data and receiving their responses.

body = {
    "prompt": prompt,
    "temperature": 0.5,
    "p": 0.9,
    "max_tokens": 512,
}

response = bedrock_runtime.invoke_model(
	# Use on-demand inference model ID for response before fine-tuning
    # modelId="cohere.command-light-text-v14",
	# Use ARN of your deployed custom model for response after fine-tuning
	modelId=provisioned_custom_model_arn,
    modelId=base_model_id,
    body=json.dumps(body)
)

Here is the response from the base model before fine-tuning:

#Person1# starts the meeting and #Person2# asks for project updates. #Person3# reports that the development team has completed the initial prototype. #Person4# mentions that the marketing team is preparing the launch strategy. #Person2# inquires about any issues or blockers. #Person3# indicates no major issues but requests more resources for testing. #Person4# agrees to arrange additional testers by next week. The meeting ends with a plan to reconvene next Monday for further updates.

Here's the response after fine-tuning, more concise and direct:

The team discussed project updates: the prototype is complete, the launch strategy is in progress, and more testers are needed. The next meeting is on Monday.

If you want to refer to the official fine tuning guide Amazon Bedrock you can find it here.

Conclusion

During this lesson, we experimented with the fine-tuning process of large language models, utilizing the LoRA technique to achieve an efficient tuning process. Additionally, we explored the capabilities of Amazon Bedrock for fine-tuning. During our exploration, we discovered the importance of the process. It can serve as a starting point for RLHF or be used for instruction tuning. In the upcoming lessons, we will experiment with the fine-tuning process for creating domain-specific models.

>> Notebook.

>> W&B Report.

For more information on Intel® Accelerator Engines, visit this resource page. Learn more about Intel® Extension for Transformers, an Innovative Transformer-based Toolkit to Accelerate GenAI/LLM Everywhere here.

Intel, the Intel logo, and Xeon are trademarks of Intel Corporation or its subsidiaries.