Introduction to the “Building AI Search: Multi-Modal RAG, RAFT, & GraphRAG” Course
Activeloop and the Intel Disruptor Initiative are excited to collaborate to bring yet another Gen AI 360: Foundational Model Certification Course for aspiring Generative AI professionals, executives, and enthusiasts of tomorrow.
Following the success of our "LangChain & Vector Databases In Production" and "Training and Fine-tuning LLMs for Production” course, as well as the "Advanced RAG” course, we're excited to welcome you to the 4th part of the series: “Building AI Search: Multi-Modal RAG, RAFT, & GraphRAG”
In this course, you'll learn how to implement advanced RAG techniques like GraphRAG, RAFT, and RAG with multi-modal models like ColPali. You'll also brush up your knowledge of state-of-the-art strategies to boost retrieval accuracy across reranking, various search techniques, fine-tuningg embedding models and more! A special module is dedicated to state-of-the-art RAG evaluation - which will help you iterate on the combination of strategies you've learnt throughout the course. This course will guide you on the optimal methods and practices for getting RAG production-ready with plenty of applied industry project examples. Let's get started!
Why This Course?
The course provides the theoretical knowledge and practical skills necessary to build advanced RAG products. RAG is evolving fast, so we spent some time surveying which novel techniques are the most practical and interesting for you to use!
Many human tasks across various industries can be assisted with AI by combining LLMs, prompting, RAG, and fine-tuning workflows.
We are huge fans of RAG because it helps with
1) reducing hallucinations by limiting the LLM to answer based on existing documentation,
2) helping with explainability, error checking, and copyright issues by clearly referencing its sources for each comment,
3) giving private/specific or more up-to-date data to the LLM,
4) and not relying on black box LLM training/fine tuning for what the models know and has memorized.
We touched upon basic RAG in our first Langchain and Vector DBs course, but building more advanced and reliable products requires more complex techniques and iterations of the model.
The 'course aims to provide you with the theoretical knowledge and practical skills necessary to develop products and applications centered on RAG.
A fundamental pillar of our course is the focus on hands-on learning. Real-world application and experimentation are crucial for a deep understanding and effective use of RAG techniques.
In this course, you will move beyond basic RAG apps, develop these applications with more advanced techniques, build RAG agents, and evaluate the performance of RAG systems.
Who Should Take This Course?
Whether planning to build a chat with data application for your organization or just learning how to leverage Generative AI in various industries, this course is for you. The course addresses critical issues such as reducing hallucinations in AI outputs, enhancing explainability, addressing copyright concerns, and offering more tailored, up-to-date data inputs. We go beyond basic RAG applications, equipping you with the skills to create more complex, reliable products with tools like LlamaIndex, and Deep Memory by Activeloop. Emphasizing hands-on learning, this course is a gateway to mastering advanced RAG techniques and applications in real-world scenarios. Please note that prior knowledge of coding and Python is a prerequisite.
What You Will Learn
You will start by learning the basic RAG tools, such as loading, indexing, storing, and querying LlamaIndex. We’ll also demystify the two libraries to help you select the right one when working with RAG or other LLM applications. You will then move towards more advanced RAG techniques aimed at surfacing and using more relevant information from the dataset. We cover techniques such as Query expansion, Transformation reranking, recursive retrieval, optimization, and production tips and techniques with LlamaIndex. We also introduce how better embedding management through Activeloop’s Deep Memory can be used to improve accuracy. We then progress to the exciting stuff: learning how to build RAG agents in Langchain and Llamaindex, an introduction to OpenAI assistants and some other tools & models that can be used in RAG products. We conclude with a summary of RAG evaluation techniques.
Is the Course Free?
Yes, the course is entirely free for everybody. However, running the project examples yourself will cost you some API and cloud credits.
Certification
By participating in this course and completing the quizzes at the end of each chapter, you will have the opportunity to earn a certification in using Deep Lake - a valuable addition to your professional credentials. This certification program, offered at no cost, forms part of the Deep Lake Foundational Model Certification Program in collaboration with Intel Disruptor Initiative.
Course Logistics
Here's everything you need to know about the course.
Course Hosting and Pace
This course is hosted by Activeloop. It is designed as a self-paced learning journey, allowing you to proceed at your own comfort. The online format provides flexibility to engage with the lessons whenever it best suits you.
At the end of each module, you can test your new knowledge with multiple-choice quizzes, which are mandatory to continue the course. You will receive your course certification after completing all the quizzes.
Community Support
Have questions about this course or specific lessons? Want to exchange ideas with fellow learners? For queries related to Deep Lake, please join the Deep Lake Slack community, where experts and users will be ready to assist.
Required Platforms, Tools, and Cloud Tokens
The course involves practical projects and exercises that require various tools and platforms. These will be thoroughly guided in the individual lessons. However, the main platforms that you will use throughout the course are:
- Activeloop Deep Lake
- Open AI
- LlamaIndex
What is Activeloop?
Activeloop is a tech company dedicated to building data infrastructure optimized for deep-learning applications. It offers a platform that seamlessly connects unstructured data types, like audio, video, and images, to machine learning models. Their main product, Deep Lake, ensures data streaming, scalable machine learning pipelines, and dataset version control. Such infrastructures are particularly beneficial when dealing with the demands of training and fine-tuning models for production.
What is Deep Lake?
Deep Lake is an open-source data lake designed for deep learning applications. It retains essential features of traditional data lakes, including SQL queries, ACID transactions, and dataset visualization. It specializes in storing complex data in tensor form, efficiently streaming data to deep learning frameworks. Built to be serverless on a columnar storage format, it also offers native version control and in-browser data visualization, complementing the needs of LLM training and deployment processes.
How to set up a Deep Lake account?
To set up a Deep Lake account, navigate to the app’s registration page and sign up. Follow the on-screen instructions and add the required details. Once you've verified your email and established a secure password, your account will be active and ready for use.
How to get the Deep Lake API token?
- After logging in, you should see your homepage. You should now see a “Create API token” button at the top of your homepage. Click on it, and you’ll get redirected to the “API tokens” page. This is where you can generate, manage, and revoke your API keys for accessing Deep Lake.
- Click on the "Create API token" button. You should see a popup asking for a token name and an expiration date. By default, the token expiration date is one year. Once you’ve set the token name and its expiration date, click the “Create API token” button.
- You should now see a green banner saying that the token has been successfully generated, along with your new API token, on the “API tokens” page. To copy your token to your clipboard, click the square icon on its right.
Coding Environment and Packages
Before starting this course, you need to ensure that you have the appropriate coding environment ready. Please make sure to use a Python version equal to or later than 3.8.1. You can set up your environment by choosing one of the following options:
- Having a code editor installed on your computer. A popular coding environment is Visual Studio Code.
- Using Python virtual environments to manage Python libraries.
- Alternatively, you could use Google Colab notebooks. If you have weird installation issues, we recommend troubleshooting them with running the course-related collabs to isolate the issue to the code or to the environment on your machine.
You will need the following packages to successfully execute the sample codes provided in each lesson. They can be installed using the pip
package manager.
deeplake==4.0.3
sentence-transformers==3.3.1
rerankers==0.6.0
bm25s==0.2.5
scikit-learn==1.5.2
numpy==2.1.3
datasets==3.1.0
torch==2.5.1
matplotlib==3.9.3
wikipedia==1.4.0
colbert-ai==0.2.21
pydantic==2.10.2
asyncio==3.4.3
nest-asyncio==1.6.0
nltk==3.9.1
rouge-score==0.1.2
ragas==0.2.6
pandas==2.2.3
ftfy==6.3.1
regex==2024.11.6
tqdm==4.67.1
torch-vision==0.1.6.dev0
scikit-image==0.24.0
# LlamaIndex Packages
llama-index==0.12.2
llama-index-core==0.12.2
llama-index-vector-stores-deeplake==0.3.2
llama-index-embeddings-openai==0.3.1
llama-index-llms-openai==0.3.2
llama-index-readers-web==0.3.0
llama-index-finetuning==0.3.0
llama-index-multi-modal-llms-openai==0.3.0
Please note that the code examples in the course have been tested with the versions specified in above. We routinely update the course, but due to the manual process, we recommend trying out the course code as is first and then trying the latest version. Moreover, specific lessons may require the installation of additional packages, which will be explicitly mentioned. The following code will demonstrate how to install a package using pip.
pip install deeplake==4.0.3
# Or: (to install an specific version)
# pip install deeplake==4.0.3
Google Colab
Google Colaboratory, popularly known as Google Colab, is a free cloud-based Jupyter notebook environment. Data scientists and engineers widely use it to train machine learning and deep learning models using CPUs, GPUs, and TPUs. Google Colab comes with an array of features, such as:
- Free access to GPUs and TPUs for accelerated model training.
- A web-based interface for a service running on a virtual machine, eliminating the need for local software installation.
- Seamless integration with Google Drive and GitHub.
To use Google Colab, all you need is a Google account. You can run terminal commands directly in notebook cells by appending an exclamation mark (!) before the command. Every notebook created in Google Colab gets stored in your Google Drive for easy access.
A convenient way of using API keys in Colab involves:
- Saving them in a file named
.env
on your Google Drive. Here’s how the file should be formatted to save the Activeloop token and the OpenAI API key.
ACTIVELOOP_TOKEN=your_activeloop_token
OPENAI_API_KEY=your_openai_key
- Mounting your Google Drive on your Colab instance.
- Loading them as environment variables using the
dotenv
library, like in the following code.
from dotenv import load_dotenv
load_dotenv('/content/drive/MyDrive/path/to/.env')
Creating Python Virtual Environments
Python virtual environments offer an excellent solution for managing Python libraries and avoiding package conflicts. They create isolated environments for installing packages, ensuring that your packages and their dependencies are contained within that environment. This setup provides clean and isolated environments for your Python projects.
Begin by executing the python
command in your terminal to confirm that the Python version is either equal to or greater than 3.8.1. Then follow these steps to create a virtual environment:
- Create a virtual environment using the command
python -m venv my_venv_name
. - Activate the virtual environment by executing
source my_venv_name/bin/activate
. - Install the required libraries and run the code snippets from the lessons within the virtual environment.
Happy learning!
Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.