Fine-tuning Embeddings for RAG: A Hands-on Walkthrough (2025 Edition)

Introduction

Welcome to this walkthrough on fine-tuning embedding models to enhance Retrieval-Augmented Generation (RAG) systems. This guide will walk you through the steps to fine-tune an embedding model on domain-specific data, evaluate its performance, and implement it into a RAG pipeline.

The core idea behind fine-tuning embeddings is simple:

Move the embeddings for questions relating to a document closer together with that document

zomux / aws gpu setting.md

Last active February 3, 2016 15:27

system setting commands

ssh -i ~/Downloads/zomux.pem ubuntu@52.8.85.159

ssh-keygen

mangecoeur / description.md

Last active March 30, 2021 21:34

Pandas PostgresSQL support for loading to DB using fast COPY FROM method

This small subclass of the Pandas sqlalchemy-based SQL support for reading/storing tables uses the Postgres-specific "COPY FROM" method to insert large amounts of data to the database. It is much faster that using INSERT. To acheive this, the table is created in the normal way using sqlalchemy but no data is inserted. Instead the data is saved to a temporary CSV file (using Pandas' mature CSV support) then read back to Postgres using Psychopg2 support for COPY FROM STDIN.

dapangmao / ssn.md

Last active August 29, 2015 14:10

Spark practice (3): clean and sort Social Security numbers

Sample.txt

Requirements:
1. separate valid SSN and invalid SSN
2. count the number of valid SSN

402-94-7709 
283-90-3049

dapangmao / spark2.md

Last active August 29, 2015 14:09

Chapter 2. Downloading and Getting Started

In this chapter we will walk through the process of downloading and running Spark in local mode on a single computer. This chapter was written for anybody that is new to Spark, including both Data Scientists and Engineers.

Spark can be used from Python, Java or Scala. To benefit from this book, you don’t need to be an expert programmer, but we do assume that you are comfortable with the basic syntax of at least one of these languages. We will include examples in all languages wherever possible.

Spark itself is written in Scala, and runs on the Java Virtual Machine (JVM). To run Spark on either your laptop or a cluster, all you need is an installation of Java 6 (or newer). If you wish to use the Python API you will also need a Python interpreter (version 2.6 or newer) . Spark does not yet work with Python 3.

fyears / note.md

Last active February 6, 2024 09:59

how to install scipy numpy matplotlib ipython in virtualenv

if you are using linux, unix, os x:

pip install -U setuptools
pip install -U pip

pip install numpy
pip install scipy
pip install matplotlib
#pip install PySide

William C Grisaitis grisaitis

Fine-tuning Embeddings for RAG: A Hands-on Walkthrough (2025 Edition)

Introduction

system setting commands

Chapter 2. Downloading and Getting Started