Skip to content

Instantly share code, notes, and snippets.

@csiebler
Last active October 27, 2023 17:35
Show Gist options
  • Save csiebler/95b04caae5aaba52b52289fca12b1726 to your computer and use it in GitHub Desktop.
Save csiebler/95b04caae5aaba52b52289fca12b1726 to your computer and use it in GitHub Desktop.
RAG for current information access in real-time using AOAI and Azure Cognitive Search
{"title": "JPMorgan Chase’s Jamie Dimon and his family to sell $141 million of stock in 2024", "content": "JPMorgan Chase chief executive Jamie Dimon and his family plan to sell one million of their shares in the bank starting next year, according to a new securities filing.\n\nDimon — who will use his stock trading plans to offload his shares — and his family currently own roughly 8.6 million shares in the company. The move marks Dimon’s first stock sale during his 17 years at the company’s helm.\n\nJPMorgan Chase shares closed at $140.76 on Thursday, putting the transaction’s worth at roughly $141 million.\n\n“Mr. Dimon continues to believe the company’s prospects are very strong and his stake in the company will remain very significant,” JPMorgan Chase said in the filing.\n\nCNN has reached out to the company for comment.\n\nShares of JPMorgan Chase have climbed roughly 5% this year during what’s been a tough environment for banks. The Federal Reserve’s aggressive pace of interest rate hikes, which began in 2022, has crimped demand for loans and forced banks to pay up for clients’ high-yielding holdings. At the same time, banks have watched the value of their own bond investments erode in value.\n\nJPMorgan Chase, the largest US bank by assets, has managed to turn out earnings beats this year despite the stormy conditions. The bank in May acquired most assets of collapsed regional lender First Republic, a move that helped JPMorgan Chase’s profit jump 35% last quarter.\n\nBut Dimon has warned that the Federal Reserve’s fight against inflation isn’t over and could weaken the remarkably resilient economy. He has also sounded alarm bells that wars in Ukraine, Israel and Gaza could have damaging consequences for global financial markets and geopolitical relationships.\n\n“Now may be the most dangerous time the world has seen in decades,” Dimon said earlier this month, when announcing the bank’s third-quarter financial results."}
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RAG using Function Calling for current data access from Azure Cognitive Search\n",
"\n",
"Install the preview version of the Azure Cognitive Search Python SDK if you don't have it already\n",
"\n",
"`pip install azure-search-documents --pre --upgrade`\n",
"\n",
"Then create a `.env` with:\n",
"\n",
"```\n",
"OPENAI_API_BASE=https://xxxxxxx.openai.azure.com/\n",
"OPENAI_API_KEY=xxxxxxxx\n",
"OPENAI_API_VERSION=2023-07-01-preview\n",
"AZURE_COGNITIVE_SEARCH_ENDPOINT=https://xxxxxx.search.windows.net\n",
"AZURE_COGNITIVE_SEARCH_API_KEY=xxxxxx\n",
"AZURE_COGNITIVE_SEARCH_INDEX_NAME=rag-current-data\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"import os \n",
"import json \n",
"import openai \n",
"import uuid\n",
"import tiktoken\n",
"from dotenv import load_dotenv\n",
"from tenacity import retry, wait_random_exponential, stop_after_attempt \n",
"from azure.core.credentials import AzureKeyCredential \n",
"from azure.search.documents import SearchClient\n",
"from azure.search.documents.indexes import SearchIndexClient \n",
"from azure.search.documents.models import Vector \n",
"from azure.search.documents.indexes.models import ( \n",
" SearchIndex, \n",
" SearchField, \n",
" SearchFieldDataType, \n",
" SimpleField, \n",
" SearchableField, \n",
" SearchIndex, \n",
" SemanticConfiguration, \n",
" PrioritizedFields, \n",
" SemanticField, \n",
" SearchField, \n",
" SemanticSettings, \n",
" VectorSearch, \n",
" HnswVectorSearchAlgorithmConfiguration,\n",
") "
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"load_dotenv()\n",
" \n",
"# Configure environment variables for Azure Cognitive Search\n",
"service_endpoint = os.getenv(\"AZURE_COGNITIVE_SEARCH_ENDPOINT\")\n",
"index_name = os.getenv(\"AZURE_COGNITIVE_SEARCH_INDEX_NAME\")\n",
"credential = AzureKeyCredential(os.getenv(\"AZURE_COGNITIVE_SEARCH_API_KEY\"))\n",
"\n",
"# Create the Azure Cognitive Search client to issue queries\n",
"search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)\n",
"\n",
"# Create the index client\n",
"index_client = SearchIndexClient(endpoint=service_endpoint, credential=credential)\n",
"\n",
"# Configure OpenAI environment variables\n",
"openai.api_type = \"azure\"\n",
"openai.api_base = os.getenv(\"OPENAI_API_BASE\")\n",
"openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"openai.api_version = os.getenv(\"OPENAI_API_VERSION\")\n",
"\n",
"# Model deployment names\n",
"deployment_name = \"gpt-35-turbo\"\n",
"embedding_model = \"text-embedding-ada-002\"\n",
"\n",
"# Use tiktoken to measure article length for trunacting/splitting\n",
"encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
"token_limit = 8000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create index and load data"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" rag-current-data deleted\n",
" rag-current-data created\n"
]
}
],
"source": [
"# Create a search index\n",
"fields = [\n",
" SimpleField(name=\"id\", type=SearchFieldDataType.String, key=True, filterable=True),\n",
" SearchableField(name=\"title\", type=SearchFieldDataType.String, facetable=True, analyzer_name=\"en.microsoft\"),\n",
" SearchableField(name=\"content\", type=SearchFieldDataType.String, analyzer_name=\"en.microsoft\"),\n",
" SearchField(name=\"content_vector\", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
" searchable=True, vector_search_dimensions=1536, vector_search_configuration=\"my-vector-config\")\n",
"]\n",
"\n",
"vector_search = VectorSearch(\n",
" algorithm_configurations=[\n",
" HnswVectorSearchAlgorithmConfiguration(\n",
" name=\"my-vector-config\",\n",
" kind=\"hnsw\"\n",
" )\n",
" ]\n",
")\n",
"\n",
"# Semantic Configuration to leverage Bing family of ML models for re-ranking (L2)\n",
"semantic_config = SemanticConfiguration(\n",
" name=\"my-semantic-config\",\n",
" prioritized_fields=PrioritizedFields(\n",
" title_field=None,\n",
" prioritized_keywords_fields=[],\n",
" prioritized_content_fields=[SemanticField(field_name=\"content\")]\n",
" ))\n",
"semantic_settings = SemanticSettings(configurations=[semantic_config])\n",
"\n",
"# Create the search index with the semantic settings\n",
"index = SearchIndex(name=index_name, fields=fields, \n",
" vector_search=vector_search, semantic_settings=semantic_settings)\n",
"result = index_client.delete_index(index)\n",
"print(f' {index_name} deleted')\n",
"result = index_client.create_index(index)\n",
"print(f' {result.name} created')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define a helper function to create embeddings"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.006555966567248106, 0.0036704621743410826, -0.01164249237626791, -0.026776477694511414, -0.012383492663502693, -0.001434117672033608, -0.013375679031014442, 0.009356696158647537]\n"
]
}
],
"source": [
"# Function to generate embeddings for title and content fields, also used for query embeddings\n",
"@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))\n",
"def generate_embeddings(text):\n",
" \n",
" num_tokens = len(encoding.encode(text))\n",
" if num_tokens > token_limit:\n",
" text = encoding.decode(encoding.encode(text)[:token_limit])\n",
" \n",
" response = openai.Embedding.create(\n",
" input=text, engine=embedding_model)\n",
" embeddings = response['data'][0]['embedding']\n",
" return embeddings\n",
"\n",
"print(generate_embeddings(\"Hello world!\")[:8])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load the data into Azure Cognitive Search"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Uploaded 1 documents\n"
]
}
],
"source": [
"batch_size = 100\n",
"counter = 0\n",
"documents = []\n",
"search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)\n",
"\n",
"with open(\"news.jsonl\", \"r\") as j_in:\n",
" for line in j_in:\n",
" counter += 1 \n",
" article = json.loads(line)\n",
" article['id'] = str(uuid.uuid4())\n",
" article['content_vector'] = generate_embeddings(article['content'])\n",
" article[\"@search.action\"] = \"upload\"\n",
" documents.append(article)\n",
" if counter % batch_size == 0:\n",
" # Load content into index\n",
" result = search_client.upload_documents(documents) \n",
" print(f\"Uploaded {len(documents)} documents\") \n",
" documents = []\n",
" \n",
" \n",
"if documents != []:\n",
" # Load content into index\n",
" result = search_client.upload_documents(documents) \n",
" print(f\"Uploaded {len(documents)} documents\") \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test function calling"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"role\": \"assistant\",\n",
" \"function_call\": {\n",
" \"name\": \"query_articles\",\n",
" \"arguments\": \"{\\n \\\"query\\\": \\\"Jamie Diamond stock sales 2024\\\"\\n}\"\n",
" }\n",
"}\n"
]
}
],
"source": [
"messages = [{\"role\": \"user\", \"content\": \"How much stock will Jamie Diamond sell in 2024?\"}]\n",
"\n",
"functions = [\n",
" {\n",
" \"name\": \"query_articles\",\n",
" \"description\": \"Retrieves latest information past 2021 from a private data source to answer user questions with up to date information\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"query\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Search query to retrieve information for answering the user's question\",\n",
" }\n",
" },\n",
" \"required\": [\"query\"],\n",
" },\n",
" }\n",
"]\n",
"\n",
"response = openai.ChatCompletion.create(\n",
" deployment_id=deployment_name,\n",
" messages=messages,\n",
" functions=functions,\n",
" temperature=0.2,\n",
" function_call=\"auto\", \n",
")\n",
"\n",
"print(response['choices'][0]['message'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define function to call Azure Cognitive Search"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Article 0: JPMorgan Chase’s Jamie Dimon and his family to sell $141 million of stock in 2024\n",
"JPMorgan Chase chief executive Jamie Dimon and his family plan to sell one million of their shares in the bank starting next year, according to a new securities filing.\n",
"\n",
"Dimon — who will use his stock trading plans to offload his shares — and his family currently own roughly 8.6 million shares in the company. The move marks Dimon’s first stock sale during his 17 years at the company’s helm.\n",
"\n",
"JPMorgan Chase shares closed at $140.76 on Thursday, putting the transaction’s worth at roughly $141 million.\n",
"\n",
"“Mr. Dimon continues to believe the company’s prospects are very strong and his stake in the company will remain very significant,” JPMorgan Chase said in the filing.\n",
"\n",
"CNN has reached out to the company for comment.\n",
"\n",
"Shares of JPMorgan Chase have climbed roughly 5% this year during what’s been a tough environment for banks. The Federal Reserve’s aggressive pace of interest rate hikes, which began in 2022, has crimped demand for loans and forced banks to pay up for clients’ high-yielding holdings. At the same time, banks have watched the value of their own bond investments erode in value.\n",
"\n",
"JPMorgan Chase, the largest US bank by assets, has managed to turn out earnings beats this year despite the stormy conditions. The bank in May acquired most assets of collapsed regional lender First Republic, a move that helped JPMorgan Chase’s profit jump 35% last quarter.\n",
"\n",
"But Dimon has warned that the Federal Reserve’s fight against inflation isn’t over and could weaken the remarkably resilient economy. He has also sounded alarm bells that wars in Ukraine, Israel and Gaza could have damaging consequences for global financial markets and geopolitical relationships.\n",
"\n",
"“Now may be the most dangerous time the world has seen in decades,” Dimon said earlier this month, when announcing the bank’s third-quarter financial results.\n",
"\n",
"\n"
]
}
],
"source": [
"def query_articles(query):\n",
"\n",
" results = search_client.search( \n",
" # uncomment these 3 for semantic reranking (might be useful)\n",
" # query_type=\"semantic\", \n",
" # query_language=\"en-us\",\n",
" # semantic_configuration_name=\"my-semantic-config\",\n",
" search_text=query,\n",
" vectors=[Vector(value=generate_embeddings(query), k=3, fields=\"content_vector\")],\n",
" select=[\"title\", \"content\"],\n",
" top=3\n",
" ) \n",
" \n",
" sources = \"\\n\".join([f\"Article {i}: {result['title']}\\n{result['content']}\\n\\n\" for i, result in enumerate(results)])\n",
" return sources\n",
"\n",
"print(query_articles(\"Jamie Diamond\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get things running end to end"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"def run_conversation(messages, functions, available_functions, deployment_id):\n",
" \n",
" #send the conversation and available functions to model\n",
" response = openai.ChatCompletion.create(\n",
" deployment_id=deployment_id,\n",
" messages=messages,\n",
" functions=functions,\n",
" function_call=\"auto\", \n",
" temperature=0.2\n",
" )\n",
" response_message = response[\"choices\"][0][\"message\"]\n",
"\n",
" # check if the model wants to call a function\n",
" if response_message.get(\"function_call\"):\n",
" print(\"Recommended Function call:\", response_message.get(\"function_call\"))\n",
" \n",
" # call the function\n",
" # Note: the JSON response may not always be valid; be sure to handle errors\n",
" function_name = response_message[\"function_call\"][\"name\"]\n",
" \n",
" # verify function exists\n",
" if function_name not in available_functions:\n",
" return \"Function \" + function_name + \" does not exist\"\n",
" function_to_call = available_functions[function_name] \n",
" \n",
" function_args = json.loads(response_message[\"function_call\"][\"arguments\"])\n",
" function_response = function_to_call(**function_args)\n",
" print(\"Output of function call:\", function_response)\n",
" \n",
" # send the info on the function call and function response to the model\n",
" \n",
" # adding function response to messages\n",
" messages.append(\n",
" {\n",
" \"role\": response_message[\"role\"],\n",
" \"function_call\": {\n",
" \"name\": response_message[\"function_call\"][\"name\"],\n",
" \"arguments\": response_message[\"function_call\"][\"arguments\"],\n",
" },\n",
" \"content\": None\n",
" }\n",
" )\n",
"\n",
" # adding function response to messages to show model context data\n",
" messages.append(\n",
" {\n",
" \"role\": \"function\",\n",
" \"name\": function_name,\n",
" \"content\": function_response,\n",
" }\n",
" )\n",
"\n",
" # get a new response from model where we use the function's response to give context to the original question\n",
" second_response = openai.ChatCompletion.create(\n",
" messages=messages,\n",
" deployment_id=deployment_id\n",
" )\n",
"\n",
" return second_response\n",
" else:\n",
" return response"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recommended Function call: {\n",
" \"name\": \"query_articles\",\n",
" \"arguments\": \"{\\n \\\"query\\\": \\\"Jamie Diamond stock sale 2024\\\"\\n}\"\n",
"}\n",
"Output of function call: Article 1: JPMorgan Chase’s Jamie Dimon and his family to sell $141 million of stock in 2024\n",
"JPMorgan Chase chief executive Jamie Dimon and his family plan to sell one million of their shares in the bank starting next year, according to a new securities filing.\n",
"\n",
"Dimon — who will use his stock trading plans to offload his shares — and his family currently own roughly 8.6 million shares in the company. The move marks Dimon’s first stock sale during his 17 years at the company’s helm.\n",
"\n",
"JPMorgan Chase shares closed at $140.76 on Thursday, putting the transaction’s worth at roughly $141 million.\n",
"\n",
"“Mr. Dimon continues to believe the company’s prospects are very strong and his stake in the company will remain very significant,” JPMorgan Chase said in the filing.\n",
"\n",
"CNN has reached out to the company for comment.\n",
"\n",
"Shares of JPMorgan Chase have climbed roughly 5% this year during what’s been a tough environment for banks. The Federal Reserve’s aggressive pace of interest rate hikes, which began in 2022, has crimped demand for loans and forced banks to pay up for clients’ high-yielding holdings. At the same time, banks have watched the value of their own bond investments erode in value.\n",
"\n",
"JPMorgan Chase, the largest US bank by assets, has managed to turn out earnings beats this year despite the stormy conditions. The bank in May acquired most assets of collapsed regional lender First Republic, a move that helped JPMorgan Chase’s profit jump 35% last quarter.\n",
"\n",
"But Dimon has warned that the Federal Reserve’s fight against inflation isn’t over and could weaken the remarkably resilient economy. He has also sounded alarm bells that wars in Ukraine, Israel and Gaza could have damaging consequences for global financial markets and geopolitical relationships.\n",
"\n",
"“Now may be the most dangerous time the world has seen in decades,” Dimon said earlier this month, when announcing the bank’s third-quarter financial results.\n",
"\n",
"\n",
"Final response:\n",
"According to a news article, JPMorgan Chase's CEO Jamie Dimon and his family plan to sell one million of their shares in the bank starting in 2024. They currently own roughly 8.6 million shares in the company, and the value of the transaction is estimated to be around $141 million. Dimon has stated that he continues to believe in the company's prospects and will retain a significant stake in the company.\n"
]
}
],
"source": [
"system_message = \"\"\"Assistant is a large language model designed to help users find information.\n",
"\n",
"You have access to an Azure Cognitive Search index with many news articles from 2021 to today. You use this source if the user asks a question that requires current knowledge. For questions regarding data from 2021 or earlier, you rely on your internal, trained knowledge.\n",
"\n",
"You are designed to be an interactive assistant, so you can ask users clarifying questions to help them find the right information. It's better to give more detailed queries to the search index rather than vague one.\n",
"\"\"\"\n",
"\n",
"messages = [{\"role\": \"system\", \"content\": system_message},\n",
" # {\"role\": \"user\", \"content\": \"when was the first moon landing?\"}]\n",
" {\"role\": \"user\", \"content\": \"How much stock will Jamie Diamond sell in 2024?\"}]\n",
"\n",
"available_functions = {'query_articles': query_articles}\n",
"\n",
"result = run_conversation(messages, functions, available_functions, deployment_name)\n",
"\n",
"print(\"Final response:\")\n",
"print(result['choices'][0]['message']['content'])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment