This guide will help you set up and run the Nanonets OCR model locally for PDF processing.
GitHub Gist: https://gist.github.com/kordless/652234bf0b32b02e39cef32c71e03400
# 0. Create and activate conda environment first
This guide will help you set up and run the Nanonets OCR model locally for PDF processing.
GitHub Gist: https://gist.github.com/kordless/652234bf0b32b02e39cef32c71e03400
# 0. Create and activate conda environment first
import sys | |
import os | |
import logging | |
import requests | |
import re | |
import json | |
from typing import Dict, Any, Optional, List, Tuple | |
from mcp.server.fastmcp import FastMCP, Context | |
from urllib.parse import urlparse, urljoin | |
import asyncio |
""" | |
Adaptive Connector Framework (ACF) | |
A self-bootstrapping alternative to MCP that dynamically builds and tests | |
connectors based on current needs. The system evolves its own capabilities | |
through iterative learning and testing. | |
Key components: | |
1. Registry - Manages available connectors and their capabilities | |
2. Connector Builder - Dynamically creates new connectors |
import os | |
import sys | |
from dotenv import load_dotenv, set_key | |
from langchain_openai import ChatOpenAI | |
from langchain.prompts import ChatPromptTemplate, PromptTemplate | |
from langchain_core.runnables import RunnableSequence | |
from langchain.tools import Tool | |
from langchain.agents import create_react_agent, AgentExecutor | |
from langchain.schema import HumanMessage | |
import getpass |
To run the Python script for splitting a PDF into segments of just under 25MB each, you'll need to follow these steps:
Python Installation: Ensure that Python is installed on your system. If not, you can download and install it from python.org.
PyPDF2 Library: The script uses the PyPDF2 library. You can install it using pip, Python's package installer. If pip is not already installed, it comes bundled with Python 3.4 and later versions.
Open Terminal or Command Prompt: On Windows, you can open Command Prompt by searching for cmd in the Start menu.
# tokens from https://cloud.featurebase.com/configuration/api-keys | |
featurebase_token = "<token>" | |
# featurebase ($300 free credit on signup) | |
# https://query.featurebase.com/v2/databases/bc355-t-t-t-362c1416/query/sql (but remove /query/sql) | |
featurebase_endpoint = "https://query.featurebase.com/v2/databases/<uuid-only-no-query-sql>" |
This example illustrates a way to utilize a function dynamically while querying an OpenAI GPT model. It uses the newly released functions
support in the completion endpoints OpenAI provides.
The general concept is based on using a decorator to extract information from a function so it can be presented to the language model for use, and then pass the result of that function back to the completion endpoint for language augmentation.
In general, a wide variety of functions can be swapped in for use by the model. By changing the get_top_stories
function, plus the prompt in run_conversation
, you should be able to get the model to run your function without changing any of the other code.
To use this, create a config.py
file and add a variable with your OpenAI token:
""" | |
Hacker News Top Stories | |
Author: | |
Date: June 12, 2023 | |
Description: | |
This script fetches the top 10 stories from Hacker News using Algolia's search API. It retrieves the stories posted within the last 24 hours and prints their titles and URLs. | |
Dependencies: | |
- requests: HTTP library for sending API requests |
import openai | |
import numpy as np | |
from openai.embeddings_utils import get_embedding | |
openai.api_key = "TOKEN" | |
def gpt3_embedding(content, engine='text-similarity-ada-001'): | |
content = content.encode(encoding='ASCII',errors='ignore').decode() | |
response = openai.Embedding.create(input=content,engine=engine) | |
vector = response['data'][0]['embedding'] # this is a normal list |
State of AI Report | |
October 11, 2022 | |
#stateofai | |
stateof.ai | |
Ian Hogarth | |
Nathan Benaich | |
About the authors | |
Nathan is the General Partner of Air Street Capital, a venture capital firm investing in AI-first technology and life science companies. He founded RAAIS and London.AI (AI community for industry and research), the RAAIS Foundation (funding open-source AI projects), and Spinout.fyi (improving university spinout creation). He studied biology at Williams College and earned a PhD from Cambridge in cancer research. | |
Nathan Benaich |