Skip to content

Instantly share code, notes, and snippets.

View nan-wang's full-sized avatar

Nan Wang nan-wang

View GitHub Profile
@nan-wang
nan-wang / inspect_into_modernbert_tokenizer.ipynb
Created January 8, 2025 14:22
inspect_into_modernbert_tokenizer.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nan-wang
nan-wang / inspect_into_modernbert_tokenizer.ipynb
Created January 7, 2025 04:32
inspect_into_modernbert_tokenizer.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nan-wang
nan-wang / pixelshuffling.ipynb
Created December 25, 2024 01:02
pixelshuffling.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nan-wang
nan-wang / jina_classifier_router_chain.py
Created October 22, 2024 09:01
A demo of using the classify API from Jina AI for semantic routing
import time
from typing import Any, Optional, Dict, List
import requests
from langchain.chains.router.llm_router import RouterChain
from langchain_core.callbacks import CallbackManagerForChainRun
from langchain_core.utils import convert_to_secret_str, get_from_dict_or_env
from pydantic import SecretStr, model_validator
JINA_API_URL: str = "http://api.jina.ai/v1/classify"
@nan-wang
nan-wang / jina_hub_transformertorchencoder
Created December 2, 2021 02:58
Try out TransformerTorchEncoder from Jina Hub
from jina import Executor, Document, DocumentArray
encoder = Executor.from_hub('jinahub://TransformerTorchEncoder')
da = DocumentArray([
Document(text='Jina is a neural search framework.'),
Document(text='Jina relies heavily on multiprocessing.'),
Document(text='Jina is backed by Jina AI.')])
encoder.encode(docs=da)
for doc in da:
print(f'{doc.embedding}')
@nan-wang
nan-wang / toy.py
Created August 25, 2021 04:09
check encoder outputs
from transformer_tf_text_encode import TransformerTFTextEncoder
from jina import Document, DocumentArray
encoder = TransformerTFTextEncoder(
pretrained_model_name_or_path='hfl/chinese-legal-electra-small-generator',
pooling_strategy='cls'
)
case_1 = {
'query': [