Last active
March 8, 2024 17:40
-
-
Save mrocklin/39433928ba44ff7e981a2d7355688185 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "be8308a8-48ce-46b4-a6d2-9c7f445f8ae8", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['arxiv/pdf/arXiv_pdf_0001_001.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0001_002.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0002_001.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0002_002.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0003_001.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0003_002.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0004_001.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0004_002.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0005_001.tar',\n", | |
" 'arxiv/pdf/arXiv_pdf_0005_002.tar']" | |
] | |
}, | |
"execution_count": 1, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import s3fs\n", | |
"s3 = s3fs.S3FileSystem(requester_pays=True)\n", | |
"\n", | |
"directories = s3.ls(\"s3://arxiv/pdf\")\n", | |
"directories[:10]\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "75229f27-3e76-4954-a2ac-816482d3e29a", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import tarfile\n", | |
"import io\n", | |
"import fitz\n", | |
"\n", | |
"def extract(filename: str):\n", | |
" \"\"\" Extract and process one directory of arXiv data\n", | |
" \n", | |
" Yields\n", | |
" ------\n", | |
" dictionary with name and text pairs\n", | |
" \"\"\"\n", | |
" with s3.open(filename) as f:\n", | |
" bytes = f.read()\n", | |
" with io.BytesIO() as bio:\n", | |
" bio.write(bytes)\n", | |
" bio.seek(0)\n", | |
" with tarfile.TarFile(fileobj=bio) as tf:\n", | |
" for member in tf.getmembers():\n", | |
" if member.isfile() and member.name.endswith(\".pdf\"):\n", | |
" with fitz.Document(\n", | |
" stream=tf.extractfile(member).read()\n", | |
" ) as pdf:\n", | |
" # TODO: think about smaller chunks / overlapping, etc..\n", | |
" for page in pdf.pages(): \n", | |
" yield {\n", | |
" \"name\": member.name,\n", | |
" \"text\": page.get_text(),\n", | |
" }" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "2f9bc3d2-757d-468c-a7db-1c75a5ecfedd", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"out = extract(directories[-10])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "323562fb-8c1c-49ea-ab45-328d7cfe004c", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'name': '9908/quant-ph9908027.pdf',\n", | |
" 'text': 'arXiv:quant-ph/9908027v1 6 Aug 1999\\nGalilean Lee Model of the Delta Function Potential\\nC. R. Hagen∗\\nDepartment of Physics and Astronomy\\nUniversity of Rochester\\nRochester, New York 14627-0171\\nThe scattering cross section associated with a two dimensional delta function has recently been\\nthe object of considerable study. It is shown here that this problem can be put into a field theoretical\\nframework by the construction of an appropriate Galilean covariant theory. The Lee model with\\na standard Yukawa interaction is shown to provide such a realization. The usual results for delta\\nfunction scattering are then obtained in the case that a stable particle exists in the scattering channel\\nprovided that a certain limit is taken in the relevant parameter space. In the more general case in\\nwhich no such limit is taken finite corrections to the cross section are obtained which (unlike the\\npure delta function case) depend on the coupling constant of the model.\\nI. Introduction\\nThe problem of scattering by a delta function in two dimensions is of considerable interest for a number of reasons,\\nnot the least of which is the fact that it lacks a dimensional parameter. This leads directly to the appearance of\\ndivergences in the calculation of bound state energies and scattering amplitudes, a fact which seriously complicates\\nthe task of physical interpretation. Although a delta function potential occurs in the relevant wave equation for the\\ncase of spin one-half Aharonov-Bohm scattering [1], it appears there in conjunction with 1/r2 terms with coefficients\\nsuch that a cancellation of all divergences occurs. Since this requires a somewhat delicate limiting process (namely,\\nthe limit of vanishing flux tube radius must be taken at the end of the calculation), it is important to note that no\\nsuch limiting process suffices to yield a finite result for the pure delta function potential. Such a goal can only be\\nachieved by a) limiting consideration to the attractive delta function and b) requiring that there be a bound state\\nassociated with the scattering channel. The latter step is frequently justified by pointing out that the delta function is\\nso singularly attractive that a bound state is a natural expectation. The crucial point is that a type of renormalization\\nis carried out by which divergences are combined into a physical parameter (i.e., the bound state energy) in such a\\nway that the relevant scattering amplitude can be written as a finite function of the scattering energy and the bound\\nstate energy. Just as the physical mass is not amenable to calculation in covariant field theory, so also in this case\\nthe bound state energy for the delta function potential cannot be calculated from first principles.\\nThis renormalization program has been carried out for the two dimensional delta function and finite results obtained\\n[2,3]. Bender and Mead [4] have gone one step further by considering the attractive delta function in D-dimensional\\nspace with the D = 2 result obtained as a limit. They assert that refs.[2] and [3] obtain the wrong cross section\\nand infer from this that it is essential that the two dimensional result be obtained as a limit of the arbitrary D\\ncase. However, Cavalcanti [5] has recently pointed out that the results of refs.[2-4] are identical provided only that\\na calculational error in ref.[3] is corrected. The two dimensional delta function thus seems to be reasonably well\\nunderstood within the framework of conventional Schr¨odinger analysis.\\nSince Aharonov-Bohm scattering is well known to be the two particle sector of a Galilean invariant pure Chern-\\nSimons gauge theory, it is natural to ask what the corresponding Galilean field theory [6] of the delta function should\\nbe. This paper examines that question and shows that a Yukawa coupling in such a theory provides a realization\\nof the delta function potential in the limit in which a direct (or contact) interaction is obtained. In the following\\nsection the properties of Galilean field theories are briefly reviewed and the Galilean invariant trilinear interaction\\nterm constructed. The theory obtained from this process is essentially the Galilean version of the Lee model and\\nhas been discussed previously by L´evy-Leblond [7]. In section III the two particle scattering sector is considered and\\nthe corresponding Hilbert space constructed. This allows one to calculate the two particle scattering matrix and\\nthereby obtain a formal expression for the cross section. In IV the various limits of the latter are considered and the\\nrenormalization carried out. The Conclusion summarizes some of the principal results obtained.\\nII A Galilean Model\\nOne begins the construction of an appropriate Galilean covariant model by the determination of the relevant free\\nparticle Lagrangian. Using the fact that the invariant quantity in Galilean relativity is E− P2\\n2M where M is the particle\\nmass, it is straightforward to infer the free particle Lagrangian\\nL0 = ψ†[i ∂\\n∂t + ∇2\\n2M − U0]ψ\\n1\\n'}" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"next(out)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "00e40383-a572-4030-9dbf-142de071771e", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1386" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(list(out))" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python [conda env:rag]", | |
"language": "python", | |
"name": "conda-env-rag-py" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.11.8" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment