Created
April 15, 2022 09:31
-
-
Save michezio/742c72b7209bbbc5920cee603768f967 to your computer and use it in GitHub Desktop.
Convert a citation list into a MS Word compatible source file in XML format
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
This is a quick script I made to recreate the bibliography from a | |
copied text in MS Word. Just copy the bibliograpy list and format | |
this way: | |
1. <Text less than 255 chars> | |
2. <Text less than 255 chars> | |
... | |
Entries longer than 255 characters will be ignored (but reported con console output) | |
The tag for each entry will be the reference number itself, useful to then | |
manually replace the reference number text in the copied text with the actual reference | |
chosing the one with the same number as tag. | |
Since it was a quick script I simply chose to put the whole citation inside | |
the title. It could be improved by placing each part in the proper field, and | |
this would also increase the max length since each text field can contain up to 255 chars, | |
but since citation formats differ a lot this is not that straightforward. | |
Please make sure all characters are ASCII based, | |
in case of Unicode use this tool to remove/replace them: | |
https://pteo.paranoiaworks.mobi/diacriticsremover/ | |
(I'm not affilitated with the website, it's just a useful tool) | |
''' | |
import sys | |
import uuid | |
with open(sys.argv[1], 'r') as file: | |
sources = [] | |
for line in file.readlines(): | |
elements = line.strip().split() | |
number = elements[0] | |
if number[-1] == '.': | |
number = number[:-1] | |
number = int(number) | |
citation = " ".join((x for x in elements[1:] if x != "")) | |
citation = citation.replace("&", "&") | |
if len(citation) > 255: | |
print(f"IGNORING [{number}]") | |
continue | |
sources.append(dict(number=number, title=citation)) | |
with open(sys.argv[1] + ".xml", "w") as out: | |
out.write('<?xml version="1.0"?>\n') | |
out.write('<b:Sources SelectedStyle="" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography">\n') | |
for source in sources: | |
out.write("\t<b:Source>\n") | |
out.write(f"\t\t<b:Tag>{source['number']}</b:Tag>\n") | |
out.write("\t\t<b:SourceType>Misc</b:SourceType>\n") | |
out.write(f"\t\t<b:Guid>{{{str(uuid.uuid4()).upper()}}}</b:Guid>\n") | |
# out.write(f"\t\t<b:Author><b:Author><b:Corporate>{source['author']}</b:Corporate></b:Author></b:Author>\n") | |
out.write(f"\t\t<b:Title>{source['title']}</b:Title>\n") | |
# out.write(f"\t\t<b:Publisher>{source['editor']}</b:Publisher>\n") | |
out.write("\t</b:Source>\n") | |
out.write('</b:Sources>\n') | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment