Skip to content

Instantly share code, notes, and snippets.

@michezio
Created April 15, 2022 09:31
Show Gist options
  • Save michezio/742c72b7209bbbc5920cee603768f967 to your computer and use it in GitHub Desktop.
Save michezio/742c72b7209bbbc5920cee603768f967 to your computer and use it in GitHub Desktop.
Convert a citation list into a MS Word compatible source file in XML format
'''
This is a quick script I made to recreate the bibliography from a
copied text in MS Word. Just copy the bibliograpy list and format
this way:
1. <Text less than 255 chars>
2. <Text less than 255 chars>
...
Entries longer than 255 characters will be ignored (but reported con console output)
The tag for each entry will be the reference number itself, useful to then
manually replace the reference number text in the copied text with the actual reference
chosing the one with the same number as tag.
Since it was a quick script I simply chose to put the whole citation inside
the title. It could be improved by placing each part in the proper field, and
this would also increase the max length since each text field can contain up to 255 chars,
but since citation formats differ a lot this is not that straightforward.
Please make sure all characters are ASCII based,
in case of Unicode use this tool to remove/replace them:
https://pteo.paranoiaworks.mobi/diacriticsremover/
(I'm not affilitated with the website, it's just a useful tool)
'''
import sys
import uuid
with open(sys.argv[1], 'r') as file:
sources = []
for line in file.readlines():
elements = line.strip().split()
number = elements[0]
if number[-1] == '.':
number = number[:-1]
number = int(number)
citation = " ".join((x for x in elements[1:] if x != ""))
citation = citation.replace("&", "&amp;")
if len(citation) > 255:
print(f"IGNORING [{number}]")
continue
sources.append(dict(number=number, title=citation))
with open(sys.argv[1] + ".xml", "w") as out:
out.write('<?xml version="1.0"?>\n')
out.write('<b:Sources SelectedStyle="" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography">\n')
for source in sources:
out.write("\t<b:Source>\n")
out.write(f"\t\t<b:Tag>{source['number']}</b:Tag>\n")
out.write("\t\t<b:SourceType>Misc</b:SourceType>\n")
out.write(f"\t\t<b:Guid>{{{str(uuid.uuid4()).upper()}}}</b:Guid>\n")
# out.write(f"\t\t<b:Author><b:Author><b:Corporate>{source['author']}</b:Corporate></b:Author></b:Author>\n")
out.write(f"\t\t<b:Title>{source['title']}</b:Title>\n")
# out.write(f"\t\t<b:Publisher>{source['editor']}</b:Publisher>\n")
out.write("\t</b:Source>\n")
out.write('</b:Sources>\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment