GM-RKB XML Snapshot File
Jump to navigation
Jump to search
A GM-RKB XML Snapshot File is a specific type of MediaWiki XML Data Snapshot File that represents a snapshot of data from the GM-RKB (Gabor Melli's Research Knowledge Base).
- Context:
- It can be used GM-RKB Maintenance, and GM-RKB Analysis.
- It can be processed by a gmrkb_xml_snapshot_processor.py.
- …
- Example(s):
rkb-mediawiki-20230604-1206.xml
- …
- Counter-Example(s):
- Wikipedia XML Data Snapshot, such as enwiki-latest-pages-articles.xml.
- Non-XML Export File.
- XML files not adhering to the MediaWiki Wiki Export File Format.
- See: GM-RKB.
References
2023
- chat
import json from xml.etree import ElementTree # Introduction: This program extracts the titles and contents of pages from a given XML file. # It then formats the data into a JSON file that is ready to be uploaded to a specified destination. def extract_pages(xml_file): # Parse the XML file tree = ElementTree.parse(xml_file) root = tree.getroot() # Initialize a list to hold the extracted pages pages = [] # Iterate through each page element in the XML file for page in root.iter('{http://www.mediawiki.org/xml/export-0.10/}page'): # Extract the title and content of the page title = page.find('{http://www.mediawiki.org/xml/export-0.10/}title').text content = page.find('.//{http://www.mediawiki.org/xml/export-0.10/}text').text # Append the title and content as a dictionary to the pages list pages.append({ 'title': title, 'content': content }) return pages # Specify the XML file to extract from xml_file = 'rkb-mediawiki-20230604-1206.xml' # Extract the pages from the XML file pages = extract_pages(xml_file) # Create the JSON object to be uploaded data_to_upload = {"value": pages} # Write to the JSON file with open('data_to_upload.json', 'w') as json_file: json.dump(data_to_upload, json_file, ensure_ascii=False, indent=4) # Print a message to indicate success print("File successfully created.")