This is an old revision of the document!
This document is a work in progress but puts in place the basics for harvesting the University's electronic dissertations, masters theses and MFA theses in Scholarworks via an OAI-PMH crosswalk using an XML script.
Server Record: http://scholarworks.umass.edu/cgi/oai2.cgi
Set Name: publication:masters_theses_2 OR publication:dissertations_2 OR englmfa_theses OR larp_ms_projects Metadata Type: Dublin Core Crosswalk Path: C:\Temp\XML1\OAIDCtoMARCXMLmodified.xsl *instructions below* - Click on Advanced Settings - Add the Start and End date. Must follow the format of yyyy-mm-dd - for example, 2015-02-01 / 2015-05-31. This will harvest any new files uploaded to Scholarworks in that time period (i.e., February ETDs). You may have to tinker with the dates to capture all the files desired. - Click on OK. Harvesting will commence and filter through the C: drive .xsl file. The results will be displayed a MarcEditor window. - Compare the list of names against the 'packing list' spreadsheet provided by the Graduate School. There may be ETDs with earlier publication dates which already have in-house cataloged records in OCLC and Aleph. Delete any records which would generate duplicate bib records. - In the menu bar of the MarcEditor file, click on Tools --> Assigned Tasks --> then click on one of the following as appropriate: OAI_Dissertations OAI_Masters OAI_MFA OAI_LARP This will run the harvested records through the MarcEdit task list. Save the results to your hard drive as a .mrk file (ex: C:\Temp\OAI_Batch\MastersFeb2015.mrk) - Checking for bad characters This will scan the .mrk file for non-ASCII characters which would otherwise prevent a record from being validated and uploading into OCLC. NOTE: At this time (1/8/2016), the XML file crosswalk has not been modified to successfully run an Unicode-UT8 fix template. The following instructions are for correcting each record by hand in Connexion. - Open the MRK_BadCharRdr application on your desktop (available from Systems). This will open the directory in your C: drive to which you previously saved the above .mrk file. - Select and open the file. (The file folder type is Mnemonic MarcEditor File) - The script will then run through the file and save the results in an Exel file under the same filename in the same C: directory. - Each record with a bad character is listed by number and shows the MARC field involved as well as the codes for each bad character. Set this list aside.
- Import harvested records into your C: drive
- Import file into Connexion
- Import file into OCLC
(Coming soon!)
NOTES:
Modifies the 006 and 007 fields
Inserts 040, 042 fields Changes the 245 00 indicator fields to 10. Later versions of the script will allow changes in the second indicator according to any articles present. This is currently taken care of by the MarcEdit Task List. Changes the 700 'creator' field to a 100 'author' field with the appropriate |e subfield. Inserts a 264 field (Amherst, Massachusetts :|b University of Massachusetts Amherst, |c <appropriate date as harvested>. Inserts a 300 field (1 electronic document.) Inserts the RDA fields 336, 337, 338 and 347. Later versions will insert the appropriate degree name harvested from each record (Ph.D. |c University of Massachusetts Amherst |d <date>). Currently this is handled by the MarcEdit Task List. Inserts a 538 field (Available online in PDF format via Scholarworks at UMass Amherst.) Inserts 653 fields for keywords and such. Inserts a 655_7 field (Academic theses. |2 lcgft) Later versions will include a 690 field with the degree program harvested from each record (Theses |x Chemistry |x Masters) Inserts a 710 field (University of Massachusetts Amherst, |e degree granting institution) Inserts a 710 field (University of Massachusetts Amherst. Libraries, |e issuing body) Inserts a 856 field (Scholarworks URL with |z Link to free resource)
- The MarcEdit Task List does the following:
Adds an 008 field and corrects any necessary LDR fields Adds an 049 AUMM field Corrects the 100 field to include a period and comma after an initial in the author's name Corrects the 245 field to show the appropriate indicators for a title beginning with an article Inserts a colon and |b where needed Adds a 502 field (Masters Degree |c University of Massachusetts Amherst |d <date> Coming: adding a 949 field for ALEPH holdings purposes NOTE: Each MarcEdit task list has its own 502 field for Masters Degree, Doctoral Degree and Terminal Project Degree as well as its own 690 field for Doctoral and Masters.