Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
oai_harvesting_via_marcedit [2016/01/08 14:43]
kdion
oai_harvesting_via_marcedit [2016/03/09 15:47]
kdion Major updates
Line 1: Line 1:
 ====== OAI Harvesting of Scholarworks Records Via MarcEdit ====== ====== OAI Harvesting of Scholarworks Records Via MarcEdit ======
  
-This document is a work in progress but puts in place the basics for harvesting the University'​s ETD dissertations,​ masters theses,MFA theses and LARP terminal projects in Scholarworks via an OAI-PMH crosswalk using an XML script. ​+This document is a work in progress but puts in place the basics for harvesting the University'​s ETD dissertations,​ masters theses,MFA theses and LARP terminal projects in Scholarworks via an OAI-PMH crosswalk using an XML and XSLT script. ​
  
 ==To Harvest:== ==To Harvest:==
Line 8: Line 8:
  
 Click on '​Harvest OAI Records'​. ​ In the popup Metadata Harvester window, input: Click on '​Harvest OAI Records'​. ​ In the popup Metadata Harvester window, input:
-  * Server Record: <​nowiki>​http://​scholarworks.umass.edu/​cgi/oai2.cgi</​nowiki>​+  * Server Record: <​nowiki>​http://​scholarworks.umass.edu/​do/oai</​nowiki>​
   * Set Name: publication:​masters_theses_2 //OR// publication:​dissertations_2 //OR//   * Set Name: publication:​masters_theses_2 //OR// publication:​dissertations_2 //OR//
 englmfa_theses //OR// larp_ms_projects englmfa_theses //OR// larp_ms_projects
-  * Metadata Type: Dublin Core+  * Metadata Type: qdc
   * Crosswalk Path: C:​\Temp\XML1\OAIDCtoMARCXMLmodified.xsl  ​   * Crosswalk Path: C:​\Temp\XML1\OAIDCtoMARCXMLmodified.xsl  ​
 +        ​
 + (This is for the Qualifed Dublin Core records. Simple Dublin Core will not allow us to extract degree names nor departments.) ​       ​
         ​         ​
 Click on Advanced Settings. Click on Advanced Settings.
Line 34: Line 36:
 ==Checking for Bad Characters== ==Checking for Bad Characters==
  
-This will scan the .mrk file for non-ASCII characters which would otherwise prevent a record from being validated ​and uploading into OCLC. +The XSLT crosswalk script ​will automatically convert up any non-comforming punctuation (single left and right quotation marks, left and right double quotation marks, En dash, Em dash) but at this time (3/9/​2016) ​it cannot covert bad diacritics The following instructions are for correcting each record by hand in Connexion. ​
- +
-NOTE: At this time (1/8/2016), the XML file crosswalk has not been modified to successfully run an Unicode-UT8 fix template. The following instructions are for correcting each record by hand in Connexion. ​+
       ​       ​
 Open the MRK_BadCharRdr application on your desktop (available from Systems). This will open the directory in your C: drive to which you previously saved the above .mrk file.  Open the MRK_BadCharRdr application on your desktop (available from Systems). This will open the directory in your C: drive to which you previously saved the above .mrk file. 
Line 76: Line 76:
 The original MarcEdit OAIDCtoMarcXML file can be found on your hard drive under C:\Program Files\MarcEdit 6\xslt\OAIDCtoMARCXML.xsl or wherever your MarcEdit application version is.   This is the XML generic version .. don't change this; use the modified version, a copy of which can be found in the R drive under Theses\OAI MarcEdit XML harvest code (OAIDCtoMARCXMLmodified.xsl). Note that you must also have the Marc21slimUtils in the same folder in order for the .xsl file to run properly. The original MarcEdit OAIDCtoMarcXML file can be found on your hard drive under C:\Program Files\MarcEdit 6\xslt\OAIDCtoMARCXML.xsl or wherever your MarcEdit application version is.   This is the XML generic version .. don't change this; use the modified version, a copy of which can be found in the R drive under Theses\OAI MarcEdit XML harvest code (OAIDCtoMARCXMLmodified.xsl). Note that you must also have the Marc21slimUtils in the same folder in order for the .xsl file to run properly.
  
-The XML script is based on that generously shared by Ken Robinson (kjr106@psu.edu),​ Cataloging and Metadata Services, the Pennsylvania State University. ​ This file can be found online at [[https://​scholarsphere.psu.edu/​collections/​x346dj68d]] along with a detailed description of their eTD Dublin Core-to-MARCXML Crosswalk. ​   The script includes a template to check for bad non-ASCII characters but as of this writing, it will not run in our XML script. This is being worked on. +The XML script is based on that generously shared by Ken Robinson (kjr106@psu.edu),​ Cataloging and Metadata Services, the Pennsylvania State University. ​ This file can be found online at [[https://​scholarsphere.psu.edu/​collections/​x346dj68d]] along with a detailed description of their eTD Dublin Core-to-MARCXML Crosswalk. ​  ​
  
-Our __XML script version__ does the following: 
  
 +Our personalized __XML script version__ does the following:
   * Modifies the 006 and 007 fields   * Modifies the 006 and 007 fields
   * Inserts 040, 042 fields   * Inserts 040, 042 fields
-  * Changes the 245 00 indicator fields to 10. Later versions of the script will allow changes in the second indicator according ​to any articles present. This is currently taken care of by the MarcEdit Task List.+  * Changes the 245 00 indicator fields to 10.  
 +  * Corrects ​the 245 field to show the appropriate indicators for a title beginning with an article
   * Changes the 700 '​creator'​ field to a 100 '​author'​ field with the appropriate |e subfield.   * Changes the 700 '​creator'​ field to a 100 '​author'​ field with the appropriate |e subfield.
   * Inserts a 264 field (Amherst, Massachusetts :|b University of Massachusetts Amherst, |c <​appropriate date as harvested>​.   * Inserts a 264 field (Amherst, Massachusetts :|b University of Massachusetts Amherst, |c <​appropriate date as harvested>​.
 +  * Inserts a 300 field (1 online resource)
   * Inserts the RDA fields 336, 337, 338 and 347.            * Inserts the RDA fields 336, 337, 338 and 347.         
-  * 502 field: Later versions will insert the appropriate degree name harvested from each record ​(Ph.D. |c University of Massachusetts Amherst |d <​date>​). Currently this is handled with a generic form by the MarcEdit Task List.+  * Inserts a 502 field (<degree abbrev.|c University of Massachusetts Amherst |d <​date>​). ​
   * Inserts a 538 field (Available online in PDF format via Scholarworks at UMass Amherst.)   * Inserts a 538 field (Available online in PDF format via Scholarworks at UMass Amherst.)
   * Inserts 653 fields for keywords and such.   * Inserts 653 fields for keywords and such.
   * Inserts a 655_7 field (Academic theses. |2 lcgft)   * Inserts a 655_7 field (Academic theses. |2 lcgft)
-  * 690 field: Later versions will include this field with the degree program harvested from each record ​(Theses |x Chemistry |x Masters)+  * Inserts a 690 field (Theses |x Chemistry |x Masters) ​ ​*NOTE:​* The crosswalk script automatically adds x Masters but this will be changed to Doctoral as needed via MarcEdit Tools.) 
 +  * Inserts 700 fields for advisors ​
   * Inserts a 710 field (University of Massachusetts Amherst, |e degree granting institution)   * Inserts a 710 field (University of Massachusetts Amherst, |e degree granting institution)
   * Inserts a 710 field (University of Massachusetts Amherst. Libraries, |e issuing body)   * Inserts a 710 field (University of Massachusetts Amherst. Libraries, |e issuing body)
Line 101: Line 104:
     * Adds an 049 AUMM field     * Adds an 049 AUMM field
     * Corrects the 100 field to include a period and comma after an initial in the author'​s name     * Corrects the 100 field to include a period and comma after an initial in the author'​s name
-    * Corrects the 245 field to show the appropriate indicators for a title beginning with an article 
     * Inserts a colon and |b where needed     * Inserts a colon and |b where needed
-    * Adds a 502 field (Masters Degree |c University ​of Massachusetts Amherst ​|d <​date>​+    * Removes titles ​(Dr., Prof.) and '​Ph.D'​ from advisor names 
 +    * Reverses the form of advisor names to Lastname, Firstname and replaces |e contributor with |e advisor.  
     * Coming: adding a 949 field for ALEPH holdings purposes     * Coming: adding a 949 field for ALEPH holdings purposes
 +
       ​       ​
-NOTE: Each MarcEdit task list has its own 502 field for Masters Degree, Doctoral Degree and Terminal Project Degree as well as its own 690 field for Doctoral and Masters. ​ 
                    
    
oai_harvesting_via_marcedit.txt · Last modified: 2022/05/16 19:35 by jeustis
[unknown link type]Back to top
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0