This is an old revision of the document!
Batch Uploading OAIs from Scholarworks into OCLC and Aleph
CHANGE TITLE? ETDs (Current) Processing ScholarWorks OAIs
NOTE:  My helpful “hints” will appear in Italics.
Introduction
The Graduate School will email “Packing Lists” dated February, May and September (end of semesters) of new dissertations, theses, MFA theses and occasionally LARP theses.  There may be a lag between these dates and when the ETDs are available on ScholarWorks.  I try to process them after a couple of months have passed, to assure that they will be picked up in the Crosswalk harvest.
Preparation
-  Have a handy copy (either online or a printout) of the Packing List-in-process. NOTE: It's a good idea to save copies of these in appropriate folders.  Example: PackingListReport_Feb2019diss.xlsx in [Drive]:\OAI\Dissertations\2019\ (i.e. 2019), or OAI\Theses, ThesesMFA or ThesesLARP. 
-  Open MarcEdit.  (NOTE: Make sure your MarcEdit XSLT engine is set to SAXON.NET. On MarcEdit home page, click tools(found on top), Preferences, MARCEngine, select SAXON/NET under XSLT Engine.) 
Harvesting from ScholarWorks
-  Click on Harvest OAI Records: (Found on either the MarcEdit home page or under (top) tools/OAI Harvester Tools/) Set the following: - 
- 
-  Set name (for dissertations):  publication:dissertations_harvesting  (IMPORTANT NOTE:  Because of software changes made in 2018, Erin Jerome needs to be informed before running a Crosswalk on dissertations only! Before they can be pulled, they need to be transferred from “publication:dissertations_2” to a special harvesting subset.) 
-  Set name (for theses): publication:masters_theses_2 
-  Set name (for MFAs): publication:englmfa_theses  (NOTE: This series is only for English MFAs; MFAs for art etc. are included in masters_theses_2.) 
-  Set name (for LARPs): publication: 
-  Metadata type: dcq  (NOTE: This is not included in the MarcEdit drop-down, but needs to be typed in. It's a “modified” version of Dublin Core.) 
-  Crosswalk path:  C:\Crosswalk\XML1\OAIDCtoMARCXMLmodified.xsl (NOTE: This program needs to be loaded onto your personal C: drive.) 
-  Start date (for May, in this format): 2019-06-01 
-  End date (for May, in this format): 2019-08-31 (NOTE: Using August avoids Sept. lists. Occasionally these dates have to be tweaked to include everything on the appropriate Packing List.) 
-  Hit “OK” and let it run.  A green bar will appear if it is working.  (NOTE: This function is a little cranky. Recently it didn't work for me because I entered 2019-11-31 instead of 2019-11-30.  Everything has to be entered precisely! If no amount of tweaking resolves the issue, contact bepress (Digital Commons), which occasionally blocks ScholarWorks harvesting for security purposes, Erin Jerome or Aaron Rubinstein.) 
-  Once the harvesting is finished, a MarcEdit list will open up, containing the harvested records in raw form.  I like to save this immediately into the  appropriate OAI folder, as (example) umdissertations_sept.mrk 
 
-  Check harvested records against Grad School's packing list - 
-  Hint: In MarcEdit, click Edit/Find/enter =100 in “Find what” window/click Find All. This will produce a list that can be saved to the clipboard, and copied into Excel or another program.  (NOTE: When working in MarcEdit, click File/Save after every change!!  Do NOT Save if no changes are made.) 
-  IMPORTANT NEW STEP, added 2020:- Go to ScholarWorks/Dissertations and Theses and log onto “My account”, scroll down to the appropriate series (i.e., DOCTORAL DISSERTATIONS (dissertations_2)/Manage Dissertations/Batch revise Excel/Generate a spreadsheet of current data. See  Changing one year campus titles to open access in ScholarWorks-  for instructions on generating ScholarWorks spreadsheets. If extra names appear in the MarcEdit file, check the generated spreadsheet to make sure they are NOT dated in the range requested.  (This step has been added since occasionally a dissertation or thesis will have been left off the Packing List.)
 
-  Any harvested record NOT on the Packing List that is also not on the generated spreadsheet, or has a different date (Check degree_year and award_month), or which belongs to a different series (such as English MFAs)can be removed from the MarcEdit file. 
 
Edit the MarcEdit .mrk file of harvested records
-  Run MarcEdit task - 
-  Change date in 008 with the new year, under Tools → Manage Tasks → Selected desired task in Task Lists window → Manage Existing Tasks → Edit Selected Task List → Save. 
-  Click on Tools → Assigned Tasks → Currently Available Tasks → OAI_Dissertations (or OAI_Masters, OAI_MFA, OAI_LARP, as appropriate). 
 
-  Miscellaneous Fixes (These fixes are easy to do in MarcEdit, through Find/Replace. Some are more important than others; some might be corrected through editing the task lists; after reviewing Regex rules, I can tackle this!) - 
-  IMPORTANT: Field 690 \\ needs to be changed to 657 \7 with $2local/mu appended at the end (edit task list?) 
-  IMPORTANT: Check to be sure Field 049 \\$aAUMM is present in the records. If not, add it (edit task list?) 
-  Replace $zLink to free resource with $zLink to resource. (NOTE: This will be only for uploading to OCLC, as some will have 1 or 5-year access restrictions. When the final records are downloaded into Aleph, we need to reinsert the “free” as only certain phrases are acceptable.)  
-  Check for double periods (i.e., Doctoral..), missing dates in the 264 and 502. Replace Scholarworks with ScholarWorks (edit task list?) Since these records lack 504s, change obm-space in the 008 to om (edit task list?). Add a period to “advisor” (edit task list?). 
-  Check to be sure the Summary (field 520), advisors (field 700) and keywords (field 653) are present in all records.  If not, download the dissertation or thesis in ScholarWorks and check the abstract and advisor lists entered by the author. Enter the missing information into the metadata screen under Revise dissertation (or thesis) and MarcEdit record. (See below, How to fix errors in ScholarWorks.) 
 
-  Author and title adjustments - 
-  With the MarcEdit file open, click Edit → Edit Shortcuts → Change Case → Title Case (for 100$a and 700$a) 
-  Click Edit → Edit Shortcuts → Change Case → Initial Case (for 245$a), then → lower case (for 245$b) 
 
-  Check names for problems, and correct as needed. - 
-  Find → 100/700 → Find All. Check SW for how questionable names should appear.  Examples of problems: “Dr.” (and other titles) should be removed, internal capitalization needs fixing (as in DeStefano, McCormick, LaPlante, O'Neill, etc.), period missing from initial, order of name. (NOTE: Sometimes authors enter shortened versions of their names in the SW metadata description, e.g. without middle initials included in the title page of their work.  If correct otherwise, I let it go. If the metadata information is misspelled or otherwise incorrect, download and check title page to be sure, and fix in ScholarWorks with Revise dissertation/thesis.) 
-  Advisor fixes:  NOTE: There will be many more advisor entries than author entries. Doing the following is helpful in revealing inconsistencies and other questionable problems, especially for longer Dissertation lists.  Save a MarcEdit “700 Find All” search to the clipboard, and download to Excel.   Sort the 700 names A-Z. (Hint: To make this data more manageable, insert a blank column in front of the “Jump to Record #:” column, and work a Data/Text to Columns on the name column, splitting off and deleting the equal sign.) I have compiled an Excel sheet of alphabetized (controlled) advisors' names found in the Connexion authority file, Drive:\OAI\Authorities.xlsx which can be useful for updating advisor entries. 
 
-  Check titles for proper names (e.g. for people, countries, cities, scientific names, etc.) and for acronyms, and capitalize as required. Skimming the summaries (in the 520 field) is helpful; otherwise verify in SW.
Upload to Connexion
After importing the bib records file from MarcEdit 
(For example purposes, we will use the Connexion file for February 2016 Dissertations which can be opened via CatalogingSearchLocalSaveFile → T:\\oclcapps\Connexion\Theses\2016_Feb_Dissertations.bib.db))
  * Highlight all records in the file and Validate (Edit → Validate or Shift+F5). This will generate a report of results. Note which records did not validate and make the necessary corrections. Re-validate as needed. 
  * Highlight all records in the file and Update Holdings (Action → Holdings → Update Holdings or F8). OCLC record numbers will begin appearing in the file as each record is uploaded. 
To export from Connexion to Aleph:
  * Go to Tools → Options and click on the Export tab. Highlight the Prompt for filename option then check off the box for Display report for immediate export results. Click on Apply then Close.
  * Open the Local Save file you want to export (2016_Feb_Dissertations - See path above)
  * Highlight records
  * Export (Action - Export or F5)
    This will ask where to put the output file in your C: drive and what name to use. Make sure the filename is in all lower case - for example, feb2016diss. The file will be downloaded into your C: drive as a .dat file. (Example: C:\Crosswalk\Dissertation&Theses\Connexion_Records\feb2016diss.dat)
  * Open MARCTools in MarcEdit.
  * Input the .dat file from your C: drive (feb2016diss.dat)and name the Output file with a .mb extension (feb2016diss.mb. Execute the MarcBreaker.
  * Click on Edit Records. Use Replace to change AUMM to AUMETD.
  * Under MARCEditor –> File, click on Compile File into Marc. This will save as a .mrc (MARC) file. 
    
  * Open Aleph, Cataloging function
  * Click on Task Manager then [F] Upload/Download files
  * Find where your saved .mrc file is on your C: drive (feb2016diss.mrc) and copy to the FCL01/Scratch file (from drop-down menu over left Remote Files column)by clicking on the left arrow button between columns
  * In the Aleph menu bar above, click on *_Services → Load Catalog Records
  * Click on Advanced Generic Vendor Records Loader (file_90)
    Make sure the following rules are set:
     * Input File name (for this example, feb2016diss.mrc)
     * Default Holding - AUMETD
     * Character Conversion - OCLC_UTF_TO_UTF
     * Fix Routine - UMFIX
     * Match Routine - OCLC
     * Merge Routine - OCLC
     * Update Database - Yes
     * Produce Loading Report - Yes
     * Report file name(for this example, feb2016_report)
     * Click on the Submit button at top right
Once the exporting is done, click on Task Manager → [A] Batch Log to view the report.
     * Highlight your file (p_file_90) and click on View Printouts. 
     * Under Remote Name, highlight <filename>_report.new (i.e. feb2016diss_report_new)
     * Click on Print to obtain reports.  You want the loader-log-report which will show the FCL01 Bib Sys numbers for each record.  Copy one and check the bib record which displays for any potential corrections needed. 
To Globally Remove the 856 Field from Bib Records:** 
   Set the rules:
-  Input file name <filename>.mrc.bib (i.e., feb2016diss.mrc.bib) 
-  Output file name <filename>.mrc856 (i.e., feb2016.diss.mrc856) 
-  Update Database - Yes 
-  Line in Record → Tag → 856; first indicator - #  second indicator - # 
-  Delete field - Yes 
-  Click on Submit button 
The process should now be complete. 
– Contact persons:  Kay Dion or  Lucy deGozzaldi
*