California As I Saw It: First-Person Narratives of California's Early Years, 1849-1900 American Memory Web site 08-30-01 Report by Laura Graham All sgml files have been reconverted to the updated American Memory DTD. They have been proofed, parsed, and renamed *_hold.sgml and are in gc\calbk\newdtd_done on rs6. This includes the *.sgml.hold files as well, which are online on conservation as well as at the calbk Web site. They are: 024_hold.sgm Ent with KBs and .pcx .gif images 106_hold.sgm No images and no ent 129_hold.sgm ent with KBs and pcx and gif images 134_hold.sgm ent with KBs and pcx and gif images 135_hold.sgm ent with KBs and pcx and gif images 162_hold.sgm ent with KBs and pcx and gif images (Don't quite get this. With a *.sgm.hold extension I didn't see how they could go online, but on The California As I Saw It Web site, when I looked at the locator bar, it looked to me exactly as if they were coming from calbk: http://memory.loc.gov/cgi-bin/query/r?ammem/calbkbib:@field(TITLE+@band(Ramblings+through+the+High+Sierra.+)) No doubt I just don't understand.) After conversion to the current DTD, the following text editing was done to each file: Omnimark did not convert all text entities. The following were converted to SGML text entities manually, using Textpad Software: apostrophes; single left and right quotes; double left and right quotes; dollar signs; number signs; plus signs. Layers of group and global proofing of files after initial editing, so should be only a very few if any missed. In addition, all diacritic text entities were edited to move the character affected to within the text entity. That is, in the "old" SGML, one found "a´" The corrected entity is "á" This was done for all diacritics by globally searching against likely entities minus the character in lists from ISO/8859/1; ISO Latin 1; and ISO Greek 1 entities. SGML file 023 has instances of å (the little "o" above an "a"; Scandinavian languages) which was in an updated form with the character affected within the entity. But evenso, online, the letter did not display at all. ISSUES TO BE RESOLVED: DISPLAY IMAGES Do we want to continue with the "Image for Printing" thing online. These images for printing are PCX files. One clicks on the hyperlinked text "Image for Printing," and is given the option to download (not to view). Derivative GIFs are displayed online but you cannot right-click them to save, etc. They're "dead." NOTE: "Image for Printing" must be part of the programming or HTML; nothing in the SGML. But more importantly, 1) sometimes the display GIF is there but the "Image for Printing" isn't hyperlinked, either because the entity and ent. file have typos. 2)sometimes there is a hyperlinked "Image for Printing" but not display GIF. 3)sometimes there are instances of directories with a few or all image files without tags or ent files to get them online. 4)or, as in the following example, there is an ent file containing a list of entity references and tags and multiple instances of "Image for Printing" in the text online, but nary an image file in the directory: Los Angeles in the sunny seventies. gc/calbk See text online for Chapter 2. "Image for Printing" throughout, Captions too. Perhaps a plan to scan that never happened? Lots of other examples. What was done to the updated SGML files: all tags were proofed in all files, and typos in the reference entities and ent files that prevented an image being displayed--all these were fixed. QUESTION: is there some way we can update/rationalize all this to something uniform. 1) Maybe make the images actually present in the directories the determinant. 2) remove all illus tags that don't ref them (since I've proofed those entities for typos etc.) 3) remove the "Image for Printing" which must be part of the programming (it's not in the SGML) 4) present our current appropriate image format (GIF?) across the board. ENT FILES When the files were converted using the Omnimark program, the [ %images; ]> in the teiheader was altered to reference a different ent file from the SGML file. It has to have been the conversion program as these anomalies were proofed against the old SGML files where the entity % images system reference was correct. I think, in some but not all cases, if the corresponding ent file was 0KBs, as some were (plans to scan again?), it "grabbed" the nearest that had any bytes. See ENT FILE SUMMARY that follows: ENT FILE SUMMARY: SGMs with referencing different ent files In each case, 1) the older dtd sgm, the is correct but the ent file it references is 0KB. 2) So probably, in update conversion, because the ents were 0, the omnimark program substituted the closest one that had KB. All of the ents incorrectly referenced instead have KBs. 3) THESE WERE FIXED TO REFERENCE THE CORRECT FILE in the converted, proofed, parsed, and ftped files in.gc/calbk/newdtd_done/ 049 refed 047 051 ---- 052 053 ---- 056 054 ---- 056 055 ---- 056 059 ---- 060 068 ---- 069 SGMs with CORRECT, but ent files 0KB and no images in directories. Maybe planned to scan, but didn't, or thought might in future. Delete ents? (Not sure why the conversion program didn't try to do the same as above for these.) 076 081 090 091 094 097 099 Also, Empty directories: 034 035 036 039 043 044 048 Gaps in directory sequences: directories missing in sequence: 037 038 040 041 042 045 156 190 Deleted the following .bak files from 081 093 149 as per Liz's explanation, that they were Codewrite byproducts and "garbage."