Why?
I started to write psg after evaluating a number of libraries and modules in various languages to create PostScript or Portable Document Format files dynamically from am arbitrary dataset in a WWW environment:
- The obvious choice would be ReportLab. Yet the Open Source variant of ReportLab does not support reading PDF files. Only their PageCatcher product does that, and I couldn't even get a price for that. I guess they are too much of a solution provider than to license that piece of software to me. Which I can understand.
- Next I checked out pslib. This is written in C and is the basis for various programs that generate PostScript (including OpenOffice). Also there's a Python wrapper for it. I installed it and wrote my first test programs - which crashed on the spot without doing anything. Lacking the knowledge to debug it, I went on to look for alternatives.
- PyScript is a pure Python module for creating PostScript documents. It provides a rich set of drawing primitives and defines a number of smart ways to combine them. Yet there is no layout engine, not even a way to use other fonts than the standard ones for PDF. Also these fonts can't be used with Unicode input, which I (more or less) depend on.
- PyX is a library for 2D and 3D plotting of data.
So how is psg different?
- The whole thing is aimed at the 'big picture'. Many libraries contain code to generate DSC compliant PostScript documents. Psg's document.dsc module implements the DSC comprehensively (or rather: it wants to to do that). The module is able to read and write DSC compliant PostScript files with Unix, DOS and Mac line endings. This way psg could be used to rewrite a somewhat smarter set of psutils if need be.
- I want psg to be practical. For me that means it knows how to access all the characters in a given font program. Being German and programming mostly for German users, üäöÜÄÖ and ß have to be displayed correctly, which is easy. But what about €? Or correct German quotes? Or French ones? Or nice looking English ones for that matter? Different length dashes (for Emily Dickinson). Though still subject to limitations, psg's current text functions accept Unicode input only!
- I want EPS import. It's got to work on document and page level (document meaning the same EPS is displayed more than once, like on each page or so).
- I want PIL based raster image import. I don't want to mess with libjpeg, giflib or whatever, temporary files and all that. I want to Image.open(something) and document.append(it).
- I want everything to work with regular Python file objects, memory based, filesystem based, an RDBMS' large object, what ever! (Psg's input files need to support seek(), however).
- I want to be able to work with arbitrary fonts. Open Type Fonts would be nice (not implemented, Your chance!), but Type1 fonts will do for starters (this is implemented, both ascii and binary representation. You'll need the AFMs, though). TTF may or may not be there (not implemented. Can Freetype do this?).
- I want an interface to Ghostscript so I can distill pdf files or create bitmap representation of my PostScript for WWW previews. (This is partly implemented).
- A full grown layout engine would be nice. It doesn't have to be named after a monotreme (though platypus are cute), but should accept XML and (Print-) CSS input and, of course, Unicode encoding. (This is in an early phase of conception, see the examples!)
Most things on this list I can check 'done' (except for the layout engine, which is probably the toughest job). The one thing absent from psg is a library that provides drawing primitives. Right now, all the PostScript generated by psg is formulated by hand. (It's the mind that actually does it, but you get the idea ;-). I'm was thinking of creating an interface between psg and PyScript, but until now I had no need to do so: all complex graphical elements I need are created by GUI software (which is much better at that than any script) and the rest is text which is handled by the layout functions and a couple of lines which I program in by hand.
So I have to know PostScript to use this? What a rip-off!
Well... yes... you got to know your basic PostScript to use this. But I consider knowing basic PostScript a good thing, if you are going to generate PostScript documents. Let me elaborate:
PostScript "the language" is very easy to learn. It's basically Reverse Polish Notation on steroids. I learned all the PostScript I needed to know to write psg from a tutorial in an hour (Ok, plus some looking up in Adobe's official documentation later on). Check out this site, which contains a number of very useful links. I used the second tutorial from the top by Paul Bourke.
I can heartily recommend Practical PostScript by David Byram-Wigfield, the pan-ultimate tutorial on the page, not so much because of its brevity or technical brilliance, but just to read this English eccentric recommending to scrub the latest of DTP software and instead buy a high resolution PS printer and write your Postscript by hand. And, yes, if the advantages you see in PostScript are:
clean hands, and (...) the avoidance of 'dissing' inky letterpress typefaces back into cases according to their character and fount; all the time 'minding one's p's and q's'."
coding PostScript by hand becomes a charm!
How does psg work?
Let's create a traditional "Hello, World" program. It goes like this:
1 from psg.document.dsc import dsc_document 2 from psg.util import * 3 4 def main(): 5 6 document = dsc_document("Hello, world!") 7 page = document.page() 8 canvas = page.canvas(margin=mm(18)) 9 10 print >> canvas, "/Helvetica findfont" 11 print >> canvas, "20 scalefont" 12 print >> canvas, "setfont" 13 print >> canvas, "0 0 moveto" 14 print >> canvas, ps_escape("Hello, world!"), " show" 15 16 fp = open("ps_hello_world.ps", "w") 17 document.write_to(fp) 18 fp.close() 19 20 main()
The first two lines import psg's dsc module which lets us create DSC compliant documents. I like to import everything from the util package, which may be bad style, because it clutters my programs namespace, but it saves me a lot of typing.
The main() function is split into three sections:
- Initialization of document data structures
- Creation of the actual PostScript code
- Writing the result into an output file
The document instance is initialized with a title which doesn't go anywhere but the document's comments (6). Its meta data. The page instance represents the first and only page of the document (7) and the canvas instance represents a rectangular area on that page with a margin of 18mm width (8). This means that the drawing that happens on the canvas will be where you'd expect it on a page set up in your DTP application. With one difference: in PostScript, coordinates start at the lower left corner and their positive values extend upwards and to the left. This is the kind of coordinate system you're used to from math class. In the computer world, the origin is usually put in the upper left corner for technical reasons and when you're writing (in western languages), you'd usually start there, too. Here in our case this means that the 0,0 coordinate is in the lower left corner of the page, slightly displaced to form a nice margin.
The PostScript code to output "Hello, world!" is printed into the canvas verbatim (10-14), except for using the ps_escape() function from the util package, which will return a PostScript string literal which will be used by the show operator (14). The canvas class implements a method called write() which lets you use it like a regular file object here. Many other aspects of psg depend on the feature: you can think of a DSC document as a structure of nested file objects. Each of which is a stream of PostScript statements and comments.
The last section of the main() function is easily explained: Open an output file, dump the resulting PostScript inside and close it (16-18). Done!
The resulting PostScript looks like this (part of my viewer's window):
Getting the details straight
The next example is only slightly more complicated than the last. It will load a Type 1 typeface from a file and print its name and a number of special character. Most of the additional code is dedicated to loading the Type 1 font and calculating the output page's size, because this program is meant to work with arbitrary fonts and to create an Encapsulated PostScript file. Lets look!
1 import sys 2 from psg.document.dsc import eps_document 3 from psg.drawing.box import * 4 from psg.fonts.type1 import type1 5 from psg.util import * 6 7 def main(): 8 if len(sys.argv) < 3: 9 print "Usage: %s <outline file> <metrics file>" % sys.argv[0] 10 sys.exit(-1) 11 12 13 font_size = 20 14 15 16 # Load the font 17 outline_file = open(sys.argv[1]) 18 metrics_file = open(sys.argv[2]) 19 20 font = type1(outline_file, metrics_file) 21 22 23 # Create the EPS document 24 eps = eps_document(title=font.full_name) 25 page = eps.page 26 27 # Register the font with the document 28 font_wrapper = page.register_font(font) 29 30 # Create a textbox and typeset the text into it 31 text = unicode(font.full_name) + u" ~ üäöÜÄÖß" 32 width = font.metrics.stringwidth(text, font_size) 33 tb = textbox(page, 0, 0, width+5, font_size) 34 tb.set_font(font_wrapper, font_size) 35 tb.typeset(text) 36 page.append(tb) 37 38 # Calculate output page size 39 page_bb = bounding_box(0, 0, tb.w(), font_size) 40 eps.header.bounding_box = page_bb.as_tuple() 41 42 # Output resulting EPS file 43 eps.write_to(sys.stdout) 44 45 main()
We need some additional parts of psg for this to work. We already know the document.dsc module, which implements the Document Structuring Conventions. This time, however, we use the eps_document class rather than the more general dsc_document class (2). An Encapsulated PostScript file is one that has only one page, which has a defined size and may be imported into other PostScript documents by programs that know how to do that. (Psg does, by the way). Anyway, we have to determine the output size later on.
To position our font example we import classes from the drawing.box module (3). To handle Type 1 fonts we need the type1 module (4). As in the last program, I'm importing util.* just for the heck of it.
As with the last program, this program's main() function can be split into sections:
- Load the font
- Create an EPS document data structure
- Register the font with the document
- Create a textbox datastructure
- Calculate the page size
- Output the resulting EPS
The Loading of the font is an easy job once the script's parameter list has been checked and the files have been opened. The type1 class' constructor simply takes opened file pointers as arguments (20). No magic here. (The files are an outline file, containing the actual font, and a metrics file (AFM=Adobe Font Metrics) that was shipped with it. If not, use FontForge to create one).
There is no trick to creating the EPS data structure, either. We also create an EPS instance (24) and a page in it (25). The page is an attribute to the EPS document, because EPS documents are not allowed to have more than one page. Adding more pages will yield an exception, too.
In line 28 the font is "registered" with the page. This means the page is told that text is typeset on this page using this font. Its important for the page to know this, because it has to create PostScript code to set up the font for use. The register_font() method returns a wrapper for the font which is only good for that one page. This mechanism may be made fully automatically in the future.
Now we start with the tricky parts:
- Line 31 puts together the output string from the font's name (as loaded from the AFM file) and a couple of special characters of my choosing, including my favorite: €.
- Line 32 uses the font metrics' object's stringwidth() function to determine the string's width in terms of PostScript units (which are 1/72s of an inch. You already know it from how your DTP program tells the size of a font: in pt. That's the same thing!).
- Line 33 creates a textbox in the page's lower left corner, having the width calculated above and the font's height. (There's a padding to that width to avoid the textbox' simple layout algorithm performing a line break. Man.. you have no idea how long it took me to figure out why my üäöÜÄÖß kept disappearing!)
- Line 34 sets the font, line 35 does the output.
- Line 36 puts the text on the page.
Now, to put it all together:
- Line 39 creates a bounding box object. A bounding box is a rectangular area on a PostScript page which is located somewhere. The bounding box of our page starts at the origin (0,0) and extends to the width and height of our textbox.
- Line 40 sets the output document's bounding box, which is required by the EPS specifications. It's converted to a regular Python tuple to do that, which is understood by the document's bounding_box property.
Then we just write it to stdout, font and all. If you try this program, it will dump loads of characters at you. That is the font. The interesting part is the last twenty lines or so where the page is defined, the font is 'instantiated' and the text is written. Here you can see, why nobody (except the English guy, kudos to him!) codes PostScript by hand: It's not because the language is so complicated, but because it's so tedious to get the details straight. This is what psg does for you!
Calling out program like this:
python type1_embedding_example.py bold.pfb bold.afm \ > ~/Desktop/out.ps
and then:
gs ~/Desktop/out.ps
gives us this:
As you can see, InDesign correctly recognizes the bounding box. (In this example the bounding box contains a 3mm padding, because the EPS file was written with type1_embedding_example.py from the examples/ directory, not the program above, which is slightly simplified).