I recently needed to dig into Picasa's internal databases to get some information that it appeared to store only there, and not finding the answer on the interwebs, here are my notes about their format. Please do let me know if you have more information about this file format.
The notes are for the Mac OS, Picasa version 3.9.0.522.
The database files are found under
$HOME/Library/Application Support/Google/Picasa3/db3
on the Macs, and there are equivalent locations on other platforms. Under here are a set of files with a .pmp suffix, which are the database files.
[BTW: The files with the .db suffix just hold thumbnails of various groups of images. They are in the standard windows thumbs.db format, and here's a link that has more useful information about this format.]
Each .pmp file represents a field in a table, and the table is identified by a common prefix as follows:
$ ls -1 catdata_* catdata_0 catdata_catpri.pmp catdata_name.pmp catdata_state.pmp
The file with the _0 suffix is a marker file to identify the table, and each .pmp file sharing that prefix is a field for that table. For instance, catdata_state.pmp contains records for the field state in the table catdata, and so forth.
All files start with the four magic bytes: 0xcd 0xcc 0xcc 0x3f
The marker files (ie, files that end in _0) only contain the magic bytes.
The pmp file is in little-endian format rather than the usual network byte/big-endian format.
There are several areas where I just see constants -- I don't know the purpose of these and I'll list them out. Please note: all values are presented in little-endian format, so if you hex-dump a file, you should see the bytes reversed.
Header
4bytes: magic: 0x3fcccccd
2bytes: field-type: unsigned short.
2bytes: 0x1332 -- constant.
4bytes: 0x00000002 -- constant.
2bytes: field-type: unsigned short -- identical with field-type above.
2bytes: 0x1332 -- constant.
4bytes: number-of-entries: unsigned int.
Following the header are "number-of-entries" records, whose format depends on the field-type. The field-type values are:
0x0: null-terminated strings. I haven't tested how (if at all) it can store unicode.
0x1: unsigned integers, 4 bytes.
0x2: dates, 8 bytes as a double. The date is represented in Microsoft's Variant Time format. The 8 bytes are a double, and the value is the number of days from midnight Dec 30, 1899. Fractional values are fractions of a day, so for instance, 3.25 represents 6:00 A.M. on January 2, 1900. While negative values are legitimate in the Microsoft format and indicates days prior to Dec 30, 1899, the Picasa user interface currently prevents dates older than Dec 31, 1903 from being used.
0x3: byte field, 1 unsigned byte.
0x4: unsigned long, 8bytes.
0x5: unsigned short, 2bytes.
0x6: null-terminated string. (possibly csv strings?)
0x7: unsigned int, 4 bytes.
The entities are indexed by their record number in each file. Ie, fetching the 7273'rd record in all files named imagedata_*pmp gives information about the fields for entity #7273 in the imagedata table.
You might expect every "field file" for a given table to contain the same number of records, but this is not always the case. I expect the underlying library returns the equivalent of undefined when fetching fields for a record beyond the "end" of any given field file.
Finally, a small java program to dump out whatever information I've gathered thus far. Compile, and run against a set of .pmp files.
Here is a sample run.
$ javac -g -d . Read.java $ java Read "$HOME/Library/Application Support/Google/Picasa3/db3/catdata_name.pmp" /Users/kbs/Library/Application Support/Google/Picasa3/db3/catdata_name.pmp:type=0 nentries: 10 [0] Labels [1] Projects (internal) [2] Folders on Disk [3] iPhoto Library [4] Web Albums [5] Web Drive [6] Exported Pictures [7] Other Stuff [8] Hidden Folders [9] People
And here's the code.
import java.io.*; import java.util.*; public class Read { public static void main(String args[]) throws Exception { for (int i=0;i <args.length; i++) { doit(args[i]); } } private final static void doit(String p) throws Exception { DataInputStream din = new DataInputStream (new BufferedInputStream (new FileInputStream(p))); dump(din, p); din.close(); } private final static void dump(DataInputStream din, String path) throws Exception { // header long magic = readUnsignedInt(din); if (magic != 0x3fcccccd) { throw new IOException("Failed magic1 "+Long.toString(magic,16)); } int type = readUnsignedShort(din); System.out.println(path+":type="+Integer.toString(type, 16)); if ((magic=readUnsignedShort(din)) != 0x1332) { throw new IOException("Failed magic2 "+Long.toString(magic,16)); } if ((magic=readUnsignedInt(din)) != 0x2) { throw new IOException("Failed magic3 "+Long.toString(magic,16)); } if ((magic=readUnsignedShort(din)) != type) { throw new IOException("Failed repeat type "+ Long.toString(magic,16)); } if ((magic=readUnsignedShort(din)) != 0x1332) { throw new IOException("Failed magic4 "+Long.toString(magic,16)); } long v = readUnsignedInt(din); System.out.println("nentries: "+v); // records. if (type == 0) { dumpStringField(din,v); } else if (type == 0x1) { dump4byteField(din,v); } else if (type == 0x2) { dumpDateField(din,v); } else if (type == 0x3) { dumpByteField(din, v); } else if (type == 0x4) { dump8byteField(din, v); } else if (type == 0x5) { dump2byteField(din,v); } else if (type == 0x6) { dumpStringField(din,v); } else if (type == 0x7) { dump4byteField(din,v); } else { throw new IOException("Unknown type: "+Integer.toString(type,16)); } } private final static void dumpStringField(DataInputStream din, long ne) throws IOException { for (long i=0; i<ne; i++) { String v = getString(din); System.out.println("["+i+"] "+v); } } private final static void dumpByteField(DataInputStream din, long ne) throws IOException { for (long i=0; i<ne; i++) { int v = din.readUnsignedByte(); System.out.println("["+i+"] "+v); } } private final static void dump2byteField(DataInputStream din, long ne) throws IOException { for (long idx=0; idx<ne; idx++) { int v = readUnsignedShort(din); System.out.println("["+idx+"] "+v); } } private final static void dump4byteField(DataInputStream din, long ne) throws IOException { for (long idx=0; idx<ne; idx++) { long v = readUnsignedInt(din); System.out.println("["+idx+"] "+v); } } private final static void dump8byteField(DataInputStream din, long ne) throws IOException { int[] bytes = new int[8]; for (long idx=0;idx<ne; idx++) { for (int i=0; i<8; i++) { bytes[i] = din.readUnsignedByte(); } System.out.print("["+idx+"] "); for (int i=7; i>=0; i--) { String x = Integer.toString(bytes[i],16); if (x.length() == 1) { System.out.print("0"); } System.out.print(x); } System.out.println(); } } private final static void dumpDateField(DataInputStream din, long ne) throws IOException { int[] bytes = new int[8]; for (long idx=0;idx<ne; idx++) { long ld = 0; for (int i=0; i<8; i++) { bytes[i] = din.readUnsignedByte(); long tmp = bytes[i]; tmp <<= (8*i); ld += tmp; } System.out.print("["+idx+"] "); for (int i=7; i>=0; i--) { String x = Integer.toString(bytes[i],16); if (x.length() == 1) { //System.out.print("0"); } //System.out.print(x); } //System.out.print(" "); double d = Double.longBitsToDouble(ld); //System.out.print(d); //System.out.print(" "); // days past unix epoch. d -= 25569d; long ut = Math.round(d*86400l*1000l); System.out.println(new Date(ut)); } } private final static String getString(DataInputStream din) throws IOException { StringBuffer sb = new StringBuffer(); int c; while((c = din.read()) != 0) { sb.append((char)c); } return sb.toString(); } private final static int readUnsignedShort(DataInputStream din) throws IOException { int ch1 = din.read(); int ch2 = din.read(); if ((ch1 | ch2) < 0) throw new EOFException(); return ((ch2<<8) + ch1<<0); } private final static long readUnsignedInt(DataInputStream din) throws IOException { int ch1 = din.read(); int ch2 = din.read(); int ch3 = din.read(); int ch4 = din.read(); if ((ch1 | ch2 | ch3 | ch4) < 0) throw new EOFException(); long ret = (((long)ch4)<<24) + (((long)ch3)<<16) + (((long)ch2)<<8) + (((long)ch1)<<0); return ret; } }
Recently, I read a few stories from Andy Hertzfeld's site, which are terrific stories about the development of the Macintosh. As I read it, I started to view a few videos, some podcasts etc that were connected to the stories. Wouldn't it be nice to bundle all of this into a single epub, so it's all in one place?
Well, you can always do that by editing an epub in Sigil or other tools. However, I've found it convenient to take out some of the tedium through a couple of small scripts that bundle everything together from txt files. The scripts do very little -- it's best suited when you pretty much have straight text, and perhaps a set of media that you want to insert into the text at appropriate places.
It also assumes that you're reasonably comfortable with the command-line, and can deal with a couple of thrown-together scripts. [Fortunately, they are small so I hope they will at least be a starting point.]
Here's how I assemble my epub. First, I create a directory that will hold all the content I want in the epub.
$ mkdir macintosh_stories
Next, I create text files -- one for each chapter, to contain the text. So (say) I create one file called macintosh_stories/01.txt that contains this:
#Alice Even though Bruce Daniels was the manager of the Lisa software team, he was very supportive of the Mac project. He had written the mouse-based text editor that we were using on the Lisa to write all of our code, and he even transferred to the Mac team as a mere programmer ...
You can mark (2 levels) of headlines with the # character. One # is the largest headline, and ## gives a subheading. There isn't anything much else it does, though you can notate italics by _italics_.
This is sufficient to create the first epub -- and you run it as
$ gen.sh macintosh_stories "Macintosh Stories" "Andy Hertzfeld"
Let it do its thing, and your epub will be left under macintosh_stories/out.epub
To create more chapters -- keep adding more txt files. The sequence of chapters are strictly determined by the filename of the text file -- so I usually create text files like 01_xxx.txt, 02_xxx.txt and so on.
To add a cover and other media, first create a directory called media
$ mkdir macintosh_stories/media
If you then create a file called macintosh_stories/media/cover.jpg, the scripts will add the cover to the epub.
To add images into the file, first place them into the media directory. For example, in the above story -- there's an image of the packaging, which I've saved as macintosh_stories/media/alice_packaging.jpg
From within 01.txt, I refer to it as:
... disk was enclosed in a small cardboard box designed to look like a finely printed, old fashioned book, complete with an elaborate woodcut on the cover, that contained a hidden Dead Kennedy's logo, in tribute to one of Capp's favorite bands. [> media/alice_packaging.jpg <] Since Alice didn't take up the whole disk, Capps including a few other goodies with it, including a font and "Amazing", a fascinating maze ...
Now I was curious how the game itself looked. So I downloaded a youtube video. Please note that it must be an mp4 file. This too, I put into the media directory, and embed it the same way in my txt file.
... Since Alice didn't take up the whole disk, Capps including a few other goodies with it, including a font and "Amazing", a fascinating maze generating program that he wrote. [> media/alice.mp4 <] When I saw the completed packaging, I was surprised to discover that ...
You can also embed audio in the same way as well, interviews, podcasts, or audio versions all work well. But please note that it must be in the .m4a format. This is the only format that works on the Color Nook. I use ffmpeg to convert an mp3s into m4a, and that seems to do the trick.
Here is a zip file of the scripts and just for kicks, the sample epub.
Labels: epub
I did some poking around how Flipboard lays out content, and here are my observations.
- The portrait and landscape layouts are identical -- the internal content of an article reflows when the orientation flips, but the overall article layout remains exactly the same. So in the rest of this, I'll only refer to the portrait orientation.
- The layout process is almost certainly "recursive rectangle cutting" rather than packing a set of rectangles. In other words, start with a big rectangle, then make a complete cut horizontally or vertically, and recursively cut each smaller rectangle. The tell-tale sign of such a process is you never see a layout like on the right. Ie, there is always at least one cut that goes from one side of the rectangle being cut to its opposite side, and so on, recursively.
- You can therefore associate a "cut tree" with any layout, where each node represents a cut -- horizontal or vertical -- and is labelled with the location of the cut position(s). The choice of cutting positions are heuristics, and Flipboard's approach seems to be to pick small integer ratios of the parent.
If you ignore "dual" ratios (eg: 1/3 and 2/3 would be duals of each other) Flipboard picks cut positions that are located 1/2, 1/3, 1/4, 2/5, and 3/8 of the parent. Which one to pick at any step is possibly a combination of how large the content is, and some randomization.
That said, there's more than one way to approach the problem, and it can be useful to understand the underlying design goals, rather than view it as purely a problem of "packing" content. (Example via Gridness)
Many modern designers and magazines layout content within a grid, and you can see an example about the underlying ideas here.
In essense, the page is divided into a grid that resembles a checkerboard, and each element of interest is reflowed into a contiguous subset of blocks. The intent is to establish a visual structure to the content, which the grid helps to maintain. Whitespace is often just as important as content, and eventually each grid block ends up being either used as content or whitespace. (This is often overlooked in many "packing" approaches to layout, though you may be able to incorporate it as "blank content" to be packed along with everything else.)
Being aware of an underlying grid can potentially simplify algorithms as well as allowing internal content to settle along grid lines. A fairly basic approach can still use rectangle cutting, but just select one of the grid lines at each cut (rather than ratios of the parent, as Flipboard does.) It can also allow you on occasion, to create non-rectangular areas, especially with reflowable content. For example, you may be able to subtract a set of blocks out of an enclosing rectangular set, to add related content, and so on. You can see it in the pullquotes in this example from the Behance Network.