Note: All Information inside this document is based on my own tests because there is no kind of public documentation of PSION about the file structures of the S5. Due to this fact the informations in this document may be wrong. However I thought that a buggy but in most cases working converter is better then no converter.
This document is an addition to the documentation of Frodo Looijard about PSION file formats so you will have to read this document first to understand all terms inside this document.
Files of type 0x10000050 are used for files which remain on disc like databases. Data is read and modified directly on disc. There may be additional data structures in RAM but the structure on the disc is the leading one. Therefore don't use this structure inside your own programs except from/to converters because it may be seen as a part of the program and so you may provoke Psion/Symbian if you are using it in your own databases or organizers.
File type 0x10000037 is used for files which are loaded from disc to RAM first and the modified there and afterwards written back to disc. Structure in RAM may be completly different to the structure on disc. Even I don't see this structure as a part of a program but as a public interface concept don't use it in your own programs because you may provoke Psion/Symbian.
There seems to be a new kind of list. Because I wasn't able to decoded it completely I called it a mystery coded list. Normally a word at the beginning of the list is coding the number of bytes inside the List. But Bit 14 is set and Bit 15 clear and must be masked out to get the true length. For very long sections must be encoded with this structur (eg all the information inside the Dataset of the DB) it is likely that also long decoded lengths are existing. My name for this list would be MLISTE or MLISTB. It is not clear to me why there is an additional list because all could be coded by the existing ones.
There is an reduced form of strings which is coded as a BLISTB List. I will call this SString (Short String).
Following IDs are used inside the Data file:
0x10000069 - Internal Table Storage Section
0x10000086 - Data-ID
0x10000089 - APP-ID
0x1000012E - Table Definition Section
0x10000131 - View Definition Section
0x10000132 - unknown Section 3
0x10000133 - unknown Section 5
0x10000137 - unknown Section 4
0x00 - Boolean
0x05 - Integer
0x09 - Double
0x0A - Date
0x0B - Text
0x0E - Memo
0x10 - Format. Occurs always with Text/Memo. May contain embedded object information
Double is stored using 8 Bytes according to IEEE double precision format. Note that more significant Long containing the exponent is stored at position 4-7 at disc and the minor Long containing mantisse at position 0-3.
Date Information is stored as a 8 Bytes. Seems to be 2 longs. Lower long stored at the beginning, higher long at end. Now enter two dates with one day difference:
00E088F28F42E000 = 10.04.2000
- 0080B1D47B42E000 = 09.04.2000
Internal Form:
0x00E0428FF288E000
- 0x00E0427BD4B18000
------------------
0x000000141DD76000
Result is 8.64E10 in decimal. A hour has 60 Minutes and a minute 60 seconds. Divide result by 3600 so you will get 24,000,000 and note that our difference was one day which are 24 hours. That means time unit is 1/1,000,000 of a second! Base day is the 1.1.0000. There is an important point to note. PSION implements Gregorian calender scheme. That means:
1. Every 4 four years is a switching year (that means a year has 366 days)
2. If the year can be divided by 100 it is not a switching year
3. If the year can be divided by 400 it is a switching year
Gregorian Calender was used since 1582. To correct difference to the Julian calender the 5.10.1582 until the 14.10.1582 were officially skipped. This has not be implemented at PSION. You can enter these days here, but PSION implements the rules 2. and 3. from 1600 (included) on! That means eg that the year 1500 is a switching year which is not in accordance to the rule 2. and 3. of Gregorian Calender but in accordance to Julian calender.
System is able to handle dates before birth of christus (BC) too. All switching years are handled in accordance to Julian calender in this range of time:
00207C65EA16EEFF = 01.04.-160
Internal Form:
FFEE16EA657C2000
This is the usual negativ integer format. That means similar to that in bytes:
-1 -> 0xFF
-2 -> 0xFE
...
The Data file consists of three parts. First there is a Header section, then a data section is following. These data section contains several subsections. These addresses of these subsections are managed by a Section Location Table which follows after the data section.
The header setion is always 0x1C bytes long.
Offset Size Data Description
0x0000 ID 0x10000050 UID1: Heaader Section Layout
0x0004 ID 0x1000006D UID2: File kind
0x0008 ID 0x10000086 UID3: Application ID
0x000C L 0x5508A1FE UID4: Checksum of UID1, UID2 and UID3
0x0010 L unknown
0x0014 L 0x00000000 or contains the number of entries in the Section Location Table
0x0018 L Offset from 0x0014 on to the Section Location Table if 0x00000014 is 0x00000000
The data section consists of subsections. The subsections are encapsulated each using a MLISTB list. Note that there may be unused areas between the subsections. If possible these areas will be removed if the program packs a table. If you are generating a new database you must build it without such unused areas.
There is an important silly effect to note. Data section is divided into logical blocks of 0x4000 bytes! 2 Bytes are preceeding such a block (so a block is totally 0x4002 bytes). May be this is a checksum or a space usage indicator. The first bock has a special behaviour. Its Bytes are located at 0x0000001C then 0x4002 Bytes of data follows! Note that the special Bytes are always placed at fixed locations. They are inserted into the real information. That means that you have to strip them out of the data stream if reading bytes and much more horrible the lists are not counting these bytes so you have to calculate them out! This may be done by using special basically i/o operations which are hiding the bytes to the rest of the program. Following documentation is written as if these bytes are not present. There is one remarkable additional silly effect too at the MLISTB entry which encapsulates the section. If it encapsulates a section which passes over such a block change its length is zero! (not sure !!! check after reorganization of DB too, not sure at end).
Offset Size Data Description
0x0000 LLISTB
The database is defined by the declaration of tables. A table is defined by its fields. All datasets of a table will have the same fields even not every field must really filled with any data. There are three definitions of the fields. First there is the basically declaration of the fields, that means the field name and the type of the data it is containing. I will call this Table Definition Section. Second there are the definitions how to display tables on the screen. I will call it View Definition Section. At last there is an internal definition which give an information how the fields are stored. I will call this section Internal Table Storage Section. Of course you must store the information of the datasets itself. I will call this the DB-Dataset Content Section. Note that there will be normally a lot of such sections.
The Section Location Table is containing offsets (without the block bytes!) to the special subsections inside the data section. Some sections seemed to be placed at fixed positions. I am not thinking that this is really the case but until now I can't proof this. I am assuming that there are references from one section to another in every case but until now I did not find this references in every case. There are references inside ID-Binding Table, Table Definition Section, unknown Section 2 and Table Content Sections which tells to what kind of subsection an element of the Section Location Table is poiting to. Until now there are the following kinds of entries known (not every kind of entry must really occur in each data file):
unknown Section 1 (always positon 1?, or may be referenced by Table Definition Section)
Internal Table Storage Section (always positon 2?, or may be referenced by Table Definition Section)
Table Content Sections (First is referenced by unknown Section 2)
unknown Section 2 (always position 4?)
Application Section (Referenced by ID-Binding Table)
ID-Binding Table (Referenced by an entry at the beginning of the Section Location Table)
Table Definition Section (Referenced by ID-Binding Table)
View Definition Section (Referenced by ID-Binding Table)
unknown Section 3 (Referenced by ID-Binding Table)
unknown Section 4 (Referenced by ID-Binding Table)
Memo Content Section (Referenced by a dataset memo field)
Format Content Section (Referenced by a dataset format field)
unknown Section 5 (Referenced by ID-Binding Table)
The known references to this table may have in some cases additional information in the top byte of the long containg the reference. The meaning of this information is unknown.
Note that there are two ways how the beginning of the Section Location Table can be determined. If 0x00000014 contains 0x00000000 0x00000018 contains the offset from 0x00000014 on to the start of the Section Location Table. If 0x00000014 is not zero then it contains the numbers of elements in the Section Location Table. Multiply this by 5 and add 12 (3*4 Bytes for 3 Longs). Go this amount of Bytes backward from end of file to determine the beginning of the Section Location Table.
Offset Size Data Description
0x0000 L Number of the element of the section table section which points to ID-Binding Table.
0x0004 L unknown
0x0008 LLISTE E=constant 5
Offset Size Data Description
0x0000 B unknown. normally 0 but other values possible. May be a usage tag or something like that
0x0001 L 0x1C added to value gives the offset to the beginning of a special subsection of the Data Section
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000-0x0008 unknown 00 00 00 00 00 00 00 00 00
It contains the information how the informations from the tables are stored from the tables. There is a list which does contain one or more entries belonging to a field of a table.
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000 ID 0x10000069
0x0004 B unknown 00
0x0005 L unknown
0x0009 XLISTE Table Storage Definition Table
0xXXXX-0xXXX6 unknown 20 04 00 00 00 00
Offset Size Data Description
0x0000 SLISTB Name of the table
0xXXXX XLISTE Table of DB-Field Storage Binding Field Elements of various length
Coding of a DB-Field Storage Binding Field Element
Offset Size Data Description
0x0000 SLISTB field identification
0xXXXX B type of field
[0xXXX1- Depends on field type]
The coding of a DB-Field Storage Binding Field Element depends on the type of the field information which is coded. The first two entries are the same for all types.
The first entry is a String and gives the name of the element. Beginning of the name is always "Col" followed by a letter which indicates the belonging to a particular field. A new field starts always if the letter is A. Further elements may follow which will be named "B", "C" and so on. After this the number of the database field is following in digital letters!
The second entry is a byte which detects the type of the element. All afterwards depends on the type of the element.
Offset Size Data Description
0x0000 String internal Name of the field
0xxxxx B 0x00 type of element: Boolean
0xxxx1 B unknown always 0
Offset Size Data Description
0x0000 String internal Name of the field
0xXXXX B 0x07 type of element: Integer
0xXXX1 B unknown always 0
Offset Size Data Description
0x0000 String internal Name of the field
0xXXXX B 0x09 type of element: Double
0xXXX1 B unknown always 0
Offset Size Data Description
0x0000 String internal Name of the field
0xXXXX B 0x0A type of element: Date
0xXXX1 B unknown always 0
Offset Size Data Description
0x0000 String internal Name of the field
0xXXXX B 0x0B type of element: Text
0xXXX1 B unknown always 0
0xXXX2 B maximal length of the string
Offset Size Data Description
0x0000 String internal Name of the field
0xXXXX B 0x0E type of element: Memo
0xXXX1 B unknown always 0
Offset Size Data Description
0x0000 String internal Name of the field
0xXXXX B 0x10 type of element: Format
0xXXX1 B unknown always 0
These sections contain the informations stored in the database. Because of internal reasons explained later the database is splitted up into several Table Content Sections. Every section is containing a fraction of the datasets. The first Table Content Section will contain a pointer to the next one and so on. Up to 16 datasets may be stored inside one section. Note that the address of a section may change during changing the content of the datasets. The old address of the section will be lost and the space occupied before the changes will become an unused area. If there are too much unused areas in the DB it is automatically packed. This means that the program tries to move used section from the end to the front into unused areas. If these does not fit exactly there may be still unused areas after the packing process too. Some sections seems to never change their locations.
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000 L Number of element of Section Location Table which gives the offset to the next Table Content Section or 0 if this is the last Table Content Section. (Top byte contains special Information of unknown meaning, so mask it out).
0x0004 W Dataset representation flags
0x0006-0xXXXX Dataset length table
0xXXX1 content of datasets
To handle the information stored inside these Table Content Section you will have to know how many datasets are stored inside it. Note also, that there may be the need to adress a dataset by its internal unique number. If a dataset is deleted you must not renumber the remaining ones. To fit these requirements there is a Word of 16 Bits. Each of the bits is representing a dataset. A "1" detects that a dataset exists. In this case its length will be stored in the Dataset length table and its content in the content of datasets. A "0" detects that the dataset is not present. There will be no entries in the other parts of the Table Content Section belonging to it. This scheme gives you the ability to keep the internal numbering of the datasets and let you detect where a new dataset can be placed.
The Dataset length table will contain the length of the content of each dataset which is stored inside these Table Content Section (that means corresponding Bit in Dataset representation flags is "1"). The length itself is encoded like the length of an EXTRA-Encoded-List. That means that the length may be coded using 1, 2 or 4 Bytes. If want to jump to the third set inside this Table Content Section you can easily do this by adding the content of the first two entries of this list, scanning to the end of the Dataset length table and from there on jump the sum of bytes forward without scanning the content of the other datasets.
The datasets are stored dataset by dataset and field by field. To save memory there is an information which tells whether a field is present at a dataset or not. Each field is represented by a bit, so 8 fields are encoded in one Byte (first Bit represent field 1, second bit field 2 and so on). After this Byte the field contents are following if present for up to 8 fields. Then the next present byte is following and so on. One exception to this rule is made at coding of boolean fields. There is also one bit which tells whether a field is existing or not. If it is existing then the next bit of the representation byte tells you whether the field is true or false. Another exception is made at Memo fields. Here first bit tells whether a field is existing or not. If it is existing the next field tells how the content of the memo field is stored. The same scheme is also used at format fields. Note that some field existing bit may be missing when there is no field information after them. This can be detected when end of actual dataset is reached! It is still open what will happen if a new present byte must declared if a field is added. There seems to be the following rule. No empty fields will be added to existing datasets even the new field can be represent by a constant amount of memory. It seems also that if a dataset is added space will be allocated for every content which can be stored by a constant amount of memory. If such fields are changed this will be simply done by changing the content of the according piece of storage. Strings are stored by using an amount of memory which corresponds to their actual length. Therefore the storage must be reorganised if the length is changed. It seems that the following scheme will be used. The total Table Content Section which does contain the changed dataset will be copied to the end of the file with the changed content of the field. Section Location Table will modified afterwards and the address of the old now invalid Table Content Section will be replaced by the actual one. This copy process is the reason for splitting up the database into several Table Content Sections. Else in worst case you would have to duplicate the whole content of the database which cannot be tolerated at large databases. Note that Memo fields may be stored by using a separat section.
Offset Size Data Description
Nothing, but coded with an extra bit inside existence mask
Offset Size Data Description
0x0000 L
Offset Size Data Description
0x0000 D
Offset Size Data Description
0x0000 Date
Offset Size Data Description
0x0000 SString
May be coded in two ways. An extra bit inside existence mask tells what kind of coding is valid. If it is false then a Memo Content Section is used to keep the content of the Memo field. If it is true the content of the Memo field is stored as a normal Text.
Offset Size Data Description
0x0000 L Number of the Section Content Table element which points to the according Memo Content Section. (Attention Top Byte has a special unknown meaning, so ignore by masking out!)
0x0004 L Length of Memo
Or
0x0000 SString
May be coded in two ways. An extra bit inside existence mask tells what kind of coding is valid. If it is false then a Format Content Section is used to keep the content of the Format field. If it is true the content of the Format field is stored directly in the data set. See Notes about Format Content Section for details.
Offset Size Data Description
0x0000 L Number of the Section Content Table element which points to the according Format Content Section.
Or
0x0000 L Offset to Text Layout Section
[0x0004 embedded Objects]
0xXXXX Text Layout Section
Is an important section because it contains a pointer to the first Table Content Section. The meaning of the other information is unknown.
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000 L Pointer to first Table Content section (Top byte may contain other information
and must be masked out)
0x0004-0x000C unknown
The usual Application section.
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000 ID 0x10000086
0x0004 String Name of the Application.
It contains IDs followed by an offset. This offset seems to be the position of the special subsection inside the Section Location Table which corresponds to the ID.
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000 BLISTL two Long gives a pair of data
Offset Size Data Description
0x0000 ID
0x0000 L Position inside the Section Location Table
The Table Definition Section contains definitions of the fields of the tables (without internal aspects).
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000 L 02 00 00 00 unknown. May be the number of the element of the Internal Table Storage Section in Section Table Section.
0x0004 LLISTL Search Definition Table
0xXXXX B unknown. Normally 0x00, 0x01 if file is sorted
0xYYYY LLISTE E=constant 5. Sort Definition Table
0xZZZ1 XLISTE unknown. May be a Table Definition Table
The Sort Definition Table contains the fields which are used to sort the table follwed by the kind of sorting (ascending/descending)
Offset Size Data Description
0x0000 L tag of the field to sort by
0x0004 B Sortorder. 0x00 ascending, 0x01 descending
The Search Definition Table contains the fields which are searched during a searching process.
Offset Size Data Description
0x0000 L tag of the field to search
The Sort Definition Table contains the fields which are used to sort the table.
Offset Size Data Description
0x0000 tag of the field to search
Offset Size Data Description
0x0000 String Name of the table
0xXXXX L internal counter which is used to produce an unique tag to each defined field. It does contain the number assigned of the field which has been defined last.
0xXXX4 XLISTE Table Field Definition Table
The coding differs according to the kind of the field. The first entries are the same.
Offset Size Data Description
0x0000 String Name of the field
0xXXXX B 0x00 type of the field: Boolean
0xXXX4 L unique number of field
0xXXX8-0xXX13 unknown FF FF FF FF 00 00 01 00 04 00 02 00
0xXX14 L unknown 9C FF FF FF
0xXX18 L unknown 64 00 00 00
Offset Size Data Description
0x0000 String Name of the field
0xXXXX B 0x05 type of the field: Integer
0xXXX4 L unique number of field
0xXXX8-0xXX13 unknown FF FF FF FF 00 00 01 00 04 00 02 00
0xXX14 L Minimal allowed value as Integer
0xXX18 L Maximal allowed value as Integer
Offset Size Data Description
0x0000 String Name of the field
0xXXXX B 0x09 type of the field: Double
0xXXX4 L unique number of field
0xXXX8-0xXX13 unknown FF FF FF FF 00 00 01 00 04 00 02 00
0xXX14 L Minimal allowed value as Integer
0xXX18 L Maximal allowed value as Integer
Offset Size Data Description
0x0000 String Name of the field
0xXXXX B 0x0A type of the field: Date
0xXXX4 L unique number of field
0xXXX8-0xXX13 unknown FF FF FF FF 00 00 01 00 04 00 02 00
0xXX14 L unknown 9C FF FF FF
0xXX18 L unknown 64 00 00 00
Offset Size Data Description
0x0000 String Name of the field
0xXXXX B 0x0B type of the field: text
0xXXX1 L unique number of field
0xXXX5 B maximal length of the text in the field
0xXXX6 L unknown 00 00 00 00
0xXXXA B unknown some Flags, Bit 2 shows that this field will be shown at dialing.
0xXXXB B number of characters to use at sort
0xXXXC B unknown 00
0xXXXD B unknown 01
0xXXXE B unknown 00
0xXXXF B unknown 02
0xXX10 B unknown 00
Offset Size Data Description
0x0000 String Name of the field
0xXXXX B 0x0E type of the field: Memo
0xXXX4 L unique number of field
0xXXX8-0xXX13 unknown FF FF FF FF 00 00 01 00 04 00 02 00
0xXX14 L unknown 9C FF FF FF
0xXX18 L unknown 64 00 00 00
The View Definition Section is containing the definitions how to display information of the datasets on the screen. In detail this means in which order the fields are displayed, which field are displayed at all, what a kind of character set is used to display them and mmmso on. Note that there are two kind of views (table view, dataset view) and this section is containing both definitions.
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000 View Definition: Displayed as a table
0xXXXX View Definition: Displayed as a card
Offset Size Data Description
0x0000 ELISTE View Field Definition List
0xXXXX BLISTE View Field Layout List
0xYYYY Sequence of bytes. Each byte seems to belong to one field. Meaning is unknown
Each Element seems to be 9 Bytes long.
Offset Size Data Description
0x0000 W subsequent number of the view field
0x0002 L unique tag of the field
0x0006 W unknown: direct relation to unique tag
0x0008 B unknown: always 01
Each Element seems to be 18 Bytes long.
Offset Size Data Description
0x0000-0x0012 unknown: 02 4F 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000-0x0039 unknown. Value 0010 changed if one field added, increased by one but does not contain the right field number
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
0x0000-0x0006 unknown 03 37 02 02 02 00
Offset Size Data Description
0x0000 MLISTB
Offset Size Data Description
Inside a Memo Content Section the content of exactly one Memo field of a dataset is stored if its contents is too large to store it inside a SString. An element inside the Section Location Table contains a pointer towards the according Memo Content Section. The number of this element is stored inside the memo field content of the dataset. Afterwards the length of the Memo Content Section is stored. Note that data is stored immediately after the MLISTB descriptor! Due to Memory management technics it may not contain the true length of the section.
Offset Size Data Description
0x0000 MLISTB
Inside a Format Content Section the content of exactly one Format field of a dataset is stored if its contents is too large to store it inside a SString. Each Format corresponds to a text (stored as Text or Memo-Field). The usual Text Layout Section is used to store the formats belonging to each paragraph of the text. Note that there is a Text Layout Section for each Format field and not only one like in the Word format. Another problem is to store embedded objects. This is also done in the Format Content Section. Objects are stored in Front of the Text Layout Section. To skip this information at the beginning of each Format Content Section there is a Long offset to the beginning of the Text Layout Section.
Offset Size Data Description
0x0000 L Offset to Text Layout Section from 0x0000 on.
[0x0004 embedded Objects]
0xXXXX Text Layout Section
To clearify all that where a "there seems", "it is likely" or "unknown" is found in the text.
It seems that there can be more than one table. From where the system is knowing where the first content section of the table is starting?
I see that there are big additional structures if you are sorting the database. Because this is not influencing the converting process I did not analyze this structures.