Brain Feud: June 2011

Thursday, June 30, 2011

Things to remember about FileMaker Pro

A list of things to remember about the way FileMaker Pro behaves (in no particular order).

The shutdown script (triggered by onLastWindowClose and configured in the File Options dialog) only runs is the file is closed. If the FileMaker Pro application is quit, the script is NOT run.

Cross Platform Issue: When running in Mac OS, the window cannot be closed when a script is paused (a feature useful for creating "modal" dialogs). However in Windows OS the window CAN be closed.

People should not be required to enter anything that the database can work out using information it already has. For example, dates are entered for a staff availability. This data can be used to display whether the staff is "active" or "inactive" automatically.

When trapping for errors, turn Error capture off as soon as practical after the trapped step. You WANT to know when errors happen, because they'll be unexpected.

In every scripted creation of records, trap for errors on the New Record step to prevent the over-writing of existing data in subsequent Set Field steps. Then trap the Commit Record/Request step for errors, which are likely to occur if the table has field-level validation for uniqueness (for instance). (Delete the record if the commit fails.)

FileMaker can display multiple windows in preview mode (each window's mode is completely independent) however there is only one print setup (page size and orientation) for all windows.

Wednesday, June 29, 2011

No substitute for uniqueness

A client has a web-based enquiries service. The back-end of this is MySQL and the data gets to the client through a web interface that generates a csv file. This csv file gets imported into the main FileMaker solution.

I recently created an importer that converts the flat-file web data to a format suitable for a modern relational database. The process is to import the web data into an empty table, clean up the blank and header rows, then compare it to an archive of all previously imported records to ensure it only gets processed once. After being imported into the main table the new load is imported into the archive table. The table did not have a real unique serial number primary key field, but it had what I thought was the next best thing, a timestamp field that resolved to seconds. I figured that this would serve the purpose nicely. This timestamp field was validated to be unique to ensure that the web data could only be imported into the archive table once.

It broke down in less than a month.

After a just couple of weeks of use the importer threw a 729 import error and stopped. Investigation revealed that the error was being thrown in the first import, into the empty table. How could that be?Error 729 is "Errors occurred during import; records could not be imported" and it usually occurs when field-level validation prevents records from being created, usually because of duplicates. So how could the unique timestamp field be preventing the errors from being created when the table is initially empty?

An examination of the raw csv file revealed that the timestamp field in that particular batch was NOT unique: somebody managed to hit the submit button twice in the same second!

Moral: if you need a unique field, use a field that is KNOWN to be unique and not one that that is only "probably" unique, even if that probability appears to be high.

The solution: well, there were really two problems and only one has a solution. The problem of the importer "not working" was solved by keeping the unique validation but accepting the error 729 and allowing the process to continue (after checking that the resulting found set was not empty). This will result in the occasional "duplicate" record in the web data not being imported into the database, but it's not a really problem because it is almost certain the duplicate record is the result of a double-submit and will be duplicate data anyway.

Or will it? Have I solved the problem or just replaced it with a sightly-less-likely one?

As it is, the timestamp is used as the unique identifier. Consider if (when) two different web users hit submit in the same second: the timestamp will be duplicated but the rest of the record's data will be different. The import validation will prevent the second timestamped record from being imported and consequently real data will be lost. To mitigate this risk, the unique identifier has been changed to a concatenation of the timestamp and the enquirer's name. This will reduce the likelihood of data loss to only those cases where where two people with exactly the same name submit data at exactly the same second.

Which leads to the second problem, that of uniquely identifying the web records to prevent multiple imports. Alas, without a real unique primary key in the web database there is no bullet-proof solution. The concatenated timestamp field is an improvement but it's not infallible.

Wednesday, June 22, 2011

Primary Keys and ID Codes

Clients often ask for "codes" so that they can easily identify accounts, invoices and the such. This post is about creating such codes.

The first property that codes need is that they need to be unique. They should also be unambiguous, so avoid any code that includes both zero and the letter O, the letter I and the number 1.

The story so far: unique, unambiguous.

I have one client who had "codes" for courses that were over 10 characters long which made them about as easy to work with as a software product activation key. Partly this was because they wanted the codes to contain embedded information about the course and venue. This is not necessary for the code to function: their purpose was to be typed into a web enrolment for by end users, and the usability was suffering badly because of their length. Also, they tended to be lots of 1s and 0s so reading them was difficult. So another property is that the codes should be as short as practical.

Update: unique, unambiguous, short.

So why not use a simple serial number for the code? Because if sequential serial numbers are used, a typing entry error might mean that the resulting code matches another code: the incorrectly entered code might match a different code, and create an undetected error. Thus there is an additional property for codes: the code needs to be error ~~resilient~~ detecting. (I'm not sure the word "resilient" is correct but I cannot think of another: what I mean is that the simple mis-typing of one code should not match another code. It might be "error detection".)

Unique, unambiguous, short, error detecting.

Fortunately, such technology already exists and is easy to implement: check digits. The last digit of your 16 character credit card number is actually a check digit for the previous 15. So changing the last number of a credit card won't create a new valid credit card number, and similarly a typo in any of the other characters has a low probability of creating another valid number.

The codes I'm creating don't need the security of a check digit, but adding an extra digit to an already unique serial number makes another uniquer code that is no longer as susceptible to typing errors because the resulting incorrectly entered code las a low probability of matching a real code.

In one client's system I've just created such codes for accounts, courses and students. These codes are the primary key (a simple serial number) plus a Luhn check digit added to the end. This creates a unique number (they were unique anyway) that is now non-sequential so simple typos don't easily match other keys. These have proven very effective as the codes that end users entered into a web enrolment form, and replaced the previously 10+ digit codes used last year. Simply typing the next number won't result in somebody else's code.

Examples:

primary key = 8098, Luhn check digit = 6, code = 80986
primary key = 8099, Luhn check digit = 4, code = 80994

As an added bonus I can look at the code and easily work out what the actual primary key value is by ignoring the last digit.

It looks as though this method will break after serial number 9999, but it doesn't. Remember that every serial number gets a check digit added, so there will never be any clashes and the method won't suddenly run out of numbers. The serial number 10000 will be come 10000X (where X is the Luhn check digit); the serial number 10000X becomes 10000XX. All will be unique.

Note that this method is used for generating a "code" field. These are fields that are displayed for the benefit of the end users and are only used on-screen or printed. This code is NEVER used for internal relationships: this is what the serial number is used for.

For variety it's possible to use the check digit to create an alpha character instead of a digit, and the new character need not be put at the end. The simplest method to convert a digit to an alpha character is to use a Choose function. The FileMaker code snippet below will do this:

Choose( LuhnCheckDigit( primarykey ) ; "K" ; "A" ; "B" ; "C" ; "D" ; "E" ; "F" ; "G" ; "H" ; "J" ; "L" )

Note that in this code I have chosen not to return the letter "i" since it will be confused with "1". Also note that the letters don't have to be sequential or in any particular order. Finally, the "L" should never be returned because the Luhn check digit function should only return values between 0 and 9 inclusive. If "L"s start appearing in codes there is some real trouble afoot.

For variety this alpha code can prefix or suffix the primary key. Prefixing the key value makes the code change significantly while scrolling through the records because the first character bounces around the alphabet.

The LuhnCheckDigit() custom function is available here:

http://www.briandunning.com/cf/707

Tuesday, June 21, 2011

Great quote

"Once you're crazy and know nothing about numbers, the chances of finding something psychotic and hateful in a scrabble factory explosion are hovering just around 100%." - Penne Jillette, from the seminal article about the Wingdings font.

Wednesday, June 15, 2011

Cascade delete

I has an issue a couple weeks ago where I needed to import older records into an existing database. I had already planned ahead and left "room" in the primary key serial numbers for the older records, my only challenge was to get the older set of records BEFORE the newer ones.

Importing the older records will put them after the new ones. I therefore needed to (1) export the newer records, (2) delete them, then (3) import the older records, finally (4) import the newer ones again. Easy.

Except for the cascading delete.

Relationships in FMP can be set up to automatically delete child records when the parents are deleted. I like this feature because it enhances data integrity. However this is one instance where it needs to be worked around.

(I know other developers who insist that cascade delete is a Really Bad Thing because it's deleting data, but the data that's being deleted will be a record with a key value that points to a deleted master record: IOW it's dead data anyway. If the deletion of the master record was unintended the it needs to be recovered from a backup, and the deleted related records and be restored at the same time too.)

The step 2 above is where the cascading delete will automatically delete all the child records. In the case of this database this was several table's worth of data. I could have gone through and unchecked the cascade delete option from all the relationships. done the import, then added it again... but that was too stupid.

The solution was to NOT delete the records, but instead over-write them. Doing this is as easy as selecting the option in the Import dialog to "update existing records" and to add remaining records as new records.

Naturally, I tried the process first on a backup copy of the database, and it worked as advertised. It was a simple matter to export, delete, import the old then import the new. Much faster than changing the cascade delete option and much less error prone.

Tuesday, June 7, 2011

40,000 since 16 December 2010

40,000... that's the number of e-mails since 16 December 2010 that one client has sent from a database I built for them last year.

The database automates the generation of personalised messages and optionally attaches a pdf containing information that is tailored to the recipient's enquiry. The pdf generation and e-mail sending is all performed by FMP. This automation has replaced a largely manual process that took days to complete.

Because the e-mail sending and pdf generation can take several seconds to complete, the database generates the message text and pdf then "queues" the message, rather than sending it immediately, thereby saving about 7 seconds each message. A server-side script on FM Server runs every 30 minutes and performs the task of actually sending the messages via SMTP to the mail server. This frees-up the user's computer much faster and thereby improves productivity. A pleasant side effect is that it's possible to review the queued messages and delete them before they get sent. It's also possible to select a queued message and send it immediately, and to re-send a sent message with a single click.

The limitation? FileMaker's Send Mail script step only supports plain text messages and a single attachment. This is fine for the business since all their fancy HTML mail campaigns are done through a third party.

Wednesday, June 1, 2011

Defaults and Exceptions

I'm currently working on a system for a performing arts company, and they wanted to set up next years 260+ courses in a spreadsheet and import them. The client has a huge issue with course setup because each year they spend several days playing around in a spreadsheet then another couple of days battling to import the data into the database. Any half-decent database is relational and importing data from spreadsheets is not a trivial process.

If the interface and data structure are well designed, entering the information directly into the database is faster and easier than entering it into a spreadsheet. This will then save the time wasted with data imports.

Working directly in the database also offers other opportunities for efficiencies that spreadsheets cannot. For instance, I discovered that next years courses will, to within experimental error, be the same as this years courses. The solution was to set up a process whereby the client finds the current courses, omits those they want to exclude, then clicks a button to duplicate them and assign them next year's start dates. The exceptions are handled by either adding or deleting a couple of courses manually. Using this new solution the 264 courses were created in less than 1 minute. The scripting took less than an hour.

The general principle is to create a set of defaults that meets the majority of needs, then manually handle the exceptions. In this case the defaults were the current courses.

Brain Feud