Tuesday, December 18, 2007

Computing and searching speed and I/O access: on mainframes, then on Internet


I can recall, when working for NBC in the mid 1970s, that on a Sperry Univac 1110 it would take about 3 hours in an end-of-month closing to sort maybe 300000 detail records for the voucher register in the general ledger system. That cycle kept me up when I was on call.

By the mid 1980s, it was taking maybe five minutes to sort a similarly sized file on an Amdahl (compatible with IBM mainframe) during a nightly daily billing cycle. As a result, we did all our sorts externally with Syncsort steps. We never bothered to code Sorts in COBOl with SD’s, Input and Output procedures, or even Using and Giving.

By around 1988 when I was at a small consulting company with access to an IBM 3090 at Healthnet in Richmond, the same sized sort would take maybe 30 seconds. I had to reduce the computer costs for a simulation model that I worked on. So, with one program that did a lot of random VSAM access, I sorted in sequence and processed sequentially with “balanced line” matching in COBOL, saved about 2/3 of the cost and it ran in less than half the time altogether.

The preferred mainframe sort product has always been SYBCSORT, but back in the late 1980s and engineering company in northern VA called ICF had developed a competitor, PLSORT, with pretty much the same syntax of commands, but supposedly less resource use (back in the days of 4381 environments).

In 1991, at a life insurance company with an IBM clone Hitachi with MVS, I had a mix of jobs running simultaneously that did a lot of VSAM accesses (simulated by IDMS) to print consolidated salary deduction bills. I remember that one of these jobs could take 2 hours to go through 26000 print image records in VSAM. To run a hundred bills took all day. But by 1998, in a much more modern environment in Minneapolis, the same mix of jobs could finish in less than an hour, well before anyone came to work. I don’t know exactly how the VSAM performance was improved (in terms of CI splits and so on), but it took only about 1% of the time that it had taken in 1991.

Vantage life insurance legacy policy administrative systems had a reputation of running forever for even a small volume of contracts, but by the late 1990s these problems seemed to be overcome. There were no problems at all with any of these systems in the Y2K event.

Is it any wonder, then, that we find that the Internet is so efficient, and that, even when there may be a few hundred million personal profiles and blogs and various sites, anything controversial that anyone puts out tends to be found quickly. It’s just the mathematics of binary searches.

No comments: