abstract: my experiences in looking after a Burroughs B7800 mainframe at the Monash University Computer Centre
copyright: this article is copyleft, no rights are reserved by the author Ralph Klimek 2008
A brief history of the Burroughs mainframe at monash university and my part in its maintenance.
My involvement with the Burroughs B7800 began in 1985 when I applied to Monash (seeking refuge from hard work!) for an engineering maintenance position under one of the most capable bosses I have ever had, Mr Bruce Seaman a man for whom the word "gentleman" had been invented. He had been the Burroughs engineering manager and had been thoughtfully poached by the great late Dr Clifford Belamy, then the Director of the Monash Computer Centre before some years later being the foundation Dean of the Faculty of Information Technology.
At that fateful job interview I was conducted around the machine room, taking note of the large collection of Vax 11 780 and then ogling in awe at the mighty behemoth in the other room. There was row upon row of panels of blinking lights, the mechanical dull roar of nearly one hundred disk pack spindles and the farting of a row of 9 track air column tape drives. In an adjacent room four mighty train printers ate their way through a forest worth of paper. A team of harrased operators scuttled about the place feeding the train printers' insatiable appetite for fan fold paper and responded to the relentless demands of the tape drives. Concerned system programmers, with furrowed brows, perused printouts of core dumps.
I had already known of the fabled mainframe at Monash and the surprising fact that it had been purchased from Burroughs for one dollar. I asked a a fatefull question of Mr Seaman. Why did Monash spend what was then a large amount of money and human resources on this behemoth instead of buying minicomputers (like Vax, Prime or Data General). The answer to my question was delivered without a moments hestitation. "you cannot buy thirty Vaxes for one dollar". We began to talk about the merits of various logic chip families and he mentioned the mysterious CTuL logic family about which I knew nothing. I could tell by this stage that he was to be no ordinary boss, he was very much a hands on engineer and immensely practical. He asked "how does a transistor work?" . I began my answer with a bumbling description of holes and minority carriers and I sensed some irritation. I said, "I think what you are really wanting to hear is this. If you put a little bit of current into the base, you get lots of current in the collector". He grinned. "you are the only would be engineer today who actually told me what a transistor does". I felt hired!
The Director of the computer centre, Dr Cliff Belamy, was no mere administrator. He had a PHD in nuclear physics and had done fundamental work in operating system design for Burroughs. The computer had allready had a long association with Burroughs having had a B5000 and B6700 long before I joined. The B7800 had been purchased from Burroughs. It had been traded in by the Gas and Fuel Corporation and Burroughs had been able to get some taxation advantage by giving it to Monash for one dollar. It was a good deal , all round. Burroughs got a tax break, moved obsolete inventory, Monash gets a third generation mainframe and Burroughs gets licence fees for MCP (Master Control Program) and a host of applications and databases and they must have hoped, for a lucrative maintenance contract that would have been worth about one million 1985 dollars per annum. Dr Bellamy clearly thought otherwise. Monash would do the system maintenance in house. This was a very courageous thing to attempt. But he had Bruce, MCP source code licence, the schematics, the site, power , light and air, but not one million dollars for a maintenance contract!
So there I was, with an awfull lot to learn, not having had any mainframe experience before but having at least come from the printing industry with direct experience in semi million dollar equipment maintenance so large ticket items did not scare me that much. My first assignment was to attend to a train printer. This particular one allways blew its paper motion server amplifier and would only run for a few weeks before yet again blowing up its servo amplifier. Accepted practice had been just to replace the fuses, the dead transistors and move along. I also replaced the dead power transistors but also noticed that the ciruit board had endured many many repairs, most of the solder pads had detached themselves due to repeated transistor unsoldering and reattachment. I studied the circuit looking for clues. The paper motion motor was a DC servo motor, not a stepper. Motion control relied on a optical shaft encoder, a hard wired TTL control microprogrammed computer a half horsepower DC servo motor and this rotten amplifier. The other engineers labored under the illusion that the designers knew what they were doing. My experience in printing industry had taught me that there were many ways to do things, some actually work well, others just barely work and most others just dont work. This amplifier just barely worked long enough to pass quality control and print a demo page in the showroom! I proposed to alter the circuit to prevent blowouts. "Its your neck" said my immediate superior. Bruce did not tolerate ad hoc engineering changes, with good reason off course. So I altered the amplifier circuit and the printer never experienced this kind of malfunction for the rest of its service life. Score one to me. A line printer with a failed paper motion servo was a hilarious thing to behold. A fountain of fan fold paper would be ejected at high speed into the air, with a half horse power motor to drive it. In seconds one box of paper would be squirted out covering the erring printer in a mountain of scrap paper and its attending engineer in shame! More about train printers later.
The next challenge was a rather large switchmode power supply. The B7800 mainframe main power came from a bank of rather large 20KVA 3 phase transformers which stepped the incoming mains to 110 volts, the burroughs being made to USA power standards. A star connected half wave ( yes really half wave rectifier !) rectifier using phase triggered SCRS created a master raw DC supply at about 140 Volts. This fed a choke that must have weighed nearly 100kg and a capacitor bank with about 100000 microfarads. This stored an immense amount of energy and provided an iota of protection against mains transient overvoltages, but as it turned out, not enough. Every time there was thunder storms or even brownouts, one or more of these switchmode power supply modules would die. The mainframe housed about thirty of these PSU modules which generated the 4.5V and minus 2 Volts for the CTuL based logic boards. The PSU had a discrete pulse width modulated controlled with balanced half bridge power drive to a toroidal transformer. This was rectified with a pair of enormous stud mounted power diodes and fed in parallel with all the other supplies. There was a feedback and control mechanism whereby one or two supplies were in "voltage mode" and all the others were effectively in "current mode". The PSUs were repaired on a special purpose test jig which contained a model of the mainframe power sequencer, huge dummy loads, primary DC supply and metering. We possesed exactly the right number of PSUs plus one or two hot swap spares. These could be inserted into the live mainframe after repair. The test jig was to ensure that your repair would work. If you were lucky, you inserted the repaired PSU and nothing happened. However, quite often, the replacement would literally go bang. This particular PSU I was challenged with had defied all attempts at repair. It powered up, produced the rated voltage but defied all attempts to supply the rated current. If it could not supply the rated current it would fail to "cooperate" with its parallel connected sisters. So I began and followed the written adjustment procedures. Slowly I cranked up the dummy load, and as predicted, it failed to produce its rated 135A. "We did that a hundred times too", I was told. "well what do you think?" I was still the new kid on the block and didnt think anything. "Can I see the circuit diagram?" I like to try to understand a piece of electronics before even following a "procedure" . I got a skeptical look. "We changed the circuit board and it still doesnet work properly". "Lets see the circuit." I felt my boot sinking in the quicksand. So I poored over the circuit diagram and clearly there was nothing wrong with that! My new colleages got bored and went about their business. I got busy with the scope and the circuit. The PSU controller was a standard PWM design. Free run multivibrator, pulse integrator to produce a ramp, comparator and switch driver and an elaborate feedback circuit. The free run frequency of the multivibrator was clearly too low for the main toroidal power transformer. Maybe the winder had a bad day and some turns handnt been wound. Or, the spool bobbin had emptied before the transformer was completed. Yet it was still encapsulated and shipped and the product never worked properly. So I suggested that the multivibrator frequency was about ten percent too low and (shock horror) a simple modification would make it work, and did I have permission to try it ? "Well, I dunno, said the "unter" boss. He did not ask the the "uber" boss because the answer would off course have been negative. "Ill be very carefull". "Its your neck..."
So my massive and total reengineering consisted of merely adding one resistor which had the calculated effect of raising the free run rate by about ten percent. On the PSU test jig, the psu now produced the rated current. I let it stew there for a while, not quite trusting my good sense and fortune. Better still it even survived power cycling. The true test would be to demonstrate it working with all the skeptics watching! I called the "unter" boss in after lunch. "Er...um..I fixed it" . "That silly PSU ?" "Yep" . "Er..how?" The "uber" boss was duly summoned and the fix explained. Even so, he directed, it was not to be used except for emergencies. That soon happened with the next power hit taking out more than the usual number of PSUs and my fixed, modified PSU went into the mainframe where it stayed. The challenges continued. The resolution of the next challenge again relied on understanding the circuit. I was shown a switchmode power supply from an early pc, a time when they cost a lot of money. "Nobody" had been able to fix it. One of the primary mains AC rectifier filter capacitors kept blowing up, taking the bridge diodes and fuses with it. They had been replaced a number times, allways with the same effect. "Fix that! (lets see how good you are ...haw haw , well, something along those lines ). It looked like this faulty device was used to test would be engineers. There was off course no schematic available. I resisted the urge to mearly repeat the actions that others had tried to no avail. I got out the most powerfull known instruments in the engineers toolkit, a HB pencil and paper, and proceeded to trace out the circuit. It looked like a typical switchmode supply. Mains input via a common mode supression choke, carbon surgister, rectifier, two large filter caps, etc etc. The circuit I traced didnt make sense. The others that had gathered around to gawk got bored and went about their business. My circuit still did not make sense, and then I saw the light. The two filter electrolytics were wrong, one of them was connected in reverse polarity which would have destroyed it. I made triply sure. I based my hand drawn circuit on what the others had done in prior repair attempts. Lacking a schematic they had followed the silk screen on the PCB. The silk screen was wrong! The others had faithfully followed it verbatim with the same negative outcome. I followed my created schematic and soldered in new components according to how I considered it to be. It worked. For my trouble I had now earned the privilege of fixing the faults that others considered intractable.
The really funny thing abut the B7800 primary DC supply was that there was no transient protection on the raw AC supply. Transient overvoltages are a fact of life but the designers had chosen to overlook this. This is why every time the lights blinked, allways at lunchtime, naturally, our hearts sank as we looked forward to fixing these PSUs. Nearly allways the output transistors failed, usually a powerfull internal arc would melt a hole through the case before the modules 20A magnetic breaker would trip. These transistors must have been at the cutting edge of the state of the art in the mid seventies. They had to switch 10A at 140Volts at about 5Khz. As the original transistors blew, we replaced them with Motorola MJ13335, a very rare and hard to obtain device that could sustain this very onerous requirement. The PSU controller did not have a soft start facility , so whilst starting up, these poor transistors would be operating outside their SOA rating for a few milliseconds. We became quite adapt at replacing these power transistors and the 1 ohm base drive resistor that also served as a fuse. They failed often, and the daily morning drill was to patrol all mainframe cabinets looking for PSUs that had tripped their breaker. It was a rare morning to find that all had survived the evening. I knew about metal oxide varistors, and MOVs of the required energy rating were avdailable. I never succeded in convincing the boss that we should apply them to our raw mains. That was a pity, it might have prevented a lot of unwanted work.
The demonstration of the brute force effeciveness of understanding ciruit theory or failing that of understanding just how things work concerned the successfull explanation of one of the most perplexing faults the mainframe team had encountered. One of the CPU mainframes would randomly and spontanously power itself down maybe once every two or three days. This was a very expensive fault as it could take down the entire system, though not allways. The B7800 was a MULTIPROCESSOR system ( one of the worlds first such commercial systems ) and a missing CPU did not allways mean the end of operations. This fault had to be found and fixed and there was no option of a workaround. My boss, given the very serious nature of this intermittent fault took primary responsibility for finding it. So began a long process elimination and the difficult process of setting up a storage oscilloscope and hoping that after three days it would trigger on the target event and that we could interpret the meaning of the captured trace. After a couple of weeks, it was determined tht the primary raw 140V DC main supply bus would sag to about 90Volts before the undervoltage alarms would trip and iniciate an emergency power trip. We suspected then the primary SCR half wave rectifier. The SCRs gates were driven through a very simple circuit designed to trigger the SCR just after zero crossing. The use of the big SCRs also permitted a very simple power switch instead of a huge magnetic circuit breaker that would switched about 150Amps. The SCRs were allways on line, biult into a large cage with raw, permanently hard wired mains power. No one ever had business in the cage, you wouldnt want to anyway. After relentless probing of the 2 day interation interval that this intermittent fault gave us, the scope lead had to go into the SCR cage. We sourced a "power scope" as we needed a floating reference at mains potential. With great reluctance the boss opened the cage and looked for places to attach the scope probes. Finally , the fault that we had suspected had triggered power scope which triggered our storage scope. One SCR had for just one cycle missed its trigger pulse, caused the primary DC bus to sag for just 20ms and caused an undervolt trip. Now what do we do. We had found the smoke from the gun, but where was the gun ?
At this juncture things became very nasty. There was this auxilliary power tranformer. It was a three phase job, weighed only 60kg, had just under one hundred terminals, came in a blue/green sealed tank enclosure and just to be nasty, filled with PCB transformer oil. It supplied the main power sequencer and the PSU regulators with their logic and operational DC power. There were additional auxilliary windings which provided the phase pulses to drive the main SCR gates. The terminals had allready been resoldered with a big iron to make sure that there be no dry joints, an ever present source of intermittent faults. That hadnt fixed it. Eventually after one weekend we returned to work. Bruce the Boss is looking triumphant. The scope had triggered, there captured was the missing pulse, or the pulse that could-have-been. There was now no doubt, that SCR hadnt been triggered properly. Now what. Again I suggested this awfull transformer was to blame. Again I was told to think of something else. A power transformer is the most simple piece of electronics there is. Its passive, its big,heavy and nothing could possibly go wrong with something that simple. It wasnt a dry joint because we had resoldered everything in sight! The prospect of sourcing an "unobtainable" replacement and then replacing it did not bear thinking about. I went away and drew and redrew the triggering circuit from the official schematics and went back to theory. The trigger pulse when good must look like so and so, we know what a bad trigger pulse looks like . What "circuit" would generate our captured bad pulse. Eventually I saw it. The dry joint was inside the sealed transformer tank through terminal, beyond inspection and beyond repair. The tank contained PCB and the idea of opening it was not to be entertained. I redraw my schematic and wrote a little essay about my conclusion and gave it to Bruce. It was read with some reluctance. The conclusion was ugly. A replacement component would cost many thousands. That would need executive buy-in. Fortuneatly, that executive was the great Cliff Belamy and he could understand electronic schematics. Again I presented my case. You cannot argue with physics, the physics said that under these random conditions that inner terminal would open and the SCR would not be triggered. When it opened it would result in precisely this mishapen pulse. Approval to replace it was granted. It never failed again. The old transformer was not discarded but carefully labelled "emergency use only".
By this stage I was trusted with servicing the disk pack drives. it required attention to detail and scrupuless cleanliness. There was much routine mundane and precision fiddly work required and Bruce ,who had kept this job to himself, was glad to be rid of it. Absolute filters required changeing at six month intervals and head alingment was an on going requirement. Then came a signal lesson. The spindles that supported and whirled around disk packs that weighed 12kgs at high speed contained large precision bearings of low noise and low runout. This was tightly specified because too much bearing noise or runnout would result in disk head servo loss and lost data or corrupted disk packs. It became my task to remove the spindle from a drive and replace the bearings. The physics department at Monash had a world class metal working workshop and they had the skill and presses to force out and force fit the bearings. I ordered replacement bearings, or what I thought were replacement bearings. I had selected them on the basis of shaft size and housing size and clearance. My replacments had been fitted. "What bearings did you order", asked Bruce. "err..type 6502.." "I thought I told you to order..." (crossly pointing to the white board at a 20 digit string). Whoops. "they cost over $500 a set", I offered. I had ordered automotive axle grade devices and they were considerably cheaper. Bruce was the engineers' engineer and he never let the opportunity for a valuable lesson slip past. "Install that useless bearing and then tell me whats wrong with it" . So somewhat sheepishly I installed the spindle, wondering what to expect. What could possibly go wrong? I prepared the drive and put a test pack up. Funny, cannot read data and even less funnier, the track servo would not lock. Nothing I could do and no adjustment I could make would make that disk drive work with my new spindle. A look with the scope at the track lock servo signal gave the game away. It was jumping all about the place, and by then I knew what a good servo signal was meant to look like. "Well ?" "Sorry Boss, the cheap bearings are no good, they have too much irregularity". With bearings you only get what you pay for. Disk pack spindles had to be better than clockwork.
The disk pack drives in the normal course of events exhibited their own wonderfull collection of pathology. They were physically large, monsterously heavy but otherwise well engineered machines. One of the annoying pathologies was their propensity to randomly spin down and shut down and remove all evidence of what had triggered activation of the emergency power down. Needless to say, this bad behaviour was expensive in terms of damaged jobs , wasted time and lost production. It would off course be fatal to the machine when it happened to the HaltLoad Pack or System Pack or Swap Pack. It was unrepeatable and the low frequency of occurance rendered it undebuggable with the conventional means available to us. So it became my task to remediate this and it required uncoventional means to solve this problem. So was born my simple logic state analyser.
Monash had two models of disk pack drive installed, the model 225 and model 235. The external appearance of these units was idnetical. Two large pull out drawers contained one drive which spun a 14inch disk pack. There were 20 heads in a drive. The head tower ran on a small raceway, actually a precision railway track, and the head tower was moved by a one kilowatt linear motor. The spindle motor, linear motor, card cage and absolute filter assemblies were mounted on a solid casting which weighed about 60kg. Mechanically, these drives were biult like tanks, and they had to be. The mechanical forces involved with thrusting the head tower which weighed with its bearings, tower, carriage and complement of heads nearly 1kg through the extent of its travel in less than 10ms imparted sufficient bending moments on the casting to effect head alignment. The casting must permit no appreciable flexure along the axis of the linear motor. The 225 drive stored about 70Mbytes. The heads required absolute alignment with the site alignment pack. Head alignment had to be religiously maintained because a pack written on one spindle had to be readable on all the others. The track servo was derived from an optical grating biult into the raceway, and here was the nightmare scenario in the model 225 drive. The source of illumination for the optical grating servo system was an incandescent light biult into a cavity in the track raceway. To change the lightbulb required one to remove about half the heads on the head tower, a more thoughtless design was hard to imagine. it was an effective strategy to encourage customers to maintain their service contract. The track servo mechanism was apart from the use of an incandescent bulb very simple and rock solid in practice. The grating pulses were counted so as to derive knowledge of where the head tower was actually located. The analog signal coming from the grating transducers provided the servo detent that maintained track lock. There was a small analog computer which tracked temperature changes which would change the linear reach of the heads. Apart from the rotten light bulb, the 225 drive, allthough even by the standards of the middle seventies was rather small in capacity had rock solid reliability provided that head alignment was maintained. We possesed the off line disk pack drive exerciser unit, so maintenance had no system impact.
The Model 235 drive was a state of the art unit which pioneered embedded track servo data which was interleaved in the inter sector gaps with a special factory recorded servo burst. The great advantage of the 235 drive and the embedded servo was that disk packs written in one site could allways be read on other other spindles, with confidence, but also on other sites. Because alignment packs were all slightly differant, the 225 pack could not be reliably read in a differant site that had been aligned to another alingment pack. The 235 drives servo could also compensate, up to a point, for spindle bearing eccentricity and runout. The track servo permitted tracks that were closer and as I recall there was 813 tracks, about double the number of the model 225. They were also know as quad density drives, the hierachy went something like this. The model 215 drive used FM encoding (manchester code), the 225 used NRZI coding, the 235 used NRZI with embedded track servo. The 235 had a more powerful head servo with faster seek time, and to compensate a monsterously heavy deck casting. A quad density pack stored 170Megabytes and at its peak Monash had 80 spindles and "muggins" was their slave! Routine maintenance consisted of measuring the absolute filter back pressure, and changing the filter if air flow was down. An disk pack drive absolute filter had to remove particles as small as bacteria and maintain sufficient airflow to provide the disk enclosure with net positve air pressure to keep dust and vermin out. The act of spinning 14 inch 12 platter packs at 3600 rpm would result in about 700 watts of thermal dissipation in the drive enclosure, and this hot air had to be removed and replaced with surgically clean air. They used HEPA absolute filters were a biohazard because they could trap bacteria and fungal spores. The manuals warned about avoiding injury, with good reason. Once a used filter I was removing slipped and I grabbed it. I cut my hand on the dirty side of the filter element. The cut became infected and did not heal for nearly half a year! A filter that sat idle for a a couple of weeks would smell like a mushroom farm as the trapped fungal spores would spawn and begin eating the trapped dust. So effective were the fungi in digesting dust, a "clogged" filter could be reused (for emergency purposes only!) after letting it stand idle for about 3 weeks. After the Christmas break, when the machine was powered down for about 10 days, the data centre would smell distinctly mouldy when we spun up the idle drives. The other essential duty was to clean the heads. Normally you would never touch the heads, but every few weeks I had to inspect the drive heads and clean them of a build up of a black gunge that accumulated on the leading edges of the heads. The gunk was residual dust and it would cement itself firmly to the head. Failure to remove this gunk in a timely manner would result in the dreaded head crash.
Reviewing the daily error logs, would show particular drives accumulating more than their fair share of read parity errors. The cause would have to be found before the dreaded write parity would occur, as this would lead to aborted jobs. The usual reasons could be suspect media, head alignment or spindle wear. Despite the embedded servo, head alignment was still an issue, and excessive head to head tolerance would manifest itself as servo seek faults when switching from head to head during an extended read or write. Consistent read parity errors was indicative of worn out spindle bearings, and this could be determined by observing the servo waveforms for well known patterns of jitter. Eventually one got to know the look and feel of "bad servo". Head alignment was fun and scary. The drive was caused to reseek to a known "reference" head and a computer program was run that switched between the reference head and the head under test. We would align the head to minimise the servo offset. The really fun bit was when, if you had been just a tad clumsy and pushed the loosened head too far, the drive would detect a "servo fault" and cause an emergency head retract. This was a one kilowatt linear motor and could have easily removed fingers, crushed tools and perform random orthopedic surgery on a thoughtlessly placed hand! The worst the model 235 did to me was once to eject my alignment tools across the room and cause me to utter some unkind phrases. Despite the considerable risk of inducing a head crash during this procedure this never seemed to happen. The disk drive was full of fans, these fans were 240volt ac powered fans and these things can have the darndest failure modes. There was a fan cooling the servo amplifiers. As it failed the servo amplifier became hot and eventually caused servo failure emergency retraction. If the fan motor became intermittent, the inductive spikes would cause random pandamonium. The only effective way of dealing with them was to visit each spindle and insert a tie wrap into the fan. The resulting noise quickly iformed the listener the status of the fan. This did not harm the fan, but if the fan was healthy you got a nice grinding noise. The Boss was braver than me and used a type one finger to sense air flow but he did get bitten occassionally. He tried this trick on the huge "Tarzan" fans at the bottom the mainframes and got painfully bitten there too. Call me a wuse, but I still think a tie wrap is the better fan rotation detector!
Head crashes were an on going concern. The Burroughs drives were very resistant to crashes, and this was largely due to the rather large heads. The Model 225 heads were nearly 15mm in diameter and while we had them I dont recall a crash on this disks. We once had a major power failure and one of the drives had a failed emergency head retract. The heads landed on the disk. In this instance, it was impossible to spin the drive up again due to the interlocks. I had to reach into the drive enclosure and drag the head tower back to the unloaded position. Poor heads, poor disk. It was , off all the disks to chose, the HaltLoad pack. HL Pack was what the Burroughs booted from. We thought we were doomed, it was a major exercise to biuld a HL pack all the way from booting of a card deck, loading the standalone tape boot, loading the system pack biuld utility. We spun up our "crashed" HL pack, it spun up, loaded...we waited for the grinding noise that announced a crash. We waited in vain. The Boss said boot it, and we did. The Burroughs ran just fine. The model 235 heads were smaller and rode closer to the disk platter surface and they did crash. The crash reasons were mainly due to contamination of the enclosure with labels that sometimes accompanied the disk when it was loaded, sometimes it was vermin. I once removed the remains of a moth that had been introduced into the drive enclosure probably during loading. The absolute filters could rupture and permit unfiltered air in, or when they were really clogged, there would be insuffiencent air to pressurise the enclosure against dust ingress. The Buroughs drives also had a long purge time. The purge time interval was a fixed time whereby the drive platters were spun without the heads being loaded. The idea was to permit centipetal force to remove any dust that may have settled on the platters during loading to be expelled before the heads were loaded. The Burroughs time was about 30 seconds. The Digital Equipment Corp RM03 and RM05 (rebadged Control Data Storage Modules Drives) drives had only a 5 second purge time and crashes were excessive and frequent until we modified the purge time to 30 seconds. The Burroughs packs also had a fine nylon debris filter on the bottom air flow port. This filter was held in place with a rubber O ring under tension. This was a very bad design as rubber under tension will fragment as it ages and shed powdery, sticky gunge into the air flow. If we were not dilligent, the O rings would shatter resulting in the filter grinding and O ring fragments flying into the disk enclosure. A full head crash could be very expensive, Burroughs charged nearly five hundered dollars for one replacement head and a drive used twenty of them. Untill we were able to purchase used drives on the secondhand market, a head crash meant that the particular drive was taken out of service and forgotten about. Typically about 5 or 6 heads would be too badly damaged, the others after meticulous cleaning could be reused provided that the flying surface was not scored.
The most difficult faults to fix where the ones that resulted in a servo emergency head retract. If there was such a fault, then it was nearly impossible to probe, the retract would quickly remove evidence of what had caused it. Conversely, a run away servo would slam the head tower and twenty thousand dollars worth of heads into the head carriage stop at the end of the raceway, at worst requiring you to realign the heads. Servo faults could be repaired by disconnecting the linear motor drive and manually loading the head carriage and probing the circuits.
The track servo was highly innovative. I am not sure what IBM was doing at this time, but they could not have been far behind. The inter sector gap, blank on older drives, had an analog pattern recorded that resulted in a polarity symetric pulse train that encoded in gray code the track number and the coarse and fine servo. The servo pulse was timed and gated and stored. The postive pulse was sampled and held and compared with the later negative pulse. This generated a DC offset that was amplified and applied to the linear motor to hold the heads on track. The track number code pulses provided the fine servo, using the same method. As the heads traversed the platter, the track servo code was read and decoded and the logic learned where the heads actually were on the platter and would modulate the linear motor current to slow the carriage as it approached its desired destination. When the track code was the desired one, it would go into detent mode where by track alignment was maintained until told to seek to another head or track. If the head stack was correctly aligned, when switching heads, no reseek should be required , only a resync. Generally, this mechanism, allthough complex, was extremely reliable and is reproduced in every modern "voice coil" hard disk. So ubiquitous is this technology now that the term "voice coil" and what it meant has fallen out of common knowledge.
The read and write circuits were implemented in Motorola Emitter Coupled Logic for the required speed. The read discriminator employed a phase locked loop to generate a reference clock and window functions to run the NRZI decoder. The raw data along with the recovered read clock was despatced to the drive controller along with copious unary status signals.
The drive electronics was made and housed to to facilitate maintenance, the logic card cage was on roller guides and could be withdrawn from the drive enclosure whilst in operation to permit probing.
A supervisory circuit consisting of numerous analog sensors and a giant AND gate which drove the emergency shut down would protect the drive, pack and data in the event of numerous physical ,servo and logic faults. As data was being written the read circuits were active and on miscompare would shut the drive. This was to ensure that any write amplifier fault could not overwrite the track servo pulses in the inter sector gap. Other conditions monitored were cover and drawer interlocks, spindle speed and a number of servo conditions and also the condition of the power supplies. The supervisory circuit has one major deficiency, however. Having triggered an alarm which shut the drive down, it did not remember or report the reason! It was this reason that prompted me to biuld my simple and effective logic state analyser (insert link here) which could be left running inside a drive to catch the irreproducible shutdown reason. Nearly all the drives were plagued with this condition, some to the point of uselessness. The shutdowns were random and at an average period of about 3 days.
My state analyser quickly earned its keep when it revealed that door interlocks were being triggered by the rocking motion induced during long seeks! The other reason was harder to determine. It was actually two factored. The servo supply was 50 volts and draw peak current of over 20A. During some long seeks, the voltage sag would trigger an undervoltage alarm. The root cause was the large electrolytic filter capacitors had lost capacitance with age. This was hard to proove, and I had to proove it before the Boss would approve the high cost of replacing them. I measured the capacitance by biulding a little constant current source and timing the charge time. The time required to charge a capacitor through a constant current is directly proportional to the capacitance. This quickly revealled that nearly half of them had less than half their rated capacitance. Replacing them removed this annoying intermittent behaviour. The other factor was more mundane. These caps were large and heavy and were mounted on the PSU pcb with only there screw terminal posts. Under the action of the drive rocking the pcb copper had eroded and on some drive pcbs there was a burnt ring around where the screw terminal posts made contact with the pcb traces. These heavy capacitors should have been mechanically clamped down and heavy flexible connections made to their terminals. I had to inspect all the drive PSUs. They were heavy and my simple fix was harder to do than to write about!
A disk drive requires a controller and formatter. These disks could read and write only raw data pulses in a completely arbitary way. The mainframe had an advanced virtual memory system that knew nothing about heads tracks and sectors. All it did was to think about data in terms of logical blocks. Even the IOM IO data chanel processor did not care about the drive geometry. That was the function of the disk pack drive controller. It was connected to the PCC peripheral control cabinet through an object called a Host Adapter. It translated the mysterious IOM bus and emitted this gigantic hose, more like a hawser of some forty odd 95 ohm coaxial cables that formed the high speed host connect bus. There were not enough connectors or Host Adapter had no concept of Interupt for there was another hose that connected the disk pack drive controller directly to the IOM. This was called the ready status line and presumeably would interupt the IOM on completion of a transaction. I once had the dubious honor of installing one of these hoses on a controller in a far distant corner of our data centre. It had a raised floor, as was required. Only problem was every floor tile in the vicinity of the new controller was trapped by the drive string. "Muggins" then volunteered to crawl under the raised floor with this cable tied to my foot. It must have taken half an hour to inch forward about twenty meters through cable tangles. Plenty of multi leggity beasties down there too, just as well it was pitch black, if I could not see them, then they couldnt see me!
The controller cabinet could house two controllers, each controller was connected to its own IOM via its own HOst Adapter. Each disk pack drive could be dual homed on each controller. This gave great resilience to our operations. It permitted us to maintain either drive, controller or IOM offline whilst not impacting access to any disk drive string.
Each controller contained the heart of a B700 Burroughs Small Systems Minicomputer which prooved to be extremely reliable, with the execption of the writeable control store which used fairly primitive and hideously expensive 1Kby1bit static bipolar ram chips. The control program could be loaded from casette tape, or if the mainframe was online and running, the firmware could be loaded online. This was lucky because the cassette drive were sticktly read only devices, the firmware casettes were irreplaceable. The only time the cassettes had to work was just after a total power failure,otherwise once up with some minimal firmware that permitted the mainframe to be booted, the operational firmware could be loaded online. The magic incantation to do this was allways written up on the ops white board. The writeable conrol store parity errors would halt the controller cpu, often the fault was soft and operations could continue if the controlware was simly rewritten. When that didnt work or if a particular controller hiccuped too often we would track down the wcs ram that had gone soft. The controller had its very own operation front panel and diagnostic console and there was a hard wired ram tester that would automatically point out the dodgy ram chip.
The drive controllers were naturally extremely important because of their intimate relationaship to the on going health and integrity of the data they were transporting and it was only after a long period that the Boss thought as ready to be trusted with the controller. It was possible on the front panel to mimic the commands the IOM was sending it. It was also possible to load from cassette or on line a maintenance and diagnostic program that replaced the operational firmware. The maintenance panel was programmed with toggle switches. It was possible to overwrite the data on any of the spindles with a careless command. We were sternly directed to never use the write tests, the risk of smashing something was simply too high. I did have occassion to do so, when I was repairing the data discriminators and needed a continuous write to probe the circuit. The diagnotics front panel was invaluable when permforming head alingment. We could use system utilities, but the latency was often too high to be usefull.
The quality of the hardware in the controllers had to be seen to be appreciated. The B700 part of teh controller was made on PCBs, the drive IO boards were all wire wrapped on marvelleous low profile machine pin sockets. The boards looked like high quality wire wrap prototyping boards. When looking at a full card cage of these boards, the gold on the wire wrap pins would gleam like aladinns cave. We possesed schematics for the controllers and had spare controllers available for repairing controller cards. The logic family used was 74Hxxx , which was an early predeccesor to 74 series shotcky logic, in ceramic packages and was extremely reliable. In about three years I only had to repair about 10 controller boards, most of the faults were control store ram failures. One of the boards was the cassette data decoder and dma controller. Would you believe that the casette drive had to do DMA ? There was no loader program, a hard wired address generator would write the control store as decoded data came of the casette tape. The casette interface even had its own dedicated front panel and diagnostic control.
The controllers had a remarkable repair flowchart and scripted procedure that if religiously followed would find any permanent logic fault. It was not required even to be aware of how the controller cards worked. The swapping of cards until things worked was never an option, they had to be debugged to component level. It was allways deeply satisfying to be able to point to a chip and say "it was that one" .
The peripherals that generated the most amount of work were the tape drives. We had up to 10 drives online of Model 9545 9 track PETAPE units. It was mainly with the tape drives that the operations team were most intimately familiar. The tape was the archive medium, the source and sink of data and the medium of interchange.
A nine track tape came on large spools and fully loaded were quite heavy. So heavy, that to rotate and stop start motion required the motor and the servo amplifiers to supply nearly a kilowatt of power. The rapid start stop motion was performed by the tape capstan a large diameter cylinder which was perforated with holes,behind which was a supply of vacuum, to assist the capstan in gripping the tape. Tape was buffered in a deep vacuum column which was amply supplied with numerous sensors so that the spool motor servos could keep the columns supplied with sufficient loop of tape to furfill the capstans motion requirements. The capstan had to rotate during reads and writes at a constant speed. It had a fine optical grating and moire pattern sensor that provided a digital speed feedback. The grating pulses were integrated to create an analog signal that was proportional to the instantaneous tape speed. The capstan was also coated with a rubber like substance that provided a small measure of tackyness. Tape slippage was a problem that could be hard to diagnose, it could potentially result in unreadable tapes being written. The tape heads were a marvel of precision engineering. It has three parts, a full track erase head, write heads and read heads. Data being written was a few milliseconds later read by the tape controller which could immediately assess the parity of the freshly written data and interupt if a write parity had occurred. A complex algorithm was then executed that would backspace, retry. If the IO controller program thought that a persistant write parity was due to a tape oxide dropout, the partial written block was backspaced, erased, and forward spaced to hopefully find good tape. Our morning perusal of the system logs to look at tape parity errors would result in some debate about the errors being media or transport related. In the event, they were nearly allways media related unless the servo a sometimes spectacular failure. Some of the failures could result in full servo power being applied to the tape spools resulting in complicated and huge knots of tangled tape which sometimes I found easier to disassemble the hubs and spools than to try to unknot the mess. Sometimes the tape hubs would release whilst the tape spools were spinning at high speed during a rewind operation. Occasionally a spool could be ejected at high speed. The 9545 drive had a pnumatically operated door, designed to keep finger pockers away from the spinning hubs.
The tape drives were large cumbersome machines of unexpected machinical complexity, mainly in the air system. One of the difficulties we had was insufficient vacuum and air pressure. A 2 kilowatt single phase motor drove on one crown pulley a pair of high delivery turbine pumps that created the pressure and vacuum air. I think the problems were as a result of the american AC power running at 60Hz. The induction motors running at 50Hz were a bit too slow, even though they had been fitted ( or so we were led to believe ) with a pulley set with the appropriate ratios. Changeing the drive belt was a job we dreaded, it was hard to access, dirty and full of springs to keep tension and sharp bits to help impale your fingers. This part of the system was thoughfully biult into its own cage for safety, but this cage really did get in the way of maintenance. Sometimes, no matter how hard we tried, the replacement drive belt would not ride the crown pulleys and would be ejected. This could be due to the motor drive plate bushes being worn resulting in the pulley shafts not being parallel. The crown pulleys driving surface had been coated with a very hard shiny material. When this had worn through the belts would wear rapidly and not ride the crown properly.
The tape spools were coaxial, which was fairly unique. This permitted the drive transport to be somewhat more narrow than comparable models from IBM and others. The back take up spool could not as a result of being inaccessible be manually threaded. The self loading mechanism was a joy to watch. It had just barely enough intelligence to retry a failed loading operation, this was remarkable because the 9545 had no microprocessor or even microsequener controller, it was all done with asynchronous hard wired logic. The main problem with this style is its intimate relationship with the machanical operation of the transport. This could make debugging logic faults extremely trying. Just about everything could and would cause servo trouble. The very high transient currents present in the spool motors would destroy the carbon brushes long before the carbon had worn away. First the little copper flexible wire would vaporise, followed then by the spring, the remains of which would have to be fished out of the motor. This could only be dealt with by regular inspection. If the spool motors jammed as a result of a tape tangle, this could destroy the servo amplifiers. These were a remarkable circuit. A full H bridge drove the motor with a pulse width modulated speed control system. The peak motor current was limited by current feedback through a giant green resistor , that looked like a barber pole. The peak inrush current was controlled through a 47uH 50A inductive choke reactor. The H bridge switched plus and minus 90 Volts, enough voltage here to give you a zapp, if you got careless. The H bridge transistors were a specially order type from Motorola, other commercial types just blew up in this class of service. There were some transports that we scrapped because we never succeeded in fixing the servo amplifiers. You would replace all transistors, install it, power up, load a tape and just when you thought all was well, the servo circuit breaker announced yet again that you had just failed to diagnose the underlying fault. This was an expensive folly and after a few expensive iterations transports that started to do this were junked. By then there were many appearing in computer brokerages and we did buy transports by the truckload. I suspect that there were transient armature winding to shaft shorts, that were only manifest when under the magnetic force of full armature current.
Some aspects of maintenance design were very poorly thought out. There were many trimpots in the servo gate. You had to put the numerous little cards on an extender to access the trimpots. This was very thoughtless, right angle trimpots were available. It was particularly annoying because multiple cards had to be extended and the machine powered off each time. It was not really possible to exercise the transport. You could not make it space forward slowly continously, for example. The transport possesed absolutely no maintenance controls at all, it either all worked or not at all. The hardest to diagnose fault were vacuum sensor faults. At numerous places about the vacuum tape buffer columns were small diaphram pneumatic sensors which directed the spool servos. They normally responded rapidly to changes in the positionof the tape loop but they could become choked with dust and tape debris resulting in unreliable action. This would result in unstable transport operation that was very hard to debug. The transport logic would these days have been put into a simple microprocessor, but that wasnt an option when these transports were designed. They would have benefited from a simple microsequencer control but even that wasnt done. Instead the machine was controlled by asychronous logic and external signals from the transport and servo transistioning a grey code counter from one state to the next. It was impossible to hold or set a particular state. The only debug tool we had was a binary led display that showed the grey code of the machine state. It transistioned far too quickly to be usefull to mere humans. If the transport failed to load or run you were supposed to look at the lights for the last state before the transport went into safe mode. I have no idea what the designer of this less than brilliant idea was smoking at the time, but it couldnt have done him any good. It was certainly useless to us. There were spare slots in the transport logic gate that were wired, but unused. I suspect that there was a transport exerciser card used by Burroughs field engineers, but we certainly didnt have one.
One of the truly annoying things about this transport and other Burroughs made equipment was the use of white colored wire for everything. You could have any color as long as it was white. This silly practice is common in the avionics field, every wire had a little ,often unreadeable, id tag to facilitate wiring of large looms. Sometimes the id tags agreed with the circuit if mentioned. One of the impressive things were the extraordinarily expensive IO connectors and cables. The pins had heavy gold plating and IO cables were sometimes 5 or 6 cms in diameter with hundreds of conductors. We never had trouble with IO cables, they were made with exceptional quality. The mainframe inter cabinet cables were even more remarkable. They were a grey flat ribbon of little 90 ohms air dilectric coaxial cables, all of them were some 25 meters, all the same length to equalize propagation delays. A coil of this cable was just liftable by one man. Most of the under floor space was completely choked by this stuff. Installing and moving it was a nightmare, faulty or obsolete cable simply had to be left in place, untangling it was physically impossible.
There were a variety of tape capstan servo faults. The capstan motor was a special low inertia hollow armature design. There were very few capstan amplifier faults due to the smaller electrical load of the the capstan motor. Most faults were as a consequence of the stop start speed profile being in error. If the capstan speed was wrong, a data block could be written that was too long or short, resulting in a data block that could not be read. Capstan speed was set with a mechanical turns counter. This was a very unreliable method as it required human reaction to watch a stop watch, jamm the rubber wheel of the turns counters against the capstan and disengage on the beep. I decided the procedure was rubbish. The Boss had also come to the same conclusion, so the method we used was to write a known bit pattern block and develop the tape in a remarkable substance from the recording industry called Magnasee which was a suspension of iron dust in solvent. The iron dust would attach itself to magnetized areas on the tape and the tape block could be measured with a ruler. With this method we just inherently knew that we could write fully interchangeable tapes. We could have counted capstan servo grating pulses, I supose in time I would have made a little digital counter and display to do this.
The transport had a number of safety shut down features that were not triggered by measured problems like air pressure failure or servo failure, but rather by the sequencing logic getting into the wrong state. It would detect and bad state and then power down, not recording the reason. Damm them, this was a very frustrating design oversight or appalling penny pinching.
The 9545 transport had a remarkable vacuum operated tape spool hub that never operated positively and reliably, I think due to the general lack of adequate vacuum. We made a descision to modify hubs from some disused earlier model transports, of which my predescors had made a very enlightened decision to save. So successfull were the "Potter Hubs" that they were also used on our TE16 VAX transports as well. It would appear that spool hub reliability was an issue throughout the industry.
The tape controller which stood nearby the transports did have a diagnostic front panel that permitted simple read, write ,rewind and space tests to be performed. The controller had one common data formatter and data and clock recovery per chanel. We had four controllers, three for each IOM and one spare that was permanently switched to the diagnostic panel. A thingy called the exchange permitted full path redundancy between all controllers and all IOMs. Path and unit redundancy was a unique feature to Burroughs Large Systems and permitted large system uptimes even though the underlying hardware could have been more reliable.
A tape transport is only as good as the tape. There was huge variation in the quality of tape. Some tapes were allmost unuseable as they left huge amounts of gunge on the heads and capstan and data once written was soon forgotten by the tape. The ops staff were on the ball and defective tape brands were discarded. The model 9545 also had a special tape pulley. It was a hollow cyclinder that was pierced with slots. The slots had a sharp 90 degree edge with respect to the tape which acted like a planer blade that scrapped gunk off the surface of the tape before it travelled over the read heads. ONe thing that could happen was that a sticky label, or gunge from the sticky labels would find its way onto the tape surface. It you were really lucky, not only would it obscure your irreproducible data but attach the tape to the tape head. Such a tape jam could be detected because the level in the tape column buffers would be all wrong, and the transport would shutdown. If the tape had stuck itself to the capstan, then all kinds of horror would ensue. The capstan motor servo used the same amplifiers as the high power spool servo, and when it descided to tangle tape it could do so with great efficiency! I once unwound nearly 50 turns of tape from a now ruined capstan. The inner most layer of tape had fused, such was the frictional force. Eventually the transport shut down after the capstan servo amplifier failed and threw the main breaker. One of the more perplexing write faults was a capstan servo motor that was intermittent. Brush replacement was carried out on a regular schedule. My method for detecting an intermittent motor was to run it from a constant DC supply through a resistor and monitor the resitor current on a scope. The noise pattern was indicative of the health of the commutators. If the armature went temporarily open , then this was immediately apparent on the scope. The motor should have a mechanical load applied, that they magnetic forces can act on intermittent wires. The capstan servo motor was prone to this pathology because the armature was hollow. A cylinder carrying the armature winding rotated in a fixed air gap, the inner cylinder that completed the magnetic flux circuit was fixed. The meant that the motor could not develop much torque because the armature air gap was thus quite large and the armature winding did not rest in metal slots. It was just a flat cylinderical coil molded in epoxy. What had been achieved by this, was a motor with extremely low inertia, just what was need for rapid capstan stop start/ reverse motion. A bad capstan motor could also be diagnosed by watching the read after write amplitude envelope. As we were using second hand equipment, some of the commutator bars had been worn down to nothing or were very badly pitted. We had access to a lathe so I turned up the communtators to good effect. Machining copper is hard, it galls and wont cut cleanly unless the tool is razor sharp.
Interchanging tapes with other sites with differant computers was a challenge. Allthough the nine track phase encoded tape was an IBM and then ISO standard, the blocking and file system conventions were not. Tapes written by one operatiing system may be have been automatically mountable on another. Tapes could still be mounted as "raw" and "raw" data blocks read off. Monash employed one person whose main job was to write programs to read and understand the many mystery tapes that researches would sent in for exchange or processing. One famous tape we "had" to read had been written on a faulty machine. Its blocks were too short, the capstan motor that had written it had been set too fast. When all else failed, we developed an image with Magnasee, determined what speed the tape had been written at and set one of our transports to this bad speed. The bad tape was read succesfully.
A "scratch" tape allways had to be loaded. The Burroughs print queues would normally disgorge their contents to the printer room and 7 assorted train printers. If all printers were offline, the print queue would print the to the tape instead. The most pointless events would have a full report printed and most of them could not be disabled. The CANDE terminals would generate a full and seperate report anytime someone logged in, ran a compiler, and logged out. I dont think this useless activity could be stopped. It created so much scrap paper that the ops collected and sold it. Enough revenue was collected from this to help fund the annual Christmas party.
This then brings to around to talking about printers. This was an industrial scale operation. A grate deal of operator drudgery was taken in servicing the printers' insatiable appetite for fan fold paper on which to carefully print mostly complete and total bumpth. The CANDE reports were a blank page seperator, a big bold friendly banner , a page of details, how many CPU seconds, IO seconds, resources used etc etc for each and every time a CANDE session had been started and terminated. Must have been usefull billing information for the sixties. The operators also had the dreary task of bursting the printer output and removing the occasional important looking printout.
The Burroughs , as was appropriate for its sixties heritage , was primarily a batch system. Even the engineers "on-line" diagnotic programs would write their output, not to the supervisors terminals, but print to the line printer!
Our four main train printers were in a sepearte room to mitigate the dust they disgorged and the appalling racket they made. In normal operation they operated under their sound proof hood. but even so it was not possible to converse nearby. Being primarily a mechanical device, and a reasoneably complicated one, they required a lot of engineering type of maintenance. In those days, a computer engineer really did walk around waving not only a smoking soldering iron but also a spanner. It was nice to stand in the printer room listening to the racket. They made a particular kind of music when the mainframe was operating properly. I can still hear the little repetative song they made when printing CANDE reports. If there was a problem, this little singsong got slower or died out as people left the terminal labs, which were in another biulding. Sometimes the absence of the printers' rackett was a good indication either something was wrong or something you just did was not a very good idea.
It is very rare now to see ruled, punched , lined and numbered fan fold computer paper. Programmer used to print out their stuff and poor over it, largely because the line editing tools of this era were not really editing tools, but rather a poor quality key stroke entry system done by "Key Punch Operators" who could type text but also all the fidely random puntuation characters that are so critical. They could pick out all the differant forms of back tick, apostrophe, comma and assorted diacritical marks that mean so little but yet so much to a compiler. Legions of bored clerks would poor over reports on fan fold paper. Mainframes were very good a generating a daily tonnage of the rubbish. I think Burroughs MCP was good at producing useless paper because when just running it would produce reports about itself. It must have looked good in the showroom to the Suits, the machine just looked busy, very busy indeed, and no peripheral could look as busy as a 3000 LPM train printer ! These printers also kept us engineers busy as well.
A train printer is an impact printer. For every column on standard fan fold paper, 132 columns, there is a solenoid activated hammer. The solenoid strikes it, at it flys forward under its just acquired momentum striking the paper. There is a ribbon, that is somewhat wider than a 132 column form that is wound as long as printing is taking place between the paper and the type slugs of the train. The "train" refers to a continuous wagon train of hard steel type slugs that are confined to a hardened steel race track. They have gear teeth behind the type face. At the end of the train is an idler pulley, to help them round the corner, the other end is another pinion that is forced around by a one horsepower 3 phase motor. An eletrically operated oil pump keeps the slugs racing around without disaster. The noise that the train makes is apalling, not so much as a scream but a high pitched roar of white noise. It is because of this racket that the printers are designed with an integral sound hood and interlocks stop the train when the hood is raised.
The slugs on the train are all differant and the order matters. The characters are placed on the slugs after statistical analysis of computer reports to optimise the printing speed. The hammers can be fired in parallel whenever the characters in the the line to be printed line up with their current position in the slug train. Because of the huge inertia and frictional losses in the train, its speed cannot be controlled and has no real stop/start capability, so the train slugs gear teeth are sensed by an inductive sensor and this becomes the printers internal source of synchronization. They drive a counter which addresses a ROM containing an image of the train.
The contents of the line buffer are compared with the ROM and a match results in the addressed hammer being fired. The train is still in motion during the time that the hammer is in flight, so its time of flight will determine if that particular column is printed dead on, or offset, resulting in half the character being printer, or if the hammer is bad, the neighbouring character on the slug. Bad for accounting reports, and to mitigate it, there were no adjacent numbers in the slugs. The time of flight of the hammer was influenced by many things. The hammer guides could become clogged with a mixture of congealed ink and paper dust. The solenoid air gaps, were filled with a mylar tape, so that remnant magnetism would not have the hammer stick to the solenoid armature. As the machine warmed, the resistance of the solenoid coil would go up by a fractional amount. The adjustment that we hated to perform was "phasing". Each hammer could be moved back and forth with an eccentric adjustment tool. Column by column by trail and error each hammer would be aligned to produce the best column printing. It took forever, the results were never perfect and our heads would be inside the printer whilst it was running. Ear muffs were required for a frustrating , dusty and dirty job. The mylar air gap tape would only last for a couple of months. To replace it the hammer bank would have to be withdrawn and "phased" .
Fan fold paper was perforated between sheets and along the sides. The holes was where the sprokets of the tractor feed would positively grab the paper. Good paper would not be instantly torn by the paper motion servo motor. The tractors and ancillary paper motion mecanisms were driven by a toothed timing belt. The design of the printer was thoughfully arranged to make the frequent replacement of this belt a near impossibility. I became quite good at it, from a one day job to about an hour after I learned the sequence of what had to be unbolted and what could be unbolted so as to preserve mechanical alignments. The paper motion motor was a DC servo motor, the servo amplifiers were quite reliable in the Control Data rebadged machines, less so in the Burroughs house produced machines. Eventually the optical shaft encoders would fill up with paper dust resulting in an uncontrolled slew that would fill the sound proof bonnet with a hard mass of compacted fan fold paper. Then the teeth would be stripped from the timing belt, and if you were lucky the servo amplifier would be destroyed as well.
These printer could also be programmed with a paper tape! There was a small paper tape reader that was mechanically coupled to the paper motion drive shafts. Its purpose was to iterprate generic form feed instructions in the print stream. If there was some special stationary to be printed, the paper tape was programmed with the vertical formfeed displacement. That way, a generic vertical tab in the data stream could form feed the required amount, the data source would not have any knowledge of the layout of the paper form. The standard tape contained a code that made it form-feed one standard fan-fold page.
Paper could be so called multipart with a carbon paper spacer between each sheet. The heaviest was the dreaded six part form, dreaded because of its stiffness and mass, impact printers could not be relied upon to print through six carbon layers and it was often beyond the power of the paper motion system to move. It also resulted in huge drudgery for the ops staff who had to seperate the forms. Sometimes, forms were coded with serial numbers for things like official taxation certificates, spoiled forms would have to be recovered for secure destruction and the serial numbers recorded manually.
There never was a fire in one of these printers and considering the volatile mixture of high power electronics, oil, air ,heat and paper dust, this is remarkable.
There was one of the printers that had a rare intermittent problem that I never positively located. The was an STC band printer which had the job run destroying propensity to fail to space one line in a thousand page run. If the print run was on special stationary, of which often there was only exactly the right amount supplied, this could result in some pain. This fault could not be reproduced and after ruling everything out I tried to get the paper motion motor replaced. These motors were low inertia DC servo motors. The armatures were copper coils rotating around a fixed iron core to complete the magnetic circuit. The armatures , not having a nice snug iron armature core to bind to, must have slowly suffered fatigue failures. My armature was probably intermittantly going open circuit in one of the poles. Replacement motors cost many thousands and unless I could prove that this was faulty the Boss never approved the cost. Apart from that, this printer was a marvel. It had its own TTL discrete logic minicomputer complete with front panel console. Apart from the one intermittent fault, the paper stacker never really worked properly, it relied too much on gravity and the mechanical properties of fan fold paper to fold all by itself; unlike the Burroughs/CDC printers that had mechanical arms and fingers to force the paper to stack, and these worked like a charm.
Our B7800 was installed with the maximum amount of solid state ram memory that could be addressed by the IOM and CPM, and that was a massive 8 megabytes. This was a hard limitation as memory addresses were included in fixed length bit fields in the instructions and this limitation was architected into the machine design. It was designed for the era of core memory when just a megabyte of core memory could only be afforded by major governments. We had solid state memory. The memory cabinets and most of the contents were made by Control Data and consisted of a sea of little cards containing some 32 chips of Intel 4096 dynamic rams (4k by 1 bit) and monster linear power supplies with 3 phase mains input. These cabinets contained no smarts at all, not even refresh logic, just chip and line drivers. They would have been made for 64 bit systems, ours were kitted out to just 60 bits, 8 parity/ecc , 48 bit data word, 3 tag bits and one parity bit for data and tag fields.
The memory cabinets were controlled by the Memory Control Modules, which was liveried in the same style and cabinet type as the rest of the mainframe. The MCMs were responsible for memory refresh, reading and writing to the memory cabinets, ECC, and interpreting the memory commands from the other modules, CPM, IOM and DCP (via the IOM interface) . The MCMs could be instructed to fetch/write one word or multiple consectutive words in one memory reference instruction and return both the referenced data and a "result descriptor" back to the initiating box. The Burroughs was unique in the way that every operation generated a "result despcriptor", unique to the hardware and operation. It was more than just a simple ready bit, but a one or multiple word that summarised the operation just performed. Burrough main OS, was called MCP, (Master Control Program) could be very anal about IO data integrity. It never wrote data with the faith that the operation would complete, it checked, and that checking went below the OS level to the very hardware level itself. The result desciptors were biult up by all hardware in the chain, each bit of hardware would contribute a couple of bits about itself so for example the tape would have a ready bit ( and many others), the tape contoller would add a couple of bits like ready,retry count,parity, the PCC cabinet would add a bit to say operation succesful and parity, the IOM would add block counters or congestion tags and that cumulative word would be presented to the CPM and MCP so it could perform a "I asked for this and got that" test.
MCMs were fairly self contained, they had there own console front panel on which you could initialise memory or perform memory tests or even with toggle switches enter arbitary data into memory. Each MCM could see any of the memory cabinets, and every CPM and IOM ( 3 of each) could see every MCM. All cross connected by the underfloor spaghetti of these monster coaxial ribbons. The B7800 system was busless. There was no common data bus that could experience congestion or contention. The Burroughs design was clearly, if you wanted to tranport a word you installed copper between "A" and "B" and never messed around with Buses and there arcane protocols. MCM faults were rare and had to be exercised with the maintenance diagnostic processor sub system which could read and write any internal register within the MCM and feed it with test vectors. Clearly, the MCM function was of critical importance, great care was taken in their design, and the lack of MCM issues was proof of that. A CPM or IOM could lock memory so that true multiprocessing was possible, and there were a variety of memory lock control words as well. The MCMs regarded the CPM and IOM as equal peers, allthough IOM access had higher priority which makes good sense.
The B7800 system like all good mainframes had general purpose IO processors. To them was offloaded the tedium of micromanaging nearly all aspects of IO, including retrying failed operations and the manageing of queues for devices that were shared. The B7800 was a true multiprocessing system. To iniciate an IO for any device, the data buffers with the data to be written which were in the form of lists, the lists also contained a pointer to the next buffer. A command block was biult and written to memory. The IOM had allready been told from where in memory its command blocks were to be found and where the buffers were located. A processor module would issue an "interupt" to the IOM. This so called interupt was really in the form of a start operation signal. The IOM would work through the list of command blocks and the list of buffers until complete. No further intervention from the processor was required. When the lists had been exhausted the IOM would issue an interupt back to the processor signalling that a result descriptor for that IO was now available in an agreed location in memory. The result descriptor might indicate the success or failure of the IO together with other information depending on what type of IO was performed. The IOM did not present data direct to any processor, it all went direct to memory. It is best to think of the IOM as a giant DMA controller. Even the supervisory consoles had to go through the IOMs. The IOMs also had an impressive front panel, it was possible to key in command blocks to perform any possible IO operation on all supported peripherals. The IOM had seperate internal channels for differant types of peripheral, disk, tape, character IO, generic block and true concurrency was part of the design. As there was the possibility of block streaming devices like tape and disk contending for memory access, there was buffering in bipolar static rams. The other part of the IO subsystem were the Peripheral Control Cabinets. These contained the random logic and microsequencers associated with each type of peripheral. There would be a card cage installed for each type of peripheral, disks had the Host Adapter, Tape Interface, Printer Interface, the SPO interface (supervisory printing output...the ops consoles) and Card Reader/Punch Interface. These interfaces were generic ( at least for Burroughs equipment) and would connect any modelof the Burroughs range. The PCCs were not designed with the B7800 in mind but were B6700 vintage and used the same old style of printed circuit card about 6 by 6 inches with about 24 CTuL ic packages per card. The PCC peripheral controls were allmost self contained, all had individual consoles and displays. It was possible to exercise and test peripherals from here as well as exercise the control itself. Repairing the controls was made difficult by the lack of good design documentation and engineering schematics that would have given us a mid to top level overview of how it was supposed to work. We had the lowest level view. This was supplied of microfiche where each and every signal was cross referenced by backplane pin and ic package pin. It was extemely tedious to chase a signal around the fiche, it meant changing fiche and holding on to your train of thought as you fiddled with the fiche reader. Luckily, not much would go wrong, they had been well engineered. However, the power supplies were hard to come to terms with and were allways going bad. Mains was rectified and fed a monsterous capacitor bank and control and supervisory circuits. In the bottom of the PCC cabinets were inverter style step down converters that utilised SCRs as the switching elements. They ran at about 1KHz and made a dreadfull racket. An SCR can only be switched on, to switch it off, you must either remove the forwared current or short it out with another SCR, divertiing its current elsewhere. We modified the PCC cabinet power systems to run from huge, monsterrous linear power supplies, which were used on some other Burroughs B6700 style cabinets. They had a ferroresonant CVT power transformer at mains frequency input, monsterous stud diode rectifiers and a linear regulator that defied belief. The series pass elements were some forty odd 30A germanium power transistors in parallel with emitter resistive ballasting so that one poor transistor wouldnt try to regulate the entire multi-hundred amp load. We spent a lot of time identifying and replacing blown regulators. These germanium devices were quite wierd , in the sense that a simple multimeter test wouldnt reveal the dead one. Sometimes they failed as a kind of mushy half hearted short circuit, or their current gain would to too low to actually work. A regulator failure would be announced by printers, tapes and such to go offline and sonarlerts in the cabinets emitting their ear piercing screams. We biuilt a small test rig for the regulator heatsink units, which contained about ten of these transistors that fed about 30A through them. Probing the emitter ballast resistors ( only 0.04 ohms !!!, thats basically a short circuit inna package) would quickly reveal the faulty transistor. Germanium devices were used because off their extremely low collector saturation voltage, only power MOSFETs have lower (zero) saturation voltage. The 12 volts linear supplies, one of which I still have in my ham shack, put out some fifty amps, linear regulator, CVT ferroresonant type of power transformer had the only example of 1 farad capacitors that I had seen before the invention of carbon supercapacitors. These tank size capacitors are fun to charge up and let them discharge through an automotive incandescent bulb. It is suprising to see how long the capacitor will illuminate the light bulb ( couple of seconds) . Not quite what I learned about capacitors in school! Burroughs liked to use ferro-resonant transformers for all their linear power supplies. They are very inefficient and get extremely hot in normal operation. The benefit was that the power supply was immune to mains voltage transients and the power supply would continue to deliver its rated output during brownouts.
One thing that mainframes could do was to communicate, allthough that could be terse and arcane. There were a small number of human interface devices, to use the modern jargon. There was the card reader, the SPO (supervisory operators output...once a printing teletype, we had TD830 terminals) and terminals driven by mighty DCPs Data Communication Processors. We had four of these monsters and they talked to about 150 TD830 terminals in the student computing labs. The DCPs could be wired up with rs232, burroughs "two wire" terminal drops which was a primitive networking system for wiring up multiple terminals with just one 2 core wire run using SDLC or even current loop which had no effective distance limit. They could also feed 4 wire telecom circuits via which our remote paying customers to whom we offered bureau services. They had their own led/lightbulb front panel and were "advanced" computers in their own right. They had an unusual 26 bit word format and the assembly language level code had a one to one correspondance to the machines microcode. Their design was not an original Burroughs design. I have heard it described as the Varian X Machine. I believe that the design can be traced back to the Nasa Saturn V Rocket Vehicle Controller, there is a description of it on a NASA site in the recently released Apollo technical documents that appears very familiar. It talked directly to the system main memory via DMA arbitrated in the IOMs from where it could run programs (or "orders" in the jargon of the documentation) and from where it fetched/gathered data for export. They were made from CTuL and documented with the mysterious and mostly unusueable microfiche. I never saw a complete top level design drawing but it must have been good. It was a very capable processor biult out of maybe no more than 1000 CTuL packages. Our DCPs were extremely reliable and in my four years never had to fix one, allthough power supplies and fans would require frequent attention. They had genuine magnetic core memory, we also had more modern "semiconductor memory" models in the mix. They ran at 5 Mhz, a very high speed for early seventies technology. They were programmed with their very own custom version of Algol.
The B7800 mainframe system also had the main processor! This system had minor distributed processors all over the place, ranging from the humble " D Machine" chip in the cassette drive of the Disk Pack Drive controllers, the Drive Controllers, the IOM, DCPs, printers and terminals. The B7800 CPM (central processing module) was a work of art and is worthy of whole books being written about it. There was no "central processor" as such, there was a network of peer processors and there was no limit of the number of CPMs you could have, excepting your budget and the limited number of "system interupts" of which there were only 8. So a system of 7 CPMs and one IOM was in theory possible. Who knows, maybe the people that did serious mathematical modelling or cryptography had one ! The CPM executed "orders". It was a tagged architecture which meant that each word in memory had a 3 bit tag field that told the CPM what kind of word it was. Some that spring to mind were machine control word, stack control word, instruction word and data word. The true advantage of having tags is that the instruction decoder becomes trivial to design and it is actually impossible to execute data words or for a user process to invoke privileged control words. There was a little light on the majestic front panels that lit when an attempt had been made to execute data, that was a fatal exception and would cause a "dead stop" . It indicated something seriously wrong with the machine, not just some dumb programming error. An IPW led indicated that an invalid program word had got through the many hardware checks and would also dead stop the CPM. There on the memory data word register would be displayed the "dead stop number" which we would read off and consult a folder of dead stop definitions. There were numerous operating system faults and internal inconsistencies that MCP would flag and ha"Dialled George"lt on. The machine designers chose to halt the machine on detected irregularities rather than risk corrupting customer data. Most of the dead stop information was only relevant to MCP operating system designers but some was usefull to us, like IPW detection, assorted internal parity faults or whathave you. One of the deadstop numbers was the hilarious reason given as "Dialled George" . When MCP got itself totally honked up we would occassionaly get a "Dialled George" fault. I wonder who in Burroughs "George" was. He must have been the person that either fixed all the intractable faults that everyone else had given up on , or, on the other hand perhaps "George" was the person you went to , when you wanted something totally screwed up.
An CPM instruction word contained 8 "syllables". The native instruction was only 6 bits, and an instruction word contained 8 of them. An internal queing mechanism attempted to execute as many as possible in one clock cycle , if there were no result dependancies or internal resource conflicts. There was a switch on the front panel to disable this feature. When you did, the CPM would slow down quite noticeably. This switch must have been used extensively by Burroughs salesman, never by engineers for whom it was not very usefull. A 6 bit instruction does not seem to be very usefull on a conventional microprocessor, but the Burroughs was a stack machine. Data and operands had been placed automatically on the stack (by predecoding their tags!). You then just invoked the little instruction which operated on the top of stack and left the result on the top of the stack. The stack was as large as main memory and you never ever accessed a memory location, only your local stack; a very differant and for me a the time a totally foreign concept. The speed advantage was profound. The classical vonNeuman computer wastes two thirds of its time in setting up the location of read operands and setting up the location for writing the result to storage. In the Burroughs you just did "the work" and let the stack hardware worry about reading and writing results. MCP was responsible for assigning and managing the location of user and process stacks and as such understood such concepts of "stack of stacks of stacks of ...."
The engineers never got involved much with programming, there was not the time available for such games and there was rarely much call for it. There was no assembler or any concept of assembly language programming. There was only ALGOL for mere users and DCALGOL for system programmers. The machine was originally sprung into life with something called a "heroic compiler". The machine code for the algol compiler was toggled into memory and the rest was "history". I cannot imagine how tedious that original process must have been. The B7800 CPM was basically a machine for executing ALGOL primatives. It was allmost impossible to write and execute a standalone program by hand because of all the ledgerdermain in setting up the stacks and memory structres that had to be right before the CPM could even fetch and execute your little program. I only attempted this once and my "program" consisted of one instruction word which was to fetch top of stack forever. We had a fault between the CPM and a MCM cabinet that the normally very effective diagnostics could not find. There was one totally surprising feature of Burroughs MCP that totally astonished me when the Boss showed me. As mentioned, writing little assembly language routines to perform diagnosis was impossible , so I think some archangel had inspired the MCP creators to create SCR.
SCR was a simplified interpretted ALGOLlike language that could only be run from an operators SPO console and had TOTAL control of the machine and could directly manipulate all the peripherals , including the ability to directly write arbitary data over any disk pack! Bruce, the Boss, was initially somewhat reluctant to let us write our own SCR programs because of the ability to totally wreck the machine, but eventually I was trusted to write programs that could directly access the disks. In any case, as engineers, we could allready totally stuff the system ! It , SCR, was an incredibly usefull facility and I take my hat of to the people who had the foresight to include it as part of MCP. The most usefull program I wrote checked our tape transports and told me statistical information about the tape read analog chanels. It helped me distinguish tape true hardware faults from simple media problems and would give me statistical information that helped me determine when a transport tuneup should be scheduled. The other one was to help me do disk head alignment and an actual system utility! The disk packs had volume headers of two types. Type One could be recreated by an MCP utility, Type Two had to be purchased from Burroughs, pre written! we wouldnt do that, off course, but a little SCR program recreated mangled volume headers by doing direct disk writes on a live system. SCR only ran on a live booted up system, pity, it would have been nice to also have in a stand alone envoironment.
Maintaining the CPM.
There was about 300 densely packed wire wrapped cards in a CPM
containing some 40000 CTuL and TTL packages. The machine was totally
wire wrapped, right down to the chip package level. The backplane wire
wraps went four deep and cards two deep. The wiring was completely and
thoroughly
documented and rigourously cross referenced. We knew where every wire
went and what it did! The wire wrap was a blessing and the machines
curse. It is a little known fact that wire wraps , properly done by
robot or more reliable than soldered joints. However, if the robot
tensions the wire too tight, the wire when going around a corner , that
corner being another very sharp wirewrap pin, the insulation can flow,
and an intermittent contact created to another pin. The vast majority
of mainframe faults were intermittants caused by this errant mechanism.
They were nearly all tap sensitive, temperature dependant and a total
pain to locate. If you were lucky, an intermittent fault would stay
faulty long enough for you to diagnose and fix it.
The boards were also completely documented at the logical level including all PROM contents. There were no schematics as such, but the so called LOCAL BOOKS (Logical and Card Analysis) was a tabular representation of the logic circuit in each and every card. There were also the timing tables that showed every possible (valid) machine state and its transistion table. That book was very thick but contained a complete typographical description of how the machine worked. The level above this, the one that showed how these logical sub-systems were connected conceptually was not available. It wasnt required for system engineers, it would have given too much to the competition! In fact, to be a competant field engineer you didnt have to understand how the machine worked, indeed I never really understood it either. The tools provided by Burroughs were sufficiently powerfull to find all wire and chip faults without this understanding. This was a remarkable achievement.
The principal diagnostic equipment was not as many might suppose the majestic front panel. It was the maintenance diagnostic processor (MDU/MDP in burroughs speak). It was a Burroughs B800 small systems computer, which was a smaller ALGOL interpreter that ran at one megahertz ! It had a hard sectored 8 inch floppy that booted the diagnostic monitor program and its very own 9 track tape transport, that was larger than the B800 ! The tape transport was for reading diagnotic tapes that contained test vectors and results and for reading a PROM tape for when PROMS had to be burned in the MDPs console . The front panel could read and set any and all flip flops and registers in the machine. Each little led light could have up to 16 differant meanings depending on how the panel was switched. There were thus potentially thousands of lights with as many meanings. No human engineer could have any hope of understanding these lights and then going...ah yes, panel 2 row 5 bit 23 in the blah register is one, it should be zero! We used to annoy the operators by sagely pronouncing this. We would be correct, naturally! This was because the MDP told us so! The MDP had a huge bus cable to every mainframe cabinet and could read and write the front panel and thus feed the machines with test vectors.
The MDP then would present us with a long list of discrepencies. We allways started with first one and worked back, from the front panel light, from the display logic, back to the logic card. At this point the logic card could be extended or placed in the MDU which then tested it with card level vectors. These card level vectors had an intermediate level of confidence, the definitive test was to extend it in the mainframe so it could be exposed to possible backplane and adjacent cards. Sometimes the faulty logic would have to be traced through a wire-or term. Just like ECL logic, CTuL which had an open emitter output stage, the chip outputs could be wired together to produce an logical OR. You would discover , to your horror, that a wire-or level was wrong. Most of the time it wasnt difficult, there might be only a couple of terms making the wire-or. The nightmare was a few of the bus terms, to which maybe one hundred chip packages could contribute a term, and any one of them could have been at fault. All contributing terms had to be checked. This could cause head splitting confusion. Sometimes the MDP produced more than one fault vectors, if the wire-or was causing you to call for psychiatric help, the other vectors might be produced by something before the monster wire-or.
Intermittant faults caused by shorting wires would also confuse the MDP because unlike a simple stuck logic condition, the faulty wire would expose multiple logic faults. Eventually probing with the oscilloscope would eventually reveal a bad logic level in and around the indicated area. We had a nylon faced hammer that had the exactly right impulse to expose an intermittent, and this would then cause the scope trace to jiggle in response to a tap. Then all the wires that made that logic signal would be replaced, this would mean removing all the top wires on the wire wrap posts that might also trap the bottom wires on the post, and then reverse the process with fresh wire, in a differant color to remind ourselves where we had been. The LOCAL books provided the complete wire orders for the card. You HAD to get this right or your boot would sink ever deeper into the quicksand.
Often, the faulty wire would actually be on the backplane. The wire wraps here could be up to four deep and were allways impossibly tangled. The old wires usually could not be disentangled, just disconnected. On a bad day, a backplane wire would be a term in a big wire-or as well.
The hopefully now repaired processor would be tested by loading the "all-orders" test program, this would be done by the MDP writing directly to some off line memory. This test ran continuosly which cycled through the test routine in about 4 seconds. As it ran, someone watched the front panel whilst muggins then tapped every card in the cardcage with the nylon hammer. It it survived the tap test the CPM module was deemed to fit for duty and was brough back online. Our B7800 system had 3 CPMs so a cpu fault would not impact operations. We also had 3 IOMs but they were more critical. Our site was IO bound, so a missing IOM would slow things down enough for the operators to complain about not meeting schedules. They contained about half the package count of a CPM and as such could be repaired faster. The IOMs were carried over from the B7700 series and had more primative chip functions, indeed, they did not use flip flop chips but wired them up with NAND gates. Cross connecting discrete logic to create sequential elements also causes confusion when diagnosing.
When we actually had a true dysfunctional register or flip flop chip whose output vector was wrong, it may not be the register at fault but the logic that makes up the input term to the register. To trace this fault, the test vectors and processor clock had to be stepped to one clock BEFORE the fault. The CPMs even had a built in logic state analyser with a 1000 word memory with an elaborate trigger mechanism and user defined hardware inputs. We never had call to use it but I imagine that it would have been usefull to the hardware developers. I feel its a shame that their story has never been told. It must have beem something to have to go through the design instruction by instruction to build and verify that it gave the correct answeres! Bruce, the Boss, related to us a story of his field engineering experience about one of his most perplexing case of a mainframe that gave the wrong answer. His story went something like this. He was called to an insurance company and they complained that the sum at the bottom of their general ledger was wrong. This is an amazing call to make, but it was still early days in the computing industry so they still had elaborate manual methods to generate and rigorously check the general ledger. The humans said the computer was wrong ( couldnt add ! ) . The diagnostics were run, without fault. The input data was still available and was made available to Burroughs to check. It turned out that the humans were right, there were certain sums that the adder unit got wrong. The fault was eventually repaired but required a design change to resolve it.
The other dreaded test that we were required to perform was the "margins" test. The "all-orders" test was loaded and run on a working CPM and the system power voltages altered by +/- 10%, four combinations of the minus 2 and plus 4.75 volts. This was done to expose chip logic whose switching thresholds were out of specification. There were ten turn pots just behind the front panel to perform this test. Marginal faults were hard to diagnose, being marginal and by that nature intermittent as well.
I dont possess any pictures of the the B7800 installation, it was before the era of cheap digital cameras and at the time Silver Halide photography was just too much hassle. These pictures are from a couple of years before the B7800 was installed and shows the B6700 and its peripherals that were retained for use with the B7800. Some of these peripherals I had to maintain, mostly the tape drives which are present in these images.
A B6700 mainframe and a string of model 9545 PE TAPE drives. They consumed much engineering time. The operators console was retained for the B7800. It was made from "solid steel". It was dammed heavy for just a glorified desk, but then, this was the Burroughs universe.a B6700 front panel. An operator is taking a "panel dump" using a Polaroid camera. You can see a bundle of printouts and card decks submitted through the "IO Counter"