,-Systems Reference Library

,-Systems  Reference  Library

,-Systems Reference Library

Sort 1 for IBM 1401 - Specifications

Presents specifications for Sort 1, a generalized sorting program for use on an

IBM

1401 Data Processing System equipped with a minimum of four

IBM

729 II,

729 IV, or 7330 11agnetic Tape Units. The program can modify itself according to information punched in a control card and thus performs a variety of sorting applications. This bulletin also provides information for preparing control cards and estimating the timing of sorting applications.

file Number 1401·33 form J24·1422·1

©

1960, 1961 bv International Business

Machines \:ornorMion

MINOR REVISION

(January, 1963)

This publication is a reprint of an earlier edition with the format changed to conform to that of the Systems Reference

Library.

Address comments regarding the content of this publication to

IBM

Product Publications, Endicott, New York

Sort 1 for IBM 1401-Specifications

Sort 1 is a generalized program designed to perform basic tape sorting functions for an

IBM

1401 Data

Processing System equipped with magnetic tape.

It is classified as a generalized program because it is capable of modifying itself according to information punched in a control card by the user. This ability enables Sort 1 to perform a variety of sorting applications.

Sorting

is taking data records that appear in some order on one or more reels of magnetic tape, rearranging these records in a particular sequence specified by the user, and rewriting them sequentially on tape.

• provides a checkpoint routine that periodically writes the entire contents of core storage on tape and enables the user to stop and restart the program automatically at various stages of the program

• accommodates as many records as will fit on one reel of magnetic tape as the final output (input records may be contained on up to 99 reels)

• prints out blocks containing unreadable records or writes these blocks on a fifth tape unit, if available, called a

dump

tape (if a dump tape is unavailable, these blocks can be punched into cards).

Genera/Information

A generalized tape sorting program such as Sort 1 has numerous commercial applications. For example, a wholesaler's daily transactions can be recorded as they occur. At the end of each day, Sort 1 can be utilized to write these transactions on tape in item number sequence, thus providing a compact and convenient daily business record.

Sort 1 performs applications such as this in two steps.

The first, called Phase 1, is an

internal sort.

The records in random order are written in a semblance of their final sequence on two separate tape reels. Phase 2 is a

two-way merge.

This operation writes a single sequential tape file from the two reels that resulted from the internal sort in phase 1.

The Sort 1 program:

• sorts blocked or unblocked fixed-length records with a maximum block length of up to 800 characters

• sorts either numerical or alphamerical records

• sorts according to control data contained in up to five fields of each record

• labels output tapes, if desired, in accordance with control card instructions

Minimum Machine Requirements

Machine configuration requirements for

IBM

1401 Sort

1 are minimal. The following features must be available:

4,000-character core storage capacity

Minimum of four

IBM

729 Model II, 729 Model IV, or 7330 Magnetic Tape Units (a fifth unit, if available, may be used as a dump

OWe)

IBM

1402 Card Read-Punch

IBM

1403 Printer

High-Low-Equal Compare Feature

Sorting Technique

The sorting technique used in the Sort 1 program consists of reading a number of records from the input file, arranging them in short sequences, and writing these short sequences on alternate tapes. Subsequent passes merge these short sequences into longer sequences. By repeating this merging process, Sort 1 produces one long sequence called a sorted file.

One or two tape units are used for input, and two units are used for output during the initial sorting process. Two input units are used if the records to be sorted are contained on more than one reel.

If the records are contained on more than two reels, the input units are alternated. The program automatically causes

1

a tape unit to stop and rewind when an end-of-file is reached, thus allowing the operator to change reels during input. In the merging process, the input units become output units, and vice versa.

For a more comprehensive discussion of tape sorting methods, refer to the

IBM

General Information :Manual,

Sorting Methods for IBM Data Processing Systems,

Form F28-8001.

Sort 1 accomplishes the sorting operation in two steps - phase 1 and phase 2.

PHASE

1

1. Phase 1 writes the entire contents of core storage, after initialization, as the first record of the first output tape. This is known as the checkpoint procedure.

2. It reads into storage a number of input records, unblocks them if they are blocked, and sorts them internally.

3. It writes the short sequences on alternate output tapes.

PHASE

2

1. Phase 2 writes the entire contents of core storage as the first record of what has become the first output tape.

2. It merges the sequences written during phase 1 using as many passes as are required.

3. It reblocks the records according to the user's specifications and writes them as a sequential file on a single tape reel.

Allowable Input Record Configurations

Sort 1 accommodates only fixed-length input records.

They may appear on tape either singly or in blocks.

If input records are blocked, the number of records per block (blocking factor) must be constant for each job.

The blocking factor must be established so that a block contains no more than 800 characters or 730 characters, depending upon whether one, or more than one, control data fields are used in a particular job.

If only one control data field is used, a block may contain up to 800 characters. For example, if each input record is 100 characters in length, the maximum blocking factor is 8 (8

X

100

=

800).

If more than one control data field is used, maximum block size is 730 characters.

Maximum length for unblocked records is either 800 or 730 characters.

Input Blocking

Maximum input block size is determined by the number of positions of core storage set aside by the program for internal sorting during phase 1. If only one control data field is used, more storage area is freed

2 for other use, and 800 core locations are available for internal sorting. If more than one control data field is used, more storage positions are required by the program, and the area available for internal sorting is reduced to 730 positions. Thus, no more than either 800 or 730 characters at a time can be processed,

Processing time can be substantially reduced if an input block contains as close to the maximum as possible. For example, if maximum block size is 800, and a block contains 800 characters, only one

READ operation is performed by the program before each internal sort.

If a block contains 400 characters, two

READS are performed by the program before the internal sort. If the block contains 200 characters, four

READS are required, and so forth.

As the preceding case suggests, if the maximum allowable blocking factor is not used, a submultiple of it should be used. For example, assuming a maximum

RECORD

LENGTH

010-020

021

022

023

048-050

051-053

054-057

058-061

062-066

067-072

073-080

081-088

089-100

101-114

115-133

134-160

161-200

201-266

267-400

401-800

024-

025

026

027

028

029

030

031-032

033

034

035-036

037-038

039-040

041-042

043-044

045-047

MAXIMUM

BLOCKING FACTOR

12

11

10

9

16

15

14

13

8

7

6

5

4

3

2

1

20

19

18

17

24

23

22

21

28

27

26

25·

33

32

30

29

40

38

36

34

OTHER RECOMMENDED

BLOCKING FACTORS

20,10,5,4,2,1

19,2,1

18,12,9,6,4,3,2,1

17,2,1

11,3,1

16,8,4,2,1

15, 10,6,5,3,2, 1

1

14,7,4,2,1

9,3,1

13,2,1

5,1

12, 8, 6, 4, 3, 2, 1

1

11,2,1

7,3,1

10,5,4,2,1

1

9,6,3,2,1

1

8,4,2,1

5,3,1

7,2,1

1

6,4,3,2,1

1

5,2,1

3,1

4,2,1

1

3,2,1

1

2,1

1

-

1

G

28

27

26

25

24

23

22

21

33

32

30

29

40

38

36

34

8

7

6

5

12

11

10

9

4

3

2

1

16

15

14

13

20

19

18

17

Figure

1.

Recommended Blocking with One Control Data Field

allowable block size of 800 characters and an input record length of 50 characters, the maximum permissible blocking factor is 16 (16

X

50

=

800). The blocking factor should be either 16 or a submultiple of 16

(8, 4, 2, 1). Blocking factors other than a submultiple

(15, 14, 13, 12, 11, 10, 9, 7, 6, 5, 3) may be used, but they will cause an increase in total processing time.

Figure 1 contains maximum allowable blocking factors and other recommended blocking factors for all size records up to the maximum 800 characters. Figure

2 contains maximum allowable blocking factors and other recommended blocking factors for all size records up to the maximum 730 characters.

Output Blocking

Records may be blocked on the output tape according to the user's specifications, as punched in the control card. The output blocking factor must be such that

RECORD

LENGTH

010-018

019

020

021

022

023

024

025

026

027

028

029

030

031

032·033

034

035-036

037-038

039-040

041-042

043-045

046-048

049-052

053-056

057-060

061-066

067-073

074-081

082-091

092-104

105-121

122-146

147-182

183-243

244-365

366-730

MAXIMUM

BLOCKING FACTOR

20

19

18

17

24

23

22

21

28

27

26

25

40

38

36

34

33

31

30

29

12

11

10

9

16

15

14

13

4

3

2

1

8

7

6

5

G

16

15

14

13

20

19

18

17

12

11

10

9

24

23

22

21

28

27

26

25

40

38

36

34

33

31

30

29

4

3

2

1

8

7

6

5

OTHER RECOMMENDED

BLOCKING FACTORS

20,18,8,5,4,2,1

19,2,1

18, 12, 9, 6, 4, 3, 2, 1

17,2,1

11,3, 1

1

15,10,6,5,3,2,1

1

14,7,4,2,1

9,3,1

13,2,1

5,1

12, 8, 6, 4, 3, 2, 1

1

11,2,1

7,3,1

10,5,4,2,1

1

9,6,3,2,1

1

8,4,2,1

5,3,1

7,2,1

1

6,4,3,2,1

1

5,2,1

3,1

4,2,1

1

3,2,1

1

2,1

1

-

1

Figure 2. Recommended Blocking with More Than One Control

Data Field output block length does not exceed either 800 or 730 characters, depending upon the number of control data fields used. Maximum permissible input and output blocking factors are always the same for a particular job (see Figures 1 and 2). Note that processing time is reduced if the maximum permissible output blocking factor is used.

Maximum File Length

The input £.Ie to be processed by Sort 1 must be no longer than the number of records that can be contained on a single tape reel. This number will depend on record length, input blocking factor, and on whether processing is performed in the high- or low-density magnetic tape mode. The following formula enables the user to compute the maximum number of records that can be sorted in one job:

. ~x~

MaXImum Number of Input Records = (G

X

L)

+

IRG

Explanation of symbols:

K = Number of character locations per tape reel

High-density tape - 15,350,000

Low-density tape - 5,520,000

IRG = Number of character locations per inter-record gap

High-density tape - 417

Low-density tape -150

L = Characters per record

G = Largest multiple of input blocking factor that is less than or equal to either

800

L when one control data field is used, or

730

L when more than one control data field is used.

( See Figures 1 and 2 for values of G)

EXAMPLE

Compute the maximum file size for records 50 characters long with an input blocking factor of eight.

Processing will be in the high-density mode and one control data field will be used. Referring to the preceding formula, the symbols will have the following value:

K

= 15,350,000

IRG=417

L=50

G= 16

The formula is then evaluated as follows:

( 15,350,000 x 16)

(16 x 50)

+

417 = 201,807

The maximum file size for this job is 201,807 records.

Tape Density

The Sort 1 program accommodates input reels written in either high- or low-density format; the final output reel may be written in either density, although highdensity is recommended. The user need only set the density switch of the output tape unit to the desired

3

density. The tapes used for processing must be consistent in density, but they need not have the same density as the final output reel.

Note:

If processing is performed in the high-density mode and final output is in low density, it is conceivable that the final output may require slightly more than one full reel of magnetic tape. In this situation, the program halts when an end-of-reel is encountered during final output. The user may then mount a new tape and press

START to continue output.

Control Data Fields

From one to five fields of each input record can be specified to control sequencing. These fields can be located anywhere within the record, provided they are in the same place in each record. They can be of any length.

The location of each control field is specified by the user in the control card. If more than one control field is used, the user must specify which is to be compared first, which second, and so forth.

Although up to five fields are permitted, it is to the user's advantage to limit the number of control data fields to one. As noted previously, the use of only one control field raises the maximum permissible block size to 800. Secondly, processing time is reduced if the number of control fields is reduced.

If more than one control field must be used, it is beneficial if the fields appear in the record sequentially in order of importance from left to right. Several fields can thus be. treated by the program as one field. Control fields can contain any alphamerical characters or special symbols. Standard collating sequence for the

IBM

1401 is used.

Unreadable Input Records

Input tape blocks containing unreadable records (records that cause redundancy indications on one or more characters after several attempts at re-reading) may be treated in a variety of ways according to punches in the control card prepared by the user.

When an unreadable record is reached, the block containing it is read into storage, and the machine internally corrects the parity of the invalid character by either adding or removing the check bit. Thus, although the character is now valid for machine purposes, it may not be the same character that appeared on tape.

A punch in column 14 of the control card determines the next action taken on the block containing the unreadable record. The record can be corrected from the console, or the block containing it can be punched

4 into cards or written on a fifth tape unit, if available.

If the unreadable record is corrected, the entire block will also be printed.

Unreadable records are corrected in the following manner. The program stops after the block containing the unreadable record is printed. This gives the user an opportunity to study the contents of the record. The user then has the option of continuing the sorting process with the record as it appears, or of correcting the invalid character manually before resuming processing. To continue sorting with the record as it appears, the user need only press the start key. To correct the invalid character, the user should:

1. turn on sense switch G and set the tape select switch to D

2. press

START, causing the block containing the incorrect record to be re-read; the program again halts if the redundancy has not been corrected

3. manually load the correct character in its appropriate storage location

4. set the tape select switch back to

N,

and tum off sense switch G

5. press

START to resume processing, beginning with the block that was just corrected.

Checkpoint and Restart

Because sorting is, by computer standards, a fairly lengthy procedure, a feature has been incorporated in the Sort 1 program that enables the user to stop processing at any stage of the sort if he must relinquish the machine. This same feature allows him to resume processing at a point in the program very close to where he stopped, thus saving considerable duplication of operating time.

The program accomplishes this by writing

check-

points periodically during the running of the sort. A checkpoint is a tape record containing the entire contents of storage.

It is written as the initial record of the first output tape.

The first checkpoint is written during phase 1 after initialization and just before the reading of the first block of input records to be sorted. During phase 2 a checkpoint is written at the beginning of every merge pass.

If processing is stopped during phase 1, all sorting performed up to that point is lost, and the restart begins with the reading of the first block of records to be sorted.

If processing is stopped during phase 2, only the merge pass that is interrupted is lost. The output of all preceding merge passes remains intact. When the program is interrupted, the user must, of course, save the output reels from the last pass and the reel containing the checkpoint.

To restart after an interruption, it is necessary only to:

1. mount the input and output reels

2. set the indicator of the tape unit on which the first output reel is placed to 1

3. press the tape load key

This automatically causes the first record of tape unit

1 (the checkpoint record) to be read into storage, and causes a branch to location 001 for the first instruction.

This instruction causes the program to begin either at the beginning of phase 1 or at the beginning of the interrupted merge pass, depending on which checkpoint is used. The restart routine also causes a new checkpoint to be written.

To insure that the user sets the tape unit indicators to the correct numbers, the program automatically causes the numbers of the units being used to be printed out when the tape load key is pressed during the restart routine. The number of the pass during which the program was interrupted is also printed out.

If the interruption occurred during phase 1, 00 is printed out.

Tape Labels

Sort 1 accommodates header labels on input reels and writes a header label on the final output reel, if it is desired by the user. When input header labels are specified, the program assumes that the header label is the first record of the reel. No provision is made for trailer labels. If they appear on input reels they are ignored by the program. Also, if one input reel contains a header label, all input reels must contain header labels, although the labels do not have to be the same in size or content. Maximum allowable header label length, on either input or output reels, is 80 characters.

Padding

The term padding refers to records added to a file to be sorted when the number of records in the file is not a multiple of the maximum allowable input blocking factor. These additional records are generated internally by the Sort 1 program.

Sort 1 automatically adds padding records to an input file if, preparatory to reading into storage the last block of records during phase 1, it finds that there are insufficient records to fill the processing area. Padding records generated by Sort 1 are sorted and merged in the same manner as data input records. They must, therefore, be composed either entirely of nines or entirely of blanks. The user's choice must be punched in the control card. If they contain nines, they will be the last records in the sorted file.

If they contain blanks, they will be the first records of the sorted file.

EXAMPLE

Here is a case in which padding is required. An input file contains 90 records, and the maximum permissible input blocking factor is 16. The first five internal sorts process 16 records at a time. Prior to the sixth and final internal sort, however, only ten records remain to be read. Because the processing area of storage must contain the same number of input records during each internal sort, six· padding records are read into the area at this point. Sixteen records are now ready for processing and the program continues. As this example indicates, padding can occur only in the final internal sort of phase 1.

Control

Card Preparation

The user provides control information that enables

Sort 1 to modify itself so that it can perform a particular application. Control information is supplied to the program by means of a single control card prepared by the user and inserted in the program deck.

When the control card is prepared, leading zeros are punched in fields containing information. For example, the field specifying the number of input reels (columns

3-4) is punched 05 if there are five input reels. Unused fields are left blank.

Control card format is shown in Figure 3. An explanation of each control card field follows.

Tape Unit Specification (Columns 1-4, 12, 13)

Four

IBM

729 Model II, 729 Model IV or 7330 Magnetic Tape Units are required by the Sort 1 program.

Two are used for input and two for output. The user specifies the number of each unit and the total number of tape reels on which his file is contained.

Column

1 is punched with the number of the first input tape unit.

Column

2 is punched with the number of the second input tape unit.

Columns

3-4 are punched with the total number of tape reels in which the file is' contained.

Column

12 is punched with the number of the first output tape unit from phase 1. The checkpoint immediately prior to phase 1 is written as the first record of the reel mounted on this tape unit.

Column

13 is punched with the number of the second output tape unit from phase 1.

Blocking Information (Columns 5-11)

Columns

5-7 are punched with the number of characters per record. Note that only fixed-length records are permitted.

Columns

8-9 are punched with the input blocking factor.

Columns 10-11

are punched with the output blocking factor. Recommended blocking factors for

5

[ COLUMN NO.

I

1

2

3-4

12

13

15

16

17

18

5-7

8-9

10-11

14

19

20-22

23-25

26-28

29-31

32-34

35-37

38-40

41-43

44-46

47-49

50-52

53-79

80

DESCRIPTION

No. of first input tape unit

No. of second input tape unit

No. of input reels

No. of first output tape unit

No. of second output tape unit

Input record length

Input blocking factor

Output blocking factor

Unreadable record option

Tape density indicator

Input label indicator

Output label option

Padding character

No. of control data fields

No. of control data field characters location in record of control data field 1

(high-order position)

No. of characters in control data field 1 location in record of control data field 2

(high-order position)

No. of characters in control data field 2 location in record of control data field 3

(high-order position)

No. of characters in control data field 3 location in record of control data field 4

(high-order position)

No. of characters in control data field 4 location in record of control data field 5

(high-order position)

No. of characters in control data field 5

Unused

Tape mark option

Figure 3. Sort 1 Control Card records of various lengths are shown in Figures 1 and 2.

I of the input tapes.

Column

15 is punched with the tape density indicator. If these tapes are to be low density, a zero is punched in column 15. If they are to be high density, a

1 is punched in column

15.

The density switches of the tape units must be set to the appropriate density. The density of the final output tape need not be the same as the density of the processing tapes. High density is recommended for processing and final output.

Tape Labels (Columns 16, 17, and 80)

The user specifies in these columns the presence or absence of header labels on input reels and whether or not a header label is desired on the output reel. The

Sort 1 program ignores trailer labels.

Column 16 contains the input label indicator. Column 16 is left blank if the input reels do not contain header labels.

If the input reels contain header labels, a 1 is punched in column 16. Note that if a 1 is punched in column 16,

every

input reel must have a label as its first record. Each label mayor may not be followed by a tape mark.

Column

17 contains the output label option. If the output reel is to have the same label as the first input reel, a 1 is punched in column 17. If the output reel is to have a new label, a 2 is punched in column 17. In the latter case a card punched with the contents of the label must be provided by the user and included with the prograII'l deck and the control card.

If there is to be no label on the output reel, column 17 is left blank.

Column

80 contains the tape mark option for output labels.

If the output reel is to have a header label, the user has the option of writing a tape mark immediately after it. If the output label is to be followed by a tape mark, a zero is punched in column 80.

If no tape mark

"is desired, column 80 is left blank.

Unreadable Record Option (Column 14)

As noted previously, the action taken on blocks containing unreadable records is determined by control card specifications. The control card offers the option of punching the block into cards, writing it on a fifth tape unit, or allowing the user to correct the unreadable record manually. In the latter case, the entire block is also printed.

Column

14 is punched with the unreadable record option.

If blocks containing unreadable records are to be punched into cards, column 14 is left blank. If blocks containing unreadable records are to be written on a dump tape, the number of this fifth unit is punched in column 14. If unreadable records are to be corrected from the console, a zero is punched in column 14.

Tape Density Indicator (Column 15)

The tapes used in processing may be written in either high- or low-density format, regardless of the density

Padding (Column 18)

Column

18 is punched with the character to be used throughout the program in padding records. If nines are desired as padding records, column 18 must contain a nine. If blanks are desired as padding records, column 18 is left blank.

Control Data Specifications (Columns 19-52)

The Sort 1 program bases record sequence on the contents of up to five control data fields contained in each input record. These fields, specified by the user, are compared from record to record.

The control data fields, if there are more than one, do not have to be contiguous, nor do they have to appear in the record in the same order in which they will be compared. The user specifies control fields in the control card in the order of their importance. Thus,

6

the control field to be compared first is designated control data field 1, even though it may appear in the input record following control data field 2.

Column

19 is punched with the total number of control data fields used. Valid punches in this column are

1 through 5.

Columns 20-22

are punched with the total number of characters in the up-to-five control data fields used.

This number is limited only by the size of the record.

Columns

23-28, 29-34,

35-40,

41-46, and 47-52 are punched with the specifications of control data fields

1, 2, 3, 4, and 5, respectively. The first three columns for each field

(columns

23-25 for control data field 1) are punched with the location in the record of the high-order character of the control field. The first location of every input record is considered 001. The second three columns for each field

(columns 26-28

for control data field 1) are punched with the total number of characters in the control field.

If less than nve control data fields are used, unused control field columns in the control card must be left blank. For example, if the user specifies two control data fields for a particular job, columns 35-52 of the control card must be blank.

EXAMPLE

Here is an example of control data field specification.

In an input file containing records 80 characters long, three control fields are used. The first (major) control field to be compared occupies locations 71-80. The second (intermediate) control field to be compared occupies locations 6-10. The third (minor) control field to be compared occupies locations 28-34. Figure 4 shows the punches required for this example in columns 19-52 of the control card.

RIG

1-2

3-4-

5-8

9-16

..

-

...

-

33-64

65-128

129-256

257-512

513-1024-

1025-2048

2049-4096

4097-8192

8193-16384

16385-32768

Figure 5. Number of Merge Passes p

7

8

9

-

6

4

1

2

3

10

11

12

13

14

15

Estimating Sorting Times

To estimate the time required by the Sort 1 program to process a given file of records, it is necessary to know the duration of phase 1, the duration of each merge pass of phase 2, the number of merge passes required in phase 2, the tape time required for each pass, and the time required for tape rewinding.

All of these figures can be obtained from the tables in Figures 5, 6, and 7. Figure 5 lists the number of merge passes required for various job sizes. Figure 6 provides the remainder of the data required for solving the timing formula if only one control data field is used. Figure 7 provides the remainder of the data required for solving the timing formula if more than one control field is used.

Unused Columns (Columns 53-79)

Columns

53-79 of the control card are not used by the

Sort 1 program and may he either punched for identification purposes or left blank, at the discretion of the user.

CARD

COLUMNS

PUNCH

19

20-22

23-25

26-28

29-31

32-34

35-37

38-40

41-52

EXPLANATION

3

022

071

010

006

Total number of control data fields

Total number of characters in control data fields

High-order position of control data field 1

Number of characters in control data field 1

High-order position of control data field 2

05 Number of characters in control data field 2

028 High-order position of control data field 3

07 Number of characters in control data field 3 blank Only three control data fields used

Figure 4. Example of Control Data Field Specification

Number of Merge Passes

The table in Figure 5 contains the number of merge passes (p) required for various jobs. In order to determine the number of passes, it is necessary to divide the number of records in the job (R) by the maximum permissible blocking factor (G). Various values for G are shown in Figures 1 and 2.

For example, in a particular job there are 100,000 input records (R) of 80 characters each. Only one control data field is utilized, so G can equal 10 (see Figure 1). Using the formula in Figure 5,

RIG

=

10,000.

The number of merge passes required is therefore 14.

It should be noted at this point that during phase 2 the program takes advantage of any sequencing already existing in the user's file.

If a degree of sequencing is present, the number of merge passes in phase 2 is reduced. Experience has shown that preexisting sequencing in an unsorted file may reduce

7

the number of merge passes (p) by an average of 1/7.

Thus the actual number of merge passes required in the example in the previous paragraph is more likely to be 12. The reduction in number of merge passes depends upon the degree of existing sequencing in a file. The user should take this factor into consideration when calculating the value of p.

Timing Formula

The following timing fOm'lula requires the use of many factors, the values for which can be determined from the tables in Figures 5, 6, and 7. This formula provides an accurate timing esti..'1late if the file to be sorted has the following characteristics:

1. Maximum permissible blocking factor is used for input and output.

2. Maximum tape rewind time is required for each merge pass.

3. Only one control data field is used.

If any of these conditions are not met, the factors in the basic timing formula must be adjusted to make the formula accurate. These adjustments are described subsequently. Here is the basic timing formula:

Total Time

=

PI

X

R

+

P2

X

R(p)

+

(T)(R)(p

+

1)

+

W(

+

1)

( Minutes) 60,000 60,000 60,000 P

Where:

PI

=

Process time for phase 1 in milliseconds per record

( see Figure 6)

P2

=

Process time for each pass of phase 2 in milliseconds per record (see Figure 6)

R

=

Total number of records to be sorted p

T

=

Number of merge passes in phase 2 (see Figure 5)

=

Tape time for phase 1 and each pass of phase 2 in milliseconds per record

~

T2L

T2H

=

IBM

729 Model II Low Density (see Figure 6)

=

IBM

729 Model II High Density (see Figure 6)

T 4L

=

IBM

729 Model IV Low Density (see Figure 6 )

T4H

=

IBM

729 Model IV High Density (see Figure 6)

W

=

Rewind time

IBM

IBM

729 Model II

=

1.2 minutes

729 Model IV

=

0.9 minutes

As noted previously, this timing formula is accurate only if all of the foregoing conditions are met. If the input and/or output blocking factor are not the maximum permissible, processing time is increased. Similarly, if more than one control data field is utilized, processi~g time is increased.

EXAMPLE

This example illustrates the use of the Sort 1 timing formula without any adjustment necessary. Assume

~

Sort 1 timing information relating to 1401 systems equipped with

IBM

7330 Magnetic Tape Units will be presented in a subsequent publication.

8 that a file of records to be sorted has the following specifica tions:

Record length (L)

=

30

Number of records (R)

=

100,000

Input and output blocking factor

=

26

Maximum permissible blocking factor

(G)

N umber of control fields

=

1

Length of control field (CF)

=

6

=

26

IBM

729 Model II Magnetic Tape Units in the lowdensity mode are to be used.

It is first necessary to consult the table in Figure 5 to determine how many merge passes ( p) are required in phase 2 of this operation. Because c!! in this

"'IAA AAA case

IS iVV,VVV

3 846 12

.

The table in Figure 6 is then consulted to obtain the values of PI, P2, and T. (T in this example is T2L, because tape operations are performed on a

:Model

II tape unit in the low-density mode.)

The timing chart in Figure 6 is read by first scanning the column labeled (L) to find the appropriate record length, in this case 30. Within this group, the column labeled (CF) is scanned to find the appropriate control field length, in this case 6. By reading across this line, the user finds that PI

T2L

=

15.3, P2

=

3.6, and

=

4.9. Because processing is on a Model II tape unit, rewind time (W)

=

1.2.

The proper figures can now be inserted into the timing formula as follows:

Total time

=

15.3(100,000)

+

(3.6)(100,000)(12)

+

( minutes) 60,000 60,000

(4.9)(100,000)(12

60,000

+

1)

+

1 2(12

+

1)

.

=

219.3 minutes

Rewind Time Considerations

The formula for total sort time includes the term W, the time required to rewind a full reel of magnetic tape. An

IBM

729 Model II Magnetic Tape Unit requires

1.2 minutes to rewind a full reel. An

IBM

729 Model IV

Magnetic Tape Unit requires 0.9 minute to rewind a full reel. If more than 450 feet of tape must be rewound, total rewind time is not reduced substantially enough to affect total sorting time. If 450 feet or less are to be rewound, however, rewind time is considerably lessened. This smaller rewind time can be substituted in the timing formula for W, in order to give a more accurate total time estimate.

The file size corresponding to 450 feet of tape is determined by multiplying the value of the maximum file size for a particular job by 0.2. Maximum file size is indicated in Figures 6 and 7 in the columns labeled

RH (for high density) and RL (for low density).

If the total number of records to be sorted (R) is equal to

10

20

30

40

60

80

100

120

150

200

300

400

500

600

700

800

G

40

40

26

20

13

10

8

6

5

4

2

2

1

1

1

1

T2H

2.3

2.3

2.3

6.0

6.0

6.0

6.0

7.5

7.5

7.5

7.5

3.0

3.0

3.0

4.5

4.5

4.5

904

9.4

9.4

9.4

11.5

11.5

11.5

11.5

15.0

15.0

15.0

15.0

25.2

25.2

25.2

25.2

102

1.02

1.50

1.50

30.0

30.0

30.0

30.0

45.6

45.6

45.6

45.6

5004

5004

50.4

50.4

55.2

55.2

55.2

55.2

60.0

60.0

60.0

60.0

P2

8.0

8.3

804

8.5

10.9

11.1

11.4

11.5

13.3

13.4

13.6

13.9

16.7

17.0

17.2

17.4

19.1

19.4

19.6

19.8

21.4

21.6

21.9

22.2

23.7

24.0

24.2

24.4

5.3

5.5

5.6

5.7

5.9

6.0

6.1

6.2

6.8

6.9

7.0

7.1

4.3

404

4.6

4.8

4.9

5.1

5.2

3.5

3.5

3.6

3.7

3.8

3.9

2.98

3.02

3.21

3.26

Figure 6. Timing Factors for Files with

One Control Data Field

CF

2

4

2

5

10

5

10

15

5

10

15

20

2

4

2

4

6

10

15

20

25

5

10

15

20

5

10

15

20

10

20

25

30

20

30

40

50

20

30

40

50

20

30

40

50

15

20

30

40

10

20

30

35

15

25

35

45

Pl

194

19.9

19.6

20.1

14.6

15.0

15.3

10.3

10.7

11.1

11.5

7.7

10.4

10.8

11.2

12.6

13.1

13.7

10.9

11.4

11.9

10.0

10.3

10.6

10.9

10.7

11.0

11.3

11.6

11.8

12.3

12.5

12.8

11.7

12.0

12.4

12.5

14.0

14.3

14.7

15.0

14.2

14.5

14.7

14.9

16.6

16.9

17.1

17.3

19.0

19.2

1904

19.6

21.2

21.5

21.7

21.9

T2L

16.1

16.1

16.1

16.1

20.0

20.0

20.0

20.0

2404

24.4

24.4

2404

32.2

32.2

32.2

32.2

4.9

4.9

4.9

6.4

6.4

6.4

1 88

1.88

3.22

3.22

9.7

9.7

9.7

12.9

12.9

12.9

12.9

51.0

51.0

51.0

51.0

6404

6404

6404

64.4

115

115

115

115

129

129

129

129

88.6

88.6

88.6

88.6

102.0

102.0 '

102.0

102.0

T4L

10.6

10.6

10.6

10.6

13.0

13.0

13.0

13.0

16.1

16.1

16.1

16.1

21.3

21.3

21.3

21.3

33.7

33.7

33.7

33.7

76.2

76.2

76.2

76.2

85.0

85.0

85.0

85.0

42.5

42.5

42.5

42.5

58.6

58.6

58.6

58.6

67.4

67.4

67.4

67.4

124

1.24

4.3

4.3

4.3

6.4

6.4

6.4

2.12

2.12

3.2

3.2

3.2

8.5

8.5

8.5

8.5

T4H

5.0

5.0

5.0

5.0

6.3

6.3

6.3

6.3

7.7

7.7

7.7

7.7

10.1

10.1

10.1

10.1

16.9

16.9

16.9

16.9

37.0

37.0

37.0

37.0

40.2

40.2

40.2

40.2

20.1

20.1

20.1

20.1

30.6

30.6

30.6

30.6

33.8

33.8

33.8

33.8

68

.68

1.00

1.00

3.0

3.0

3.0

4.0

4.0

4.0

4.0

1.5

1.5

1.5

2.0

2.0

2.0

166,000

166,000

166,000

126,000

126,000

126,000

126,000

100,300

100,300

100,300

100,300

80,800

80,800

80,800

80,800

65,600

65,600

65,600

65,600

749000

749,000

504,000

504,000

333,000

333,000

333,000

252,000

252,000

252,000

13,700

13,700

13,700

13,700

12,600

12,600

12,600

12,600

25,200

25,200

25,200

25,200

16,700

16,700

16,700

16,700

15,000

15,000

15,000

15,000

50,400

50,400

50,400

50,400

30,100

30,100

30,100

30,100

29,400

29,400

29,400

29,400

22,300

22,300

22,300

22,300

14,200

14,200

14,200

14,200

11,150

11,150

11,150

11,150

8,150

8,150

8,150

8,150

74,000

74,000

74,000

55,800

55,800

55,800

55,800

44,600

44,600

44,600

44,600

36,500

36,500

36,500

36,500

7,070

7,070

7,070

7,070

6,240

6,240

6,240

6,240

5,580

5,580

5,580

5,580

386,000

386,000

223,000

223,000

148,500

148,500

148,500

111,700

111,700

111,700

9

or less than RH x

0.2 (high density) or RL X 0.2 (low density), then

W is reduced in value when used in the timing formula.

Furthermore, the user should note that during phase

2, the reels being rewound contain, in general, only half the entire £Ie. This is an approximation, and therefore it is wise to be conservative and assume that on each rewind 3,4 of the full £Ie is being rewound.

This further reduces the value of W. The following formulas are used to determine the value of W:

Model II High Density:

Rewind Time (W)

.75R

=

RH

X

0.2

X

1.2

Model II Low Density:

Rewind Time (W)

.75R

=

RL

X

0.2

X

1.2

Model IV High Density:

Rewind Time (W)

.75R

=

RH

X

0.2

X

0.9

Model IV Low Density:

Rewind Time (W)

.75R

=

Rr.

X

0.2

X

0.9

An example of how this reduced value of W could occur can be seen by referring to the previous timing example. Assume that the number of records in the

£Ie (R), is not 100,000, but 10,000. The table in Figure

6 shows that the maximum £Ie size for this job (R

L ) is

148,500. This figure multiplied by 0.2 is 29,700, so it is clear that a £Ie containing only 10,000 records occupies less than 450 feet of magnetic tape. One of the formulas can thus be used to reduce the value of W.

Because processing is performed on a Model II tape unit in the low-density mode, the applicable formula is:

W

=

R

.75R r

,

X

0.2

X

1.2. Thus:

W

W

(.75)(10,000)

=

(148,500)(0.2)

X

1.2, or

=

0.3 (approx.)

This new value for W is inserted in the basic timing formula, replacing 1.2.

Control Data Field Considerations

If more than one field per record is used to control sequencing in a particular £Ie, the factors used in the basic timing formula must be adjusted to reHect the increase in processing time. Such considerations as the number of control data fields used, and the length of the control data fields, affect total sorting time.

When more than one control data field is used, the values for PI, P2, T, R

H , and RL should be determined from the table in Figure 7. These will be the values used in the basic timing formula.

Before the basic timing formula can be used, however, the values of PI and P2 must be adjusted to re-

Hect

the number and length of additional control data fields. The following formulas are used to compute the

10 values of

~Pl and

~P2.

These values must be added to the values of PI and P2, as indicated in the table in

Figure 7, before solving the basic timing formula.

LlP1

=

(G:

3

+

6)[

2LAF

+

153( NAF -1)

+

114 ]'0115

(Note:

The formula just given for

~Pl is valid only if the value of G is 4 or greater. When G

=

1, 2, or 3, one of the following formulas should be used for

~Pl.)

If G= 1

,

LlP1

=

.0115 r

L

I

+

153(NAF - 1)

+

114J

IfG=2,

LlP1

=

1.5 [2LAF

+

153(NAF - 1)

+

114] .0115

If

G

=

3,

LlP1

=

1.7 [2LAF

+

153(NAF - 1)

+

114] .0115

The following formula is used at all times to determine the value of

~

LlP2

=

(G ~

1) [ 2LAF

+

153(NAF - 1)

+

114] .0115

In all of the foregoing formulas,

LAF

NAF

=

Length of additional fields

=

Number of additional fields

EXAMPLE

This example illustrates the use of the Sort 1 timing formula when the file to be sorted has more than one control data field per record. Assume -that the job has the following specifications:

Record length (L)

=

80

Number of records (R)

=

60,000

Input and output blocking factor

=

9

Maximum permissible blocking factor (G)

=

9

N umber of control data fields

=

3

Length of first control data field (CF)

=

5

Length of second control data field

Length of third control data field

=

6

=

4

IBM

729 Model IV Magnetic Tape Units in the high-density mode are used.

The table in Figure 5 indicates that the number of merge passes (p) required in phase 2 is 13, because

R 60,000

G

=

- 9 - ' or 6,667.

Because processing is performed on a Model IV tape unit in the high-density mode, T will assume the value of T4H, and the rewind time (W)

=

0.9. Because this job uses more than one control data field, the table in

Figure 7 must be referred to for the values to be used in the timing formula. In this table, the length of the first control data field to be compared is the number to be searched for in the column labeled CF. The table in Figure 7 indicates that for this particular job,

PI

=

10.0, P2

=

4.6, and T4H

=

4.2. It must be remembered, however, that the values of PI and P2 must be incremented by the value of .6.Pl and

~P2 respectively before the timing formula can be solved.

L

20

30

40

60

80

100

120

150

200

300

400

500

600

700

730

G

36

24

18

12

9

7

6

4

3

2

1

1

1

1

1

Pl

10.0

10.3

10.6

10.9

10.6

10.9

11.1

11.4

10.0

10.3

10.7

11.1

9.8

10.1

10.4

10.8

18.1

18.5

13.9

14.2

14.5

11.9

12.3

12.9

10.5

11.0

11.5

10.4

10.9

11.1

11.4

11.7

12.0

12.4

12.5

11.9

12.0

12.3

12.5

19.0

19.2

19.4

19.6

19.6

19.9

20.1

20.3

14.2

14.5

14.7

14.9

16.6

16.9

17.1

17.3

Figure 7. Timing Factors for Files with More Than One Control Data Field

T4H

11.3

11.3

11.3

11.3

16.9

16.9

16.9

16.9

6.3

6.3

6.3

6.3

8.4

8.4

8~4

8.4

3.1

3.1

3.1

4.2

4.2

4.2

4.2

5.3

5.3

5.3

5.3

1.05

1.05

1.6

1.6

1.6

2.1

2.1

2.1

27.4

27.4

27.4

27.4

30.6

30.6

30.6

30.6

33.8

33.8

33.8

33.8

37.0

37.0

37.0

37.0

38.0

38.0

38.0

38.0

T2H

6.2

6.2

6.2

6.2

7.9

7.9

7.9

7.9

9.4

9.4

9.4

9.4

1.56

1.56

2.3

2.3

2.3

3.1

3.1

3.1

4.7

4.7

4.7

40.8

40.8

40.8

40.8

45.6

45.6

45.6

45.6

12.6

12.6

12.6

12.6

16.8

16.8

16.8

16.8

25.2

25.2

25.2

25.2

50.4

50.4

50.4

50.4

55.2

55.2

55.2

55.2

56.6

56.6

56.6

56.6

P2

21.4

21.6

21.9

22.2

22.2

22.4

22.6

22.8

16.7

17.0

17.2

17.4

19.1

19.4

19.6

19.8

10.9

11.1

11.4

11.5

14.4

14.5

14.8

15.0

6.9

7.0

7.1

7.2

8.2

8.5

8.6

8.7

3.20

3.26

3.5

3.5

3.6

3.6

3.7

3.8

4.2

4.3

4.4

4.6

4.7

4.9

5.0

5.4

5.5

5.6

5.7

5.9

6.0

6.1

6.2 .

CF

2

4

2

5

10

2

4

6

5

10

15

5

10

15

20

5

10

15

20

5

10

15

20

15

20

30

40

10

20

25

30

10

20

30

35

10

15

20

25

20

30

40

50

20

30

40

50

20

30

40

50

15

25

35

45

T2L

75.2

75.2

75.2

75.2

88.6

88.6

88.6

88.6

25.6

25.6

25.6

25.6

34.0

34.0

34.0

34.0

51.0

51.0

51.0

51.0

102

102

102

102

115

115

115

115

119

119

119

119

9.8

9.8

9.8

13.1

13.1

13.1

13.1

16.5

16.5

16.5

16.5

20.0

20.0

20.0

20.0

3.28

3.28

4.9

4.9

4.9

6.6

6.6

6.6

T4L

6.5

6.5

6.5

8.7

8.7

8.7

8.7

10.9

10.9

10.9

10.9

13.0

13.0

13.0

13.0

2.16

2.16

3.3

3.3

3.3

4.3

4.3

4.3

49.8

49.8

49.8

49.8

58.6

58.6

58.6

58.6

16.8

16.8

16.8

16.8

22.5

22.5

22.5

22.5

33.7

33.7

33.7

33.7

67.4

67.4

67.4

67.4

76.2

76.2

76.2

76.2

78.8

78.8

78.8

78.8

RL

9,640

9,640

9,640

9,640

8,150

8,150

8,150

8,150

21,200

21,200

21,200

21,200

14,200

14,200

14,200

14,200

220,000

220,000

146,000

146,000

146,000

109,700

109,700

109,700

73,200

73,200

73,200

54,900

54,900

54,900

54,900

43,700

43,700

43,700

43,700

36,500

36,500

36,500

36,500

28,300

28,300

28,300

28,300

7,070

7,070

7,070

7,070

6,240

6,240

6,240

6,240

6,030

6,030

6,030

6,030

RH

15,000

15,000

15,000

15,000

13,700

13,700

13,700

13,700

18.700

18,700

18,700

18,700

16,700

16,700

16,700

16,700

13,350

13,350

13,350

13,350

80,800

80,800

80,800

80,800

60,300

60,300

60,300

60,300

45,100

45,100

45,100

45,100

30,100

30,100

30,100

30,100

484,000

484,000

323,000

323,000

323,000

242,000

242,000

242,000

161,500

161,500

161,500

121,000

121,000

121,000

121,000

95,800

95,800

95,800

95,800

11

In solving the formulas for API and AP2, the symbols have the following value:

G=9

LAF=10

NAF=2

The formulas are then solved as follows:

.!lPl =

(

~

4 '-'

1,r

+~)

L

]

+

153(2-1)

+

114 .0115

= 10.3

.!lP2

=

(9;

1) [ 2(10)

+

153(2-1)

+

114] .OU5

=3.7

These values are then added to PI and P2 so that

PI

=

20.3 and P2

=

8.3. The basic timing formula can then be solved as follows:

Total Time 20.3( 60,000)

(minutes) 60,000

+

(8.3) (60,000) ( 13)

+

60,000

( 4.2) (60,000) (14)

60,000

+

0 9( 14)

.

=

199.7 minutes

It is important to note that although three control data fields are present in each record, the program will not necessarily have to compare all three fields from each record to establish sequence. For example, it may be necessary to compare the second control fields in only I/20th of the records, and it may be necessary to compare the third control fields in only I/50th of the records. The values of API and AP2 can thus be correspondingly decreased to about 1/35th of t-heir respective sizes to more accurately reHect the time required. Whether an adjustment of this type need be made, and the amount of the adjustment, must be determined by the user.

Blocking Considerations

In all the sorting situations covered so far in this discussion of timing estimates, the input and 911tput blocking factors have been equal to G. That is, both have been the maximum permissible.

If either the input blocking factor (Bi) or the output blocking factor (Bo) is less than G, however, several adjustments must be made to the basic timing formula.

If

Bi is less than G, timing for phase I must be increased.

If

Bo is less than G, timing for the

last

pass of phase 2 must be increased. In order to facilitate these adjustments, the basic timing formula has been rearranged, or sectionalized, to show the timing for phase 1, the last merge pass of phase 2, all other merge passes, and rewind time. This sectionalized timing formula is shown in Figure 8.

IF INPUT BLOCKING FACTOR IS LESS THAN G •..

In this case the value of PI and the value of T must be increased in the portion of the timing formula that indicates the timing for phase I (see Figure 8). The following formulae indicate the amounts by which PI and T are incremented:

.!l 1.4( G-Bi)

PI

=

G(Bi)

+

0.9

G

.!lT2H}

=

10.8 (

.lT2L

G-Bi)

G(Bi)

.!lT4H1_ 7.3 ( G-Bi )

.lT4LfG(Bi)

IF OUTPUT BLOCKING FACTOR IS LESS THAN G .••

In this case the value of P2 and the value of T must be increased in the portion of the timing formula that indicates the timing for the

last

merge pass of phase 2

(see Figure 8). The following formulae indicate the amounts by which P2 and T are incremented:

.lP2

1.2(G-Bo)

G(Bo)

.lT2H1_10.8 (

.lT2L

f-

G-Bo)

G(Bo)

.lT4H}_ 7.3 ( G-BO)

.lT4L G(Bo)

Although total processing time is increased when either input or output blocking is less than G, it is important to note that the degree of increase depends upon the size of the file being sorted. In lengthy files, the difference in sorting time is almost insignificant. As files become progressively shorter, however, the percentage of increase becomes more substantial.

Total time

(minutes)

Phase 1

~

60,000

~

60,000

Last pass of phase 2 All other merge passes

~

P2(R)

60,000

+

~

60,000

+

(P2) (R) (p-l)

60,000

+

(T){R) (p-l)

60,000

Rewind time

+

W{p+l)

Must be adjusted if Bi

¥

G

Must be adjusted if Bo

=F

G

Figure 8. Sectionalized Version of Basic Sort 1 Timing Formula

12

Technical Newsletter

File No. 1401-33

Re: Form No.

J24-1422-1

This Newsletter No. N24-0162

Date: December 23, 1963

Previous Newsietter Nos. None

A change to the publication: Sort 1 for IBM 1401: Specifications, J24-1422-1.

Page 3, Output Blocking. Add:

If the output blocking factor is not a submultiple of G, the following message is printed:

BO NOT FACTOR OF B

*

SET BO TO B

The sort program replaces BO (output blocking factor) with B (maximum blocking factor) and continues processing.

International Business Machines Corp., Product Publications Dept., Endicott,

N.

Y.

PRINTED IN u.

S.A.

N24-0162 (J24-1422-1) Page 1 of 1

J24-1422-1

~rn~ e

International Business Machines Corporation

Data Processing Division

112 East Post Road, White Plains, New York

. "

:::I

C

(J)

>

630105MTC

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement