anafile (1) analyze a file and print statistics. (V7.2 (January 2007))
Without a description file, anafile analyses the beginning or the complete file(s), assigns a file class to each file, and lists on request the column (byte-per-byte) statistics. The possible file class values are detailed in the section File Classes below.
–a lists the position (line and column numbers) of each character which leads to a classification as ascii binary or binary file.
–d considers the input file as tab-separated-values (–d alone option), or column-separated-values where the character c represents the column separator. This option, combined with (–cc, generates statistics by columns (width of each column, and characters present there).
–cc
generates a detailed list of the frequencies of every character in
each column: the characters of each column are listed in order
of their decreasing frequency.
Associated to the –d option, the contents of every field
(defined as the contiguous bytes between the column separator)
are examined, and statistics on its contents are listed.
With the addition of g (i.e. with option –ccg),
the statistics include computations of min/max in each field,
a proposal of a description to in include in a ReadMe file
and possible alignements (using the acut program)
are suggested.
See also the –head option if some top lines have to be skipped
(or used as titles / units)
–hash is a shortcut for –headlines=#
–headlines specifies that the input file may contain heading line(s) – i.e. lines that explain the contents. These heading lines (the header) may be formed by a number of lines or a leading character. For instance, the existence of 2 heading lines is specified with -head=2; the existence of heading lines starting by the hash sign is is specified with -head='#'. The default is a singlle line (i.e. equivalent to -head=1)
–fx format_file uses the contents of format_file to check the compliance of the file to the specified format. The x may be used for further options concerning this format file like computation of ranges, verifications against the CDS Standards, etc... (see the section Format File below).
–l asks to examine the complete files to assign the file class, and lists the number of lines as well as the number of bytes of the longest line.
–q suiet mode: the messages are minimized.
–t table_structure
indicates that the next file argument designates a
data file which contains data structured like
table_structure (for instance an excerpt of a table).
table_structure is therefore a name which must exist
in one of the Byte-by-byte Description ... section in the
ReadMe file.
A value of - for table_structure asks to stop this
behaviour.
Note that this option can only work when a –f specification
precedes -bf–t.
–u lists the columns which are constant (i.e. have exactly the same contents) over all lines. This option may be used to check that e.g. that the decimal points are correctly aligned, or to find out the blank columns which could be removed with trcol(1).
–v is a verbose option.
–w width specifies the assumed column width for ascii bulk or ebcdic files; such files have no linefeed embedded, and the length of each line must be assumed.
Byte-per-byte Description of file: hbc ------------------------------------------------------------------- Bytes Format Units Label Explanations ------------------------------------------------------------------- 2- 4 I3 --- HBC [1,423]+ HBC number. 5 A1 --- NEBUL [n] Nebulosity association flag. 6 A1 --- REMARK [*] Remark flag. 8- 18 A11 --- NAME [A-Z0-9@.+-]! Star name. 20- 56 A37 --- OTHER Other designation. 59- 60 I2 h RAh Hours of right ascension (1950.0). 62- 63 I2 min RAm Minutes of right ascension. 65- 69 F5.2 s RAs Seconds of right ascension. 71 A1 - DE- Sign of declination (1950.0). 72- 73 I2 deg DEd Degrees of declination. 75- 76 I2 arcmin DEm Minutes of declination. 78- 81 F4.1 arcsec DEs Seconds of declination. 83- 90 A8 --- REF References to the position. -------------------------------------------------------------------
The format file is made of five columns: the byte position, the format, the units, the label and an explanation text. Such a file is interpreted by anafile , is reedited in a standard form (on the screen if the –v option is present, or the format file is rewritten with the –fw option), and is used for data check. Note also that special labels are understood by anafile (see Special Labels below).
The explanation text may contain further restrictions concerning the range for numeric fields or the character set allowed for alphabetical field; refer to Validity Checks section below.
Note that the byte position may be specified as relative from the end of the previous field when followed by the X letter, as it is in Fortran. The number preceding the X therefore represents the number of blanks between the two columns.
With the –f. (a dot following the f) option, the input format file does not include the units column; the reedition fills this column with dashes —
With the –f1 (a one following the f) option, the first column of the input format file contains only the starting byte of the column; the ending byte is derived from the format column. The reedition (with option –f1w) completes this column with the ending byte.
With the –f1X (a one and X following the f) option, it is assumed that a blank always separates two adjacent columns; the contents of the starting byte, if existing, is ignored. The reedition (with option –f1Xw) computes the starting–ending byte column.
With the –fr option, the actual ranges (minimal / maximal values) of each column are computed.
With the –fs option, the format file is assumed to conform to the Standards for Astronomical Catalogues; further compliance checks (presence of titles, correctness of units, etc...) of this format file are then performed.
With the –fw option, the format file is rewritten according to standards.
The format options may be combined, e.g. –f1.w asks to rewrite format file where no units and no ending columns were supplied.
For a numerical field, a range can be specified in the format file if the explanation text starts with a square bracket [ ] as in the HBC column in the above example. The opening bracket is [ if the lower value is included, and ] if the lower value is excluded — i.e. the standard mathematical conventions apply. Both lower and upper values are not required; for instance, the specification of any value lower than 100 (100 excluded) is specified by [,100[. Writing [] is acceptable when no range checking applies — e.g. to override the default range implied by the label (see Special Labels section above).
For an alphabetical (i.e. A-format) field, the set of the allowed characters may be specified in the format file if the explanation text starts with a square bracket [: permitted characters are surrounded by square brackets [...], and the dash indicates a range (in the ascii sequence). The closing bracket is accepted as a character permitted in the set if it is specified first (i.e. []] means that only the closing bracket is acceptable); the dash is accepted when it is first or last character. On the above example, only n (or blank) is accepted in the NEBUL column, and uppercase letters, digits, and the symbols @ . + - in the NAME column.
An exclamation mark ! following immediately the range or character-set specification indicates that the field cannot be blank, i.e. cannot be filled with only blanks. In the above example, the Name field can never be blank.
A question mark ? following immediately the range or character-set specification indicates that the field can be blank, i.e. can be filled with only blanks. The default concerning blank fields is:
The order within a column can be specified with the signs: