Blob Blame History Raw
XREF(AWK)                   Philip L. Bewig                   XREF(AWK)

NAME

        xref(awk) - produce a cross reference listing of an awk program

SYNOPSIS

        awk -f xref.awk [ file ... ]

DESCRIPTION

        XREF(AWK) takes as input a valid awk program and produces as out-
        put a cross-reference listing of all variables and function calls
        which appear in the program.

        For ordinary variables and array variables, a line of the form

                count var(func) lines ...

        is produced, where "count" is the number of times the variable is
        used, "var" is the name of the variable, "func" is the function
        name to which the variable is local (a null "func" indicates that
        the variable is global), and "lines" is the number of each line
        where the variable appears.  Appearances of the variable in a
        function's parameter list are ignored.  The number of lines shown
        may differ from "count" if the variable appears more than once on
        the same line.

        For functions, a line of the form

                count func(define) lines ...

        is produced, where "count" is the number of times the function is
        called, "func" is the name of the function, "define" is the lime
        number where the function is defined, and "lines" is the number of
        each line where the function is called.  As for variables, the
        number of lines shown may differ from "count."

        Output lines for variables and functions are intermixed and are
        sorted by name.  Though terse, the output is informative, easy to
        read, and amenable to further processing.

EXAMPLE

        The cross-reference listing produced by running xref.awk against
        itself is shown below:

                5 NR() 39 45 50 53 68
                8 RLENGTH() 119 120 123 124 127 128 132 133
                10 act() 31 34 36 40 51 54 56 63 65 67
                1 arr(asplit) 90
                2 arr(inarray) 93 94
                1 asplit(89) 6
                3 braces() 55 57 58
                2 flines() 53 79
                1 fs(asplit) 89
                3 funcname() 42 52 61
                16 i() 76 77 78 79 81 82 83 84 90
                3 inarray(92) 37 43 48
                6 j() 82 83 84
                3 j(inarray) 93 94
                3 keywords() 10 125 129
                1 lex(97) 29
                31 line() 103 104 107 108 109 110 112 113 114 115 116 117 118
                        119 120 122 123 124 126 127 128 131 132 133
                6 lines() 39 45 50 83 84
                4 local() 41 59 60 64
                3 machine() 17 30 31
                2 n(asplit) 89 90
                15 names() 37 38 43 44 48 49 77 78 79 81 82 83 84
                3 nextstate() 30 62 73
                4 nnames() 38 44 49 76
                4 state() 23 30 31 73
                1 str(asplit) 89
                3 symb() 29 30 31
                2 temp() 59 60
                2 temp_asplit() 89 90
                31 tok() 37 38 39 41 42 43 44 45 47 48 49 50 52 53 64 101 105
                        113 115 117 119 123 125 127 129 132
                1 val(inarray) 94
                5 xnames() 39 45 50 77 82

        For readability, some lines have been folded.

SOURCE CODE

        # xref.awk - cross reference an awk program

        BEGIN {

                # create array of keywords to be ignored by lexer
                asplit("BEGIN:END:atan2:break:close:continue:cos:delete:" \
                        "do:else:exit:exp:for:getline:gsub:if:in:index:int:"  \
                        "length:log:match:next:print:printf:rand:return:sin:" \
                        "split:sprintf:sqrt:srand:sub:substr:system:while",
                        keywords,":")

                # build the symbol-state table
                split("00:00:00:00:00:00:00:00:00:00:" \
                          "20:10:10:12:12:11:07:00:00:00:" \
                          "08:08:08:08:08:33:08:00:00:00:" \
                          "08:44:08:36:08:08:08:00:00:00:" \
                          "08:44:45:42:42:41:08",machine,":")

                # parse the input and store an intermediate representation
                # of the cross-reference information

                # set up the machine
                state = 1

                # run the machine
                for (;;) {

                        # get next symbol
                        symb = lex()
                        nextstate = substr(machine[state symb],1,1)
                        act = substr(machine[state symb],2,1)

                        # perform required action
                        if ( act == "0" )
                                ; # do nothing
                        else if ( act == "1" ) {
                                if ( ! inarray(tok,names) )
                                        names[++nnames] = tok
                                lines[tok,++xnames[tok]] = NR }
                        else if ( act == "2" ) {
                                if ( tok in local ) {
                                        tok = tok "(" funcname ")"
                                        if ( ! inarray(tok,names) )
                                                names[++nnames] = tok
                                        lines[tok,++xnames[tok]] = NR }
                                else {
                                        tok = tok "()"
                                        if ( ! inarray(tok,names) )
                                                names[++nnames] = tok
                                        lines[tok,++xnames[tok]] = NR } }
                        else if ( act == "3" ) {
                                funcname = tok
                                flines[tok] = NR }
                        else if ( act == "4" )
                                braces++
                        else if ( act == "5" ) {
                                braces--
                                if ( braces == 0 ) {
                                        for ( temp in local )
                                                delete local[temp]
                                        funcname = ""
                                        nextstate = 1 } }
                        else if ( act == "6" ) {
                                local[tok] = 1 }
                        else if ( act == "7" )
                                break
                        else if ( act == "8" ) {
                                print "error: xref.awk: line " NR ": aborting" \
                                        > "/dev/con"
                                exit 1 }

                        # finished with current token
                        state = nextstate }

                # finished parsing, now ready to print output
                for ( i = 1; i <= nnames; i++ ) {
                        printf "%d ", xnames[names[i]] |"sort +1"
                        if ( index(names[i],"(") == 0 )
                                printf "%s(%d)", names[i], flines[names[i]] |"sort +1"
                        else
                                printf "%s", names[i] |"sort +1"
                        for ( j = 1; j <= xnames[names[i]]; j++ )
                                if ( lines[names[i],j] != lines[names[i],j-1] )
                                        printf " %d", lines[names[i],j] |"sort +1"
                        printf "\n" |"sort +1" }

                } # END OF PROGRAM

        function asplit(str,arr,fs,  n) { n = split(str,temp_asplit,fs)
                for ( i = 1; i <= n; i++ ) arr[temp_asplit[i]]++ }

        function inarray(val,arr,  j) {
            for ( j in arr )
                if ( arr[j] == val ) return j
            return "" }

        function lex() {

                for (;;) {

                        if ( tok == "(eof)" ) return 7

                        while ( length(line) == 0 )
                                if ( getline line == 0 ) {
                                        tok = "(eof)"; return 7 }

                        sub(/^[ \t]+/,"",line)                                # remove white space,
                        sub(/^"([^"]|\\")*"/,"",line)             # quoted strings,
                        sub(/^\/([^\/]|\\\/)+\//,"",line)     # regular expressions,
                        sub(/^#.*/,"",line)                                   # and comments

                        if ( line ~ /^function/ ) {
                                tok = "function"; line = substr(line,9); return 1 }
                        else if ( line ~ /^{/ ) {
                                tok = "{"; line = substr(line,2); return 2 }
                        else if ( line ~ /^}/ ) {
                                tok = "}"; line = substr(line,2); return 3 }
                        else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*\[/) ) {
                                tok = substr(line,1,RLENGTH-1)
                                line = substr(line,RLENGTH+1)
                                return 5 }
                        else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*\(/) ) {
                                tok = substr(line,1,RLENGTH-1)
                                line = substr(line,RLENGTH+1)
                                if ( ! ( tok in keywords ) ) return 6 }
                        else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*/) ) {
                                tok = substr(line,1,RLENGTH)
                                line = substr(line,RLENGTH+1)
                                if ( ! ( tok in keywords ) ) return 4 }
                        else {
                                match(line,/^[^A-Za-z_{}]/)
                                tok = substr(line,1,RLENGTH)
                                line = substr(line,RLENGTH+1) } } }

TECHNICAL DISCUSSION

        Broadly, XREF(AWK) parses an awk program using a symbol-state
        table, in much the same way as a yacc-generated parser.  The
        lexical analyzer recognizes seven distinct symbols:  the word
        "function", the left brace, the right brace, identifiers used
        as variables, identifiers used as arrays, identifiers used as
        functions, and end of file.  The type of symbol is returned to
        the parser as the value of the "lex" function, and the global
        variable "tok" is set to the text of the current token.

        The symbol-state table is stored in the "machine" array.  The
        table can be represented as follows:

                       symbol |     1       2  3   4     5     6     7
                              |
        state                 | "function"  {  }  var  array  func  eof
        -- -- -- -- -- -- -- -+- -- -- -- -- -- -- -- -- -- -- -- -- --
        1 any                 |     20     10  10  12    12    11   07
        2 "function"          |     08     08  08  08    08    33   08
        3 "function" name     |     08     44  08  36    08    08   08
        4 "function" name "{" |     08     44  45  42    42    41   08

        where the first digit is the state to be entered after process-
        ing the current token and the second digit is an action to be
        performed.  The actions are listed below:

                1       found a function call
                2       found a variable or array
                3       found a function definition
                4       found a left brace
                5       found a right brace
                6       found a local variable declaration
                7       found end of file
                8       found an error

        Each of the first six actions causes some information about the
        target program to be stored for later processing; the structures
        used will be discussed below.  The seventh action causes the
        parser to exit.  The eighth action causes errors to be reported
        to standard error and the program to abort.

        Before describing the intermediate data structures, we will
        discuss some of the more interesting points in the action calls.
        The "braces" variable keeps track of whether we are currently
        within a functions; it is positive within a function and zero
        without.  When the right brace which causes the value of "braces"
        to go from one to zero is found, the value of "nextstate" is
        changed from four (scanning a function) to one (any) and the
        names of local variables are forgotten.  The "local" array is
        accumulated from the variables found after the function name but
        before the opening left brace of the function; action two care-
        fully checks whether a variable is global or local before writing
        to the intermediate data structure.  The variable "funcname" is
        the name of the current function when within a function and null
        without.

        The following arrays store an intermediate representation of the
        variable and function identifiers of the target program:

                names[1..nnames] = list of all identifiers, both variable and
                        function names; for variables, the name has the form
                        var(func), but for functions, there are no parentheses

                xnames[names[i]] = number of times names[i] is used

                lines[names[i],1..xnames[names[i]]] = list of line numbers
                        where names[i] is used

                flines[names[i]] = line number where function names[i] is
                        defined

        These arrays are created as the parser reads the input; when the
        parser is finished, the arrays are output in user-readable form.

PORTABILITY

        XREF(AWK) will work with any implementation of nawk.  The MKS
        ToolKit implementation requires the large-model version of awk.

HISTORY

        Written by Phil Bewig on February 10, 1990.  Inspired by
        Exercise 3-16 of the book "The Awk Programming Language" by
        Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger
        (Addison-Wesley:  1988).

COPYRIGHT

        This program is placed in the public domain.  However, the
        author requests credit when distributed.