CS 441G-001

Professor Craig C. Douglas

Tuesday-Thursday 12:30-1:45 RMB 323

http://www.mgnet.org/~douglas/Classes/cs441g-f05
Syllabus    Survey    Mailing List    Notes    Language    Homework

  Please fill out the Survey if you have not done so already.

 

Language Description

Introduction

    The F05 language will be similar to standard algorithmic languages, but will not follow any one in particular.  You will not have an extensive library for input, output, system calls, or memory allocation.

    The lexigraphical structure of the language is defined as follows:

  • Ignore white space (blanks, tabs, new lines, and form feeds).
  • An identifier is a string made up of letters, digits, dollar signs, and underscores and does not begin with a digit.  The maximum number of characters you should pay attention to are the first 32.  Ignore everything after that.  (Token name is IDENTIFIER.)
  • A string constant is enclosed in double quotes.  (Token name is SCONSTANT.)
  • Special characters should be treated as C does using a backslash.
  • An integer is a sequence of digits 0, ..., 9.  (Token name is ICONSTANT.)
  • A floating point number has 64 bits (i.e., a C or C++ double) and the following forms:
    • Mantissa only:  123., 123.3, 0.3, .3
    • Mantissa and exponent:  0.123d42, 1.23d-3, 0.001d+10, 123.456d000
    Note that 1.00d-0 is the same as 1.00d+0 or 1.00d0 or 1.00.  (Token name is DCONSTANT.)
  • Key words should processed specially.  Unless noted, capitalize the key word as its token (e.g., token name DO for do).
    • begin (Token name BBEGIN.)
    • do
    • else
    • end
    • exit
    • function
    • if
    • integer
    • printf
    • procedure
    • program
    • real
    • return
    • scanf
    • string
    • then
    • while
  • Operators (and punctuation) that need to be recognized are the following:

Operator

Token name
= ASSIGN
: COLON
, COMMA
// COMMENT
&& DAND
/ DIVIDE
|| DOR
== DEQ
>= GEQ
> GT
[ LBRACKET
<= LEQ
( LPAREN
< LT
- MINUS
% MOD
* MULTIPLY
. PERIOD
~= NE
~ NOT
. PERIOD
+ PLUS
] RBRACKET
) RPAREN
; SEMI

Tokens do not cross new lines.  If lex cannot produce a string or number, you do not either.  An error message is appropriate if you truncate something, however.  An underflow (a floating point number that rounds off to 0 instead of whatever it really is in infinite precision arithmetic), however, is not an error, and should be treated as 0.

Defining It by Example

    The easiest way to see what the language really is, is to study one or more examples.  Below is an example ...

(still under construction)

    Comments aside, find the bugs and report them to me.

 

Notes:

  1. Declarations can occur right after a begin statement.  The declarations are only valid within the block.  After the block end, all of the variables declared in that block become undefined and memory should be freed automatically.
  2. Not all of the key words have been used (e.g., scanf and some of the logical operators).
  3. scanf and printf are special cases that will be discussed in class.
  4. A comment is the text folloing the // to the end of the line, as in C++. The text can be thrown away by the lexer unless you want to save it for some reason (e.g., complete program reconstruction from your parse tree).

Class Examples

    The class (will) put together a number of simple 5-10 line examples:

TBA

The Symbol Table

    The symbol table will be a pain in the neck all semester.  Just when you think that you have it just right, you will have an inspiration and need a modification or addition.  So be really flexible and defensive in designing your symbol table.

    Start by entering data such as an identifier name, constant value, the scope, and the memory location (if appropriate).  Keep all attributes of a name in a separate record.  Also, think of a fast way to access the symbol table.  My favorite is a combination of hashing the symbols and then using a linked list to get to the right symbol table entry.

    Your lexer can help.  For many tokens, there is the token value (e.g., ICONSTANT) and a token subvalue (e.g., the binary value of the string in yytext) that can be stored in the global variable yylval.  You should take advantage of this as often as possible in the early stages of compiling.

    Finally, our language allows for variable to be redefined inside of blocks.  Watch out for the nested loop variables and other equally unsavory variables.  Your symbol table must resolve entries correctly so that the right instance of a variable is always used in the right place.

The Translation Output

    Nah... you do not really want me to put the information here yet.  It can wait until October.  However, you will produce a file that will be included by a C program that I will provide you.

 

Cheers,
Craig C. Douglas

http://www.mgnet.org/~douglas

Last modified: