, Factor }}
Forth is a programming language and programming environment, initially developed by Charles H. Moore at the US National Radio Astronomy Observatory in the early 1970s. It was formalized in 1977 and standardized by ANSI in 1994. Forth is sometimes spelled in all capital letters following the customary usage during its earlier years, although the name is not an acronym.
A procedural, stack-oriented and reflective programming language without type checking, Forth features both interactive execution of commands (making it suitable as a shell for systems that lack a more formal operating system) and the ability to compile sequences of commands for later execution. Some Forth versions (especially early ones) compile threaded code, but many implementations today generate optimized machine code like other language compilers.
Forth is so named because "*he file holding the interpreter was labeled FORTH, for 4th (next) generation software - but the operating system restricted file names to 5 characters." Moore's use of the phrase 4th (next) generation software appears to predate the definition of fourth-generation programming languages; he saw Forth as a successor to compile-link-go third-generation programming languages, or software for "4th generation" hardware, not a 4GL as we understand the term today.
Forth uses separate stacks for storage of subroutine parameters and subroutine activation records. The parameter or data stack (commonly referred to as the stack) is used to pass data to words and to store the results the words return. The linkage or return stack (commonly referred to as the rstack) is used store return addresses when words are nested (the equivalent of a subroutine call), and store local variables. There are standard words to move data between the stacks, and to load and store variables on the stack.
The logical structure of Forth resembles a virtual machine. Forth, especially early versions, implements an inner interpreter tracing indirectly threaded machine code, giving compact and fast high-level code that can be compiled rapidly. More modern implementations generate optimized machine code like other language compilers.
Forth became very popular in the 1980s because it was well suited to the small microcomputers of that time, as it is compact and portable. At least one home computer, the British Jupiter ACE, had Forth in its ROM-resident OS. Forth is still used today in many embedded systems (small computerized devices) because of its portability, efficient memory use, short development time, and fast execution speed. It has been implemented efficiently on modern RISC processors, and processors that use Forth as machine language have been produced. Other uses of Forth include the Open Firmware boot ROMs used by Apple, IBM and Sun and as the first stage boot controller of the FreeBSD operating system.
Forth is a simple yet extensible language; its modularity and extensibility permit the writing of high-level applications such as CAD systems. However extensibility also helps poor programmers to write incomprehensible code, which has given Forth a reputation as a "write-only" language. Forth has been used successfully in large, complex projects, while applications developed by competent, disciplined professionals have proven to be easily maintained on evolving hardware platforms over decades of use.
Using RPN, one could get the result of the mathematical expression (25 * 10 + 50) this way:
25 10 * 50 + . 300 ok
This command line first puts the numbers 25 and 10 on the implied stack.
The word * multiplies the two numbers on the top of the stack and replaces them with their product. Then the number 50 is placed on the stack.
The word + adds it to the previous product. Finally, the . command prints the result to the user's terminal.
Even Forth's structural features are stack-based. For example:
: FLOOR5 ( n -- n' ) DUP 6 < IF DROP 5 ELSE 1 - THEN ;
This code defines a new word (again, 'word' is the term used for a subroutine) called FLOOR5 using the following commands: DUP duplicates the number on the stack; < compares the two numbers on the stack and replaces them with a true-or-false value; IF takes a true-or-false value and chooses to execute commands immediately after it or to skip to the ELSE; DROP discards the value on the stack; and THEN ends the conditional. The text in parentheses is a comment, advising that this word expects a number on the stack and will return a possibly changed number. The net result performs similarly to this function written in the C programming language:
int floor5(int v) { return v < 6 ? 5 : v - 1; }
: (colon) and ends with the word ; (semi-colon). For example
: X DUP 1+ . . ;
will compile the word X. When executed by typing 10 X at the console this will print 11 10.
BLOCK is employed to translate the number of a 1K-sized block of disk space into the address of a buffer containing the data, which is managed automatically by the Forth system. Some implement contiguous disk files using the system's disk access, where the files are located at fixed disk block ranges. Usually these are implemented as fixed-length binary records, with an integer number of records per disk block. Quick searching is achieved by hashed access on key data.
Multitasking, most commonly cooperative round-robin scheduling is normally available (although multitasking words and support are not covered by the ANSI Forth Standard). The word PAUSE is used to save the current task's execution context, to locate the next task, and restore its execution context. Each task has its own stacks, private copies of some control variables and a scratch area. Swapping tasks is simple and efficient; as a result, Forth multitaskers are available even on very simple microcontrollers such as the Intel 8051, Atmel AVR, and TI MSP430.
By contrast, some Forth systems run under a host operating system such as Microsoft Windows, Linux or a version of Unix and use the host operating system's file system for source and data files; the ANSI Forth Standard describes the words used for I/O. Other non-standard facilities include a mechanism for issuing calls to the host OS or windowing systems, and many provide extensions that employ the scheduling provided by the operating system. Typically they have a larger and different set of words from the stand-alone Forth's PAUSE word for task creation, suspension, destruction and modification of priority.
After the fetch and store operations are redefined for the code space, the compiler, assembler, etc. are recompiled using the new definitions of fetch and store. This effectively reuses all the code of the compiler and interpreter. Then, the Forth system's code is compiled, but this version is stored in the buffer. The buffer in memory is written to disk, and ways are provided to load it temporarily into memory for testing. When the new version appears to work, it is written over the previous version.
There are numerous variations of such compilers for different environments. For embedded systems, the code may instead be written to another computer, a technique known as cross compilation, over a serial port or even a single TTL bit, while keeping the word names and other non-executing parts of the dictionary in the original compiling computer. The minimum definitions for such a forth compiler are the words that fetch and store a byte, and the word that commands a forth word to be executed. Often the most time-consuming part of a remote port is to construct the initial program to implement fetch, store and execute. Many modern microprocessors have integrated debugging features (such as the Motorola CPU32) that eliminate even this task.
A defined word generally consists of head and body with the head consisting of the name field (NF) and the link field (LF) and body consisting of the code field (CF) and the parameter field (PF). Head and body of a dictionary entry are treated separately because they may not be contiguous. For example, when a Forth program is recompiled for a new platform, the head may remain on the compiling computer, while the body goes to the new platform. In some environments (such as embedded systems) the heads occupy memory unnecessarily. However, some cross-compilers may put heads in the target if the target itself is expected to support an interactive Forth.
structure byte: flag \ 3bit flags + length of word's name char-array: name \ name's runtime length isn't known at compile time address: previous \ link field, backward ptr to previous word address: codeword \ ptr to the code to execute this word any-array: parameterfield \ unknown length of data, words, or opcodes end-structure forthword
The name field starts with a prefix giving the length of the word's name (typically up to 32 bytes), and several bits for flags. The character representation of the word's name then follows the prefix. Depending on the particular implementation of Forth, there may be one or more NUL ('\0') bytes for alignment.
The link field contains a pointer to the previously defined word. The pointer may be a relative displacement or an absolute address that points to the next oldest sibling.
The code field pointer will be either the address of the word which will execute the code or data in the parameter field or the beginning of machine code that the processor will execute directly. For colon defined words, the code field pointer points to the word that will save the current Forth instruction pointer (IP) on the return stack, and load the IP with the new address from which to continue execution of words. This is the same as what a processor's call/return instructions does.
The "compile time" flag in the name field is set for words with "compile time" behavior. Most simple words execute the same code whether they are typed on a command line, or embedded in code. When compiling these, the compiler simply places code or a threaded pointer to the word.
Compile-time words are actually executed by the compiler. The classic examples of compile-time words are the control structures such as IF and WHILE. All of Forth's control structures, and almost all of its compiler are implemented as compile-time words.
: (colon) takes a name as a parameter, creates a dictionary entry (a colon definition) and enters compilation state. The interpreter continues to read space-delimited words from the user input device. If a word is found, the interpreter executes the compilation semantics associated with the word, instead of the interpretation semantics. The default compilation semantics of a word are to append its interpretation semantics to the current definition.
The word ; (semi-colon) finishes the current definition and returns to interpretation state. It is an example of a word whose compilation semantics differ from the default. The interpretation semantics of ; (semi-colon) and several other words are undefined in ANS Forth.
The interpreter state can be changed manually with the words (right-bracket) which enter interpretation state or compilation state, respectively. These words can be used with the word LITERAL to calculate a value during a compilation and to insert the calculated value into the current colon definition. LITERAL has the compilation semantics to take an object from the data stack and to append semantics to the current colon definition to place that object on the data stack.
In ANS Forth, the current state of the interpreter can be read from the flag STATE which contains the value true when in compilation state and false otherwise. This allows the implementation of so-called state-smart words with behavior that changes according to the current state of the interpreter.
IMMEDIATE marks the most recent colon definition as an immediate word, effectively replacing its compilation semantics with its interpretation semantics. Immediate words are always executed, not compiled, in either state. ; is an example of an immediate word. In ANS Forth, the word POSTPONE takes a name as a parameter and appends the compilation semantics of the named word to the current definition even if the word was marked immediate. Forth-83 defined separate words COMPILE and * to force the compilation of non-immediate and immediate words, respectively.
:NONAME which compiles the following words up to the next ; (semi-colon) and leaves an execution token on the data stack. The execution token provides an opaque handle for the compiled semantics, similar to the function pointers of the C programming language.
Execution tokens can be stored in variables. The word EXECUTE takes an execution token from the data stack and performs the associated semantics. The word COMPILE, (compile-comma) takes an execution token from the data stack and appends the associated semantics to the current definition.
The word (tick) takes the name of a word as a parameter and returns the execution token associated with that word on the data stack. In interpretation state, ' RANDOM-WORD EXECUTE is equivalent to RANDOM-WORD.
: (colon), POSTPONE, ' (tick) and :NONAME are examples of parsing words that take their arguments from the user input device instead of the data stack. Another example is the word ( (paren) which reads and ignores the following words upto and including the next right parenthesis and is used to place comments in a colon definition. Similarly, the word \ (backslash) is used for comments that continue to the end of the current line. To be parsed correctly, ( (paren) and \ (backslash) must be separated by whitespace from the following comment text.
VARIABLE
VARIABLE returns its address on the stack.
CONSTANT
CONSTANT). Instance behavior returns the value.
CREATE
Forth also provides a facility by which a programmer can define new application-specific defining words, specifying both a custom defining behavior and instance behavior. Some examples include circular buffers, named bits on an I/O port, and automatically-indexed arrays.
Data objects defined by these and similar words are global in scope. The function provided by local variables in other languages is provided by the data stack in Forth. Forth programming style uses very few named data objects compared with other languages; typically such data objects are used to contain data which is used by a number of words or tasks (in a multitasked implementation).
Forth does not enforce consistency of data type usage; it is the programmer's responsibility to use appropriate operators to fetch and store values or perform other operations on data.
During development, the programmer uses the interpreter to execute and test each little piece as it is developed. Most Forth programmers therefore advocate a loose top-down design, and bottom-up development with continuous testing and integration.
The top-down design is usually separation of the program into "vocabularies" that are then used as high-level sets of tools to write the final program. A well-designed Forth program reads like natural language, and implements not just a single solution, but also sets of tools to attack related problems.
The tool-box approach is one of the reasons that Forth is so difficult to master. While learning the syntax is easy, mastering the tools delivered with a professional Forth system can take several months, working full-time. The task is actually more difficult than rewriting one's own Forth system from scratch. Unfortunately, a rewrite also loses the experience accumulated in a typical professional Forth toolbox.
One possible implementation:
: HELLO ( -- ) CR ." Hello, world!" ; HELLO
The word CR causes the following output to be displayed on a new line. The parsing word ." (dot-quote) reads a double-quote delimited string and appends code to the current definition so that the parsed string will be displayed on execution. The space character separating the word ." from the string Hello, world! is not included as part of the string. It is needed so that the parser recognizes ." as a Forth word.
A standard Forth system is also an interpreter, and the same output can be obtained by typing the following code fragment into the Forth console:
CR .( Hello, world!)
.( (dot-paren) is an immediate word that parses a parenthesis-delimited string and displays it. As with the word ." the space character separating .( from Hello, world! is not part of the string.
The word CR comes before the text to print. By convention, the Forth interpreter does not start output on a new line. Also by convention, the interpreter waits for input at the end of the previous line, after an ok prompt. There is no implied 'flush-buffer' action in Forth's CR, as sometimes is in other programming languages.
EMIT-Q which when executed emits the single character Q:
: EMIT-Q 81 ( the ASCII value for the character 'Q' ) EMIT ;
This definition was written to use the ASCII value of the Q character (81) directly. The text between the parentheses is a comment and is ignored by the compiler. The word EMIT takes a value from the data stack and displays the corresponding character.
The following redefinition of EMIT-Q uses the words (right-bracket), CHAR and LITERAL to temporarily switch to interpreter state, calculate the ASCII value of the Q character, return to compilation state and append the calculated value to the current colon definition:
: EMIT-Q CHAR Q LITERAL EMIT ;
The parsing word CHAR takes a space-delimited word as parameter and places the value of its first character on the data stack. The word , the example definition for CHAR. Using EMIT-Q could be rewritten like this:
: EMIT-Q * Q EMIT ; \ Emit the single character 'Q'
This definition used \ (backslash) for the describing comment.
Both CHAR and could have been defined like this:
IMMEDIATE and POSTPONE,
: * CHAR POSTPONE LITERAL ; IMMEDIATE
.NET programming languages | Concatenative programming languages | Programming languages | Stack-oriented programming languages
Forth | Forth | Forth (Informatik) | FORTH | Forth (langage) | Forth | Forth | FORTH | Forth | Forth | Forth (язык программирования) | Forth | Forth (programmeringsspråk) | Forth (мова програмування) | Forth