Advanced Vector Translator: documentation

This documentation exists in other language: Русский.

contents

About this document
What is the Advanced Vector Translator?
Limitations
Lexis
Keywords
Data types
Structure of programme
Constants
Structures
Variables
Exceptions (64-bit programmes only)
Functions
Interrupts
Initialization and Finalization
Entry point
Calling conventions
Expressions
Operation priority
Control flow operators
Using the assembler

about this document

This document describes features of the Advanced Vector Translator (abbreviated to AVT) programming language and the corresponding compiler.

what is the advanced vector translator?

AVT is a procedural imperative structured programming language with strong data typing. Object-oriented features are not available, but can be simulated.

AVT generates 16-bit, 32-bit or 64-bit human-readable assembler source for fasm by programmer’s choice. The resulting code can be used to create programmes for DOS, GNU/Linux, KolibriOS, MacOS, Windows and other operating systems based on x86 CPU, as well as for creating operating system kernels. The 64-bit code generated by AVT are position-independent.

AVT is not an optimizing compiler: it uses a register stack to store involved registers.

limitations

The maximum size of a 16-bit programme is approximately 47 kilobytes. However, this size is recommendatory: if you don’t use a heap that should be located in same segment of memory as programme code, the 16-bit programme can occupy the entire 64-kilobyte segment of memory. The stack (or stacks, when it comes to multi-threaded 16-bit applications) can be located in another 64-kilobyte memory segment.

There are limitations on use of certain kinds of data types and operations in 16-bit and 32-bit programmes as will be discussed in section «Data types» and «Expressions».

lexis

Identifiers can consist letters of Latin alphabet (A–Z, a–z), underscore «_» and digits (0–9), and the first character must not be a digit. Identifiers are case-sensitive, so the identifiers, for example, getWindowTop and getwindowtop mean different programme elements.

Integer numbers are written in decimal or hexadecimal notation, and if a hexadecimal notation is used, it starts with prefix 0x or 0X. Examples: 7609, 0x1db9.

By default, integer numbers are of short or int type and you can not write an integer whose value is outside of these types. To write an integer of long type, you need to end the integer token with suffix L or l. Examples: 0xbf207819000fL, 1000000000000L. The small Latin letter «l» is not recommended, because it is easily confused with a «one» digit.

Real numbers are written only in decimal notation. To separate the integer and fractional part of the number use a period character. Examples:

76.09
100.
.5

A scientific notation is also possible. To separate the mantissa and order use the Latin letter E or e. For example, to write the number 8.85419·10-12 you should write 8.85419e-12.

By default, real numbers are of real type and you can not assign them to expression of float and double type. For the number to become as float, it must be supplemented with suffix F or f, and the suffixes D and d are used for double numbers. The suffixes R and r are used for real numbers and automatically applied if there are no other suffixes. Examples:

76.09f
100d
0.5r

Character literals are always written in single quotes. For example: 'u'. It is also possible to write so-called escape sequences. Some characters can be written using escape sequences only. The escape sequence starts with a backslash character «\», followed by one of this characters:

0 (means the character with code 0x0000)
b (means the character with code 0x0008)
t (means the character with code 0x0009)
n (means the character with code 0x000a)
f (means the character with code 0x000c)
r (means the character with code 0x000d)
\ (means the character with code 0x005c)
" (means the character with code 0x0022)
' (means the character with code 0x0027)
uXXXX (means the character with code 0xXXXX)

Instead of characters XXXX you should write the hexadecimal character code in four hexadecimal digits form. For example: '\u00c4' means the same as 'Ä'.

The character literals are of char type, but they can be assigned to expressions of other types than byte and (in case outing of bounds) short.

String literals are written in double quotes. They use the same rules for writing characters as for character literals, but you can use an escape sequence to write long strings. Here are examples of writing string literals:

"" (empty string)
"string"
"ABC"
"\u0041\u0042\u0043" (same as "ABC")
"long\u0020\
string" (same as "long string")

The string literals are of char[] type. There is no concatenation operation in AVT, so long strings can be written only as in last example.

The following symbol characters are used to write various blocks and operators of programming language:

.,:;?{}[]()+-*/%~!&|^<=>@#

Comments starts with sequence /* and ends with */.

keywords

The following identifiers are reserved and can be used for their intended purpose only:

assembler
boolean
break
byte
case
catch
char
const
continue
default
dispose
do
double
else
exception
false
finalization
finally
float
for
fvector
if
import
initialization
int
interrupt
long
namespace
new
null
public
pureassembler
real
return
short
struct
switch
throw
true
try
ultra
ultra32
ultra64
void
while
with
xvector
yvector
zvector

Data types

All data types of AVT are divided into three categories: scalar, vector and reference. Among vector data types there are compound, those that consist a fixed number of values of another data type. At the moment, the compound types are: ultra and xvector. This summary table shows properties of all types.

Name or writing method Size, in bytes Description Default value
Scalar
boolean 1 A variable stored only one of two values: false or true.
Available: everywhere
false
char 2 An integer between 0 and 65535.
Available: everywhere
0
byte 1 An integer between -128 and 127.
Available: everywhere
0
short 2 An integer between -32768 and 32767.
Available: everywhere
0
int 4 An integer between -2147483648 and 2147483647.
Available: 32-bit, 64-bit
0
long 8 An integer between (-263) and (263-1).
Available: 64-bit only
0L
float 4 A real number of single precision.
Available: everywhere
0F
double 8 A real number of double precision.
Available: everywhere
0D
real 10 A real number of extended precision.
Available: everywhere
0R
Vector
long 8 4 short values.
Available: 64-bit only
0L
ultra 16 4 int values or 8 short values.
Available: 64-bit only
new ultra { 0, 0, 0, 0 }
xvector 16 4 float values.
Available: 64-bit only
new xvector { 0F, 0F, 0F, 0F }
Reference (also known as Pointer)
<structure name> = A reference to structure in memory.
Available: everywhere
null
<type>[] = A reference to array descriptor.
Available: everywhere
null
<type>(<type>, <type>, …<type>) = A reference to a function.
Available: everywhere
null

The «=» symbol means that the size equals code length. In other words, in 16-bit programmes the size is 2 bytes, in 32-bit programmes the size is 4 bytes, and in 64-bit programmes the size is 8 bytes.

Arrays in AVT consist a descriptor and content. The descriptor stores the array length and relative offset of content in memory. The length field is always called length and it is of int type (or short type in 16-bit programmes).

structure of programme

First of all, the AVT programme consists a set of source code files that you put into the list in compilation order. Each source file consists one or more namespaces. Namespaces in source code are arranged in compilation order. In each programme there must be a namespace with System identifier which is compiled first regardless of its position in the list.

Each source file can begin with a list of imported namespaces. The System namespace is imported automatically: a second attempt to import it causes a compilation error.

Then follow one or more namespaces. The namespaces can contain the following elements:

  • constants;
  • structures;
  • global variables;
  • exceptions (64-bit programmes only);
  • functions.

Here is structure of source in AVT:

import <namespace 1>;
import <namespace 2>;
…
import <namespace N>;

namespace <identifier>
{
    <constants>
    <structures>
    <global variables>
    <exceptions>
    <functions>
}

Declaring each namespace element can begin with public keyword which will make it available for use in other namespaces. If any namespace element doesn’t have the public keyword, then this element will be available only inside the namespace in which it is declared.

The order of elements in namespace has no special meaning, but if any constants use the values of another constants, then the latter should be placed before the first ones, and parent structures and exceptions should be placed before those that are inherited from them.

constants

Constants are the elements of a namespace with which a constant value is associated and computed at the compilation stage. This constant value can have any scalar or vector data type. Constant identifiers are recommended to be written in all capital letters. Here are examples of constants (in comments the calculated values are placed):

public const int MIN_RADIX = 2;
public const int MAX_RADIX = 36;
public const int MIN_BYTE = 0xffffff80; /* = -128 */
public const int MAX_BYTE = 0x0000007f; /* = 127 */
public const int MIN_SHORT = 0xffff8000; /* = -65536 */
public const int MAX_SHORT = 0x00007fff; /* = 65535 */
public const int MIN_INT = 0x80000000; /* -2147483648 */
public const int MAX_INT = 0x7fffffff; /* 2147483647 */
public const long MIN_LONG = 1L << 63;
public const long MAX_LONG = MIN_LONG - 1L;
const ultra LONG_QMULHS_CONST_1 = new ultra { 1, 1, 1, 1 };
const ultra ULTRA_QMULL_CONST_1 = new ultra { -1, 0, -1, 0 };
const ultra ULTRA_QMULL_CONST_2 = ----ULTRA_QMULL_CONST_1; /* new ultra { 1, 0, 1, 0 } */
const ultra ULTRA_QMULH_CONST_1 = new ultra { 0, -1, 0, -1 };
const ultra ULTRA_QMULH_CONST_2 = LONG_QMULHS_CONST_1 ++++ ULTRA_QMULL_CONST_2 ---- new ultra { 0, 0, -2, 2 }; /* new ultra { 2, 1, 4, -1 } */
const ultra ULTRA_OMULHS_CONST_1 = new ultra { ULTRA_QMULH_CONST_1[1], LONG_QMULHS_CONST_1[0], MIN_BYTE >> 4, -ULTRA_QMULH_CONST_2[3] }; /* new ultra { -1, 1, -8, 1 } */
const float MAX_INT_AS_FLOAT = 2.147483648e+9f;
const double MAX_INT_AS_DOUBLE = -(double) MIN_INT; /* 2.147483648e+9d */
const xvector MAX_INT_AS_XVECTOR = new xvector { MAX_INT_AS_FLOAT, 
MAX_INT_AS_FLOAT, MAX_INT_AS_FLOAT, MAX_INT_AS_FLOAT }; /* new xvector { 2.147483648e+9f, 2.147483648e+9f, 2.147483648e+9f, 2.147483648e+9f } */

[Next]