A Guide to the S-Lang Language (v2.3.0): Data Types and Literal Constants

4. Data Types and Literal Constants

The current implementation of the S-Lang language permits up to 65535 distinct data types, including predefined data types such as integer and floating point, as well as specialized application-specific data types. It is also possible to create new data types in the language using the typedef mechanism.

Literal constants are objects such as the integer 3 or the string "hello". The actual data type given to a literal constant depends upon the syntax of the constant. The following sections describe the syntax of literals of specific data types.

4.1 Predefined Data Types

The current version of S-Lang defines integer, floating point, complex, and string types. It also defines special purpose data types such as Null_Type, DataType_Type, and Ref_Type. These types are discussed below.

Integers

The S-Lang language supports both signed and unsigned characters, short integer, long integer, and long long integer types. On most 32 bit systems, there is no difference between an integer and a long integer; however, they may differ on 16 and 64 bit systems. Generally speaking, on a 16 bit system, plain integers are 16 bit quantities with a range of -32767 to 32767. On a 32 bit system, plain integers range from -2147483648 to 2147483647.

An plain integer literal can be specified in one of several ways:

As a decimal (base 10) integer consisting of the characters 0 through 9, e.g., 127. An integer specified this way cannot begin with a leading 0. That is, 0127 is not the same as 127.
Using hexadecimal (base 16) notation consisting of the characters 0 to 9 and A through F. The hexadecimal number must be preceded by the characters 0x. For example, 0x7F specifies an integer using hexadecimal notation and has the same value as decimal 127.
In Octal notation using characters 0 through 7. The Octal number must begin with a leading 0. For example, 0177 and 127 represent the same integer.
In Binary notation using characters 0 and 1 with the 0b prefix. For example, 21 may be expressed in binary using 0b10101.

Short, long, long long, and unsigned types may be specified by using the proper suffixes: L indicates that the integer is a long integer, LL indicates a long long integer, h indicates that the integer is a short integer, and U indicates that it is unsigned. For example, 1UL specifies an unsigned long integer.

Finally, a character literal may be specified using a notation containing a character enclosed in single quotes as 'a'. The value of the character specified this way will lie in the range 0 to 256 and will be determined by the ASCII value of the character in quotes. For example,


              i = '0';

assigns to i the character 48 since the '0' character has an ASCII value of 48.

A ``wide'' character (unicode) may be specified using the form '\x{y...y}' where y...y are hexadecimal digits. For example,


     '\x{12F}'         % Latin Small Letter I With Ogonek;
     '\x{1D7BC}'       % Mathematical Sans-Serif Bold Italic Small Sigma

Any integer may be preceded by a minus sign to indicate that it is a negative integer.

Floating Point Numbers

Single and double precision floating point literals must contain either a decimal point or an exponent (or both). Here are examples of specifying the same double precision point number:


         12.    12.0    12e0   1.2e1   120e-1   .12e2   0.12e2

Note that 12 is not a floating point number since it contains neither a decimal point nor an exponent. In fact, 12 is an integer.

One may append the f character to the end of the number to indicate that the number is a single precision literal. The following are all single precision values:


         12.f    12.0f    12e0f   1.2e1f   120e-1f   .12e2f   0.12e2f

Complex Numbers

The language implements complex numbers as a pair of double precision floating point numbers. The first number in the pair forms the real part, while the second number forms the imaginary part. That is, a complex number may be regarded as the sum of a real number and an imaginary number.

Strictly speaking, the current implementation of the S-Lang does not support generic complex literals. However, it does support imaginary literals permitting a more generic complex number with a non-zero real part to be constructed from the imaginary literal via addition of a real number.

An imaginary literal is specified in the same way as a floating point literal except that i or j is appended. For example,


         12i    12.0i   12e0j

all represent the same imaginary number.

A more generic complex number may be constructed from an imaginary literal via addition, e.g.,


        3.0 + 4.0i

produces a complex number whose real part is 3.0 and whose imaginary part is 4.0.

The intrinsic functions Real and Imag may be used to retrieve the real and imaginary parts of a complex number, respectively.

Strings

A string literal must be enclosed in double quotes as in:


      "This is a string".

As described below, the string literal may contain a suffix that specifies how the string is to be interpreted, e.g., a string literal such as


      "$HOME/.jedrc"$

with the '$' suffix will be subject to variable name expansion.

Although there is no imposed limit on the length of a string, single-line string literals must be less than 256 characters in length. It is possible to construct strings longer than this by string concatenation, e.g.,


      "This is the first part of a long string"
       + " and this is the second part"

S-Lang version 2.2 introduced support for multi-line string literals. There are basic variants supported. The first makes use of the backslash at the end of a line to indicate that the string is continued onto the next line:


      "This is a \
      multi-line string. \
      Note the presence of the \
      backslash character at the end \
      of each of the lines."

The second form of multiline string is delimited by the backquote character (`) and does not require backslashes:


       `This form does not
       require backslash characters.
       In fact, here the backslash
       character \ has no special
       meaning (unless given the ``Q' suffix`

Note that if a backquote is to appear in such a string, then it must be doubled, as illustrated in the above example.

Any character except a newline (ASCII 10) or the null character (ASCII 0) may appear explicitly in a string literal. However, these characters may embedded implicitly using the mechanism described below.

The backslash character is a special character and is used to include other special characters (such as a newline character) in the string. The special characters recognized are:


       \"        --  double quote
       \'        --  single quote
       \\        --  backslash
       \a        --  bell character (ASCII 7)
       \t        --  tab character (ASCII 9)
       \n        --  newline character (ASCII 10)
       \e        --  escape character (ASCII 27)
       \xhh      --  byte expressed in HEXADECIMAL notation
       \ooo      --  byte expressed in OCTAL notation
       \dnnn     --  byte expressed in DECIMAL
       \u{h..h}  --  the Unicode character U+h..h
       \x{h..h}  --  the Unicode character U+h..h  [modal]

In the above table, h represents one of the HEXADECIMAL characters from the set [0-9A-Fa-f]. It is important to understand the distinction between the \x{h..h} and \u{h..h} forms. When used in a string, the \u form always expands to the corresponding UTF-8 sequence regardless of the UTF-8 mode. In contrast, when in non-UTF-8 mode, the \x form expands to a byte when given two hex characters, or to the corresponding UTF-8 sequence when used with three or more hex characters.

For example, to include the double quote character as part of the string, it must be preceded by a backslash character, e.g.,


       "This is a \"quote\"."

Similarly, the next example illustrates how a newline character may be included:


       "This is the first line\nand this is the second."

Alternatively, slang-2.2 or newer permits


       `This is a "quote".`
       `This is the first line
       and this is the second.`

Suffixes

A string literal may be contain a suffix that specifies how the string is to be interpreted. The suffix may consist of one or more of the following characters:

R: Backslash substitution will not be performed on the string. This is the default when using back-quoted strings.
Q: Backslash substitution will be performed on the string. This is the default when using strings using the double-quote character.
B: If this suffix is present, the string will be interpreted as a binary string (BString_Type).
$: Variable name substitution will be performed on the string.

Not all combinations of the above controls characters are supported, nor make sense. For example, a string with the suffix QR will cause a parse-error because Q and R have opposing meanings.

The Q and R suffixes

These suffixes turn on and off backslash expansion. Unless the R suffix is present, all double-quoted string literals will have backslash substitution performed. By default, backslash expansion is turned off for backquoted strings.

Sometimes it is desirable to turn off backslash expansion for double-quoted strings. For example, pathnames on an MSDOS or Windows system use the backslash character as a path separator. The R prefix turns off backslash expansion, and as a result the following statements are equivalent:


      file = "C:\\windows\\apps\\slrn.rc";
      file = "C:\\windows\\apps\\slrn.rc"Q;
      file = "C:\windows\apps\slrn.rc"R;
      file = `C:\windows\apps\slrn.rc`;        % slang-2.2 and above

The only exception is that a backslash character is not permitted as the last character of a string with the R suffix. That is,


     string = "This is illegal\"R;

is not permitted. Without this exception, a string such as


     string = "Some characters: \"R, S, T\"";

would not be parsed properly.

The $ suffix

If the string contains the $ suffix, then variable name expansion will be performed upon names prefixed by a $ character occurring within the string, e.g.,


     "The value of X is $X and the value of Y is $Y"$.

with variable name substitution to be performed on the names X and Y. Such strings may be used as a convenient alternative to the sprintf function.

Name expansion is carried out according to the following rules: If the string literal occurs in a function, and the name corresponds to a variable local to the function, then the string representation of the value of that variable will be substituted. Otherwise, if the name corresponds to a variable that is local to the compilation unit (i.e., is declared as static or private), then its value's string representation will be used. Otherwise, if the name corresponds to a variable that exists as a global (public) then its value's string representation will be substituted. If the above searches fail and the name exists in the environment, then the value of the corresponding environment variable will be used. Otherwise, the variable will expand to the empty string.

Consider the following example:


     private variable bar = "two";
     putenv ("MYHOME=/home/baz");
     define funct (foo)
     {
       variable bar = 1;
       message ("file: $MYHOME/foo: garage=$MYGARAGE,bar=$bar"$);
     }

When executed, this will produce the message:


     file: /home/baz/foo: garage=,bar=1

assuming that MYGARAGE is not defined anywhere.

A name may be enclosed in braces. For example,


      "${MYHOME}/foo: bar=${bar}"$

This is useful in cases when the name is followed immediately by other characters that may be interpreted as part of the name, e.g.,


      variable HELLO="Hello ";
      message ("${HELLO}World"$);

will produce the message "Hello World".

Null_Type

Objects of type Null_Type can have only one value: NULL. About the only thing that you can do with this data type is to assign it to variables and test for equality with other objects. Nevertheless, Null_Type is an important and extremely useful data type. Its main use stems from the fact that since it can be compared for equality with any other data type, it is ideal to represent the value of an object which does not yet have a value, or has an illegal value.

As a trivial example of its use, consider


      define add_numbers (a, b)
      {
         if (a == NULL) a = 0;
         if (b == NULL) b = 0;
         return a + b;
      }
      variable c = add_numbers (1, 2);
      variable d = add_numbers (1, NULL);
      variable e = add_numbers (1,);
      variable f = add_numbers (,);

It should be clear that after these statements have been executed, c will have a value of 3. It should also be clear that d will have a value of 1 because NULL has been passed as the second parameter. One feature of the language is that if a parameter has been omitted from a function call, the variable associated with that parameter will be set to NULL. Hence, e and f will be set to 1 and 0, respectively.

The Null_Type data type also plays an important role in the context of structures.

Ref_Type

Objects of Ref_Type are created using the unary reference operator &. Such objects may be dereferenced using the dereference operator @. For example,


      sin_ref = &sin;
      y = (@sin_ref) (1.0);

creates a reference to the sin function and assigns it to sin_ref. The second statement uses the dereference operator to call the function that sin_ref references.

The Ref_Type is useful for passing functions as arguments to other functions, or for returning information from a function via its parameter list. The dereference operator may also used to create an instance of a structure. For these reasons, further discussion of this important type can be found in the section on Referencing Variables.

Array_Type, Assoc_Type, List_Type, and Struct_Type

Variables of type Array_Type, Assoc_Type, List_Type, and Struct_Type are known as container objects. They are more complicated than the simple data types discussed so far and each obeys a special syntax. For these reasons they are discussed in a separate chapters.

DataType_Type Type

S-Lang defines a type called DataType_Type. Objects of this type have values that are type names. For example, an integer is an object of type Integer_Type. The literals of DataType_Type include:


     Char_Type            (signed character)
     UChar_Type           (unsigned character)
     Short_Type           (short integer)
     UShort_Type          (unsigned short integer)
     Integer_Type         (plain integer)
     UInteger_Type        (plain unsigned integer)
     Long_Type            (long integer)
     ULong_Type           (unsigned long integer)
     LLong_Type           (long long integer)
     ULLong_Type          (unsigned long long integer)
     Float_Type           (single precision real)
     Double_Type          (double precision real)
     Complex_Type         (complex numbers)
     String_Type          (strings, C strings)
     BString_Type         (binary strings)
     Struct_Type          (structures)
     Ref_Type             (references)
     Null_Type            (NULL)
     Array_Type           (arrays)
     Assoc_Type           (associative arrays/hashes)
     List_Type            (lists)
     DataType_Type        (data types)

as well as the names of any other types that an application defines.

The built-in function typeof returns the data type of its argument, i.e., a DataType_Type. For instance typeof(7) returns Integer_Type and typeof(Integer_Type) returns DataType_Type. One can use this function as in the following example:


     if (Integer_Type == typeof (x)) message ("x is an integer");

The literals of DataType_Type have other uses as well. One of the most common uses of these literals is to create arrays, e.g.,


     x = Complex_Type [100];

creates an array of 100 complex numbers and assigns it to x.

Boolean Type

Strictly speaking, S-Lang has no separate boolean type; rather it represents boolean values as Char_Type objects. In particular, boolean FALSE is equivalent to Char_Type 0, and TRUE as any non-zero Char_Type value. Since the exact value of TRUE is unspecified, it is unnecessary and even pointless to define TRUE and FALSE literals in S-Lang.

4.2 Typecasting: Converting from one Type to Another

Occasionally, it is necessary to convert from one data type to another. For example, if you need to print an object as a string, it may be necessary to convert it to a String_Type. The typecast function may be used to perform such conversions. For example, consider


      variable x = 10, y;
      y = typecast (x, Double_Type);

After execution of these statements, x will have the integer value 10 and y will have the double precision floating point value 10.0. If the object to be converted is an array, the typecast function will act upon all elements of the array. For example,


      x = [1:10];       % Array of integers
      y = typecast (x, Double_Type);

will create an array of 10 double precision values and assign it to y. One should also realize that it is not always possible to perform a typecast. For example, any attempt to convert an Integer_Type to a Null_Type will result in a run-time error. Typecasting works only when datatypes are similar.

Often the interpreter will perform implicit type conversions as necessary to complete calculations. For example, when multiplying an Integer_Type with a Double_Type, it will convert the Integer_Type to a Double_Type for the purpose of the calculation. Thus, the example involving the conversion of an array of integers to an array of doubles could have been performed by multiplication by 1.0, i.e.,


      x = [1:10];       % Array of integers
      y = 1.0 * x;

The string intrinsic function should be used whenever a string representation is needed. Using the typecast function for this purpose will usually fail unless the object to be converted is similar to a string--- most are not. Moreover, when typecasting an array to String_Type, the typecast function acts on each element of the array to produce another array, whereas the string function will produce a string.

One use of string function is to print the value of an object. This use is illustrated in the following simple example:


      define print_object (x)
      {
         message (string (x));
      }

Here, the message function has been used because it writes a string to the display. If the string function was not used and the message function was passed an integer, a type-mismatch error would have resulted.

Next Previous Contents