"Let's model me a mine"
M.E.G.M.S.
Mining & Exploration Geological Modelling Services
aniso_varying_string is a module that is [almost] an implementation of the specification in ISO/IEC 1539-2:2000 (Varying length character strings).
(Almost, because the name of the module is different. But that is trivial to remedy.)
The source is based on the Fortran 95 + TR module by Rich Townsend, but it has been modified to take advantage of additional Fortran 2003 facilities. It uses a deferred length allocatable character scalar component as the underlying store, rather than an allocatable rank one character array.
Further, as an extension to ISO_VARYING_STRING:
The latest source release (revision 2416 from 2016-09-02) can be found at www.megms.com.au/download/aniso_varying_string.f90. The code is written in Fortran 2003. Intel Visual Fortran 16.0 was the compiler used for development, with that compiler the /standard-semantics switch (or its equivalent on non-Windows platforms) is required.
A small test program that addresses the input/output aspects of the module has been written in Fortran 2008 and can be found at www.megms.com.au/download/aniso_varying_string-tests.f90.
Intel Fortran 16.0.3 has issues with its implementation of user defined derived type input/output, that compromise much of the defined input capabilities of this module. Many of those issues will reportedly be fixed in 17.0.1.
gfortran 7.0.0 experimental at the time of writing (r239953) can also be used to compile the example, but also experiences issues at runtime, related to use of deferred length character arguments.
Please send feedback from use of other Fortran 2003 and 2008 compilers to ff08@megms.com.au!
Inside an object of type varying_string, the character data is stored in a a deferred length allocatable scalar component named CHARS. This component is publicly accessible, and hence can be directly used for substring operations or for association as an actual argument with a deferred length allocatable scalar CHARACTER dummy.
TYPE(VARYING_STRING) :: vs vs = 'abcdef' ! Using defined assignment PRINT "(A)", vs%CHARS(1:3) ! Prints `abc`
From the perspective of the containing varying_string object, if the CHARS component is not allocated then the object is not defined and must not be used in a context where its value is required.
The KIND parameter of this component is specified by the ck named constant in the module. This named constant is not PUBLIC.
Because the component is publicly accessible, the structure constructor for a varying_string object is available, which makes the VAR_STR generic rather pointless.
TYPE(VARYING_STRING), ALLOCATABLE :: array_of_strings(:) array_of_strings = [ & VARYING_STRING('one'), & VARYING_STRING('two'), & VARYING_STRING('three'), & VARYING_STRING('four has trailing spaces ') ] SUBROUTINE procedure_with_intent_in_varying_string(vs) TYPE(varying_string), INTENT(IN) :: vs ... END SUBROUTINE procedure_with_intent_in_varying_string ... CALL procedure_with_intent_in_varying_string(VARYING_STRING('Hello'))
The GET generic subroutine is extended by additional specific procedures that have the STRING dummy argument as a deferred length allocatable CHARACTER. The variants of these specifics that include the optional SEPARATOR dummy argument also have that argument as deferred length allocatable CHARACTER (i.e., if SEPARATOR is present, it always has the same characteristics as for STRING).
The SPLIT generic subroutine is extended by an additional specific procedure that takes both the STRING, WORD and SEPARATOR dummy arguments as deferred length allocatable CHARACTER (i.e., if one output argument is deferred length allocatable CHARACTER, then all output arguments must be deferred length allocatable CHARACTER).
In all the cases above, the KIND of the deferred length CHARACTER dummy argument is the same as the KIND of the underlying store of character data used by the varying_string type - the value of which is in the ck named constant in the module.
Defined input/output is currently supported through bindings on the varying_string type, due to compiler issues. These may be changed to stand-alone interfaces at some future time.
Behaviour for defined input, in some instances, depends on the current setting of the changeable connection mode for the decimal separator. The settng of this mode cannot be determined by the procedure implementing defined input for the varying_string type for input from an internal unit (a READ statement reading from a character variable), in which case the mode is always take to be the default of 'POINT'.
For list directed input (a format specification of *), the same rules as for list directed input for intrinsic CHARACTER data apply.
For namelist input, the same rules as for namelist input of intrinsic CHARACTER data apply. This includes the requirement that the string be delimited with either apostrophes or double quotes.
For input under an explicit format, with no character literal value accompanying the DT descriptor in the format specification, defined input looks for the next non-blank character within the current record only (contrast with list directed and namelist IO, where that search will advance records if necessary). If that first non-blank is an apostrophe or double quote, then further characters are processed as delimited input, otherwise that first non-blank and perhaps further characters are processed as undelimited input. If no non-blank character is found in the current record then an end-of-record condition occurs, the resulting value is a zero length string and the file position is at the end of the record.
Examples for explicit formatting with no character literal in the format specification, considering the child input statement only, assuming the initial position is at the start of the record and that DECIMAL='POINT':
Record contents (delimited by backticks) | Value read | File position after read | Error or other conditions |
---|---|---|---|
` ` | '' | End of record | End-of-record condition |
` abcdef ` | 'abcdef' | After f | Nil |
` abc xyz ` | 'abc' | After c | Nil |
` "abc xyz" ` | 'abc xyz' | After closing double quote | Nil |
`abcdef` | 'abcdef' | End of record. | Nil |
`"abc xyz"` | 'abc xyz' | End of record. | Nil |
`abc,xyz` | 'abc' | After c | Nil |
` ,xyz` | '' | Before , | Nil |
The behaviour of the explicit format DT descriptor may be altered by modifiers in the optional string literal that may follow the DT descriptor in the format specification. Modifier keywords are not case sensitive, modifiers must be separated by a single comma or semicolon, blanks may be used freely outside of modifier keywords, integer literals and character literals, no modifier may appear more than once.
Modifier | Description |
---|---|
SKIPBLANK | Leading blank characters before the first non-blank character are skipped before determining whether the input is delimited or not. This modifier is assumed in combination if any other modifier, apart from FIXED or NOSKIPBLANK, is present. NOSKIPBLANK must not be provided. If the end of record is encountered before any non-blank character, then an end of record condition results. If the NODELIMITED modifier is not provided and if the initial character is a quote or apostrophe, the input is treated as delimited, as discussed above, otherwise the input is treated as undelimited, with the conditions and characters that terminate input determined by the other modifiers. |
NOSKIPBLANK | Leading blank characters before the first non-blank character are not skipped before determining whether the input is a delimited or not. If the first character read is a blank then the input is considered undelimited, in which case the leading blanks appear in the resulting value. SKIPBLANK must not be provided. |
EOR | If the input is not delimited, input will be terminated by the end of record. This modifier is assumed in combination if any other modifier, apart from FIXED, is present. |
BLANK | If the input is not delimited, input will be terminated by the next blank encountered. |
SLASH | If the input is not delimited, input will be terminated by the next / encountered. |
NODELIMITED | The input is always considered undelimited - any leading quote or apostrophe characters in the input are considered part of the value. |
COMMA | If the input is not delimited, input will be terminated by the next , encountered. |
SEMICOLON | If the input is not delimited, input will be terminated by the next ; encountered. |
NONDECIMAL | If the input is not delimited, input will be terminated by whatever is the appropriate separator character given the current DECIMAL mode, i.e. if the DECIMAL mode is 'POINT', this is equivalent to the COMMA modifier, otherwise if if the DECIMAL mode is 'COMMA', this is equivalent to the SEMICOLON modifier. |
DELIM(str) | str is a character literal, in the usual form of such a literal embedded in a format specification that is itself a character literal. If the input is not delimited, input will be terminated by the end of record or by the appearance of any character from the set nominated by str. |
FIXED(n) | n is an unsigned integer literal without a kind specifier. No other modifiers may be provided. n characters will attempt to be read. If there are less than n characters in the record, an end-of-record condition occurs and the resulting value is comprised of the characters that remained in the record, otherwise the length of the resulting string will be n. The input is never considered delimited. |
The optional v_list sequence of integers that may follow the DT edit descriptor is not used, and must not be present.
An edit descriptor of DT without any following literal is equivalent to a literal of 'BLANK,SLASH,NONDECIMAL', with SKIPBLANK and EOR modifiers implicit.
Some examples, considering the formatted input from the perspective of a child input statement and assuming that the file position before input is the start of the record:
Format specification | Record contents (content delimited by backticks) | Value read | File position after read | Error or other conditions |
---|---|---|---|---|
DT'EOR' | ` ` | ' ' | End of record | Nil |
DT'EOR' | `` | '' | End of record | End-of-record condition |
DT'BLANK' | ` ` | '' | Before the first blank (unchanged) | Nil |
DT'FIXED(3)' | `abcdef` | 'abc' | After c | Nil |
DT'FIXED(6)' | `abcdef` | 'abcdef' | After f (end of record) | Nil |
DT'FIXED(9)' | `abcdef` | 'abcdef' | End of record | End-of-record condition |
For output, the varying string object must have been previously defined.
For list directed output, if the current delimiter changeable connection mode (e.g. as given by a DELIM= specifier in a WRITE statement) is 'APOSTROPHE' or 'QUOTE', the appropriately delimited string is written. Otherwise an undelimited string is written, which may not be suitable for reading back in using list directed input.
For namelist output, a delimited string is always written, using the value of the delimiter changeable connection mode as appropriate, or using a double quote character as the delimiter if the current delimiter mode is 'NONE'.
For output under an explicit DT format, an undelimited string is always written. The optional character literal and v_list integer lists that may follow the DT edit descriptor must not be present.
Formatted defined output does not handle the situation where the size of the record remaining is insufficient to hole the formatted representation of the value of the varying_string.
Unformatted defined input and output are supported. For output, the varying string object must have been previously defined.
For this implementation the representation consists of a default integer with the number of characters in the varying string, followed by the character data for the varying_string. However, the unformatted representation should be regarded as implementation detail that is not portable between implementations.
The IOLENGTH generic interface has been added for a function that has a single argument named STRING that is a scalar varying_string with an INTENT of IN. The function result is the number of file storage units required for unformatted output of the varying_string value.
The appropriate values of IOSTAT_END and IOSTAT_EOR from the ISO_FORTRAN_ENV intrinsic module are provided for the IOSTAT dummy argument for end-of-file and end-of-record conditions respectively.
A unit connected as formatted stream may be provided for the GET, PUT, and PUT_LINE subroutines. If it encounteres an incomplete final record, the GET subroutine will define the variable associated with the STRING dummy argument to be a zero length string and define the variable associated with the IOSTAT argument to be the value of IOSTAT_END from the ISO_FORTRAN_ENV intrinsic module. If the file is not connected for stream access, the variable associated with the STRING dummy argument will always be defined as a zero length string on end of file.
The CHAR function is implemented with a single specific procedure that has an optional second argument, rather that the previous arragement of two specific procedures, one with one argument, one with two. The optional argumnent implementation is consistent with the ISO_VARYING_STRING specification, though that specification may have been unintentional.
The procedures for input/output of a varying_string, namely GET, PUT and PUT_LINE all take a further optional argument named IOMSG after the pre-existing IOSTAT argument. This argument is a default CHARACTER scalar with assumed length and an INTENT of INOUT. It is defined if IOSTAT is non-zero with an explanatory message for the error condition, end-of-file or end-of-record condition, otherwise its definition status is not changed.
Questions, queries and quibbles can be sent to ff08@megms.com.au.