Analyzing Stock Price Time Series with Fortran Arrays, Part 2
From Modern Fortran by Milan Curcic
In this part of the article, we’ll dive deeper into explicit allocation and slicing of Fortran arrays, and demonstrate how to use these techniques to find best and worst performing stock.
Part 2: Allocating, indexing, and slicing arrays for stock price analysis
This article is Part 2 of a sequence on analyzing stock price time series with Fortran arrays.
In Part 1, I covered the basics of declaring and initializing dynamic Fortran arrays. I also introduced the array constructors, which provide an easy way to allocate and initialize an array in a single line of code. If you haven’t yet read Part 1, I suggest that you go back and read it before moving on. In Part 2, we’ll learn how allocation and deallocation of Fortran arrays works in depth, allowing you to allocate arrays with arbitrary ranges (start and end indices). You’ll also learn how to slice the arrays in any direction and with custom stride to subset exactly the elements that you need. Finally, in this part we’ll apply this knowledge to implementing a solution to our first challenge — finding the best and worst performing stocks.
Allocating arrays of certain size or range
In Part 1, we’ve learned how to declare and initialize dynamic arrays. However, what if we need to assign values to individual array elements, one by one, in a loop? This will be the case as we load the data from CSV files into arrays — we’ll iterate over records in files, and assign values to array elements one at a time. However, we don’t really have a way to initialize the data arrays like we did before with
stock_symbols. Note that implicitly allocating by assigning an empty array
[integer ::] or
[real ::] won’t work here because we may need to index elements of an array in some other order than just appending values. This calls for a more explicit mechanism to allocate the array without assigning known values to it:
real, allocatable :: a(:) ! declare a dynamic array a
integer :: im = 5 ! allocate(a(im)) !
allocate memory for array a with im elements
The above statement tells the program to reserve memory for the array
a of size
im, in this case 5. When invoked like this,
a will by default have the lower bound of
1, and upper bound of
im. The lower bound of
However, Fortran doesn’t impose a constaint to the start index being 1, unlike Python, where the first index is always 0. You can specify the lower and upper bounds in the allocation statement:
integer :: is = -5, ie = 10
allocate(a(is:ie)) ! Allocate a with range from is to ie inclusive
Notice that I know used a colon (
ie to specify the range. This range is inclusive (unlike in Python!), so the size of
a is now
ie - is + 1, in this case 16. You can use intrinsic functions
ubound to get the lower and upper bound, respectively, of any array.
Allocating an array from another array
It’s also possible to dynamically allocate an array based on the size of another array. The
allocate statement accepts two optional arguments:
mold– a variable or an expression that has the same type as the object being allocated.
source– equivalent to
mold, except that the values of
sourceare used to initialize the object being allocated.
For example, allocating using
b using mold will reserve the space in memory for
a, but will not initialize its elements:
real, allocatable :: a(:), b(:)
allocate(a, mold=b) ! allocate a with same range and size as b
a = 0
However, if we allocate
source, it will be allocated and initialized with values of
real, allocatable :: a(:), b(:)
b = [1.0, 2.0, 3.0]
allocate(a, source=b) ! allocate and initialize a from b
Style Tip: No matter how you choose to allocate your arrays, always initialize them immediately after allocation. This will minimize the chance of accidentally using uninitialized arrays in expressions. While Fortran will allow you to do this, you’ll likely end up with gibberish results.
You may have noticed that when describing the array constructors earlier, I initialized arrays without explicitly allocating them with an
allocate statement. Why did I do that? You may rightfully ask, do I need to explicitly allocate arrays or not? Since Fortran 2003, there has been a convenient feature of the language called allocation on assignment.
Automatic allocation on assignment: If you assign an array to an allocatable array variable, the target array variable is automatically allocated with the correct size to match the array on the right-hand side. The array variable can be already allocated or not. If it is, it will be re-allocated if its current size differs from that of the source array. A great example to play with this is appending elements to an array on the fly:
integer, allocatable :: a(:) a = [integer ::] ! create an empty array 
a = [a, 1] ! append 1 to a, now 
a = [a, 2] ! append 2 to a, now [1, 2]
a = [a, 2 * a] ! [1, 2, 2, 4]
This feature is particularly useful when trying to assign an array that is a result of a function, and whose size is not known ahead of time.
There is another important difference between explicit allocation with the
allocate statement and allocation on assignment. The former will trigger a runtime error if issued twice, that is, if you issue an
allocate statement on an object that is already allocated. On the other side, the latter will gracefully re-allocate the array if already allocated. To be able to effectively reuse dynamic arrays, Fortran gives as a counterpart to the
allocate statement which allows us to explicitly clear the object from memory.
Cleaning up after use
When we are done working with the array, we can clean it from memory like this:
deallocate(a) ! clear a from memory
Automatic deallocation: An allocatable array is automatically deallocated when it goes out of scope. For example, if you declare and allocate an array inside of a function or subroutine, it will be deallocated on return.
a must be allocated again before using it in the right-hand side of expressions. We’ll apply this mechanism to reuse arrays between different stocks.
Much like it is an error to allocate an object that is already allocated, it is also an error to deallocate an object that is not allocated! In the next section I explain how you can check the allocation status so that you never erroneously allocate or deallocate and object twice. Otherwise, there is no restriction with regard to whether the array has been initialized or not. You are free to deallocate an uninitialized array, for example if you learn that the arrays is not of the expected size, or similar.
Style Tip: Deallocate all allocatable variables when you are done working with them.
Figure 2 illustrates a typical life cycle of a dynamic array.
We first declare the array as
allocatable. At this point, the array is not yet allocated in memory and its size is unspecified. When ready, we issue the
allocate statement to reserve a chunk of memory to hold this array. This is also where we decide the size of the array or the start and end indices (in this example, 3 and 8). If not allocating from another source array, the values will be uninitialized. We thus need to initialize the array before doing anything else with it. Finally, once we’re done working with the array, we issue the
deallocate statement to release the memory that was holding the array. The status of the array is now back to unallocated, and is available for allocation. A dynamic array can be reused like this any number of times, even with different sizes or start and end indices. This is exactly what we’ll do in our stock price analysis app. For each stock, we’ll allocate the arrays, use them to load the data from files, work on them, and then deallocate them before passing them on to the next stock.
Checking for allocation status
It will at times be useful to know the allocation status of a variable, that is, whether it’s currently allocated or not. To do this, we can use the intrinsic
real, allocatable :: a(:)
print *, allocated(a) ! will print “F”
print *, allocated(a) ! will print “T”
print *, allocated(a) ! will print “F”
Trying to allocate an already allocated variable, or to deallocate a variable that is not allocated, will trigger a run-time error.
Style Tip: Always check for allocation status before explicitly allocating or deallocating a variable.
Catching allocation and deallocation errors
Your allocations and deallocations will occasionally fail. This can happen if you try to allocate more memory than available, if you try to allocate an object that is already allocated, or free an object that has been freed. When this happens, the program will abort. However, the
allocate statement also comes with built-in exception handling if you want finer control over what happens when the allocation fails:
allocate(u(im), stat=stat, errmsg=err)
errmsg are optional arguments:
integerthat indicates the status of the
statwill be zero if allocation was successful, otherwise, it will be a non-zero positive number.
characterstring that contains the error message if an error occurred (such as
statbeing non-zero), and is undefined otherwise.
By using the built-in exception handling, you get the opportunity to decide how should the program proceed if the allocation fails. For example, if there is not enough memory to allocate a large array, perhaps we can split the work into smaller chunks. Even if you want the program to stop on allocation failure, this approach lets you handle things gracefully and print a meaningful error message.
Style Tip: If you want control over what happens if (de)allocation fails, use
errmsg in your
deallocate statements to catch any errors that may come up. Of course, you’ll still need to tell the program what to do if an error occurs, for example, stop the program with a custom message, print a warning message and continue running, or try to recover is some other way.
I think we should use the built-in exception handling in our stock analysis app. However, we’re going to need this for several arrays. This seems suitable to implement once in a subroutine, and then reuse it as needed. Explicitly allocating and deallocating arrays can be quite tedious. This is especially true if you decide to make use of the built-in exception handling. If you’re working with many different arrays at a time, this can quickly build up to a lot of boilerplate code.
Let’s implement simple subroutines
free that allocate and deallocate, respectively, an input array, and also handle exceptions. Both subroutines should use the
errmsg arguments to catch and report any errors if they occur. Once implemented you should be able to allocate and free your arrays like this:
call alloc(a, 5)
! do work with a
Let’s start with the allocator subroutine
alloc. For the key functionality to work our subroutine needs to:
- Check if input array is already allocated, and if yes, deallocate it before proceeding.
- Allocate the array with input size
- If an exception occurs during allocation, abort the program and print the error message to screen.
Here’s the implementation:
subroutine alloc(a, n)
real, allocatable, intent(in out) :: a(:)
integer, intent(in) :: n
integer :: stat
character(len=100) :: errmsg
if (allocated(a)) call free(a)
allocate(a(n), stat=stat, errmsg=errmsg)
if (stat > 0) error stop errmsg
end subroutine alloc
Now, let’s take a look at the implementation of the
real, allocatable, intent(in out) :: a(:)
integer :: stat
character(len=100) :: errmsg
if (.not. allocated(a)) return
deallocate(a, stat=stat, errmsg=errmsg)
if (stat > 0) error stop errmsg
end subroutine free
The code is very similar to
alloc except that here, at the start of the executable section of the code, we check if
a is already allocated. If not, our job here is done and we can return immediately.
These subroutines are also part of the
stock-prices repository. You can find them in
stock_prices/src/mod_alloc.f90, and are used by the CSV reader in
stock_prices/src/mod_io.f90. We’ll use these convenience subroutines to greatly reduce the boilerplate in the
Implementing the CSV reader subroutine
Having covered the detailed mechanics of allocating and deallocating arrays including the built-in exception handling, we finally arrive to implementing the CSV file reader subroutine:
subroutine read_stock(filename, time, open, high,&
low, close, adjclose, volume)
integer :: fileunit
integer :: n, nm
nm = num_records(filename) - 1
if (allocated(time)) deallocate(time)
allocate(character(len=10) :: time(nm))
call alloc(open, nm)
call alloc(high, nm)
call alloc(low, nm)
call alloc(close, nm)
call alloc(adjclose, nm)
call alloc(volume, nm)
open(newunit=fileunit, file=filename) ! open the file
read(fileunit, fmt=*, end=1) ! use read() to skip the CSV header
do n = 1, nm ! loop over records and store into array elements
read(fileunit, fmt=*, end=1) time(n), open(n),&
high(n), low(n), close(n), adjclose(n), volume(n)
1 close(fileunit) ! close file when done
end subroutine read_stock
To find the length of the arrays before I allocate them, I inquire the length of the CSV file using a custom function
num_records, defined in
stock-prices/src/mod_io.f90. If you’re wondering what the number
1 means in the
1 close(fileunit), it’s just a line label that Fortran uses if and when it encounters an exception in the
read(fileunit, fmt=*, end=1) statements. If you’re interested about how this function works, take a look inside
stock-prices/src/mod_io.f90. I won’t spend much time on the I/O-specific code here, as we just need it get going with the array analysis.
On every subroutine entry, the arrays
adjclose, and volume will be allocated with size
nm. The subroutine
alloc now seamlessly reallocates the arrays for us. Notice that we still use the explicit way of allocating and deallocating the array of time stamps. This is because we implemented the convenience subroutines
free that work on real arrays. Because of Fortran’s strong typing discipline, we can’t just pass an array of strings to a subroutine that expects an array of reals. We’ll learn later how to write generic procedures that can accept arguments of different type. For now, explicitly allocating the array of time stamps will do. Furthermore, we also need to specify the string length when allocating the
Having read the CSV files and loaded the stock price arrays with the data, we can move on the the actual analysis and fun with arrays.
Getting the number of lines in a text file: If you’re curious how the
num_records function is implemented, feel free to take a look in
stock-prices/src/mod_io.f90. This function opens a file and counts the number of lines by reading it line by line.
Indexing and slicing arrays
Did you notice that the stock data in the CSV files are ordered from most recent to oldest? This means that when we read it into arrays from top-to-bottom, the first element will correspond to the most recent stock price. Let’s reverse the arrays so that they are oriented in a more natural way, going forward in time with the index number. If we express the reverse operation as a function, we could apply it to any array like this:
adjclose = reverse(adjclose)
reverse function will prove useful for the other two objectives of the stock-prices app. Before implementing it, we need to know a few things about how array indexing and slicing works.
To select a single element, we enclose an integer index inside the parentheses, for example
adjclose(1) will refer to the first element of the array,
adjclose(5) to the fifth, and so on.
To select a range of elements, for example from fifth to tenth, use the start and end indices inside the parentheses, separated by a colon:
real, allocatable :: subset(:)
subset = adjclose(5:10)
In this case,
subset will be automatically allocated as an array with 6 elements, and values corresponding to those of
adjclose from index 5 to 10.
By default, the slice
adjclose(start:end) will include all elements between indices
end, inclusive. However, you can specify an arbitrary stride. For example,
adjclose(5:10:2) will result in a slice with elements 5, 7, and 9. The general syntax for slicing an array
a with a custom stride is:
stride are integer variables, constants, or expressions.
end can have any valid integer value, including zero and negative values.
stride must be a non-zero (positive or negative) integer.
Similar rules apply as for
strideis not given, its default value is 1.
start > endand
stride > 0, or if
start < endand
stride < 0, the slice is an empty array.
start == end, for example
a(5:5), the slice is an array with a single element. Be careful not to mistake this for
a(5)which is a scalar (non-array).
start equals the lower bound of an array, it can be omitted, and same is true if
end equals the upper bound of an array. For example, if we declare an array as
real :: a(10:20), then the following array references and slices all correspond to the same array:
a(::1). The last syntax from this list is particularly useful when you need to slice every
n-th element of the whole array – it’s as simple as
a(::n) If you have experience with slicing lists in Python, I bet this feels familiar.
Intermezzo: Reversing an array
Let’s write a function
reverse that accepts a real 1-dimensional array as input argument, and returns the same array in reverse order. We’ll use array slicing rules to perform the reversal.
We can solve this in only two steps and write it as a single-line function (not counting the declaration code). First, we know that since we are just reversing the order of elements, the resulting array will always be of same size as the input array. The size will also correspond to the end index of the array. Second, once we know the size, we can slice the input from last to first and use the negative stride to go backward:
pure function reverse(x)
real, intent(in) :: x(:) ! assumed-size input array
real :: reverse(size(x)) ! declare reverse with same size as x
reverse = x(size(x):1:-1) ! use negative stride to copy backwards end function reverse
Notice that our input array doesn’t need to be declared
allocatable. This is the so-called assumed-size array and it will take whatever size array is passed by the caller. We can use the information about the size directly when declaring the result array.
You can test your new function by reversing the input array twice and comparing it to itself:
print *, all(a == reverse(reverse(a))) ! should always print “T”
You may be wondering why make this a separate function at all when we can just do
x(size(x):1:-1) to reverse any array. There are two advantages to making this a dedicated
reverse() function. First, if you need to reverse an array more than a few times, the slicing syntax above soon becomes unwieldy. Every time you read it, there is an extra step in the thought process to understand the intention behind the syntax. Second, the slicing syntax is allowed only when referencing an array variable, and you can’t use it on expressions, array constructors, or function results. In contrast, you can pass any of those as an argument to
reverse(). This is why we can make such a test like
all(x == reverse(reverse(x))). Try it!
Take some time to play with various ways to slice arrays. Try different values of start, end, and stride. What happens if you try to create a slice that is bigger than the array itself? In other words, can you reference an array out-of-bounds?
Referencing array elements out of bounds: Be very careful to not reference array elements that are out of bounds! Fortran itself doesn’t forbid this, but you will either end up with an invalid value or with a segmentation fault, which can be particularly difficult to debug.
By default, compilers don’t check if an out-of-bounds reference occurs during run-time, but you can enable it with a compiler flag. Use
gfortran -fcheck=bounds and
ifort -check bounds for GNU and Intel Fortran compilers respectively. Keep in mind that this can result in significantly slower programs, so it’s best if used during development and debugging, but not in production.
Now that we understand how array indexing works, it’s straightforward to calculate the stock gain over the whole time series. It’s simply a matter of taking a difference between last and first element of the adjusted close price to calculate the absolute gain in USD:
adjclose = reverse(adjclose)
gain = (adjclose(size(adjclose)) - adjclose(1))
Here, I am using the intrinsic
size function, which returns the integer total number of elements, to reference the last element of the array. Like everything else we did before,
gain must be declared, in this case a
real scalar. The absolute gain, however, only tells us how much the stock grew over a period of time, but it doesn’t tell us anything about whether that growth is small or large relative to the stock price itself. For example, a gain from $1 to $10 per share is greater than the gain from $100 to $200 per share, assuming you invest $100 in either stock. In the former case, you will come out with $1000, whereas in the latter case you will come out with just $200! To calculate the relative gain in percent, we can divide the absolute gain by the initial stock price, multiply by 100 to get to percent, that is,
gain / adjclose(1) * 100. For brevity, I will also round the relative gain to the nearest integer using the intrinsic function
print *, symbols(n), gain, nint(gain / adjclose(1) * 100)
The output of the program is:
2000-01-03 through 2018-05-14
Symbol, Gain (USD), Relative gain (%)
AAPL 184.594589 5192
AMZN 1512.16003 1692
CRAY 9.60000038 56
CSCO 1.71649933 4
HPQ 1.55270004 7
IBM 60.9193039 73
INTC 25.8368015 89
MSFT 59.4120979 154
NVDA 251.745300 6964
ORCL 20.3501987 77
From this output, we can see that Amazon had the largest absolute gain of $1512.16 per share, and Hewlett-Packard had the smallest gain of only $1.55 per share. However, the relative gain is more meaningful then the absolute amount per share because it tells us how much has the stock gained relative to its starting price. Looking at relative gain, Nvidia had a formidable 6864% growth, with Apple being the runner up with 5192%. The worst performing stock was that of Cisco Systems (CSCO), with only 4% growth over this time period.
If you have cloned the stock-prices repo from Github, it’s straightforward to compile and run this program. From the
stock-prices directory, type:
You can read the full program in
We have now covered a lot of the nitty-gritty of how arrays work. We’ll use this knowledge in Part 3 of this article sequence to implement solutions to the remaining two stock price analysis challenges.
In this part, we continued from where we left of at the end of Part 1, and dug deeper into the mechanics of allocation and deallocation of dynamic Fortran arrays. We covered allocating arrays with arbitrary index ranges, and saw how we can slice and dice through the arrays with custom strides. In this article we also implemented the solution to the first of the three challenges. In the third and final part of this sequence, we’ll use built-in Fortran functions and whole-array arithmetic to calculate metrics such as moving average and standard deviation. We’ll use these to tackle the remaining two challenges — identifying which stocks are riskier than others at any given time, and finding good times to buy or sell stock.
About the author:
Milan Curcic is a meteorologist and oceanographer. A Fortran programmer since 2006, he has worked with teams from United States Navy and NASA on developing and improving Earth system prediction models. Milan has authored two general-purpose Fortran libraries and is currently working on a startup porting Fortran to the cloud for weather and ocean prediction.
Originally published at freecontent.manning.com.