The SMLofNJ structure is a miscellaneous collection of non-standard functions supplied with the SML/NJ compiler. In version 110.0.7 it provides:
the manipulation of continuations with a Scheme-like call/cc API.
an interval timer to deliver a periodic trigger to an application.
a little control over the garbage collector.
execution time profiling.
some operating system information.
utilities for lazy evaluation of a function.
weak pointers for the garbage collector.
exporting the heap. This has already been discussed in the section called Assembling the Hello World Program in Chapter 2.
access to an exception history list for debugging.
The call/cc API is in the SMLofNJ.Cont structure. It is described in more detail in the section called Continuations in Chapter 6.
You can set the interval timer to produce alarm signals (Signals.sigALRM) at periodic intervals. This can be used to trigger activity in your application. The Concurrent ML library uses it to trigger pre-emptive scheduling of threads so you won't be able to use the interval timer if you are using Concurrent ML. But if you aren't you could write something like the following program.
fun alarm_handler(signal, n, cont) = let in print(concat["tick at ", Time.toString(Time.now()), "\n"]); cont end fun main(arg0, argv) = let fun loop() = (Signals.pause(); loop()) in Signals.setHandler( Signals.sigALRM, Signals.HANDLER alarm_handler); SMLofNJ.IntervalTimer.setIntTimer (SOME(Time.fromSeconds 1)); loop(); OS.Process.success end |
By returning a different continuation you can have your program switch to different code on each clock tick.
The SMLofNJ.Internals.GC provides two functions for control of the garbage collection. Calling SMLofNJ.Internals.GC.doGC n with n = 0 will trigger a minor collection. With a large value of n, say 10, it will trigger a major collection of all of the generations to reduce the memory usage to a minimum.
You can also turn on or off the collection tracing messages. These are off by default in your programs. Calling SMLofNJ.Internals.GC.messages true will turn them on. You will see messages looking like
GC #0.1.2.5.6.43: (60 ms) GC #1.2.3.6.7.66: (60 ms) |
A message is produced for each major collection. The numbers show the number of collections that have been performed in each generation. The oldest generation is on the left. The right-most number is the number of minor collections. The time is the duration of the major collection. The messages can give you some idea of the amount of memory activity in your program and the typical pause times during the collections.
There is more discussion on the SML/NJ implementation of garbage collection in the section called Garbage Collection Basics in Chapter 7.
You access execution time profiling through the Compiler.Profile structure, which is separate from the SMLofNJ structure. However the profiling uses the low-level control functions in SMLofNJ.Internals.ProfControl.
To get profiling you have to compile your program for profiling. Then when it runs it must explicitly turn on the profiling. To compile with profiling using the Compilation Manager you need an extra command. For the example profile program:
> CM_ROOT=profile.cm sml Standard ML of New Jersey, Version 110.0.7, September 28, 2000 - Compiler.Profile.setProfMode true; - CM.make(); |
It is a good idea to force the recompilation of all of your source when you do this. A simple way to do this is to delete the CM directories in each of the source directories of your program. This deletes the cached .bin files and they have to be recompiled.
Here is the profile program which just sorts a large list.
fun main(arg0, argv) = let fun sort() = let val gen = Rand.mkRandom 0w123 val data = List.tabulate(100000, (fn _ => gen())) val sorted = ListMergeSort.sort (op >) data in () end in (* SMLofNJ.Internals.GC.messages true; *) Compiler.Profile.setTimingMode true; sort(); Compiler.Profile.setTimingMode false; Compiler.Profile.report TextIO.stdOut; OS.Process.success end |
Here are the performance results on my 1GHz Athlon machine.
%time cumsec #call name 42.85 .18 1 Main.<tempStr>.main.main.sort.sort 26.19 .29 0 Major GC 23.80 .39 0 Minor GC 9.52 .42 100000 Main.<tempStr>.main.main.sort.sort.data .00 .42 1 Main.<tempStr>.main.main |
This shows the program spent 9% building the list of 100000 random numbers, 42% in the sort, and 50% doing garbage collection of all of that data. Memory usage peaked at 11MB. SML/NJ likes to use lots of heap space to save on garbage collection. I can get some control over the peak heap size by changing the allocation size used by the garbage collector. The default is 256KB. You can change this by adding a command line argument to the SML runtime in the script that runs the program. For example if I add @SMLalloc=1024 then the allocation size is 1MB and the peak heap usage goes up to 22MB but the collection time drops to 29%. If I reduce it to 100KB then the peak usage is around 9MB but the collection rises to 64%.
The SMLofNJ.SysInfo structure provides a collection of functions to return the configuration of the compiler and the platform. If you know that it's a Unix system then the Posix API is likely to be available. If you want to know the endian-ness then the target architecture will tell you. In the 110.0.7 version of SML/NJ the getOSVersion function does not work. It always returns "<unknown>".
Lazy suspensions allow you to "memoise" a function. This means that the function is evaluated at most once. On subsequent calls the result from the first call is returned. This could be useful to initialise an imperative data structure only if actually needed at runtime.
In the getopt example of the section called Using a Hash Table in Chapter 2 the option table was built when the Global structure was compiled. It appeared in the heap file. This would be inefficient if the table is very large. Also if the data structure you want to build depends on some parameter supplied at run-time then you need to build the data structure imperatively after the program starts running. You can do this with a reference variable but a suspension is more convenient.
The following example uses the string table structures from the section called Using a Hash Table in Chapter 2.
structure Table: sig val set: string * string -> unit val get: string -> string option end = struct open Common type Table = string STRT.hash_table val susp = SMLofNJ.Susp.delay(fn () => STRT.mkTable(101, NotFound): Table) fun table() = SMLofNJ.Susp.force susp fun set (k, v) = STRT.insert (table()) (k, v) fun get k = STRT.find (table()) k end |
I've defined a Table structure with get and set functions. I've used an unnamed signature constraint to only export these functions. The value susp is built during the compilation of the structure and leaves a suspension in the heap file. This suspension will be forced to a concrete value the first time that either the get or set functions is called. This will cause the table to be built. The same table will be used by all calls to the get and set functions which is important since it is updated in place.
The type constraint on the mkTable call is needed to fix the type of the table for the suspension. The value restriction rule of SML does not allow a value at the level of a structure declaration to have a polymorphic type (i.e. one with an unresolved type variable).
Normally the garbage collector deems a heap object to be garbage once all pointers to the object have been deleted. Sometimes it is convenient to retain a pointer to an object while still allowing the object to be collected. For example you may have a cache of objects that you have fetched from some file. If memory becomes tight you may want the objects to be removed from the cache and collected since you can fetch them again if you really need them. (Unfortunately you can't prioritise the collection. All weakly referenced objects in a generation will be collected).
A weak pointer is a pointer that is ignored by the garbage collector when deciding whether a heap object is garbage. Normal pointers are called strong pointers. Once all of the strong pointers have disappeared the object can be collected. Then all weak pointers to that object are marked invalid to indicate that they now dangle. You can test if the weak pointer is still valid.
Another use for weak pointers is to do some finalisation after the object has been collected. If you can arrange to scan all weak pointers after each collection then you can detect which objects have been collected because their weak pointers will be invalid. You can trigger a scan of the weak pointers with a signal handler for the sigGC pseudo-signal. (See the section called Signals).
There is a problem with weak pointers and compiler optimisation. Since, with immutable data structures, copy by value and copy by reference are the same, there can be some ambiguity about whether the various pointers are all pointing to the same copy of an object. You should only use weak pointers to reference variables. This ensures that there is no hidden replication of the object pointed to by the reference variable.
Here is a mickey-mouse example that caches the Unix environment variables in a global hash table for faster access. This of course assumes that the environment isn't changed while the program runs (which it probably won't do since there is no putenv operation).
structure Environ: sig val get: string -> string option end = struct open Common open SMLofNJ.Weak type Table = string STRT.hash_table val cache: (Table option) ref weak ref = ref (weak (ref NONE)) fun table() : Table = let (* This takes a NAME=VALUE string *) fun fill tbl env = let val ss = Substring.all env val str = Substring.string val fields = Substring.fields (fn c => c = #"=") ss in case fields of [n, v] => STRT.insert tbl (str n, str v) | [n] => STRT.insert tbl (str n, "") | _ => () (* unrecognisable *) end fun build() = let val tbl = STRT.mkTable(101, NotFound) in print "building\n"; app (fill tbl) (Posix.ProcEnv.environ()); cache := weak (ref (SOME tbl)); tbl end in case strong (!cache) of NONE => build() (* has been collected *) | SOME rtbl => ( case !rtbl of NONE => build() (* is not yet built *) | SOME tbl => tbl (* table is available *) ) end fun get k = STRT.find (table()) k end |
Instead of a suspension as I did in the section called Lazy Suspensions I've used a reference variable. With one of those I can have the variable initialised to the NONE state so that the table isn't built until called for. The table function fetches the table or builds/rebuilds it if it is not available. The weak function creates a weak pointer to the reference. The strong function returns the reference if it is still available. Since the type of strong is 'a weak -> 'a option the value in the case expression has the type (Table option) option which gives us the three cases. After building the table a weak reference to it is assigned to the cache. Note the extra ref between the weak reference in the table. This is just to ensure that we only have weak references to ref types.
Here is the main function that I use to test it. I build a big list in between two calls to get an environment variable. This triggers a garbage collection and I can see that the build is done twice. If I comment out the call to data then the build only happens once.
fun main(arg0, argv) = let fun data() = ignore(List.tabulate(100000, fn n => n)) in SMLofNJ.Internals.GC.messages true; print(concat["The PATH is ", valOf(Environ.get "PATH"), "\n"]); data(); print(concat["The PATH is ", valOf(Environ.get "PATH"), "\n"]); OS.Process.success end |
Getting access to the exception history list is a new feature which has crept in to the compiler in the 110.0 version. It shows the source location where the exception was raised. Here is an example of it in the top-level uncaught exception handler.
fun main(arg0, argv) = let fun bad() = raise Fail "bye" in bad(); OS.Process.success end handle x => ( toErr(concat["Uncaught exception: ", exnMessage x, " from\n"]); app (fn s => (print "\t"; print s; print "\n")) (SMLofNJ.exnHistory x); OS.Process.failure ) |