Some updates: bugs fixing and CLAD.

Hello there,

it has been a very long time since I posted here, but most recently, thanks to a couple of pesky bug reports on HEΛP by Mirko Vukovic, and because I had a couple of days relatively free, I was able to go back to do some programming, fix (some of) the bugs and post here.

Here is the story.  There were two bugs which I had to deal with (there are more, of course).
  1. A bug triggered by CCL and its implementation of the Common Lisp Reader algorithm.
  2. A buglet due to missing supporting data (.css files in this case) in the deployment of the HEΛP generated documentation.
The first bug was quite difficult to track down and it boiled down to CCL bailing out on READ in an unexpected way (that is, with respect to other implementations).  As an aside, this is a problem of the standard non having a predefined condition for "error caused by the reader because it does not find a package"; LW has CONDITIONS:PACKAGE-NOT-FOUND-READER, but that is not standard, and some implementation just signal READER-ERROR or PACKAGE-ERROR.  The error was easy to "fix" once diagnosed: just don't process files that you know will be problematic, and HEΛP can already "exclude" such files.

The second bug was easier to diagnose, but the fix was more complicated (especially due to the NIH syndrome I suffer from).  The problem is that ASDF moves the compiled code around, but not auxiliary data, like in my case, .css files.  I could have followed what ASDF does, but I decided to go another way and came up with a small library I called Common Lisp Application Data (CLAD, because you need to "dress" your code).


By now, at least on Windows 10 and Mac OS X (and Linux), there is a a notion of an Application and Data Folder. The user version of this folder (as opposed to the system one) is ~/Library/ on Mac OS X and %USERPROFILE%\AppData\Roaming\ (this is the "roaming" profile in W10 parlance).  For Linux there are several semi-standards, one of them is the XDG base directory specification; in this case a common solution is to use the ~/.config/ folder.

CLAD assumes these "fixed" locations to create per-application or per-library subfolders of a "Common Lisp" one.  That is, CLAD ensures the presence of the following folders in your account.
  • Mac Os X: ~/Library/Common Lisp/
  • Windows 10: %USERPROFILE%\AppData\Roaming\Common Lisp\
  • Linux/Unix: ~/.config/common-lisp/ (in this case, I am following ASDF lead)
The library exports three simple functions,
  1. user-cl-data-folder, which returns the pathnames of the folders above.
  2. ensure-app-or-library-data-folder, which, as the name implies, ensures that a subfolder exists in the proper location.
  3. app-or-library-data-folder, which returns the pathname associated to a library or app.
A library or app can now set itself up by doing something like

(defparameter *helambdap-data-folder*
  (clad:ensure-app-or-library-data-folder "HELambdaP")
  "The user HELambdaP data folder.")

On Mac OS X, this results in the folder ~/Library/Common Lisp/HELambdaP; a library or an application can now rely on a clear space where to store "common" data files.  For HEΛP it solved the problem of where to find the .css files in a reliable place.

Trivial? Yes.
NIH syndrome? Of course.
Complete? No.
Useful? You be the judge of that.



CLAST: a Common Lisp AST and Code Walking library

I guess this is a good time to publicize the CLAST library I have been working on with Matteo Crespi. CLAST is a Common Lisp AST and Code Walking library that stands apart because it is geared at producing a source-level Abstract Syntax Tree (AST) of Common Lisp code.

Of course the usual issues with MACROEXPAND are all there, but I believe the choices made to handle it are quite sensible.

The library is still "in fiery", but most of the heavy lifting is done. The development branch is the most up-to-date one


ELS 2017: thank you!

Dear all
just got back for ELS 2017 in Brussels, which went very well; thanks especially to Didier Verna, Irene Durand and Alberto Riva.  It was a particularly good edition of the event.



Orangutans grok Common Lisp

This being the season to be jolly (and eat bananas), I produced a magical piece of software: a compiler and decompiler for the language Ook!  Which proves that orangutans grok Common Lisp.

You can try out this recreational and magical piece of software by downloading the library from common-lisp.net.  A brief introduction can be found, of course, in the library (if you can see it!)

Please send bananas and DO NOT say "monkey"!




(R)creational Common Lisp

In my real work, I (or better, my students and colleagues) use R to program our data analyses (if you are interested, you can check out the TRONCO package.)

Of course, being a Common Lisp quite-high priest at this point and a recreational programmer in it, I wondered how to render several R constructs in the language with parentheses.  It turns out that it was quite fun to re-engineer some of the key features of the R language.

This is not the first such project trying to mimic R.  I know at least two of them; the most public one being Common Lisp Stat, but, being known for suffering from NIH and RTW syndromes, I had to jump in.

Here is a short summary of what I did and what was achieved.

Generic Math, Arrays and Vectors

R provides generic math on vectors.  Pretty much everything in R is a vector (stored in column major order; more on this later) including arrays.  The generic math helps you writing things like
> 42 * 1:5 # The ':' operator produces a vector.
[1]  42  84 126 168 210


The generic math and the construction of some operators in CL is pretty much straightforward.  Where things get hairy is in the handling of the several ways that a R programmer may use to index, slice and dice vectors and arrays. As an example, you could write
> a = array(11:14, dim = c(2, 2), dimnames = list(list("a", "b"), list("x", "y")))

> a
   x  y
a 11 13
b 12 14

In brief, you have a matrix with "named" rows and columns. There are other idiosyncrasies but this is already something interesting to deal with, as you can now write things like:
> a[1, "y"]
[1] 3

> a["b", ]
 x  y 
12 14

The second "indexing" selects, or slices, the entire second row (counting from 1) and, as a result, returns a vector with the column names retained.

Indexing can also become interesting because of other idiosyncrasies in R.  Consider the following:
> a[2]
[1] 12

> a[42]
[1] NA # 'NA' is the 'not available' indicator.

> a[c(2, 3, 4, 10)]
[1]  12  13  14 NA

Indexing by only one index accesses the underlying vector in column major order. If the index is beyond the current limit, the value NA ('not available') is returned.  If the sole index is a vector, then the indexing returns a vector of the values in the array accessed column major order.

Indexing and Assignments

Of course, once you handle slicing and dicing of arrays, you have to deal with setting these slices and dice.

In R you can have a lot of what you can dream of (and something more, some of which, you wished was not there).

Basic assignment on vectors and arrays works as expected. Consider:
> a = array(11:19, dim = c(3, 3), dimnames = list(list("a", "b", "c"), list("x", "y", "z")))

> a
   x  y  z
a 11 14 17
b 12 15 18
c 13 16 19

> a[2, "y"] = 42

> a
   x  y  z
a 11 14 17
b 12 42 18
c 13 16 19

So far, nothing special is happening. Things actually make sense when trying to set a sub-slice of the array.
> b = array(42, dim = c(2, 2))

> b
     [,1] [,2]
[1,]   42   42
[2,]   42   42

> a[2:3, 2:3] = b

> a
   x  y  z
a 11 14 17
b 12 42 42
c 13 42 42

However, when setting a sub-slice of an array to a vector, or setting the elements of an array using a single vector index (i.e., when accessing the array in column major order), things may become a bit surprising.
> a[2:3, 2:3] = c(1, 1, 1, 1)

> a # This is not a surprise.
   x  y  z
a 11 14 17
b 12  1  1
c 13  1  1

> a[2:3, 2:3] = c(1, 2, 3, 4, 5, 6) # This is also expected (modulo the 'multiple'.)
Error in a[2:3, 2:3] = c(1, 2, 3, 4, 5, 6) : 
  number of items to replace is not a multiple of replacement length

> a[c(1, 2, 6, 7)] = c(111, 111, 111, 111) # Ok, understandable.
> a
    x   y   z
a 111  14 111
b 111   1   1
c  13 111   1

> a[c(1, 2, 12, 7)] = c(1, 1, 1, 1)

> a # What?!?
 [1]   1   1  13  14   1 111   1   1   1  NA  NA   1

Ok, the last example shows how R manages assigning past the limit of the underlying vector.

Not really a problem, once you get the hang of it, but, source of a bit of confusion nevertheless, at least to yours truly.

In any case, all of this must be dealt with once you want to get into building some facilities like these on top of Common Lisp.

Let's "Reinvent the Wheel" (RTW) in Common Lisp

Getting the basics down was not so difficult.  Common Lisp has a somewhat richer set of array and vector data types than R, however, some design decisions resulted in some fun (albeit probably sub-optimal) setup.

Here is a list of decisions taken, which may be subject to debate.

  • Objects are represented as structures.
  • Constructors follow the R conventions (no make-array!)
  • Vectors and arrays are mapped to CL vectors and arrays.
    This means that
    • Arrays are kept as multi-dimensional entities which are accessed in row major order
    • Vector and array "objects" are separate and different.
  • A generic reference function AT is introduced (and its counterpart, (SETF AT).)

Of course, these choices will make porting code from R to the CL library/tool quite cumbersome; let's say that this goal was dropped early on. The CL library (which, by the way, is named ρ, that is: RHO) will look familiar to a R programmer, but it will not be the same. After all, is it Common Lisp, isn't it?

Here is a run down of the basic features.  This was the hard part, in the opinion of the author.  Getting more of R implemented will be somewhat easier.

The AT Generic Function

The AT generic function works as expected on the standard CL objects (as it should).
RHO 10 > (defvar qwerty "qwerty")

RHO 11 > (setf (at qwerty 3) #\R)

RHO 12 > qwerty

RHO 13 > (at qwerty 4)

RHO 14 > (defvar a #2A((1 2 3) (4 5 6) (7 8 9)))

;;; ...

RHO 16 > (at a 1 1)

RHO 17 > (setf (at a 0 0) 42)

RHO 18 > a
#2A((42 2 3) (4 5 6) (7 8 9))

The generic function AT also works on structures and class instances.
RHO 22 > (defstruct foo a s d)

RHO 23 > (defvar foo (make-foo :s 42))

RHO 24 > (at foo 'foo-s) ; In this case the accessor function FOO-S is invoked.

RHO 25 > (setf (at foo 'foo-d) "Food")

RHO 26 > foo
#S(FOO :A NIL :S 42 :D "Food")

So far, so good.  Now let's have some fun.

Creating Arrays (and Vectors and Matrices)

Let's create some arrays, vectors and matrices.
RHO 28 > (setf rr (range 1 9)) ; This would be 'rr = 1:8' in R.
  1  2  3  4  5  6  7  8

RHO 29 > (setf rv (vector 1 22/23 -3.2 42 5.8))
      1  22/23   -3.2     42    5.8

RHO 30 > (setf rv2 (vector* (list 1 22/23 -3.2 42 5.8)
                            :names '(a b c d e)))
      A      B      C      D      E
      1  22/23   -3.2     42    5.8

The objects returned by VECTOR and VECTOR* (the full constructor) are wrappers around the underlying CL vectors.  Their class is RHO:VECTOR.  Note also that *PRINT-READABLY* is set to NIL, therefore, a "pretty printing" of the vectors is produced.
RHO 37 > (setf ra (array (range 1 9) :dim '(3 3)))
      [, 0] [, 1] [, 2]
[0, ]     1     2     3
[1, ]     4     5     6
[2, ]     7     8     1

RHO 38 > (at ra 1 1)

The function RANGE excludes the upper limit (following CL conventions), while the ARRAY constructors "recycles" the values passed to it (following R conventions).  The value produces is pretty-printed, while its class is RHO:ARRAY.  Indexing is 0-based, as in CL.

Creating matrices is straightforward and, again, a R-like function is used.
RHO 3 > (setf rm (matrix '(1 1 2 0) :nrow 2 :ncol 2))
      [, 0] [, 1]
[0, ]     1     1
[1, ]     2     0

RHO 4 > (at rm 0 1)

RHO 5 > (at rm 1 0)

Slicing and Dicing in RHO

Now that we have all the bases covered, we can proceed to some "slicing and dicing".  All the R indexing and assignment methods are provided in RHO, modulo the Common Lisp background choices, bugs and unimplemented features; not many of the last kind.

First of all, let's set some dimension names...
RHO 7 > (setf (dimension-names ra) '(("a" "b") ("x" "y")))
(("a" "b") ("x" "y"))

RHO 8 > ra
      [, x] [, y] [, 2]
[a, ]     1     2     3
[b, ]     4     5     6
[2, ]     7     8     1

RHO 9 > (at ra 1 "y")

Next we create a new array (a matrix, but in R, matrices are separate subclasses of arrays) rb and use it as a value to set a sub-matrix of ra.
RHO 13 > (setf rb (array 42 :dim '(2 2)))
      [, 0] [, 1]
[0, ]    42    42
[1, ]    42    42

RHO 29 > (setf (at ra (vector 1 2) (vector 1 2)) rb)
#2A((42 42) (42 42))

RHO 15 > ra
      [, x] [, y] [, 2]
[a, ]     1     2     3
[b, ]     4    42    42
[2, ]     7    42    42

As you can see, everything works as expected.

Remember that we have built a matrix rm; let's use it to set some elements of the array ra, according to the R "access-by-matrix" semantics.
RHO 16 > rm
      [, 0] [, 1]
[0, ]     1     1
[1, ]     2     0

RHO 24 > (type-of rm)

RHO 27 > (setf (at ra rm) 1024)

RHO 28 > ra
      [, x] [, y] [, 2]
[a, ]     1     2     3
[b, ]     4  1024    42
[2, ]  1024    42    42

RHO 29 > (setf (at ra #(1 2) #(1 2)) #(1 1 1 1))
#(1 1 1 1)

RHO 30 > ra
      [, x] [, y] [, 2]
[a, ]     1     2     3
[b, ]     4     1     1
[2, ]  1024     1     1

Note however, that, although "correct", no attempt to signal an error is made for incongruous dimensions.
RHO 42 > (setf (at ra #(1 2) #(1 2)) #(11 12 13 14 15))
#(11 12 13 14 15)

RHO 43 > ra
      [, x] [, y] [, 2]
[a, ]     1     2     3
[b, ]     4    11    12
[2, ]  1024    13    14

Getting the indexing of different combinations of scalars, vectors, and arrays was an interesting exercise. The implementation is probably bloated and not easily understandable; most probably because of the design choice of keeping the underlying CL vectors and arrays "as-they-are", especially multidimensional arrays.

Missing Things...

Most important one is "negative" indexing, which does not work completely yet.
Of course, I am limiting this comment to arrays and vectors: other R data structures, such as factors and data.frames are in the making, but all the heavy lifting is now done.

Want to Play?

Well, for the time being, drop me a line (you know where to find me).  We can talk about setting up a proper project and discuss some design choices.




Thanks to Masayuki Takagi, a not-so-obvious bug was fixed in the basic unification machinery of CL-UNIFICATION.  This led to the addition of a couple of new utility functions and some other cleanups (hopefully).

The result should be in Quicklisp in the next round.  Otherwise you can always clone/update/pull from the main repository.




I just revamped the CL-ENUMERATIONS library and API.

This was simply a nice cleanup of the previous library. The new documentation pages are, of course, generated with a little HEΛP, which also got some fine tuning to be able to generate doc pages from quite a variety of relatively "plain" doc-strings.