ARTICLE

Dive into clojure.java.io

From Clojure, The Essential Reference by Renzo Borgatti

clojure.java.io contains a collection of functions to simplify the interaction with the Java Input/Output system (or simply IO). Over the years, Java evolved the original InputStream and OutputStream abstractions into Reader and Writer, eventually adding also asynchronous IO. During this transformation, Java put a lot of effort in maintaining backward compatibility, a principle also shared with Clojure. Unfortunately, there are now coexisting IO APIs that impact negatively on usability, forcing Java developers through bridges and adapters to move between different styles of IO.

Take 37% off Clojure, The Essential Reference by entering fccborgatti into the discount code box at checkout at manning.com.

clojure.java.io does not implement a new IO system, but does a great job at shielding Clojure developers from most of the inconsistencies generated by the fragmented Java interface. This is achieved by a few polymorphic multimethods that can be further extended if needed. The namespace also contains a few utility functions to work with the classpath, files and URLs.

Streams, Writers, and Readers

Let’s start by illustrating the following functions:

  • clojure.java.io/reader and clojure.java.io/writer produce a java.io.BufferedReader and java.io.BufferedWriter objects respectively. They accept a variety of input types like readers, streams, files, URLs, sockets, arrays and strings. The fact that a reader accepts a java.io.InputStream, for example, removes most of the boilerplate required in Java to move between the two abstractions.
  • clojure.java.io/input-stream and clojure.java.io/output-stream produce a java.io.InputStream and java.io.OutputStream objects respectively. They accept the same input types as reader and writer, including accepting a reader as input (transforming between reader/writer and input/output streams).

In the following example we can see how to create a reader from a file. The reader should keep in mind that, in general, IO objects allocate resources on the host operating system and they need to be released or closed. We can use with-open to release resources after use:

(require '[clojure.java.io :as io]) ; ❶

(with-open [r (io/reader "/usr/share/dict/words")] ; ❷
(count (line-seq r))) ; ❸
;; 235886

clojure.java.io is usually aliased as io.

reader interprets the first string argument as a path to a file or remote URL.

line-seq creates a lazy sequence by reading line items from the reader object.

Sometimes it’s useful to create a reader from a string (especially for testing), but reader interprets strings as locations. We can achieve the desired effect by transforming the string into a character array first:

(require '[clojure.java.io :as io])

(def s "string->array->reader->bytes->string") ; ❶

(with-open [r (io/reader (char-array s))] ; ❷
(slurp r)) ; ❸
;; "string->array->reader->bytes->string"

io/reader is commonly used to load external resources. Sometimes, especially for testing, it’s useful to create a reader directly from a string. We use this simple string for illustrative purposes.

char-array transforms the string into a primitive array of chars, preventing reader interpretation of the string as location.

slurp has polymorphic behavior similar to reader and in this case transforms the reader back into a string by reading its content.

The book contains other interesting examples of use of io/reader: in line-seq we show how to read from a java.io.InputStream. In disj instead, we can see an example about how to read from a java.net.Socket object.

Not surprisingly, writer creates a new writer object accepting the same first argument types as reader:

(with-open [w (io/writer "/tmp/output.txt")] ; ❶
(spit w "Hello\nClojure!!")) ; ❷

(println (slurp "/tmp/output.txt")) ; ❸
;; Hello
;; Clojure!!
Nil

Using a writer is very similar to using a reader. writer creates the object “w” that will automatically close at the end of the expression thanks to with-open.

spit sends the content of a string into a file. If the file already exists, the content is overwritten.

To test the content of the file, we can use slurp instead of passing through a reader.

When data processing consists of reading from a large file, operate some transformations and write the results back to disk, we can chain a reader and a writer together and process data using lazy functions like line-seq. By operating this way, we avoid to load the entire input into memory:

(require '[clojure.java.io :refer [reader writer]])
(require '[clojure.string :refer [upper-case]])

(with-open [r (reader "/usr/share/dict/words") ; ❶
w (writer "/tmp/words")]
(doseq [line (line-seq r)] ; ❷
(.append w (str (upper-case line) "\n")))) ; ❸
;; nil

Both reader and writer need to be closed after use. In this example we use the dictionary file presents on most Unix-based systems. The file is large and we want to avoid to load its entire content to memory.

Using doseq, we make sure that side effects are evaluated lazily and without holding the head of sequence. The net effect is that just a small portion of the file is present in memory at any given time, while the garbage collector can claim any processed item that was already written to disk.

We wouldn’t be able to use spit repeatedly, because the first call would close the writer.

reader accepts an :encoding key, while writer accepts both :encoding and :append keys. The :encoding key forces a specific encoding for reading or writing data. The :append key forces writer to append new data at the end of the output stream. The following example shows how to force “UTF-16” encoding (instead of the default “UTF-8”) and removes the need to explicitly call the .append method (compared to the previous example):

(with-open [r (reader "/usr/share/dict/words" :encoding "UTF-16")
w (writer "/tmp/words" :append true :encoding "UTF-16")] ; ❶
(doseq [line (line-seq r)]
(.write w (str (upper-case line) "\n")))) ; ❷

We can use :append to prevent writer from removing any previous content from the file while writing new content.

Instead of using the .append method we can now using the more generic .write and control the behavior using configuration options.

Please note that forcing “UTF-16” encoding in the example above only makes sense if the input file is written with that encoding. It was used in the example for illustration purposes.

Resources and URLs

You can find examples showing reader or slurp loading resources using strings to indicate their location. reader interprets the given string similarly to an URL (Uniform Resource Locator). An URL is a convention to format strings to encode the location of resources across the network. Confusingly enough, a java.io.File object has methods to transform it into an URI (Uniform Resource Identifier) but not an URL. An URI is a slightly more general object than an URL.

In Java programming, URL and URI need some convoluted transformations to use with readers and files. Clojure hides this complexity away allowing us to create a reader from a file or a string without thinking about such conversions. More specifically, the following functions from clojure.java.io are available to deal with resource and locations:

  • resource retrieves an URL object given a string representing the location of a resource on the Java classpath. Resources on the classpath are different from resources on disk as their location is independent from the location of the running Java executable.
  • as-url: creates an URL object given a string representing its location (it could be classpath, local file system, or other protocols such as “http”).

resource is quite common in Clojure programming to retrieve resources from the Java classpath. The classpath normally contains compiled Java classes, Clojure sources (unless they are explicitly removed) or other artifacts. We could for example retrieve the source of the clojure.java.io namespace with the following:

(require '[clojure.java.io :refer [resource reader]])

(def cjio (resource "clojure/java/io.clj")) ; ❶

(first (line-seq (reader cjio))) ; ❷
;; "; Copyright (c) Rich Hickey. All rights reserved."

Clojure sources are packaged as part of the Clojure executable. We can find them using the relative path of the file inside the Jar archive.

We can see the first line of the file after using a reader and line-seq.

as-url is a small utility function to create URL objects (without the need of importing java.net.URL to use its constructor directly). as-url adds some level of polymorphism to handle input types other than strings:

(require '[clojure.java.io :refer [as-url file]])
(import 'java.nio.file.FileSystems)

(def path ; ❶
(.. FileSystems
getDefault
(getPath "/tmp" (into-array String ["words"]))
toUri))

(def u1 (as-url "file:///tmp/words")) ; ❷
(def u2 (as-url (file "/tmp/words"))) ; ❸
(def u3 (as-url path)) ; ❹

(= u1 u2 u3) ; ❺
;; true

path shows how to convert a Java NIO (New IO Api) path into an URI.

as-url accepts strings (with protocols) to identify a location on disc of a file.

as-url also accepts the same location as a java.io.File object.

Finally, as-url also accepts an URI as the result of passing through a java.nio.file.Path object.

The 3 urls are different objects, but they represent the same location on disk of the file “/tmp/words”.

Unfortunately, clojure.java.io doesn’t handle coercions or transformations of java.nio.file.Path objects directly, as demonstrated by the previous example where we had to explicitly call toUri() before calling as-url. But clojure.java.io can be extended to handle java.nio.file.path (and similarly other protocols):

(require '[clojure.java.io :as io])
(import '[java.nio.file Path FileSystems])

(extend-protocol io/Coercions ; ❶
Path
(as-file [path] (io/file (.toUri path)))
(as-url [path] (io/as-url (.toUri path))))

(def path ; ❷
(.. FileSystems
getDefault
(getPath "/usr" (into-array String ["share" "dict" "words"]))))

(io/as-url path) ; ❸
;; #object[java.net.URL 0x1255fa42 "file:"/usr/share/dict/words"]

(io/file path) ; ❹
;; #object[java.io.File 0x1c80a235 "/usr/share/dict/words"]

clojure.java.io contains the Coercions protocol declaring two functions, as-file and as-url. While as-file has the file wrapper function available, as-url doesn’t have a corresponding url function. The implementation consists of transforming the path into an URI and call the corresponding (and already existing) implementations.

Java NIO Path objects are roughly equivalent to URLs. java.nio.file.Path only has a translation into URI available that we can use to create an URL. The getPath() method takes a first “root” argument of the initial part of the path, followed by any other segment as a variable argument type. Clojure needs to create an array of strings to be compatible with the type signature.

After extending the protocol, we can use as-url to transform java.nio.file.Path directly.

As a bonus, also file can now create a file object directly from a path.

Dealing with Files

Dealing with files is another important aspect in any programming language. Clojure relies on java.io.File for file operations and clojure.java.io contains a few utility functions to deal with files.

We’ve already seen io/file in action multiple times in this section. The function takes one or more arguments. When only one argument is present, it could be a string, another file, a URL or URI (or, less interestingly, nil):

(require '[clojure.java.io :as io])

(keys (:impls io/Coercions)) ; ❶
;; (nil java.lang.String java.io.File java.net.URL java.net.URI)

(io/file "/a/valid/file/path")
;; #object[java.io.File 0x7936d006 "/a/valid/file/path"]

(io/file (io/file "/a/valid/file/path"))
;; #object[java.io.File 0x3f46ce65 "/a/valid/file/path"]

(io/file (io/as-url "file://a/valid/url"))
;; #object[java.io.File 0x7af35ada "/valid/url"]

(io/file (.toURI (io/as-url "file://a/valid/uri")))
;; #object[java.io.File 0x2de6a5c8 "/valid/uri"]

(io/file nil)
;; nil

We can see what single argument types io/file accepts by checking the :impl key of the Coercions protocol. What follows is a list of all the possible calls to io/file with the respective argument types.

The default list of types that io/file can understand is visible inside the Coercion protocol map, as demonstrated in the example. We’ve already seen that, by extending this protocol, we can apply io/file to other argument types. io/file also accepts other arguments after the first, with the same type constraints. Additional arguments have to be relative paths (i.e. they cannot start with a forward slash ‘/’):

(io/file "/root" (io/file "not/root") "filename.txt") ; ❶
;; #object[java.io.File 0x6898f182 "/root/not/root/filename.txt"]

(io/file "/root" (io/file "/not/relative") "filename.txt") ; ❷
;; IllegalArgumentException /not/relative is not a relative path

All arguments to io/file after the first need to be relative paths.

Here the second argument starts with ‘/’ which denotes another root path after the first.

io/file does not actually create a physical resource, but just a “pointer” that other functions like writer can use to write content to. Another way to create content is to copy one file to another using the io/copy function:

(require '[clojure.java.io :as io])

(io/copy "/usr/share/dict/words" (io/file "/tmp/words2")) ; ❶
;; nil

(.exists (io/file "/tmp/words2")) ; ❷
;; true

We can use io/copy to copy the existent /usr/share/dict/words file into a new file in the /tmp folder.

To check if the file was actually created, we can use the exists() on the java.io.File object.

io/copy supports many type combinations: from reader to writer, from string to file, from InputStream to OutputStream and so on. One of them, from file to file, is specifically optimized using java.nio.channel.FileChannel which guarantees optimal performance when the file is cached by the operative system. io/copy however, does not support a string to string transfer (with a file to file copy implementation). We can extend io/copy using the related do-copy multimethod:

(require '[clojure.java.io :as io])

(defmethod @#'io/do-copy [String String] [in out opts] ; ❶
(apply io/copy (io/file in) (io/file out) opts))

(io/copy "/tmp/words2" "/tmp/words3") ; ❷
;; nil

(.exists (io/file "/tmp/words3")) ; ❸
;; true

The defmethod definition for io/do-copy is private in clojure.java.io but we can still make access to it by looking up the related var object (with the reader macro #') and then dereferencing the var with @ (another reader macro). The implementation simply call io/file on each argument.

io/copy now accepts a pair of strings as arguments.

We can verify the file was effectively created.

The io/do-copy signature shows that io/copy accepts options:

  • :buffer-size defaults to 1024 bytes and can be used when the origin argument is an InputStream.
  • :encoding is similar in use to the same option we’ve seen for reader and writer, forcing a specific encoding. It defaults to “UTF-8”.

When a file path requires sub-folders, that don’t yet exist, we can use make-parents to create all folders recursively. make-parents does not create the last path segment, considering it the name of a file:

(require '[clojure.java.io :as io])

(def segments ["/tmp" "a" "b" "file.txt"]) ; ❶

(apply io/make-parents segments) ; ❷
;; true

(io/copy (io/file "/tmp/words") (apply io/file segments)) ; ❸
;; nil

(count (line-seq (io/reader (io/file "/tmp/words")))) ; ❹
;; 235886

(count (line-seq (io/reader (apply io/file segments))))
;; 235886

Instead of a single string containing the path, we assembled the path out of fragments.

make-parens creates any non-existent folder, but does not try to interpret “file.txt” as one, considering it a file name instead.

The same fragments of file name can be used with io/file to copy content over to the new folder.

We can check if the content was correctly copied comparing lines at origin with the destination.

We can use delete-file to remove files. The types supported are the same as io/file. We can additionally pass a second argument if we want to prevent delete-file to throw an exception in case of error:

(require '[clojure.java.io :as io])

(io/delete-file "/does/not/exist") ; ❶
;; IOException Couldn't delete /does/not/exist

(io/delete-file "/does/not/exist" :ignore) ; ❷
;; ignore

(io/delete-file "/tmp/a/b/file.txt" "This file should exist") ; ❸
;; true

When we try to delete a file that does not exist, delete-file throws exception.

We can prevent the exception in case of non-existent files, by passing a second argument which is returned to signal that the operation was not successful.

This file was created previously and should exist on the file system. delete-file correctly returns true.

as-relative-path retrieves the path from resources objects (such as files, URIs, URLs). This is especially useful to convert file objects into path strings for further processing:

(require '[clojure.java.io :as io])

(def folders ["root/a/1" "root/a/2" "root/b/1" "root/c/1" "root/c/1/2"]) ; ❶

(map io/make-parents folders) ; ❷
;; (true false true true true)

(map io/as-relative-path (file-seq (io/file "root"))) ; ❸
;; ("root" "root/a" "root/c" "root/c/1" "root/b")

We have a group of nested folders as a vector of strings.

We can use make-parents to create all the necessary folders. Note that folders don’t start with “/” (on a Unix system this means they are not absolute paths).

After creating a sequence of all the files within “root” with file-seq, we can extract their path strings with as-relative-path.

That’s all for this article. If you want to learn more about the book, check it out on Manning’s liveBook platform here.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Manning Publications

Manning Publications

Follow Manning Publications on Medium for free content and exclusive discounts.