ARTICLE
Dive into clojure.java.io
From Clojure, The Essential Reference by Renzo Borgatti
clojure.java.io contains a collection of functions to simplify the interaction with the Java Input/Output system (or simply IO). Over the years, Java evolved the original InputStream and OutputStream abstractions into Reader and Writer, eventually adding also asynchronous IO. During this transformation, Java put a lot of effort in maintaining backward compatibility, a principle also shared with Clojure. Unfortunately, there are now coexisting IO APIs that impact negatively on usability, forcing Java developers through bridges and adapters to move between different styles of IO.
Take 37% off Clojure, The Essential Reference by entering fccborgatti into the discount code box at checkout at manning.com.
clojure.java.io
does not implement a new IO system, but does a great job at shielding Clojure developers from most of the inconsistencies generated by the fragmented Java interface. This is achieved by a few polymorphic multimethods that can be further extended if needed. The namespace also contains a few utility functions to work with the classpath, files and URLs.
Streams, Writers, and Readers
Let’s start by illustrating the following functions:
clojure.java.io/reader
andclojure.java.io/writer
produce ajava.io.BufferedReader
andjava.io.BufferedWriter
objects respectively. They accept a variety of input types like readers, streams, files, URLs, sockets, arrays and strings. The fact that areader
accepts ajava.io.InputStream
, for example, removes most of the boilerplate required in Java to move between the two abstractions.clojure.java.io/input-stream
andclojure.java.io/output-stream
produce ajava.io.InputStream
andjava.io.OutputStream
objects respectively. They accept the same input types asreader
andwriter
, including accepting areader
as input (transforming between reader/writer and input/output streams).
In the following example we can see how to create a reader
from a file. The reader should keep in mind that, in general, IO objects allocate resources on the host operating system and they need to be released or closed. We can use with-open
to release resources after use:
(require '[clojure.java.io :as io]) ; ❶
(with-open [r (io/reader "/usr/share/dict/words")] ; ❷
(count (line-seq r))) ; ❸
;; 235886
❶ clojure.java.io
is usually aliased as io.
❷ reader interprets the first string argument as a path to a file or remote URL.
❸ line-seq creates a lazy sequence by reading line items from the reader object.
Sometimes it’s useful to create a reader
from a string (especially for testing), but reader
interprets strings as locations. We can achieve the desired effect by transforming the string into a character array first:
(require '[clojure.java.io :as io])
(def s "string->array->reader->bytes->string") ; ❶
(with-open [r (io/reader (char-array s))] ; ❷
(slurp r)) ; ❸
;; "string->array->reader->bytes->string"
❶ io/reader
is commonly used to load external resources. Sometimes, especially for testing, it’s useful to create a reader
directly from a string. We use this simple string for illustrative purposes.
❷ char-array
transforms the string into a primitive array of chars, preventing reader
interpretation of the string as location.
❸ slurp has polymorphic behavior similar to reader
and in this case transforms the reader
back into a string by reading its content.
The book contains other interesting examples of use of io/reader
: in line-seq we show how to read from a java.io.InputStream
. In disj instead, we can see an example about how to read from a java.net.Socket
object.
Not surprisingly, writer
creates a new writer object accepting the same first argument types as reader
:
(with-open [w (io/writer "/tmp/output.txt")] ; ❶
(spit w "Hello\nClojure!!")) ; ❷
(println (slurp "/tmp/output.txt")) ; ❸
;; Hello
;; Clojure!!
Nil
❶ Using a writer
is very similar to using a reader
. writer
creates the object “w” that will automatically close at the end of the expression thanks to with-open.
❷ spit sends the content of a string into a file. If the file already exists, the content is overwritten.
❸ To test the content of the file, we can use slurp instead of passing through a reader
.
When data processing consists of reading from a large file, operate some transformations and write the results back to disk, we can chain a reader
and a writer
together and process data using lazy functions like line-seq. By operating this way, we avoid to load the entire input into memory:
(require '[clojure.java.io :refer [reader writer]])
(require '[clojure.string :refer [upper-case]])
(with-open [r (reader "/usr/share/dict/words") ; ❶
w (writer "/tmp/words")]
(doseq [line (line-seq r)] ; ❷
(.append w (str (upper-case line) "\n")))) ; ❸
;; nil
❶ Both reader
and writer
need to be closed after use. In this example we use the dictionary file presents on most Unix-based systems. The file is large and we want to avoid to load its entire content to memory.
❷ Using doseq, we make sure that side effects are evaluated lazily and without holding the head of sequence. The net effect is that just a small portion of the file is present in memory at any given time, while the garbage collector can claim any processed item that was already written to disk.
❸ We wouldn’t be able to use spit repeatedly, because the first call would close the writer
.
reader
accepts an :encoding
key, while writer
accepts both :encoding
and :append
keys. The :encoding
key forces a specific encoding for reading or writing data. The :append
key forces writer
to append new data at the end of the output stream. The following example shows how to force “UTF-16” encoding (instead of the default “UTF-8”) and removes the need to explicitly call the .append
method (compared to the previous example):
(with-open [r (reader "/usr/share/dict/words" :encoding "UTF-16")
w (writer "/tmp/words" :append true :encoding "UTF-16")] ; ❶
(doseq [line (line-seq r)]
(.write w (str (upper-case line) "\n")))) ; ❷
❶ We can use :append
to prevent writer
from removing any previous content from the file while writing new content.
❷ Instead of using the .append
method we can now using the more generic .write
and control the behavior using configuration options.
Please note that forcing “UTF-16” encoding in the example above only makes sense if the input file is written with that encoding. It was used in the example for illustration purposes.
Resources and URLs
You can find examples showing reader
or slurp loading resources using strings to indicate their location. reader
interprets the given string similarly to an URL (Uniform Resource Locator). An URL is a convention to format strings to encode the location of resources across the network. Confusingly enough, a java.io.File
object has methods to transform it into an URI (Uniform Resource Identifier) but not an URL. An URI is a slightly more general object than an URL.
In Java programming, URL and URI need some convoluted transformations to use with readers and files. Clojure hides this complexity away allowing us to create a reader from a file or a string without thinking about such conversions. More specifically, the following functions from clojure.java.io
are available to deal with resource and locations:
resource
retrieves an URL object given a string representing the location of a resource on the Java classpath. Resources on the classpath are different from resources on disk as their location is independent from the location of the running Java executable.as-url
: creates an URL object given a string representing its location (it could be classpath, local file system, or other protocols such as “http”).
resource
is quite common in Clojure programming to retrieve resources from the Java classpath. The classpath normally contains compiled Java classes, Clojure sources (unless they are explicitly removed) or other artifacts. We could for example retrieve the source of the clojure.java.io
namespace with the following:
(require '[clojure.java.io :refer [resource reader]])
(def cjio (resource "clojure/java/io.clj")) ; ❶
(first (line-seq (reader cjio))) ; ❷
;; "; Copyright (c) Rich Hickey. All rights reserved."
❶ Clojure sources are packaged as part of the Clojure executable. We can find them using the relative path of the file inside the Jar archive.
❷ We can see the first line of the file after using a reader
and line-seq.
as-url
is a small utility function to create URL objects (without the need of importing java.net.URL
to use its constructor directly). as-url
adds some level of polymorphism to handle input types other than strings:
(require '[clojure.java.io :refer [as-url file]])
(import 'java.nio.file.FileSystems)
(def path ; ❶
(.. FileSystems
getDefault
(getPath "/tmp" (into-array String ["words"]))
toUri))
(def u1 (as-url "file:///tmp/words")) ; ❷
(def u2 (as-url (file "/tmp/words"))) ; ❸
(def u3 (as-url path)) ; ❹
(= u1 u2 u3) ; ❺
;; true
❶ path
shows how to convert a Java NIO (New IO Api) path into an URI.
❷ as-url
accepts strings (with protocols) to identify a location on disc of a file.
❸ as-url
also accepts the same location as a java.io.File
object.
❹ Finally, as-url
also accepts an URI as the result of passing through a java.nio.file.Path
object.
❺ The 3 urls are different objects, but they represent the same location on disk of the file “/tmp/words”.
Unfortunately, clojure.java.io
doesn’t handle coercions or transformations of java.nio.file.Path
objects directly, as demonstrated by the previous example where we had to explicitly call toUri()
before calling as-url
. But clojure.java.io
can be extended to handle java.nio.file.path
(and similarly other protocols):
(require '[clojure.java.io :as io])
(import '[java.nio.file Path FileSystems])
(extend-protocol io/Coercions ; ❶
Path
(as-file [path] (io/file (.toUri path)))
(as-url [path] (io/as-url (.toUri path))))
(def path ; ❷
(.. FileSystems
getDefault
(getPath "/usr" (into-array String ["share" "dict" "words"]))))
(io/as-url path) ; ❸
;; #object[java.net.URL 0x1255fa42 "file:"/usr/share/dict/words"]
(io/file path) ; ❹
;; #object[java.io.File 0x1c80a235 "/usr/share/dict/words"]
❶ clojure.java.io
contains the Coercions
protocol declaring two functions, as-file
and as-url
. While as-file
has the file
wrapper function available, as-url
doesn’t have a corresponding url
function. The implementation consists of transforming the path into an URI and call the corresponding (and already existing) implementations.
❷ Java NIO Path objects are roughly equivalent to URLs. java.nio.file.Path
only has a translation into URI available that we can use to create an URL. The getPath()
method takes a first “root” argument of the initial part of the path, followed by any other segment as a variable argument type. Clojure needs to create an array of strings to be compatible with the type signature.
❸ After extending the protocol, we can use as-url
to transform java.nio.file.Path
directly.
❹ As a bonus, also file
can now create a file object directly from a path.
Dealing with Files
Dealing with files is another important aspect in any programming language. Clojure relies on java.io.File
for file operations and clojure.java.io
contains a few utility functions to deal with files.
We’ve already seen io/file
in action multiple times in this section. The function takes one or more arguments. When only one argument is present, it could be a string, another file, a URL or URI (or, less interestingly, nil
):
(require '[clojure.java.io :as io])
(keys (:impls io/Coercions)) ; ❶
;; (nil java.lang.String java.io.File java.net.URL java.net.URI)
(io/file "/a/valid/file/path")
;; #object[java.io.File 0x7936d006 "/a/valid/file/path"]
(io/file (io/file "/a/valid/file/path"))
;; #object[java.io.File 0x3f46ce65 "/a/valid/file/path"]
(io/file (io/as-url "file://a/valid/url"))
;; #object[java.io.File 0x7af35ada "/valid/url"]
(io/file (.toURI (io/as-url "file://a/valid/uri")))
;; #object[java.io.File 0x2de6a5c8 "/valid/uri"]
(io/file nil)
;; nil
❶ We can see what single argument types io/file
accepts by checking the :impl
key of the Coercions
protocol. What follows is a list of all the possible calls to io/file
with the respective argument types.
The default list of types that io/file
can understand is visible inside the Coercion
protocol map, as demonstrated in the example. We’ve already seen that, by extending this protocol, we can apply io/file
to other argument types. io/file
also accepts other arguments after the first, with the same type constraints. Additional arguments have to be relative paths (i.e. they cannot start with a forward slash ‘/’):
(io/file "/root" (io/file "not/root") "filename.txt") ; ❶
;; #object[java.io.File 0x6898f182 "/root/not/root/filename.txt"]
(io/file "/root" (io/file "/not/relative") "filename.txt") ; ❷
;; IllegalArgumentException /not/relative is not a relative path
❶ All arguments to io/file
after the first need to be relative paths.
❷ Here the second argument starts with ‘/’ which denotes another root path after the first.
io/file
does not actually create a physical resource, but just a “pointer” that other functions like writer can use to write content to. Another way to create content is to copy one file to another using the io/copy
function:
(require '[clojure.java.io :as io])
(io/copy "/usr/share/dict/words" (io/file "/tmp/words2")) ; ❶
;; nil
(.exists (io/file "/tmp/words2")) ; ❷
;; true
❶ We can use io/copy
to copy the existent /usr/share/dict/words
file into a new file in the /tmp
folder.
❷ To check if the file was actually created, we can use the exists()
on the java.io.File
object.
io/copy
supports many type combinations: from reader to writer, from string to file, from InputStream
to OutputStream
and so on. One of them, from file to file, is specifically optimized using java.nio.channel.FileChannel
which guarantees optimal performance when the file is cached by the operative system. io/copy
however, does not support a string to string transfer (with a file to file copy implementation). We can extend io/copy
using the related do-copy
multimethod:
(require '[clojure.java.io :as io])
(defmethod @#'io/do-copy [String String] [in out opts] ; ❶
(apply io/copy (io/file in) (io/file out) opts))
(io/copy "/tmp/words2" "/tmp/words3") ; ❷
;; nil
(.exists (io/file "/tmp/words3")) ; ❸
;; true
❶ The defmethod definition for io/do-copy
is private in clojure.java.io
but we can still make access to it by looking up the related var object (with the reader macro #'
) and then dereferencing the var with @
(another reader macro). The implementation simply call io/file
on each argument.
❷ io/copy
now accepts a pair of strings as arguments.
❸ We can verify the file was effectively created.
The io/do-copy
signature shows that io/copy
accepts options:
:buffer-size
defaults to 1024 bytes and can be used when the origin argument is anInputStream
.:encoding
is similar in use to the same option we’ve seen forreader
andwriter
, forcing a specific encoding. It defaults to “UTF-8”.
When a file path requires sub-folders, that don’t yet exist, we can use make-parents
to create all folders recursively. make-parents
does not create the last path segment, considering it the name of a file:
(require '[clojure.java.io :as io])
(def segments ["/tmp" "a" "b" "file.txt"]) ; ❶
(apply io/make-parents segments) ; ❷
;; true
(io/copy (io/file "/tmp/words") (apply io/file segments)) ; ❸
;; nil
(count (line-seq (io/reader (io/file "/tmp/words")))) ; ❹
;; 235886
(count (line-seq (io/reader (apply io/file segments))))
;; 235886
❶ Instead of a single string containing the path, we assembled the path out of fragments.
❷ make-parens
creates any non-existent folder, but does not try to interpret “file.txt” as one, considering it a file name instead.
❸ The same fragments of file name can be used with io/file
to copy content over to the new folder.
❹ We can check if the content was correctly copied comparing lines at origin with the destination.
We can use delete-file
to remove files. The types supported are the same as io/file
. We can additionally pass a second argument if we want to prevent delete-file
to throw an exception in case of error:
(require '[clojure.java.io :as io])
(io/delete-file "/does/not/exist") ; ❶
;; IOException Couldn't delete /does/not/exist
(io/delete-file "/does/not/exist" :ignore) ; ❷
;; ignore
(io/delete-file "/tmp/a/b/file.txt" "This file should exist") ; ❸
;; true
❶ When we try to delete a file that does not exist, delete-file
throws exception.
❷ We can prevent the exception in case of non-existent files, by passing a second argument which is returned to signal that the operation was not successful.
❸ This file was created previously and should exist on the file system. delete-file
correctly returns true
.
as-relative-path
retrieves the path from resources objects (such as files, URIs, URLs). This is especially useful to convert file objects into path strings for further processing:
(require '[clojure.java.io :as io])
(def folders ["root/a/1" "root/a/2" "root/b/1" "root/c/1" "root/c/1/2"]) ; ❶
(map io/make-parents folders) ; ❷
;; (true false true true true)
(map io/as-relative-path (file-seq (io/file "root"))) ; ❸
;; ("root" "root/a" "root/c" "root/c/1" "root/b")
❶ We have a group of nested folders as a vector of strings.
❷ We can use make-parents
to create all the necessary folders. Note that folders don’t start with “/” (on a Unix system this means they are not absolute paths).
❸ After creating a sequence of all the files within “root” with file-seq, we can extract their path strings with as-relative-path
.
That’s all for this article. If you want to learn more about the book, check it out on Manning’s liveBook platform here.