Signal AI is an AI company that extracts knowledge from unstructured textual content and processes near to 5 million documents a day from various data sources, providing insights to its subscribers.
JUXT visited Signal AI for a Clojure In five years ago. I wanted to revisit to understand if Clojure has stood the test of time, and how they feel about it now.
I met with VP of engineering Alan Wright, Senior Software Engineers Joachim Draeger and Charlie Briggs, Richard Lewis Jones on the Search Team, Rahul Dé (Site Reliability Engineer), and CTO Luca Grulla.
Jon: What languages do you use now?
Joachim: Clojure is then our go-to language for APIs and data manipulation. It also shines if you need to do anything fast in a multi-threaded way.
Richard: We’re also using Clojure to build an abstract interface on top of Elastic Search.
Jon: Have you looked at Clojure for the Data Science part, rather than Python?
Charlie: We’ve realized that if you need to do any data science, just start with Python. We haven’t tried to shoehorn Clojure into that ecosystem. In the previous Clojure In interview, Miguel talked about the improvements of data science on Clojure and it doesn’t seem to have come to fruition in any way that’s usable for us in production.
Joachim: You can’t really compare the data science ecosystem between Clojure and Python, it’s all happening in the Python space.
Jon: So why not use Python throughout?
Charlie: The most powerful feature I find with Clojure is how easy it is to do multi-threading and parallelism and get good performance out of the box. It’s very hard to use Python because of global interpreter lock and similar issues. Whenever we’ve tried to tackle it, it’s always been a bit of a pain.
Joachim: It takes a lot more effort with Python to get all the dependencies straight and to get it to work properly in Docker. The JVM is a lot more mature in that sense. It’s easier to run Clojure services in production, to scale and to monitor them than it is for Python.
Clojure is also great for any kind of data manipulation. Our machine learning pipelines and APIs are all using a mix of Python/Clojure.
Rahul: We’ve historically used Python for infrastructure automation. Where we’re dealing with a lot of unstructured data and especially query analysis for Elastic. If you compare Golang and Clojure in terms of handling unstructured dynamic data, that’s a big plus point for Clojure.
Jon: Could you expand more on what you do with unstructured dynamic data?
Rahul: The primary use case that we have is to run this massive search cluster and we monitor this cluster at all times. We need to protect it against large queries that could overwhelm or load the cluster. Clojure allows us to walk the tree with its standard library and to preemptively measure the impact of a query and, so we can therefore plan and adjust.
Jon: It’s quite inspiring, you’ve got those three different languages for each relevant area and you choose which one works best. It’s great to hear that Clojure is holding its own.
Luca: This to me is the strength of Clojure. The design of the core library is of such high quality that there’s no major changes, not because there’s no innovation but because it achieves such a level of instruction. Unlike other languages where the framework is constantly being revisited, you can do a lot with the core library, it stands the testament of time.
Jon: How have you found it in terms of hiring and training for Clojure?
Charlie: I joined because I wanted to learn Clojure.
Richard: I don’t consider there’s a problem picking up Clojure. It’s an attractive thing for people who have discovered it.
For example we’ve had two recent joiners who hadn’t used Clojure before. One has Lisp/C# experience and the other had Python. They were both aware of functional language concepts and they were quick to pick it up.
To help with this, for a time we ran a Clojure club, so people who aren’t familiar can get support from people with experience.
Rahul: The club was born from people who were learning it and they’ve built a framework and wiki which is available to the team, which shows them how to go about the learning.
Charlie: It’s great we have this now as when I joined it wasn’t so in-depth, there were reading suggestions but now it’s more comprehensive.
Jon: It’s great, you’ve got a full range of people: from juniors learning the technology, people joining for the technology, and then people you’ve pulled across from adjacent technologies like Python and Common Lisp. Have you had any challenges from the team with your decision to adopt Clojure?
Richard: Not really, it’s nice to play around with new technologies but our heart is in Clojure.
Jon: What do you think of the general state of Clojure?
Richard: Stability is a great thing, which is sometimes mistaken for stagnation by people who don’t understand that stability is important. The language itself, as Luke said, is not growing, but I don’t think it should; it’s perfectly good as it is.
The community is growing around it with new ecosystem technologies like Babashka. The Clojure world is quite exciting and hopefully my colleagues share that excitement.
Working with dynamic data
Jon: It feels you’re aligned to dynamic languages. Do you feel the static versus dynamic debate has moved on at all?
Richard: I’m attracted to the elegance of some of the newer languages out there on the frontend such as Elm and Reason. I’m not a fan of TypeScript because it’s a halfway house, it’s not inferred, so you just end up with horrible types you’ve got to deal with. Languages that are properly typed first, they’re nice, but it’s not necessarily suitable for a lot of what we’re doing, so Clojure wins.
Charlie: We’ve had some pain with things not being typed for maintenance, such as APIs. I feel the solution for that is not necessarily adding types to the language but definitely adding more schemas, and structure to the data you consume and produce.
We tried to use Clojure spec to make our boundaries more strict. We found it hard to work with, and now we use Malli and it just seems more intuitive. We never really got to grips with how to structure our specs and actually to use them in the right context, like development Vs production. You seem to need to do a lot of work to make it actually fit for human consumption.
Richard: On my team we also use Malli and other tools from Metosin for routing.
Jon: What IDE do you use?
Charlie: If you’re a new joiner and you don’t already use Vim or Emacs, then you will probably end up using VS code and Calva. I haven’t seen that many people jump over to another IDE whilst learning Clojure.
Richard: One of our new starters who was learning Clojure and is a Lisper, hadn’t used Emacs before but did take on both Emacs and Clojure at the same time. His mindset helped, but he seems to have taken to it very well.
Joachim: My take on Emacs and Vim is that if you give it an honest try and are still struggling after 6 months, then it’s probably not for you. I was basically one of those people. I think some people use it really well but other people do struggle.
Rahul: I’m firmly in the Vim camp.
Jon: What tools and libraries do you use?
Richard: In the Search team we’re aiming for simplicity at all costs. For lifecycle we’re with Mount and configuration we’re using cprop and that’s a fairly simple and easy way of configuring your projects. We like deps.edn for dependency management and Cheshire for JSON.
Rahul: For dealing with large sets of data that are fast flowing, I prefer to use Charred by Chris Nuernberger which handles JSON and CSV more efficiently and is how we’re able to process 70 megabytes in milliseconds.
Jon: You mentioned Babashka, can you tell us more?
Rahul: I’m one of the parents of this project, contributing and maintaining it for a couple of years now along with borkdude. We use it for monitoring clusters of machines, i.e. our scripts tell us: “this machine could be recycled better”, or “here’s why these machines need a bit of help”.
Other Babashka use cases are partial recovery for unstructured data and even an onboarding script for new people; the script writes an email, sets them up with VPN and their AWS account, and that’s all done in Babashka.
Richard: Babaskha can be seen as a better Python for certain scenarios. I’ve done plenty of Python in the past and it has been my go-to language for getting stuff done quickly. But if you’re programming in Clojure to begin with, you get the benefits of Clojure, plus you get the same speed and immediacy that Python gives.
I like the fact that you can play your dependencies in the same script. Recently I wrote a quick command line tool to tail SNS topics, which just attaches to an SNS topic, creates a temporary queue, and just read off that queue. That’s a few lines of Babashka and it’s a very nice looking online tool and it’s just one script with all the dependencies specified.
Jon: Thanks for your time!