praeclarum

I Built the World’s Largest Translated Cuneiform Corpus using AI

2023-06-09T00:00:00+00:00

TL;DR I used a custom-trained Large Language Model (T5) to create the world’s largest online corpus of translated cuneiform texts. It’s called the AICC (AI Cuneiform Corpus) and contains 130,000 AI translated texts from the CDLI and ORACC projects.

Cuneiform

Cuneiform is the oldest known writing system. It was used in Mesopotamia (modern day Iraq) for over 3,000 years. It was used to write Sumerian, Akkadian, and other languages. Written on clay, it has survived the millennia and is now being translated by scholars around the world.

Sadly, we have more clay tablets than scholars. Fortunately, we have computers.

Introducing the AICC

I’m proud to introduce the AICC - a collection of 130,000 cuneiform texts translated from ancient Sumerian and Akkadian to English using a neural network. It is the largest collection of translated cuneiform texts in the world.

This is the 2nd edition of the translated corpus I released last summer. The 1st edition contained about 30,000 texts but this new edition boasts 130,000 texts. The corpus is growing fast!

How good are the translations? Well, they’re decent. :-) I hope you’ll go browse the site and see for yourself.

Judging the quality of cuneiform translations has a rich history. Indulge me in a story.

Can it Translate Tiglath-Pileser?

In 1857 a new cylinder inscribed with cuneiform text and the name Tiglath-Pileser was found (dated 1150 BC). At this time, cuneiform was just being relearned and there was a question as to how good various translation methods were.

The Royal Asiatic Society decided to perform an experiment (that was later published as the book Inscription of Tiglath Pilser I., King of Assyria). They would give the same inscription to three different translators and see how well they did. The idea was that if the translations were similar, then the current understanding of cuneiform was also good.

Here is a page from the book showing the beginning of two of the three translations:

We can see that although the translations are different, they convey the same meaning. The experiment was a success.

I am pleased to now add my own (well, my AI’s) translation to the mix. Just one problem, my corpus contains many objects with similar inscriptions and I’m not sure which one is this specific Tiglath-Pileser cylinder. There’s P393923, P463064 (newly translated), P463510 (newly translated), P467316 (newly translated), Q005926, Q006021 and more.

I decided in fairness and in the spirit of the competition to translate one that had no previous translation. I chose P467316 as its beginning seemed to match the other translations.

AI Translation of Tiglath-Pileser

AI Translation.

Ashur, great lord, who makes the totality of the gods submit, who gives scepter and crown, who establishes kingship; Enlil, lord, king of all the Anunnaku gods, father of the gods, lord of the lands; Sin, wise one, lord of the crown, saqu-worthy; Magur, Shamash, judge of heaven and underworld, who carries out the slander of the enemy, who breaks up the foe; Adad, hero, conqueror of the four quarters of the lands, the four quarters; Ninurta, hero, villainous sacrificial spirit and enemy, who destroys the heart of the people; Ishtar, foremost among the gods, lady of battle;

The great gods, who make the heavens and earth a lordly place, whose utterances are a scepter and a scepter, who make kingship supreme, Tiglath-Pileser, beloved prince, your beloved, your shepherd, who by your true heart you have entrusted to me, this exalted one, you have established for the sovereignty of the land of the great Enlil, you have granted him a scepter.

Continued in P467316.

While stilted in places, it is a decent translation, and I deem this experiment a success!

Why AI Translations?

Existing online repositories (CDLI, Oracc) contain many transliterations of ancient cuneiform texts (a transliteration is a rewriting of a text from one writing system to another without changing the language), but they are very lacking in the translations department.

While I am not a cuneiform expert, I am an expert at neural networks and have a deep passion for languages and writing systems. I want any person to have access to the archives of the ancients. A grandiose goal for sure, but also a very achievable one thanks to modern engineering advancements.

Sumerian

Consider Sumerian (spoken by the creators of cuneiform). There are currently 103,075 texts published with transliterations from cuneiform symbols to (mostly) latin letters. But only 4,583 of these texts have publicly available translations online. That is a mere 4% of texts available to a lay person such as myself.

Publications	Count
Transliterated	103,075
Translated	4,583
Need Translations	98,492

Given the existing transliterations, there are 98,492 works that can be translated but have not yet been.

(There are more translations than these, but the others are not freely available and are held under copyright. In other words, you need to go by a book to read them.)

Things aren’t much better for Akkadian (the language spoken by the famous Sargon and Ashurbanipal).

Akkadian

Publications	Count
Transliterated	31,747
Translated	10,069
Need Translations	21,678

We can see that 21,678 works are all set to be translated but have not been.

Training a Large Language Model

The modern advancement of large language models (LLMs) has affected and will continue to affect nearly every human endeavor.

The current architecture that is heralding this new age of knowledge is the Transformer architecture. It was designed specifically to be very good at translating text from one language to another using the innovative “attention mechanism”. It’s a little funny that this network designed for translation is now broaching the realm of artificial general intelligence (AGI), but I digress.

Ignoring the absurdly large LLMs that are dominating the field now (GPT-4 and friends), the humble smaller transformers are still quite powerful and have made the problem of translation a somewhat trivial.

My favorite one of these is the T5 network from Google. While large itself, it is capable of being trained using off-the-shelf (though expensive) GPUs. If you can build a large training set, you can train this network at home to accomplish wonders.

Knowing this I set about building a training set that the network could use to learn these ancient languages.

Building the Dataset

Thankfully there has been a push to digitize acquired artifacts and to publish their cuneiform on the web.

The two great projects are the CDLI (Cuneiform Digital Library Initiative) and Oracc. I owe a large debt to these projects.

As any machine learning expert will tell you, 90% of the problem is collecting a good training dataset (the other 10% is justifying the compute bill). Building the cuneiform dataset presented its own unique set of challenges.

Inconsistent Transliterations

Sadly, Assyriologists took some time to settle on a consistent transliteration system. When works were first transliterated to a digital form, only ASCII characters were available and the researchers made due using funny characters like # to denote demonstratives, numbers to disambiguate symbols, and ALL CAPS whenever they were in the mood (just kidding, but the use is so random it might as well be).

When other character encodings became available, researchers adapted. They started to use diacritic marks to disambiguate symbols (loosely based on guessed sounds). And then HTML was invented and they went wild with special marks attempting to better capture the original writing.

While neural networks are powerful and can certainly handle these inconsistencies, it’s not ideal. If you want the network to properly learn the language it’s best not to distract it with also learning the histrionics of human computer interface systems.

A wrote a variety of cuneiform and english normalizers to help with this problem. They’re not perfect, but they do a decent job.

Paragraph Wrapping and Unwrapping

Cuneiform texts are usually written line by line in a column and are read from top to bottom.

These lines are often short and, when translated, contain even fewer words. If I train the network on just these lines (and, surprise, I did for the 1st edition), the translations it produces are also short and choppy. They’re not great.

To work around this problem, I automatically “unwrap” lines into paragraphs to be translated all together. This way the network can learn to translate longer sentences and paragraphs.

The network, however, has its own limitations and can only translate sentences up to 512 tokens long. To work around this problem, I “wrap” the paragraphs into chunks of up to 512 tokens and translate those. I then stitch the translations back together to form the final translation.

This “unwrap” then “wrap” process is not perfect and can lead to some strange translations, but it’s better than the alternative of just translating single lines.

Training Process

I started with a pre-trained a T5 base model from Hugging Face and fine-tuned it on my dataset. This model has 220 million parameters and is capable of translating sequences of up to 512 tokens.

I trained it on a dataset of 210,247 translation examples for 30 epochs. It took about 48 hours on my RTX3090.

While starting with a pre-trained model saves me a lot of compute time, it has drawbacks. The pre-trained model was trained to translate from English to French or German. Ideally, I would have a model that was pre-trained to translate to English.

Also, I used its default tokenizer which does not support all the characters I need and performs poorly on the transliterated cuneiform.

Learning Sumerian and Akkadian Simultaneously

Since my datasets are small in size, I decided to combine learning Sumerian and Akkadian simultaneously. This has the benefit of increasing the training size and exposing the network to more cuneiform symbols. Interestingly, Akkadian often uses some Sumerian intermixed with its own language so it’s not a bad idea to train on both.

Bidirectional Translation

The network was having a hard time converging on a good solution. It would train well enough for many epochs, and then it would fall apart.

I found a regularization strategy that helped a lot. I would train it to also translate from English to Sumerian and Akkadian. Doing this helped the network to always converge. I assume this is an affect of using the pre-trained network.

While translating from English to Akkadian or Sumerian is not a useful task, it is a “fun party trick” as my friend put it.

Future Work

I want to continue to improve the translations and hope to take these steps in the future:

Fine-tune the model for specific translation tasks like Akkadian to English.
Pre-train a new model from scratch using a better tokenizer.
Train a larger model like T5 large.
Add more training data.

Conclusion

I hope you enjoyed this deep dive into neural networks and ancient languages.

When I started this project, I had no idea whether it would work or not. I was delighted that it did, and I am extremely delighted to be able to introduce the AICC to the world. Now amateur Assyriologists like myself can read and read to their heart’s content.

Side note: If you are an academic and would like to collaborate on this project, please reach out to me by filing issues on GitHub. I have a million questions about cuneiform that I would love to ask you.

How I Re-implemented PyTorch for WebGPU

2023-05-19T00:00:00+00:00

TL;DR I’ve been working on a WebGPU optimized inference and autograd library called webgpu-torch with an API that matches PyTorch. The goal is to run neural networks in the browser at speeds comparable to a Linux workstation. Many kernels have been implemented and its design is easily extensible. It’s available on NPM now and works in both the browser and Node.js!

Neural Networks in the Browser

Nine months ago, I got Hugging Face Transformers (Large Language Models like GPT but a wee bit smaller) working in the browsers thanks to the ONNX web runtime and some painfully hand-coded tokenizers.

It’s quite liberating running these nets in the browser since the web is the best software distribution platform ever created. You can just send someone a link and they can run your code. No need to install anything. No need to worry about what OS they’re running. No need to worry about what hardware they have. It’s all just there.

The only problem is that ONNX is a wee bit, shall we say, slow.

Thankfully, WebGPU has arrived in browsers and we can now properly access the GPU to write optimized kernels for neural network operations. This is a huge deal. It means we can now run neural networks in the browser at speeds comparable to NVIDIA/CUDA.

Someone just needs to, you know, do the hard work of implementing all those operations for the GPU.

Well that’s what I’m very pleased to announce I’ve been working on for the past few months. I’ve been re-implementing PyTorch in TypeScript for WebGPU.

What is a PyTorch?

PyTorch is a wrapper over the torch runtime (which I first used with Lua) for performing neural network operations. It’s a very popular library for doing AI work and seems to have won the arms race for now.

The library is broken up into parts:

An optimized (for GPU) math library supporting element-wise operations, matrix multiplication, convolutions, reductions, etc. over tensors.
An automatic differentiation library (autograd) that is just a lot of bookkeeping to keep track of the operations performed on tensors so that gradients can be calculated.
A neural network library that is just a bunch of layers that can be composed together to form a neural network.

Doesn’t sound so hard to re-implement right? And so I did.

What is a WebGPU?

WebGPU is the new standard for accessing GPUs from the browser. It supports generic compute shaders and is designed to be a low level API that can be used to build higher level libraries. The compute shaders are able to break work up into a 3D grid and, so long as you can reformulate your code to take advantage of that 3D grid, you can benefit from dedicated hardware doing the computations.

This is perfect for the web since JavaScript is single-threaded and not optimized for doing heavy computation. The GPU is a perfect fit for this since it’s designed to do heavy computation in parallel.

Writing Optimized WebGPU Kernels

PyTorch is very mature now and supports a huge variety of operations. It’s also very well optimized for CUDA and CUDNN (NVIDIA’s compute libraries). So how do you go about re-implementing all of those for WebGPU?

Well, you start with the basics. You implement the basic operations like element-wise operations, matrix multiplication, convolutions, reductions, etc. But there is a tremendous amount of similarity between these operations.

For example, element-wise multiplication and addition only vary by the operator used in the inner loop. The trick is to optimize the memory layout and kernels of those operations so they are fast. They need to adapt to big and small GPUs and they need to adapt to big and small workloads.

This is a perfect scenario to take advantage of code generation. I wrote a code generator that takes a template and generates the optimized kernels for each operation. The code generator is written in TypeScript and generates WebGPU compute shader code. This means that the generated code can be heavily optimized for the given scenario and those optimizations can be shared between operations.

For example, here is how I define the ReLU operation (from op_table.ts):

{
    name: "relu",
    nnName: "ReLU",
    nnOp: true,
    type: "unary",
    forward: "output = max(input, 0.0)",
    backward: "inputGrad = input > 0.0 ? outputGrad : 0.0",
}

In this template I define both the forward computation max(input, 0.0) and the backward computation input > 0.0 ? outputGrad : 0.0. The code generator then generates the optimized kernels for both the forward and backward passes based on the size of your GPU (the size of compute workgroups) and the shape of tensors (in addition to the memory layouts of the tensors).

Keeping the template short and simple gives me flexibility to optimize the kernels as needed while preserving the core logic. For example, different kernels can be emitted for contiguous memory tensors vs strided memory tensors. For operations like reductions, 1D, 2D, 3D, and xD kernels can be emitted to take advantage of the 3D workgroup grid.

At first I designed the template system to help me save some typing, but I quickly realized its power and now I use it for all operations.

Debugging WebGPU Kernels

Another huge benefit came from the fact that I was generating the kernels. I could generate the kernels to not only emit WebGPU code, but also JavaScript code. The core logic gets wrapped in another function that can be called from JavaScript. This means that I can run the same code in JavaScript and WebGPU and compare the results. Even better, I can debug kernels in JavaScript and then execute them on WebGPU.

The JavaScript CPU kernels are terribly slow, but they’re not supposed to be fast. They instead provide a convenient playground for debugging and testing kernels.

This also means that my WebGPU library can also run just fine in Node.js, without WebGPU, whatever. Isn’t it great when architectural decisions keep paying off?

Testing

The worst part of using a new neural network library is when it doesn’t give the exact same results as previous libraries you’ve used. One of my biggest frustrations with the WebGL ONNX backend is the fact that it gives very inaccurate results compared to PyTorch. I didn’t want that. I want full fidelity. I want to make sure all my WebGPU kernels match the results of PyTorch operations.

To that end, I have built a test harness that first runs code snippets in PyTorch to record results, then runs the same code snippets in my library and compares the results. If they don’t match, it throws an error.

This has produced a silly but fun web page to go visit. If you go to https://praeclarum.org/webgpu-torch/tests/ you will see a huge set of tests running to verify all the supported operations. It’s a great way to see what operations are supported and what the results are.

Goals

I like to train imaging networks and to that end my goal is to get Stable Diffusion and similar nets running under this library. Once that’s accomplished I will focus on the many Hugging Face transformer networks. I’m hoping to get all of them running in the browser at CUDA speeds.

I have a set of TODOs in the README of the project. If you’re interested in helping out, please take a look!

Generating SVG Images with ChatGPT

2023-04-03T00:00:00+00:00

TL;DR I used ChatGPT, with varying levels of success, to generate SVG images. With enough patience and prompting, you can get it to draw basic shapes and simple scenes about as well as a five year old can. (Which, hey, let’s face is, is good enough a lot of the time.)

Background

I like the idea of using AI’s to generate images. Go back in this blog and you’ll see lots of posts and casual references to imaging networks I like build and experiment with.

With recent advances like DALL-E, Stable Diffusion, and Midjourney, it seems like the static image generation problem has been solved. Sure, there’s plenty of room for growth and improvement, but the basic problem of generating images from text is solved.

But what about vector graphics? As an app developer, I’m often in need of graphical resources. Images are neat, but they’re also so 1970s. They don’t scale well and are a terribly wasteful and inconvenient (for editing) encoding of graphical information.

Like everyone else, I’ve been using ChatGPT a lot recently, and I was curious to see if it could generate SVG code for images. I know it’s pretty good at HTML, so I figured it might be able to do SVG too. It’s certainly not something it was designed to do, but I thought it would be fun experiment to conduct.

To my delight, it actually works. You’ll see below that the results are not always great or, um, err, good, but it’s still pretty cool and useful that it can do this at all.

As of GPT-4, I would say it’s capable of generating decent shapes and simple combinations of shapes. But I’ll let you you judge.

Experiments

The following transcripts are from my first experiments generating SVGs with ChatGPT. I was curious to see what kind of images the model could generate, and how it would respond to different inputs.

I performed the experiments using the ChatGPT API using the gpt-4 model.

I used my app AskGPT to run these prompts and dutifully pasted the responses into SVG files to make this post.

Please generate an SVG of a catus [sic] out in the desert on a sunny day, no explanation.

As an AI language model, I cannot create or directly display SVG images. However, I can help you create the SVG code, which you could then use in any SVG viewer or compatible software/website to visualize the cactus in the desert.

Here’s a simple SVG code for a cactus in a desert environment. Copy and paste this code into an SVG viewer or an HTML document to display the image.

<svg width="300" height="150" xmlns="http://www.w3.org/2000/svg">
    <!-- Background color (sky) -->
    <rect width="300" height="150" x="0" y="0" fill="skyblue" />
    
    <!-- Sun -->
    <circle cx="270" cy="30" r="20" fill="yellow" />
    
    <!-- Ground -->
    <rect width="300" height="50" x="0" y="100" fill="tan" />
    
    <!-- Cactus -->
    <rect width="30" height="80" x="130" y="70" fill="green" />
    <rect width="20" height="30" x="125" y="50" fill="green" />
    <rect width="20" height="30" x="145" y="50" fill="green" />
</svg>

SUCCESS Although the initial part of the response was discouraging (why the low confidence ChatGPT?), the model dutifully generated the SVG code for a cactus in the desert.

The cactus could obviously use some work, but this image is mostly what I was expecting. I know these models are sensitive to inputs, so I decied to give it some encouragement.

OK, great. But please now generate a much more detailed image using gradients and paths

Here is a more detailed SVG code for a cactus in the desert using gradients and paths for a richer visual representation. To view the image, copy and paste the code into an SVG viewer or an HTML document.

(I’m going to elide the SVG code from now on, but you can see it by viewing the source of the images.)

SUCCESS OK, so it’s not going to win any rewards with that, but I applaud the effort. The cactus has a bit more shape and the horizon is more interesting.

It looks like the model is having a hard time getting the directions of arcs correct in the SVG code. Golly, I can relate. I’ve written several SVG libraries and still I can never get the ordering right.

That said, I feel like I’m failing the network by not being a good prompter. Let’s try again with stupid amounts of prompting.

Please generate an SVG of a cactus in the desert on a sunny day. Please make it as detailed as possible. Use vibrant colors. Use dramatic lighting. I want depth! I want to feel like I’m there! I want something shockingly beautiful.

I apologize, but as an AI text model, I cannot create images or SVG files. However, I can help you understand how you can create such an image using a vector graphics software like Adobe Illustrator or Inkscape.

Open your preferred vector graphics software and create a new document.

…

FAILURE Oops, I went too far and it just started explaining to me how to use a drawing program. I guess it doesn’t like that kind of prompting.

Let’s change subjects and see how it does at generating icons.

App UI Icons

As an AI, I cannot generate an SVG file directly in this text interface, but I can provide you with the code that you can copy and paste into an SVG file, like ‘toggle_preferences_icon.svg’.

Here’s a simple SVG code example for a mac toolbar icon with two gear-like symbols representing the toggling of the selected item preferences pane:

FAILURE Haha, OK, um so those aren’t gears. E for effort.

Can we just make a single gear if we concentrate really hard?

Please generate an SVG code of a gear icon. Please make it detailed showing the teeth and such. You can use paths if you want. Make your response as long as it needs to be.

Here’s an SVG code representation of a gear icon using paths:

FAILURE I see gears aren’t its strong suit.

Let’s see if it can generate something more basic.

Please generate an SVG code of a house icon. You can use paths if you want. Make your response as long as it needs to be.

SUCCESS? Do walls need to be straight? Does it matter if your roof has a little overhang? That’s a house. I’ll take it. Even better, I think an app with a “messy icon” aesthetic would be a welcome breath of fresh air.

Enough with boring toolbar icons. Let’s see if it can make a new icon for my app iCircuit.

App Icons

Please generate the SVG code of an app icon for iCircuit, an app that enables you to draw and simulate electronic circuits. The icon should have a nice graident background and fit in well on the iOS home screen. You can use paths if you want. Make your response as long as it needs to be.

FAILURE I dig that gradient (those colors are pretty close to the actual app icon), but the logo is a bit too… abstract. I can forgive abstractness, but I can’t forgive the fact that it’s not centered properly.

Let’s try prompting our way out of this hole.

Nice, I like that gradient. But the inner logo is too weird. Maybe put a nice looking sine wave in there. Centered nicely.

FAILURE The sin wave is so close! If I were generous, I would grade this one a “SUCCESS?”, but it failed to center again, and I just can’t have that.

Art

Please generate the SVG code of a nice looking abstract art piece. You can use paths if you want. Make your response as long as it needs to be.

SUCCESS I like it. It’s very “Saved by the Bell”, very 1990s.

That gives me an idea.

Please generate the SVG code of a spaceship using awesome neon colors and rad gradients.

FAILURE Cool colors, for sure, but it’s not a spaceship.

OK, it’s not an artist. Let’s see if it can do simple shapes.

Please generate the SVG code of a red circle. Inside of that circle put a white square. Inside that, put the red text ‘THIS IS A WARNING’. Wrap the text if needed. Use some gradients to make it look good an intimidating.

SUCCESS The text isn’t centered. I get it, it’s hard. I’m expecting a lot out of a language model.

But look, it’s text! Readable text. That’s a big deal because imaging networks aren’t really known for their text prowess. For example, here are DALL-E and Stable Diffusion’s attempts at the same prompt:

At least ChatGPT gave me exactly what I wanted (I’m not thinking about centering, I’m not thinking about centering, …). I got a red circle, a white box, and readable text. Neither of the other two networks could do/did that.

Conclusion

My biggest takeways are:

Yes, ChatGPT can generate SVG code.
It’s still in a very early stage and cannot handle much complexity.

I was hoping for better, but for a network that was not designed to do this at all, it’s not bad. I certainly look forward to GPT-5 and 6 improving at this.

I keep thinking back to the last warning sign experiment. The images generated by DALL-E and Stable Diffusion were not good. They were not even really what I wanted. ChatGPT, on the other hand, was able to generate something much closer to what I wanted.

I think this is a benefit of how it was trained. The engineers worked hard on the “alignment problem” (pun unintended) and it shows. It does what I expect. Old GPT-3 and friends were pure statistical generators. During their training, the only positive feedback they received was when their output sampling match language use. ChatGPT, however, was trained with a bias towards being helpful and providing the kinds of answers people want. This was accomplished by having people hand-rank its responses. I have a feeling that this hand-ranking, in order to aid in the human alignment problem, is going to fuel the next few years of machine learning improvements.

I went into this wanting to see if I could get SVGs, what I think of as a more useful format for my life as an app developer, out of ChatGPT. I got that, but I also got a lot more. I got a glimpse into the future of AI.

Anyway, enough armchair philosophizing. I hope you enjoyed seeing the results of my experiements, and I hope you’ll try some of your own.

Notes

I am aware there are neural nets specifically trained to generate SVG code. DeepSVG is one example. Specialized networks give more reliable results today and will certainly be useful in the future. I wanted to experiment with ChatGPT because I am intrigued by its generalized knowledge and I wanted to see if that general knowledge produced interesting results.
None of this was done scientifically. ChatGPT is still a stochastic (random) model and its outputs can vary. I should have had it generate more samples for each prompt. But I didn’t. I just wanted to see roughly what it could do.

Transformers-js - Neural Networks in the Browser

2022-08-25T00:00:00+00:00

TL;DR I wrote a javascript library that lets you run modern transformer neural networks from Hugging Face 🤗 in the browser. It works on mobile browsers, desktop browsers, pretty much everywhere. Check out transformers-js on GitHub to see how it works and checkout out the live translation demo running on my static website: https://transformers-js.praeclarum.org.

Introduction

Currently, the best way to deploy neural networks is to pay a cloud provider to host it and pay them to run inference. The more customers you have, the more you pay. It’s an old-fashioned big-iron middle-man’s utopia.

I’m a big fan of running neural nets on everyday hardware. It makes sense to let customers, who already invested a lot of money and carbon, use their own hardware. It’s also a huge privacy win: attackers can’t steal your information if it’s never on the network (insert Intel joke here). It’s good economically, environmentally, and it’s good for security. Sign me up.

Let’s fight the big-iron trend. Let’s run neural networks in the browser!

Announcing transformers-js: a library to make running translation and other language neural nets in the browser simple.

Update (Feb 7, 2023): I’ve re-released the library as web-transformers with full NPM and webpack support.

Hugging Face 🤗 Transformers with transformers-js

Transformers are neural networks that are good at manipulating serialized symbols. Ahem, sorry. By “serialized symbols” I mean language. They do language things: Sentiment analysis, summarization, translation, transmogrification. Basically, any -ation you can think of that works with a discrete set of symbols laid out one after the other.

And you know these networks from their friendly household names: GPT-3, Copilot, DALL-E, Stable Diffusion. There seems no end to what they can do (see also the CNN revolution of 2014).

Hugging Face 🤗 has established itself as the “GitHub of Transformers”. They have an excellent unifying framework, great documentation, and good-ish hosting. I only say good-ish hosting here because I had a demo fail because their servers were down. Clouds…

In fact, it was that demo fail that got me to thinking, “why can’t I just run this thing in the browser?” That thought led me to 3 days of programming. Those 3 days produced a javascript library. And that javascript library produces some kick-ass neural translations.

I wrote transformers-js to make running transformers from Hugging Face 🤗 in the browser just as easy as running them in Python land. To do this, I leverage the amazing ONNX runtime in order to run the network. ONNX offers a browser-compatible runtime using WASM compiled from the complete ONNX opset code. That’s very powerful because it means that, if you can get your net running in ONNX, you can get it running in the browser. (ONNX also offers a webgl backend that is much faster than their WASM backend. But you lose so much precision in webgl that I have yet to see a network work correctly using that engine.)

But running the neural network is only half the battle. Running transformers requires more software than just the neural net. You also need text tokenization software to convert your text to tokens (symbols) and you need sampling software to convert the neural net’s output probabilities back to symbols. Transformers-js takes care of all that for you.

Tokenization

Step 1 in running a transformer is getting a working tokenizer. Each neural net is optimized to solve a problem and that means each net uses a slightly different tokenizer from each other. I thought writing the tokenizers would be a piece of cake. I’ve written hundreds of tokenizers in my career in my pursuit of programming language nirvana, but I have never run into the kind of tokenizers that data scientists have come up with.

Side tangent: did you know that modern tokenizers use classical AI approaches? Neither did I! For example, the T5 symbol list is redundant; you can encode the same sentence many many different ways. In order to correctly tokenize the sentence for input to T5, you have to find the optimal path through the redundant symbol list based on the a-priori probabilities of the symbols. It’s a graph problem, and those are hard. Fortunately, classic AI people loved graph problems and found solutions. Two AI winters ago, people thought graphing techniques would be the foundation of all future AIs. They were wrong, but it’s nice to see these old powerful algorithms live on.

Back to tokenizers. I learned all that graph theory so you don’t have to! I encoded that knowledge into code that a computer can decode to make the magic happen. Behold:

// Load the tokenizer
const tokenizer = await AutoTokenizer.fromPretrained("t5-small", "/models");

That loads a tokenizer. Currently, I only support Sentence Piece Unigram models (good enough for most nets). I hope to support Byte Pair Encoding in the future (GPT’s preferred tokenization).

With that tokenizer, you can convert strings into token lists:

// Tokenize "Hello, world!"
const english = "Hello, world!";
const inputTokenIds = tokenizer.encode("translate English to French: " + english);

inputTokenIds is a list of integers that represent the symbols in the sentence. Some words are just one symbol. While other, less common or longer words, can be more than one symbol.

I added a little prefix to the string (“translate English to French:”) because I’m building up to a translation demo here and the T5 network, with all its advanced capabilities, needs to be told what to do.

Generation

Now that we have tokens, we can hand them off to the neural network to be run:

// Translate
const outputTokenIds = await model.generate(inputTokenIds, {maxLength:50,topK:10});

That’s it! The code takes the input tokens, runs them through the network, and returns a new list of output tokens.

That little generate function is hiding a lot of work. Most networks generate one token at a time. That means you have to run them over and over until you get the whole sentence. This can be terribly inefficient if you run the entire network over and over. Instead, you split it into pieces and run each piece only as it is needed.

The generate method also has to sample from the neural network’s output probabilities. Networks are not into commitment, and will always output a variety of options. A sampling technique is needed to pick the right one.

Greedy sampling is when you just pick the highest probability option. Top-k sampling is when you randomly pick from the top k probable options. Greedy is good for when you want the most probable option. Top-k is good for when you want to inject a bit of creativity (randomness) into the results. This library supports both. I hope to add more sampling options in the future.

Now that we have a list of output tokens, we can convert them back to a string:

// Convert output tokens to a string
const french = tokenizer.decode(outputTokenIds, true);
console.log(french); // "Bonjour monde!"

The output is “Bonjour monde!” which makes sense given our input of “Hello World”.

That’s it! In about 5 lines of code we executed a neural translation algorithm that ran completely in the browser.

Demo

I put the code above together into a little web app to demonstrate the library.

https://transformers-js.praeclarum.org

It’s a little translation app that is able to go from English to French, German, and Romanian.

When you run it the first time, it will take some time to download the neural network. After that, the browser can cache it and subsequent reloads will be faster.

It’s hosted on Azure Static Web Apps to demonstrate that no logic is running on the server. The server merely provides the neural network data to be downloaded and run on the browser. Pretty cool, huh?

Optimizing Models for the Browser

Now, let’s talk about a few last details to make networks in the browser run well.

Because I’m using the WASM version of ONNX, the neural network is executed on the CPU. It’s therefore beneficial to optimize it to run there. The best optimization right now is to use quantized 8-bit weights. This converts what are otherwise 32-bit floats into 8-bit integers. This is done carefully to preserve as much accuracy as possible and is honestly a bit of a dark art.

Fortunately, there are lots of dark art practitioners and we can stand on their shoulders. The fastT5 library converts T5 models from Hugging Face 🤗 to quantized 8-bit models ready to run in the browser. I wrapped that library in a script as I hope to support other model types in the future.

Lastly, I highly recommend running this code in a background web worker. This will prevent the browser from locking up while it’s executing. I didn’t implement that feature in the demo web site but I would say it’s worth the effort in a production app.

Conclusion

While there will always be some big networks out there that need big servers to run them, a surprsing number of neural networks work just fine in the browser. Doing so is good for economic, environmental, and security reasons.

Transformers-js was written to make running language models in the browser simple. I hope you will find it enjoyable to use and I hope you will use it to make cool things.

Practical Guide to Solving Hard Problems

2022-02-19T00:00:00+00:00

I sometimes find myself in a position of needing to write some code that I’m just not sure how to write. Been there have you? Here are the steps I take when I’m stumped. No huge revelations here, just hard-earned advice.

Think hard about the problem for a few weeks before typing any code.
Type in a function or write a class that has the inputs and outputs you need.
Break the function down into multiple steps with clear objectives. You may not know how to achieve those objectives, but that’s a problem for your future self. Right now, you’re just trying to write out the high-level algorithm.
Create a function for each of those steps and throw new NotImplementedException() in each of them. Their names should be long and explanatory and there should be no question about what’s expected of them. It’s really OK if you don’t actually know how to write ‘em.
Now, go implement a few of those functions. You know they’re not all hard. Some may even be fun! Build up your confidence and implement the easy ones. It feels good to make progress and it lets the analytical part of your brain run in the background for a bit while you focus on nitty-gritty number types and file IO.
Time to tackle some of those harder functions. Go into each of those and break the problem down into steps just like you did before. You’re right, I’m gonna say it: Rinse and repeat. Keep breaking those hard problems down into steps. Turn each of those steps into a function with a clear name. Implement the easy ones. Then break the hard ones down into steps again. Do this over and over again. You’ll be surprised how much you can actually get done.
Pretty soon (haha) you will have an 80% complete solution with just a few pesky functions left that throw NotImplemented. Now go scour your favorite package repository, or code repository, or question and answer site, or artificial intelligence programming assistant for implementations. Chances are you’re not the first person to need this particular function or widget. Find some giants, climb on top of them, and scream “Holy shit, there are a lot of smart programmers in the world!”
OK, you’ve scoured the inter webs and yet you still have a few pesky NotImplemented exceptions. It’s time to check on those scientists. Enter every SEO permutation of your problem statement into arXiv. Surely others have worked on problems related to one you are trying to solve. They will most likely offer insights or perspective shifts that can help you reframe your problem into something solvable. Do that. Reframe your problem and knock out those NotImplementeds.
Now you’re in trouble. If you still have a few NotImplemented exceptions, and there are no giants upon which to stand nor academics obsessing over this particular field, then it’s all up to you. Think big. Think outside the box. Your career depends on it. (Just kidding, I hope.) Perhaps a bath will help you think?

I think these are steps all programmers take, but sometimes it’s good to spell it out.

I especially value the functional decomposition. Functions are a powerful abstraction, not just for writing less code, but for thinking about problems.

And please don’t misinterpret my use of the word “functions” to mean only those things functional programmers like. I mean any data transformer: from lowly lambdas to state-bearing IO-processing monolith objects.

Thanks for reading! Now go solve those hard problems!

The Technology of iCircuit 3D

2021-02-08T00:00:00+00:00

Today I’m pleased to announce the macOS release of iCircuit 3D! Last week I released the iOS version to a wonderful reception and I’m happy to now be able to give all the Mac users out there the same experience. iCircuit 3D has been a work of passion over the last couple years and I thought I would take a moment to describe some of the more interesting technical aspects of it.

So here is a very extended colophon for the app. Maybe you can get some ideas for your next app from it!

Programming Language and Style

iCircuit 3D is written 100% in C# using Xamarin.iOS and Xamarin.Mac. I started with C# 7 but C# 9 had already been released by the time I shipped the app. :-) The null reference checking feature of C# 8 has been especially useful in tracking down bugs.

The app is very object oriented. In fact, this may be the largest OOP app I’ve ever built. This is funny considering that I am quite an advocate for functional programming and more reactive style UI design these days. But react style programming of a real-time engine like this is not trivial and I found myself trailblazing a bit more than I like. Like it or hate it, OOP is well understood at this point.

I decided to go OOP, in some ways, to rid myself of writer’s block too. I’ve been writing OOP programs since the 1990s and I am very comfortable with the abstraction and patterns. Sometimes when you’re stuck, the best thing to do is to lean into your strengths.

Properties FTW

A mental road block I often run into when starting a new app is how to handle serialization and undo buffers. I decided to solve both problems in a generic way that would require very little on-going work. The key was to base both undo and serialization on properties of objects.

Serialization is a simple process thanks to advanced libraries like Newtonsoft.Json. I’m able to serialize and restore entire object graphs (thanks to its object reference handling) and only have to be careful that I don’t serialize more than I need to. I also chose to use the BSON serialization format so that the app could handle binary data more efficiently. (Circuits can contain binary data thanks to image and board import facilities.)

The undo (and redo) system of the app is based on property changes. When a property changes, an undo change event is registered with the OS (Apple has NSUndoManager). That event captures the property’s value before and after the change. When the user undos and redos, the system just has to figure out which of those values to restore to the property.

All of this is achieved with property definitions such as this:

public class Resistor : CircuitElement {
    double resistance = 1000;
    [Inspector (Units = "Ω", Min = 0, Max = 1e12)]
    public double Resistance {
        get => resistance;
        set => SetUndoableProperty (ref resistance, value, () => {
            InvalidateMaterial ();
            InvalidateAnalysis ();
        });
    }
}

The trick is to call SetUndoableProperty which is very similar to the MVVM helpers standard SetProperty. It checks if the value actually did change, and, if so, calls an inline change handler (this example invalidates some things) and triggers the PropertyChanged event. SetUndoableProperty takes just one extra step and registers that change with an undo manager. I found that you want to be explicit about what gets registered with the undo manager because not all properties are set by users and don’t need to be registered. That’s all that’s needed for me to get serialization, undo support, and inspector UI support (thanks to reflection).

If you adopt the pattern of default object constructors and properties that can change any time, you can get these cross-cutting features essentially for free. I love how much power can be put into a single property.

I’ve talked about this a bit more on Merge Conflict over the last couple years. (Gosh I’ve been working on this app for awhile.) Give them a listen if you’re interested in details!

Lots of Multithreading

My biggest gripe with OOP is its inability to handle shared-memory multi threading. And, honestly, the style basically guarantees your code will have race conditions. In iCircuit 3D there are 4 kind of threads that all have to interact with each other:

The main UI thread is responsible for drawing the 2D UI and handling user interactions
The 3D render thread is responsible for drawing the environment (one for each view)
The electronics simulation thread, well, simulates things
The background threads that perform the operations that are too slow to perform on the other 3 threads

In other words, there is a lot to synchronize in a mutation heavy environment.

I ended up using .NET’s monitor-based locking primitive lock (thing) { } to do fine grained locks over critical data structures. From my past experience writing multithreading-heavy apps, I knew that the lock primitive was very efficient. Not safe, but efficient. ;-)

Safety came from designing thread-safe interfaces to objects and being very strict with myself in how callbacks are executed. The nemesis of locking is deadlocks. Deadlocks are usually caused by holding a lock while executing unknown code (from a callback, an enumerable, or an event). I sacrificed a little performance for stability by ensuring I only held locks while executing code that I was sure was callback and recursion free. It’s hard sometimes but the benefits of having thread safe code are innumerable. Also, the good news is that deadlocks are actually easier to debug than race conditions. If the app deadlocks, pause it, find what every thread is blocking on, slap yourself on the wrist, and fix the code.

I’m sure there are race conditions galore in the app, but multithreading bugs just have not been an issue… usually… But it takes diligence.

Apple Only

While C# and .NET certainly lend themselves to cross-platform code, I decided to focus primarily on Apple APIs.

In order to make progress on the app, I abandoned cross-platform support and focused on leveraging the magnificent Apple APIs. This was a great decision. Before I made this decision, I would have to consider how to do fancy things on 3 different platforms. Easy stuff is easy, but hard stuff (like 3D renderers) are, well, hard. Focusing on one platform made the app better because I could optimize that experience without spreading my time out across platforms.

The 3D Renderer

The main user interface is a 3D environment powered by Apple’s SceneKit technology. SceneKit is basically a game engine built into every version of iOS and macOS. I love it because it is both feature rich and performant. I’m continually impressed by it being able to handle whatever I throw at it (within reason of course) and continually getting graphics improvements.

Some features I take heavy advantage of include:

Thread Safety

Apple was kind enough to make SceneKit thread safe. This means I can usually access it from whatever thread I want. I did notice, however, that there are some race conditions if you don’t execute your code in SceneKit’s transactions. It’s not hard, you just have to make sure to write:

SCNTransaction.Begin ();
try {
    // Code that does SceneKit things
}
finally {
    SCNTransaction.Commit ();
}

whenever you want to manipulate rendered objects. With that pattern I could confidently manipulate the scene graph from whatever thread I wanted.

Physically Based Rendering

PBR is the new hotness if real-time realistic-looking rendering. It’s actually a technology developed for movies to make visuals look less, well, plasticky and now I’m able to run it on my phone. Amazing.

PBR requires that you provide three textures for every object: its color, its roughness, and its metalness. You can imagine what color does. Roughness lets you control how rough or shiny an object is and metalness controls the color of light reflected off the object. The fact that you can vary these levels on a pixel-per-pixel level enables a lot of nice effects. This is most notable on AA batteries who have a diffuse body, a shiny metallic stripe, then dull metallic contacts. That’s all done using a real-time generated texture.

LED Lights

Once you have a good lighting model, you need some good lights. I decided to make every LED a light source. This actually overwhelms SceneKit because it has a maximum of 8 light sources, but things are pretty fun up until that point. This is a short section, I just thought the LED lights were fun. :-)

Skeletal Meshing for Wires

One fun feature of iCircuit 3D is that its wires arc around the environment like real wires. I am using skeletal meshing, a technology that is usually used for character movement in games, to accomplish this.

I use a 3D Bézier curve that starts by matching its direction to the port it’s attached to. It then travels in a direction towards the other object and then finally curves again to meet that objects’ port. It’s a lot of math to figure out (shout out to Calca for helping me) but really adds some pizzaz to the app. I move skeletal control points to lie on that curve and then allow SceneKit’s code to move the geometry of the wire to match that skeleton.

Environment Maps

PBR only looks good if you give it a nice environment. A friend and I spent an evening in a craftsman’s workshop taking photography to light this app using a cheap 360 degree camera. Then Wilderness Labs’ Bryan Costinach let me take photos of his lab to add some variety to the app. To add a bit more user control I also added SceneKit’s sky generator that tracks the current time of day, and users can select their own colors.

Custom Shaders

SceneKit has some beautiful light rendering, but sometimes you want to take control. The API makes this easy by being able to inject little shader snippets into each model by assigning a string to a material property. As someone who has been writing and using 3D renderers their whole career, this is an absolute delight. I love the freedom to just code up any graphics I want without mucking around with compiling those programs.

Dynamic Geometry

This isn’t a feature specific to SceneKit, but one that I take great advantage of in order to make parts parametric. I have some high-level modeling abstractions in code that allow me to use virtual lathes, virtual mills, and constructive solid geometry. This allows me to define the geometry of parts in code and change them as I desire. Think OpenSCAD or OpenJSCAD but better. ;-)

This is notable in electronics like the capacitor whose shape changes depending on capacity of the device. It also allows one to control the radius of the wheel independent of the width for instance.

Geometric Instancing

Like any good engine, SceneKit can efficiently render multiple instances of the same geometry in one frame, and I take advantage of this. My code does its best to share as much geometry between part instances as possible while still varying material properties. This is achieved by a per-part geometry cache that uses part parameters as keys. It was tricky to setup but is worth it to have very efficient geometry loads and a high-level overview of how much I have allocated.

Physics Simulation

The last major feature of SceneKit that I use is its built-in physics engine. This engine has all the standard support for collision detection, gravity, and joints. iCircuit 3D makes use of all of these.

When I began the app, I focused on the interaction of picking up a part and moving it around on the work surface. I wanted to nail that touch interaction on the iPad. My thought was that if I make it fun and easy to move parts around, people will be encouraged to try building larger and larger circuits. I found that gravity was my friend. It made picking up parts and setting them down feel more real, more natural than in traditional CAD apps.

Once I had gravity, I had to have collision detection - two parts should never occupy the same space. This gives a sense of “thisness” to the parts, again, making them feel more real.

The last step was to integrate physical joints into the app. Joints are used in physics engines to glue two things together. It can either be a rigid joint (like glue) or a flexible joint like a hinge.

I added rigid joints to the app so you could build structures and panels. There are some physical primitives that can be made “sticky” and when another object is dragged over them, they become rigidly connected. I use these rigid joints for both breadboards and for building blocks. It’s pretty interesting what you can build with it.

I do not allow users to create flexible freely, but I have two elements, a DC motor and a servo motor that create implicit physical joints. Thanks to SceneKit, I can control the velocity of the joints in order to simulate those motors. It’s tricky code, and it’s admittedly a bit buggy, but it really open up what the app can do.

Electronics Simulation

iCircuit 3D uses the same robust electronics simulation engine as iCircuit. This is a pretty standard nodal electronics simulator with some notable features:

It uses connectivity to only simulate parts that are wired to each other. This means you do not pay the cost of simulating extra parts in the environment. This is what enables you to liberally use Duplicate and Copy and Paste without having to worry about affecting the simulation engine.
It can simulate Arduinos using my custom C compiler and interpreter called, surprisingly, CLanguage.

I am able to share the simulation engine between the original iCircuit and iCircuit 3D. This is great for me because any improvements I make to the original will be automatically imported into this engine. As a solo indie developer, that’s a lot less maintenance. And it also means all users of iCircuit with benefit from improvements to both apps.

This code sharing also allowed me to import the entire iCircuit library of parts into iCircuit 3D. That has really increased its usefulness.

The End

I hope you enjoyed this tour of iCircuit 3D technology. If you found this interesting and think the app could interest you, I hope you’ll give iCircuit 3D a try!

F# Advent - Functional Motor Control

2020-12-08T00:00:00+00:00

TL;DR I implemented a proportional-integral feedback controller to control the speed of a DC motor using a gyroscope. I wrote it in F# using a functional programming style. The work was easily ported to run on Wilderness Labs’ Meadow IoT hardware.

In this blog I would like to implement a simple but robust control theory that was taught to me in college. This blog post is one in a series of “F# Advent” posts coordinated by Sergey Tihon so I will be implementing it all using my favorite programming language F#.

Embedded controls such as these are often implemented in imperative languages like C with a lot of variable mutation.

I want to explore implementing a DC motor control algorithm in functional style instead of the imperative style. I also want to do it while embracing F#’s strong type checking and units of measure support.

I will try to spare you the heavy math but this is definitely going to be a nerdy post. Enjoy!

Motor control

Motors are some of the simplest but most entertaining devices out there to play with. You can use them to make cars, clocks, drones, and automatic fidget spinners.

The technique I’m going to describe here is a general one that can be applied to a variety of control tasks, not just motors.

It’s a three step process:

We look at the inputs and outputs
We compare the output to what we want
We modify the input to get us closer to the output

Easy peasy! That’s the general outline. There is of course a million ways to accomplish all of the above. So let’s start with the easy stuff: inputs.

Inputs

I am using a DC motor with two pins. When one pin’s voltage is higher than the other, the motor spins one way. When it’s lower, the motor spins the other way.

But I won’t actually be controlling the motor using voltages, instead I’m going to be using a square wave with a varying duty cycle called a PWM (pulse width modulation) to mimic different voltages.

By changing how often a digital pin is on and off, I can mimic analog voltages. If I have a duty cycle of 100%, then the motor will be given all the voltage my circuit can muster. If I give it a duty cycle of 50%, then it will get half that voltage. A duty cycle of 0% would mean the voltage is 0 V.

Let’s model the motor input. I’m going to take advantage of F#’s unit system to make sure all my math and my functions are physically correct.

[<Measure>]
type percent

type MotorInput =
    {
        ClockwiseDutyCycle : float<percent>
        CounterClockwiseDutyCycle : float<percent>
    }
    member this.DirectionalDutyCycle =
        this.ClockwiseDutyCycle - this.CounterClockwiseDutyCycle
    
let makeMotorInput (cw : float<percent>) (ccw : float<percent>) =
    {
        ClockwiseDutyCycle = max (min cw 100.0<percent>) 0.0<percent>
        CounterClockwiseDutyCycle = max (min ccw 100.0<percent>) 0.0<percent>
    }

I created a helper function makeMotorInput that ensures the duty cycles are in a valid range.

While motors are controlled with two pins, they are usually easier to reason about using positive numbers to go in one direction and negative numbers to go in the other. I’m going to create a function to implement that.

let makeDirectionalMotorInput (directionalDutyCycle : float<percent>) =
    if directionalDutyCycle >= 0.0<percent> then
        makeMotorInput directionalDutyCycle 0.0<percent>
    else 
        makeMotorInput 0.0<percent> (-directionalDutyCycle)
        
[makeDirectionalMotorInput (101.0<percent>)
 makeDirectionalMotorInput (50.0<percent>)
 makeDirectionalMotorInput (0.0<percent>)
 makeDirectionalMotorInput (-101.0<percent>)]

index	ClockwiseDutyCycle	CounterClockwiseDutyCycle	DirectionalDutyCycle
0	100	0	100
1	50	0	50
2	0	0	0
3	0	100	-100

We have a lot of good code now to work with inputs, let’s look at the outputs of the motor.

Outputs

When it comes to motors we can control any of these things:

Its position
Its speed
Its acceleration

But we can only control them if we can measure them. If we want to control the position of a motor, to control, say, a door, then we need a way to measure its position. In that case, either its angular position or the position of a part of the door.

You can also control its acceleration or the force it produces against an object. But as my controls professor always said, “don’t try to control forces”. So, I’ll leave that as an exercise for the reader.

Controlling the speed of a motor is probably the most common scenario. There are multiple sensors you can use to measure the speed: you could use the same position sensors and do some math, you can use rotary encoders, you can measure minor fluctuation is voltages produced by the motor, you can use potentiometers, … There are a lot of options.

I’m going to use a less common sensor to measure the rotation of the motor: a gyroscope. Gyroscopes measure angular velocity, just what I want. They’re more complicated to use than the other options, but I think they’re fun so I’m going to use them in this project. I will be using an MPU6050 that is capable of measuring linear accelerations and angular velocities on all three axes. I’ll just be monitoring the one rotational axis.

/// Radians
[<Measure>]
type rpm

type MotorOutput = float<rpm>

Control Algorithm

Now that we have defined our inputs and outputs we can discuss how to control the motor to achieve desired speed.

The standard negative feedback control loop looks like this:

Read outputs: let outputs = getOutputs ()
Calculate error: let error = desiredOutputs - realOutputs
Calculate new inputs: setInputs (lastInput + control error)

This three-part loop is general purpose enough to solve any control problem where you’re trying to achieve a measurable goal.

The real magic happens in the function control. It is responsible for deciding what the inputs should be to the motor based upon how far off we are from our desired speed. This kind of controller is called a regulating controller because it’s designed to bring whatever error it is given down to 0. It does that by increasing the input when the error gets larger and decreasing it when it’s smaller.

type ControlError = float<rpm>

One simple way to implement the controller is to make the input proportional to the error:

let controlWithProportionalResponse (proportion : float<percent/rpm>) (error : ControlError) =
    proportion * error

Proportional control makes a lot of intuitive sense: if we want the motor to go faster then we should give it more voltage. If we want it to go slower, then we should give it less.

This kind of control certainly works, but it has some negative side effects. First, it’s tricky decide what that proportion should be. Trial and error is often used though I will discuss alternatives to trial and error later. But worse than that, it’s jittery! The control will never be happy as it always has some error and will always force the motor to jitter back and forth as it tries to meet the desired output.

What we need is a calming influence. Something called an integral controller.

Integral control doesn’t act unless the error has accumulated for a bit of time. It’s slower acting than the proportional controller, but it is easier to tune and really helps the controller to relax when it’s near the desired output. The formal definition of the controller involves integrating the error. Integrating is just a fancy word for adding a bunch of things up. We’re going to add up the error. But we’re not going to do it forever, just the last few seconds worth of error. To do that, I’m going to create a datatype that can keep track of everything.

type ErrorIntegral =
    {
        MaxErrors : int
        Errors : ControlError list
    }
    member this.TotalError = List.sum this.Errors
    member this.NumErrors = this.Errors.Length
    member this.LastError = match this.Errors with x :: _ -> x | _ -> 0.0<rpm>
    member this.Add error =
        { this with
            Errors =
                let n = this.Errors.Length
                if n < this.MaxErrors then
                    error :: this.Errors
                else
                    error :: (this.Errors.[..(n-2)])
        }
        
let makeErrorIntegral (maxErrors : int) =
    {
        MaxErrors = maxErrors
        Errors = []
    }       
        

Let’s test it out by creating one that can only hold 2 errors.

let ei = makeErrorIntegral 2
ei

MaxErrors	Errors	TotalError	NumErrors	LastError
2	[ ]	0	0	0

let ei3 = ei.Add(100.0<rpm>).Add(200.0<rpm>).Add(300.0<rpm>)
ei3

MaxErrors	Errors	TotalError	NumErrors	LastError
2	[ 300, 200 ]	500	2	300

ei3.LastError

300

We can see that it discarded the first error when the third error arrived.

Now that we have an integrator, we can write the integral control function:

let controlWithIntegralResponse (proportion : float<percent/rpm>)
                                (errorIntegral : ErrorIntegral) =
    proportion * errorIntegral.TotalError

This integral controller still has a proportion parameter because we still want to be able to tune how aggressive or passive the controller is.

We can combine these two controllers to get the best of both worlds. This proportional-integral controller is called a PI controller in control theory.

let control (kp : float<_>) (ki : float<_>) (errors : ErrorIntegral) =
    let p = controlWithProportionalResponse kp errors.LastError
    let i = controlWithIntegralResponse ki errors
    p + i

Hardware Abstraction

In order to test this controller and implement in hardware, I will define a hardware abstraction layer in the form of an interface.

type IHardware =
    abstract GetOutput : unit -> MotorOutput
    abstract SetInput : MotorInput -> unit

This is the only mutable object in the code and it’s mutable because real hardware is mutable. Once you have performed an action on it, it’s impossible to undo that action.

Let’s also define some test hardware so we can validate our algorithms.

type TestHardware (initialOutput : MotorOutput) =
    let mutable output = initialOutput
    let mutable input = 0.0<percent>
    interface IHardware with
        member this.GetOutput () =
            output <- (output + input * 100.0<rpm/percent>) * 0.5
            printfn "Test Speed: %.3f RPM" (float output)
            output
        member this.SetInput newInput =
            input <- newInput.DirectionalDutyCycle            

With that simple abstraction, we can implement the control loop.

Function Control Loop

I am going to implement the control loop as a classic fold operation over an infinite sequence of time.

This architecture keeps the control loop nice and functional and makes it very easy to test different states.

In order to use fold, you need three things:

Sequence - I will use time
State - I will track the last input and the error integral
Transformer - This is the control algorithm

Let’s start by defining the control loop:

/// Seconds
[<Measure>]
type s

type ControlState = MotorInput * ErrorIntegral

let controlLoop (hardware : IHardware) (kp : float<_>) (ki : float<_>)
                (desiredOutput : float<s> -> MotorOutput)
                ((lastInput, errors) : ControlState)
                ((t, output) : float<s> * MotorOutput) =
    let error = desiredOutput t - output
    let newErrors = errors.Add error
    let newInputDutyCycle = lastInput.DirectionalDutyCycle + control kp ki newErrors
    let newInput = makeDirectionalMotorInput newInputDutyCycle
    hardware.SetInput newInput
    newInput, newErrors
    

For the sequence I am going to create a general purpose sampler that calls a function to read some data over fixed time intervals. It even blocks the thread for those intervals to keep things in real time.

let sampler (dt : float<s>) (sample : float<s> -> 'a) =
    seq {
        let sw = System.Diagnostics.Stopwatch ()
        sw.Start ()
        while true do
            yield sample (sw.Elapsed.TotalSeconds * 1.0<s>)
            System.Threading.Thread.Sleep (TimeSpan.FromSeconds (float dt))
    }
    
    
let time = sampler 0.1<s> (fun t -> t)
time |> Seq.take 3

index	value
0	3E-07
1	0.1036857
2	0.2037907

We can use that sampler to sample the outputs of the motor.

type MotorSampler = (float<s> * MotorOutput) seq

let motorSampler dt (hardware : IHardware) : MotorSampler =
    sampler dt (fun t -> t, hardware.GetOutput ())
    

Let’s test it out.

let testOutputs = motorSampler 0.1<s> (new TestHardware (100.0<rpm>))

testOutputs |> Seq.take 3

Test Speed: 50.000 RPM
Test Speed: 25.000 RPM
Test Speed: 12.500 RPM

index	Item1	Item2
0	3E-07	50
1	0.1064288	25
2	0.2124844	12.5

That’s working great! Now let’s put it all together.

let controlMotor (maxIterations : int) (integralIterations : int)
                 (hardware : IHardware) (kp : float<_>) (ki : float<_>)
                 (desiredOutput : float<s> -> MotorOutput) =
                 
    let loop = controlLoop hardware kp ki desiredOutput
    let initialState = makeDirectionalMotorInput 0.0<percent>,
                       makeErrorIntegral integralIterations
    let motorOutputs = motorSampler 0.1<s> hardware

    motorOutputs
    |> Seq.take maxIterations
    |> Seq.fold loop initialState
    |> ignore
    

We can now run a test to see if we can control the test hardware. Let’s try to get it up to 100.0<rpm>.

let testGoals (t : float<s>) = if t < 1.5<s> then 100.0<rpm> else 0.0<rpm>

controlMotor 30 3 (new TestHardware (0.0<rpm>)) 0.002<_> 0.0002<_> testGoals

Test Speed: 0.000 RPM
Test Speed: 11.000 RPM
Test Speed: 27.290 RPM
Test Speed: 45.323 RPM
Test Speed: 61.971 RPM
Test Speed: 75.752 RPM
Test Speed: 86.237 RPM
Test Speed: 93.616 RPM
Test Speed: 98.388 RPM
Test Speed: 101.153 RPM
Test Speed: 102.488 RPM
Test Speed: 102.887 RPM
Test Speed: 102.732 RPM
Test Speed: 102.301 RPM
Test Speed: 101.776 RPM
Test Speed: 101.267 RPM
Test Speed: 89.833 RPM
Test Speed: 73.204 RPM
Test Speed: 54.926 RPM
Test Speed: 38.115 RPM
Test Speed: 24.235 RPM
Test Speed: 13.699 RPM
Test Speed: 6.301 RPM
Test Speed: 1.529 RPM
Test Speed: -1.225 RPM
Test Speed: -2.546 RPM
Test Speed: -2.929 RPM
Test Speed: -2.761 RPM
Test Speed: -2.318 RPM
Test Speed: -1.785 RPM

That worked! You can see that the controller even overshot the mark and then had to backtrack.

You can play with the kp and ki values to see how different values affect how quickly the controller speeds up the motor.

Real Hardware

With the abstract interface IHardware, it’s very easy to run this code on real hardware.

To demonstrate this, I’m going to code against the Meadow F7 Board from Wilderness Labs. This board runs .NET code including F# so is a perfect digital controller for this application.

The board uses the Meadow API that has abstractions for all sorts of hardware.

open Meadow
open Meadow.Devices

type MotorMeadowApp() =
    inherit App<F7Micro, MotorMeadowApp>()

    let device = MotorMeadowApp.Device
    let i2c = device.CreateI2cBus()

    let mpu = new Meadow.Foundation.Sensors.Motion.Mpu6050(i2c)
    do
        mpu.Wake()
        mpu.StartUpdating (1000 / 50)

    let motorCwPwm = device.CreatePwmPort(device.Pins.D09, 200.0f, 1.0f)
    let motorCcwPwm = device.CreatePwmPort(device.Pins.D10, 200.0f, 0.0f)
    do
        motorCwPwm.Start()
        motorCcwPwm.Start()

    interface IHardware with
        member this.GetOutput () =
            (float mpu.YGyroscopicAcceleration) / 360.0 * 60.0<rpm>
        member this.SetInput newInput =
            motorCwPwm.DutyCycle <-
                float32 (newInput.ClockwiseDutyCycle / 100.0<percent>)
            motorCcwPwm.DutyCycle <-
                float32 (newInput.CounterClockwiseDutyCycle / 100.0<percent>)

In just 20 lines of code I was able to implement the IHardware interface. For details, I hope you’ll check out the Meadow Documentation.

Now it’s a matter of executing the control loop:

[<EntryPoint>]
let main argv =
    printfn "Motor Control from F#!"
    
    let app = new MotorMeadowApp()
    
    controlMotor int.MaxValue 3 app 0.002<_> 0.0002<_> testGoals
    
    0

For details, I hope you’ll check out the Meadow Documentation.

Conclusion

I hope you enjoyed this functional look into control theory using my favorite language F#.

We were able to build a strongly typed code base to control the speed of a motor with minimal mutation. I hope this code demonstrates some of the benefits functional programming can bring to a space that is generally dominated by procedural programming.

This was my also my first F# Advent post. I hope you enjoyed it and are staying safe and healthy!

Mac Catalyst Support for Xamarin.iOS

2020-11-10T00:00:00+00:00

TL;DR I have written a tool that will convert your Xamarin.iOS app (including Xamarin.Forms and MAUI) to a Mac app using Mac Catalyst. This makes creating a macOS app and selling it on the App Store easier than ever. If you sponsor me on GitHub, subscribe to me on Twitch, or support me through Patreon, I will send you a link to try a beta version of it.

A lot of what I’m about to discuss here was discussed on Episode 225 of Merge Conflict. Have a listen!

Twitch

I also have been live streaming this work on Twitch. You can see me getting it to work with Xamarin.Forms.

Fragmented Brain

Oh how time changes everything. When I first started programming, I learned to write apps in Visual Basic on Windows and thought it to be a wonderful canvas upon which build some cool stuff. I thought I would spend the rest of my life learning its ins and outs, mastering its UI, building my own UIs, and being happy.

But time changes everything. I abandoned my beloved VB, to learn MFC and fell in love with that. Then I learned Delphi and thought, finally, the one tool to rule them all. But I was wrong. The web came. All of a sudden I was breaking my UIs into servers and clients like it was the 1970s. And then iOS came into my life and I abandoned all of that for UIKit. But then Google wanted a piece of the action and thrust Android into the world and yet another UI framework into my brain.

I have been forced to learn too many frameworks in my pursuit to build cool apps. It’s been exhausting. :-)

Unification

So, when Apple announced a unification of their platform APIs with macOS Catalina, I was all ears.

Apple had decided to allow macOS apps to be written using UIKit as opposed to the mighty AppKit. The technology is called Mac Catalyst and is a port of Apple’s mobile libraries to work on Intel 64-bit Macs.

As I support multiple apps that release for both iOS and Mac, this decision has real economic repercussions for my business.

It will allow me to add another store (another way to sell) my existing apps.
It will guarantee that all my future apps will also be for sale on at least two stores.
It will drastically decrease the amount of time I spend fixing UI bugs on mac.
It will give me more time to focus on and improve the iOS versions of my apps.

All wins in my opinion. While I have grown to love AppKit, with Mac Catalyst, I’m not sure I’ll ever write another AppKit app again.

Lastly, Apple has introduced Apple Silicon macs that will also be able to natively run UIKit. In the same way UWP is Microsoft’s universal framework, UIKit is shaping up to be Apple’s.

Just one snag, I write my apps using Xamarin.iOS and Microsoft has not yet added support for Catalyst apps. Schedules and all that.

This delay was frustrating me and the opportunity cost was killing me, so I decided to take matters into my own hands and port Xamarin.iOS to Mac Catalyst. Fortunately, Xamarin and mono and .NET are all open source - the code is right there. You just have to be crazy enough to start editing it. :-)

It wasn’t easy, but with enough effort I was able to get it all working! (I would put a GIF here, but seriously no meme could capture how I felt.)

And now it’s time to share that work with everyone.

BETA

If you would like to convert your own apps to run on Macs, I have prepared a downloadable SDK/tool to do that conversion. I am making it available to people who support my OSS work. Namely, if you:

Then I will give you a link to the SDK. The SDK is just a little .NET Core tool that converts the iOS version of your Xamarin app to a Mac Catalyst version. “It just works”

This is an early version so you can also expect errors. That is to say, sometimes, it doesn’t just work. But so far it has been able to run my largest apps.

I hope you’ll give it a shot. It’s really fun seeing your apps work on Macs!

I haven’t tried releasing to the Mac App Store yet, that’s the next hurdle to overcome.

Contributing Back

I am working with Microsoft and the mono team to integrate this work into the product so everyone can use it. Schedules are still tight, and the world is still crazy so it will probably take a little time, but I’ll keep working on it!

On that front, the first big step has already been taken. I submitted a PR to the mono project, and it has been accepted!

Build makefile to support Mac Catalyst #20537

@akoeplinger Thank you, looks great!

I am so delighted! I have always wanted to be a mono hacker and now I am! :-)

Next steps are getting changes to Xamarin.iOS put in-place. These are small but require coordination with the build tooling so have to be done carefully.

In Conclusion

The future app development is looking bright thanks to Apple’s unification efforts, and Microsoft’s commitment to open source. I can’t wait to get all my apps onto Mac.

If you want to help me out, you can do so by buying my apps and by supporting my open source work. I’ll put those links here one more time. ;-)

Azure Resource Organization Cheat Sheet

2020-04-23T00:00:00+00:00

It’s taken me some time, but I think I finally have a handle on how all the various Azure resources are organized and how exactly I’m paying for things. I summarized what I know into a handy little cheat sheet that I hope will help you if you’re as confused as I was.

Azure Resource Organization Cheat Sheet PDF

Account - Who are you?

Or, more specifically, what’s your email address? Everything kicks off with a Microsoft account registered through one of their many services. You probably have one already.

Active Directory - Who are you working with?

Active Directory is used for authentication - especially authentication for Microsoft services. One was probably created for you when you setup Azure with your Account. If you ever see all your Subscriptions or Services disappear, it’s probably because you have selected the wrong AD.

Subscription - How are you going to pay?

This is how you tell Microsoft how you’re going to pay for all the goodies in the next sections. If you want to split Services over multiple payment methods, you’re going to need multiple Subscriptions.

Resource Group - How do you group things?

This is your moment to impose order on the chaos and create groupings of your own desire. Resource Groups are convenient for moving many Services from one Subscription (payment method) to another or to delete several Services.

Service Plan - How much do you want to pay?

Or, how big of a machine do you want? This is where you decide how much compute power, storage space, memory, and bandwidth you’re willing to pay for. Prices range from free for dev and test to much much more. You also need to choose where in the world the server will be located.

Services - What do you need?

Now you can add Services to the machine you selected in the Service Plan or to a Resource Group (which one depends on the service). There are literally billions of services to choose from. Each one will add their own costs based on usage and daily rates to the credit card on the Subscription.

Appstat - App Sales and Analytics Menu

2020-02-20T00:00:00+00:00

TL;DR I have written an app, Appstat, that I think every Apple app developer will want. It is a macOS status menu app that displays all your apps that are for sale on the App Store plus a whole bunch of data about them including proceeds, sales, ratings, and uploaded builds.

It is an app that I have always dreamed of, so I built it, and I know you will love it! You can buy Appstat on gumroad now!

If you are an Apple app developer I’m sure you’ve come up with your own system to track sales. When I started out, I used a Mac app to show my sales reports - it had graphs and was wonderful. But then that app got expensive and less useful and I stopped using it.

So for years I’ve been logging into App Store Connect and clicking around furiously to find the information I needed. It’s a lot of clicking, and I found myself checking reports less and less often - certainly not casually.

But this isn’t good in fact it is terrible! I’m an independent app developer and need to stay on top of things. I want to make sure my apps are performing well and that customers are happy with them. My livelihood depends on it.

Enter James Montemagno. After recording an episode of Merge Conflict, we were chatting about what our ideal sales and builds status app would be. We both fell in love with the idea of a little menu that, when clicked, would show you all the important information about all your apps, at a glance. No more web pages with passwords, no more progress bars, just glorious data.

I decided that this app needed to exist - I, at least, needed it to exist. So I wrote it. And now I’m making it for sale to you, I hope you love it. :-)

Amazing Features

What does the app show you? EVERYTHING. It shows you everything:

Name and platform
Average proceeds per day (including in-app purchases)
Average unit sales per day (including free apps)
Bundle identifier and SKU
Your app’s website
Original release date and time on the store
Latest version number and time on the store
Rate at which people rate your app (ratings/month)
Latest version rating
Overall app rating
Device distribution based on purchases
Country distribution based on purchases
Chart of proceeds over time
Average proceeds per day
Projected proceeds per month and year
Chart of unit sales over time
Latest builds uploaded to TestFlight

All stats are linked to the appropriate section of Apple’s developer site, making Appstat a great launchpad.

Total store proceeds are also displayed:

Average total proceeds per day (and standard deviation)
Projected total proceeds per month (estimated using a configurable amount of history)
Projected total proceeds per year

Amazing Security

I can’t expect you to use an app like this unless I take security very seriously, and I did.

#1 No usernames or passwords

This isn’t the 90s. Instead, all requests are signed using a personally generated API Key. This gives you absolute control without ever exposing your usernames or passwords. This also relieves me of having to ask for that sensitive information.

#2 No data is ever transmitted from your local machine

This is obvious, but in 2020 I guess I need to state it. I don’t even collect analytics.

#3 No data is ever cached to disk

Disk caches are security holes so I don’t use them. Instead, the app downloads all data afresh when started. Don’t worry, this process is very fast.

#4 All account setting are encrypted in Keychain

You need to provide an API Key to access data and that key is encrypted locally on your machine in your login Keychain.

Selling on Gumroad

I’m selling Appstat on the wonderful gumroad. This is my first time selling there and so far I am very impressed with their service. I guess you should follow me on there, or something.

I also unlocked a Mac developer achievement with this app by integrating the Sparkle framework to do app updates. I feel like this is a real milestone in my career.

Colophon

When you write a security-conscious app with multiple data sources and sums over currencies, you want to use a language with strongly typed data and immutability - a functional programming language - you want to use F#. And so I did. The data architecture is the reactive one I’ve been presenting at conferences over the last 5 years.

Appstat was written using Xamarin.Mac to keep it good looking and fast and was written using Visual Studio for Mac. Big thanks to everyone on the Xamarin team for making it possible to use my favorite language on macOS!

praeclarum

I Built the World’s Largest Translated Cuneiform Corpus using AI

Cuneiform

Introducing the AICC

Can it Translate Tiglath-Pileser?

AI Translation of Tiglath-Pileser

Why AI Translations?

Sumerian

Akkadian

Training a Large Language Model

Building the Dataset

Inconsistent Transliterations

Paragraph Wrapping and Unwrapping

Training Process

Learning Sumerian and Akkadian Simultaneously

Bidirectional Translation

Future Work

Conclusion

How I Re-implemented PyTorch for WebGPU

Neural Networks in the Browser

What is a PyTorch?

What is a WebGPU?

Writing Optimized WebGPU Kernels

Debugging WebGPU Kernels

Testing

Goals

Generating SVG Images with ChatGPT

Background

Experiments

Please generate an SVG of a catus [sic] out in the desert on a sunny day, no explanation.

OK, great. But please now generate a much more detailed image using gradients and paths

Please generate an SVG of a cactus in the desert on a sunny day. Please make it as detailed as possible. Use vibrant colors. Use dramatic lighting. I want depth! I want to feel like I’m there! I want something shockingly beautiful.

App UI Icons

Would you generate the SVG code for a mac toolbar icon that toggles the selected item preferences pane on and off?

Please generate an SVG code of a gear icon. Please make it detailed showing the teeth and such. You can use paths if you want. Make your response as long as it needs to be.

Please generate an SVG code of a house icon. You can use paths if you want. Make your response as long as it needs to be.

App Icons

Please generate the SVG code of an app icon for iCircuit, an app that enables you to draw and simulate electronic circuits. The icon should have a nice graident background and fit in well on the iOS home screen. You can use paths if you want. Make your response as long as it needs to be.

Nice, I like that gradient. But the inner logo is too weird. Maybe put a nice looking sine wave in there. Centered nicely.

Art

Please generate the SVG code of a nice looking abstract art piece. You can use paths if you want. Make your response as long as it needs to be.

Please generate the SVG code of a spaceship using awesome neon colors and rad gradients.

Please generate the SVG code of a red circle. Inside of that circle put a white square. Inside that, put the red text ‘THIS IS A WARNING’. Wrap the text if needed. Use some gradients to make it look good an intimidating.

Conclusion

Notes

Transformers-js - Neural Networks in the Browser

Introduction

Hugging Face 🤗 Transformers with transformers-js

Tokenization

Generation

Demo

Optimizing Models for the Browser

Conclusion

Practical Guide to Solving Hard Problems

The Technology of iCircuit 3D

Programming Language and Style

Properties FTW

Lots of Multithreading

Apple Only

The 3D Renderer

Thread Safety

Physically Based Rendering

LED Lights

Skeletal Meshing for Wires

Environment Maps

Custom Shaders

Dynamic Geometry

Geometric Instancing

Physics Simulation

Electronics Simulation

The End

F# Advent - Functional Motor Control

Motor control

Inputs

Outputs

Control Algorithm

Hardware Abstraction

Function Control Loop

Real Hardware

Conclusion