praeclarum

Coding with Local Agents on an RTX 3090

2026-05-05T00:00:00+00:00

TL;DR Running coding agents on local machines has never been easier. This article gives easy setup instructions for running Qwen 3.6 27B on an RTX 3090 in Linux. I then show how to use the model in VS Code using the LLM Gateway extension. By the end of this guide, you’ll be free of service providers and able to run a variety of OSS models.

Update (May 16, 2026): Over a week has gone by since posting this, so of course everything has changed. Well not everything but MTP is the new performance hotness and I’ve added a section about how I use it at the end.

Overview

There are roughly two steps to running a local coding agent:

Get the model up and running serving the standard chat API.
Connect the model to your coding environment (e.g., VS Code).

There are hundreds of different OSS models, and hundreds of different model servers to choose from. You have, frankly, an overwhelming number of options to fulfill step #1. That said, if you’re looking to run these models on consumer-grade hardware, you will be looking at models in the 7B-31B parameter range. Here is one site, of many, that tries to rank these beasts: Artificial Analysis

For this guide, I will focus on Qwen 3.6 27B from Alibaba since it works well-enough. But Gemma 4 31B from Google is a champ and is worth also looking at.

There is a wonderful arms race happening with model servers right now too. A model server is a giant math library, optimized into oblivion, that deigns to run an HTTP server so it can service requests. But it also has one more crucial component: a caching layer that keeps as much of chat conversations in GPU memory as possible in order to minimize latency and compute time - the KV cache.

For this guide, I will focus on llama.cpp since it is pretty popular, easy to use, and has good GPU support. But there are a number of other servers that are worth looking at, including vLLM, Ollama, MLX-LM, MTPLX, and on and on.

Download the Model

This is both the easy part, and the hard part. Easy, because all you have to do is go to Hugging Face and download any of the thousands of models available. It’s hard because there are so many models! There’s model families, model sizes, model fine tunes, model quantizations, model formats. Oh my!

Most inference engines (like llama.cpp) support a specific set of model formats, so that will narrow down your options. For llama.cpp, the supported format is GGUF, so you’ll want to look for models in that format. For MLX models (to run on Apple Silicon), you’ll look in the mlx-community.

You’ll now need to pick a quantization size. Quantization is a compression method for model weights. If we took a 27 billion parameter model with 32-bit floating point weights, it would be 27B * 32 bits = 108 GB in size. Unless you have a datacenter handy, you won’t be running that. Instead, you’ll choose, say a 4-bit quantized model. This will compress the weights down to 27B * 4 bits = 13.5 GB, which is much more manageable for consumer hardware. The tradeoff is that quantization can reduce the model’s performance and accuracy, but it’s often a necessary compromise.

Now the RTX 3090 has 24 GB of VRAM so you might be tempted to pick a higher-bit quantization, but you have to keep in mind that the context and the output also have to fit in GPU memory. If you want long contexts and long outputs, you might have to go with a lower-bit quantization to ensure everything fits.

The Q4_K_M quantization format is a good compromise for a 27B model and a 24 GB GPU. So I’m going to download the Qwen 3.6 27B Q4_K_M model from Hugging Face:

wget "https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/Qwen3.6-27B-Q4_K_M.gguf?download=true"

(wget is a little dumb, so you’ll need to rename the file after downloading it since it doesn’t handle the ?download=true part of the URL very well.)

Build llama.cpp

You can download prebuilt libraries of llama.cpp but if you want to ensure its optimized for your machine and hardware, you’ll want to build it yourself. Thankfully, it’s pretty easy to do:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j 8

Aside from the nastiness of having to use CMake, building software doesn’t get much easier than this.

I passed the -DGGML_CUDA=ON flag to ensure that I get NVIDIA CUDA support, which is crucial for running these large models on consumer-grade hardware. If you’re on an M-series Mac, you would want to pass -DGGML_METAL=ON instead to get support for Apple’s Metal API.

If all goes well, you will have a nice, shiny build/bin/llama-server executable that you can use to serve your model.

Run the Server

You will want to run the server with a delicious soup of command line arguments. Something like this:

./build/bin/llama-server -m ~/Downloads/Qwen3.6-27B-Q4_K_M.gguf --host 0.0.0.0 -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0

Let’s deconstruct that soup:

Argument	Description
`-m`	The path to the model file you downloaded.
`--host 0.0.0.0`	This tells the server to listen on all network interfaces, which is necessary if you want to connect to it from another machine (e.g., your dev machine).
`-ngl 99`	This sets the number of GPU layers to use. Setting this to 99 tells the server to use as many GPU layers as possible, which will maximize performance.
`-c 262144`	This sets the context size to 262,144 tokens, which is the maximum context size for this model. You can adjust this based on your needs and GPU memory constraints.
`-fa on`	This enables the “faster auto-regressive decoding” feature, which can improve performance.
`--cache-type-k q4_0 --cache-type-v q4_0`	This sets the quantization type for the KV cache to `q4_0`, which is a good choice for performance and memory efficiency.

Notice how we are quantizing the KV cache (context and outputs) as well. This is a crucial step for ensuring that the model runs efficiently on consumer-grade hardware, as the KV cache can consume a significant amount of GPU memory.

You’ll be greeted with typical programmer excretions:

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24159 MiB):
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24159 MiB
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build_info: b9026-a817a22bc
system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 860 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
Running without SSL
init: using 11 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model '/home/fak/Downloads/Qwen3.6-27B-Q4_K_M.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
common_params_fit_impl: getting device memory data for initial parameters:
common_memory_breakdown_print: | memory breakdown [MiB] | total    free     self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - CUDA0 (RTX 3090)   | 24159 = 23257 + (21388 = 15345 +    5206 +     836) +      -20486 |
common_memory_breakdown_print: |   - Host               |                   1214 =   682 +       0 +     532                |
common_params_fit_impl: projected to use 21388 MiB of device memory vs. 23257 MiB of free device memory
common_params_fit_impl: will leave 1868 >= 1024 MiB of free device memory, no changes needed
common_fit_params: successfully fit params to free device memory
common_fit_params: fitting params to free memory took 0.66 seconds
llama_model_loader: loaded meta data with 51 key-value pairs and 851 tensors from /home/fak/Downloads/Qwen3.6-27B-Q4_K_M.gguf (version GGUF V3 (latest))

Congratulations. You’re now an AI service provider. I recommend getting some seed capital and start selling access to your model to the highest bidder.

But before you do that…

Install LLM Gateway in VS Code

I rock VS Code for all my coding needs, and I want to be able to use my local model in its AI agent chat window thingy. To do that, I need to install an extension that connects VS Code to the standard chat API. (Why VS Code doesn’t support the API standard that literally every LLM server provides is beyond me.)

ANYWAY, I like the LLM Gateway extension by Andrew Butson.

Install that extension.
Open the “GitHub Copilot LLM Gateway: Configure Server” UI from the command palette and enter the URL for your server (e.g., http://my-awesome-server.local:8080).
Test the connection with the “GitHub Copilot LLM Gateway: Test Server Connection” command. It should say “Found 1 model(s)” if everything is working. (If it’s not working, email James Montemagno and ask him for help.)
Open the “Chat: Manage Language Models” UI from the command palette. You should see your model listed but it will probably be grayed out for some reason. Click it, click the eye ball (gross!), and it should now be active and ready to use in the chat window.
Open the chat window, and click the model selector. Choose “Other Models”, scroll, and scroll, looking for your model. It’s there somewhere. I promise. You might doubt it, but have faith. When in doubt, keep scrolling. You can do it. You found it! Click it, and now you can use your local model in the chat window!

MTP for Speed

(Added May 16, 2026)

Since posting this article, Multi-Token Prediction (MTP) has been released, and it is a game changer for performance. MTP is a new parallelism method that somehow makes things faster by doing more work. Weird, I know. Instead of the purely serial operation of (1) generate a token, (2) add it to the context, (3) GOTO 1, MTP uses a much smaller model to quickly do the 1-2-3 dance for a few tokens and then uses the real model to verify the results. Given the nature of these beasts, the smaller model takes up very little wall time but still has OK-ish accuracy. The big model, instead of being just a generator, is used to verify the probabilities of these new tokens. It can do that quickly because it doesn’t need to test them one at a time, but can test them all in parallel (vs serial). Since modern LLMs are memory bandwidth bound, not compute bound, this parallel execution is “free”. The result is a huge speed boost, about 1.4x-1.8x. It’s a crazy hack, and I’m here for it.

As of this writing, MTP is still a work-in-progress for llama.cpp, but it is available in PRs and forks. I’m compiling using am17an’s fork.

./build/bin/llama-server -hf unsloth/Qwen3.6-27B-MTP-GGUF:UD-Q4_K_XL --host 0.0.0.0 -c 150000 -ngl 99 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --temp 0.6 --top-p 0.95 --top-k 20 --presence-penalty 0.0 --min-p 0.00 --spec-type draft-mtp --spec-draft-n-max 2

The important new args are:

Argument	Description
`--spec-type draft-mtp`	This tells the server to use the MTP parallelism method.
`--spec-draft-n-max 2`	This sets the maximum number of tokens to predict in parallel. 2 is a very conservative choice.

I’ve also modified a few other sampling parameters based on recommendations for coding environments:

Argument	Description
`--temp 0.6`	This sets the temperature to 0.6, which is a good choice for coding tasks as it encourages more deterministic outputs while still allowing for some creativity.
`--top-p 0.95`	This sets the nucleus sampling parameter to 0.95, which helps to ensure that the model generates more relevant and coherent code by focusing on the most probable tokens.
`--top-k 20`	This sets the top-k sampling parameter to 20, which limits the number of tokens considered at each step to the 20 most likely, further improving the relevance of the generated code.
`--presence-penalty 0.0`	This sets the presence penalty to 0.0, which means that the model will not be penalized for generating tokens that have already appeared in the context, which can be beneficial for coding tasks where repetition of certain tokens (e.g., variable names, function names) is common.
`--min-p 0.00`	This sets the minimum probability threshold to 0.00, which means that the model will consider all tokens regardless of their probability, allowing for a wider range of potential outputs.

I also switched to the Qwen3.6-27B-MTP-GGUF model, which includes the smaller MTP model needed for the parallel token prediction. And I switched to the UD-Q4_K_XL quantization format, because guessing which quantization format to use is half the fun of self-hosting models.

Is it Worth It?

What does an RTX machine cost these days?

Component	Price (USD)
RTX 3090	$1500
CPU	$300
64 GB RAM	$700 (what has the world come to?)
HDD	$200
PSU	$150
Case	$100
Total	$2,950

So for about $3,000 you can have your very own local coding agent. That’s a pretty hefty price tag, but it’s also a one-time cost.

In a typical day, I burn through about 50,000,000 tokens. 500,000 output tokens, 1,750,000 input tokens, and the rest are cache hits. At 40 tok/second (typical for my RTX), my compute day is about (500,000 + 1,750,000) / 40 = 56,250 seconds, which is about 15.6 hours of compute time per day. Ugh.

Right now, you can use DeepSeek for $3.48 per 1,000,000 output tokens, $1.74 for inputs, and $0.0145 for cache hits. So my daily cost would be (500,000 / 1,000,000) * 3.48 + (1,750,000 / 1,000,000) * 1.74 + (47,750,000 / 1,000,000) * 0.0145 = $5.48 per day. That’s about $1,400 per year (five day work weeks). So in about 2 years, I would recoup the cost of running my own local agent. Hmmm…

So you might not want to run out and buy your own server. But, if you do have an over-provisioned gaming rig, well you might as well put it to use doing something useful. ;-) X

Conclusion

Since 2017 I have been advocating running local models. I’m amazed that it’s now possible to run 27B parameter variants on consumer hardware. (In my mind, 7B is still tremendous.) These are real models, able to write good code, in a fully agentic harness. Amazing.

While the up front hardware cost, the noise of fans, and the slower response rates are not ideal and don’t make this an easy win, I have a different perspective. AI coding has changed how I work. Permanently. I do not want to go back to writing every line of code by hand, it seems absurd now. But I also don’t like being at the mercy of large cloud providers. Having the ability to run my own local agent, even with its limitations, is a huge win for me. I know, even with no internet connection, I can still do what I love: code.

Colophon: Written by hand. Proofread and edited by Qwen 3.6 27B running on an RTX 3090.

AppReviewFetch - Because App Store Connect Shouldn’t Require a Web Browser

2026-01-21T00:00:00+00:00

TL;DR I wrote a .NET library, CLI tool, and MCP server for fetching app reviews from App Store Connect (with Google Play support in the works). You can use them yourself with the NuGet packages AppReviewFetch, AppReviewFetch.Cli, and AppReviewFetch.Mcp. The MCP server is especially exciting because it lets AI assistants analyze your reviews for you.

If you’re an Apple app developer, you’ve done The Dance. You know the one. Open Safari. Navigate to App Store Connect. Click through seventeen dropdown menus. Wait for the world’s slowest web app to load. Click “My Apps.” Wait again. Click your app. Wait. Click “Ratings and Reviews.” Wait. Squint at a paginated table that shows you 10 reviews at a time. Wonder what your users are actually thinking. Give up and go make coffee.

This is 2026. I have apps that have been on the App Store for well over a decade. I have accumulated a lot of reviews. I want to read those reviews. I want to search those reviews. I want to understand what my users love and what makes them cry. And I don’t want to do any of that through a web browser that feels like it’s running on a Pentium II.

So I built AppReviewFetch.

The Boring Part (The Library)

At its core, AppReviewFetch is a .NET library for talking to the App Store Connect API. It handles the JWT authentication dance (which, let me tell you, is a dance), fetches your apps, fetches your reviews, handles pagination, and gives you nice strongly-typed objects to work with.

dotnet add package AppReviewFetch

using AppReviewFetch;

var service = new AppStoreConnectService();
var reviews = await service.GetReviewsAsync("YOUR_APP_ID", new ReviewRequest
{
    SortOrder = ReviewSortOrder.NewestFirst,
    Limit = 100,
    Country = "US"
});

foreach (var review in reviews.Reviews)
{
    Console.WriteLine($"{review.Rating}/5 - {review.Title}");
    Console.WriteLine(review.Body);
}

Is this exciting? Not particularly. It’s plumbing, good plumbing, I like to think. Clean interfaces, proper exception handling, support for dependency injection are all good plumbing. It’s the kind of code that has to exist so that more interesting things can be built on top of it.

If you want an easy API to build your own tools, this is it.

But I know you didn’t come here for plumbing.

The Actually Useful Part (The CLI)

Here’s my pitch: what if you never had to open App Store Connect in a web browser again to check your reviews?

dotnet tool install -g AppReviewFetch.Cli
arfetch

That’s it. Now you have a beautiful, interactive REPL that does everything the web interface does, except faster and without making you want to throw your computer out the window.

arfetch> list
┌────────────────────────────────────────────────────────────────────────────┐
│                              Available Apps                                 │
├──────────────┬─────────────────────────┬─────────────────────────┬─────────┤
│     App ID   │          Name           │        Bundle ID        │  Store  │
├──────────────┼─────────────────────────┼─────────────────────────┼─────────┤
│  123456789   │        iCircuit         │ com.krueger.icircuit    │  Apple  │
│  987654321   │          Calca          │ com.krueger.calca       │  Apple  │
└──────────────┴─────────────────────────┴─────────────────────────┴─────────┘

arfetch> fetch iCircuit
⭐⭐⭐⭐⭐ "Best circuit simulator ever!"
"I've been using this app for years and it just keeps getting better..."

⭐⭐⭐⭐☆ "Great app, one suggestion"
"Love the Arduino support! Would be nice if it could also simulate..."

arfetch> fetch iCircuit US
[Shows only reviews from the United States]

arfetch> export reviews.csv
✓ Exported 342 reviews to reviews.csv

You can search by app name, bundle ID, or app ID. You can filter by country. You can export to CSV for further analysis. The reviews are beautifully formatted with colors and star ratings. Developer responses are included. Pagination is handled with a simple “more” prompt.

It’s the kind of tool I always wished existed. So I built it.

Setup takes about two minutes: you need an App Store Connect API key (which Apple now makes reasonably easy to generate), and you run arfetch setup which walks you through entering your credentials. Done.

The Wild Part (The MCP Server)

Okay. Here’s where things get fun.

Model Context Protocol is a standardized way for AI assistants to access external data sources. GitHub Copilot supports it. Claude supports it. And now AppReviewFetch supports it.

dotnet tool install -g AppReviewFetch.Mcp

Add a few lines to your VS Code settings or Claude Desktop config, and suddenly your AI assistant can read your app reviews.

This is where things get really interesting.

You can ask GitHub Copilot, “What are users complaining about in my app?” and it will actually go fetch your reviews, analyze them, and tell you. Not hallucinated reviews from its training data. Your actual reviews. Right now. In real time.

Things You Can Do That Sound Made Up But Aren’t

“Analyze the sentiment of my reviews over the past month and tell me if things are getting better or worse.”

The MCP server will fetch hundreds of reviews, compute rating distributions, analyze trends, and give you a comprehensive report. No clicking through web pages. No exporting CSVs and opening them in Excel. Just ask.

“Show me all the 1-star reviews and categorize them by complaint type.”

Your AI assistant will identify patterns you might have missed. “42% of negative reviews mention performance issues on older devices. 28% mention difficulty with the tutorial. 17% are people who clearly don’t understand what the app does.”

“Draft responses to this week’s negative reviews.”

AI assistants are pretty good at writing polite, helpful customer service responses. Now they can do it with full context of what the user actually said.

“Compare my review sentiment to what users are saying on Reddit about my app.”

Combine the MCP server with web search and you’ve got a comprehensive view of user sentiment across platforms.

“Every morning, give me a summary of new reviews across all my apps.”

Set up a simple automation and never be surprised by a sudden influx of negative reviews again.

“Find reviews that mention crashes and correlate them with my crash reports.”

Cross-reference user-reported issues with your crash analytics to prioritize bug fixes.

The Really Wild Ideas

Here’s where my imagination starts running ahead of my implementation:

Automated Triage: Have an AI assistant monitor reviews and file GitHub issues for reported bugs, tagged by severity based on the language used.
A/B Testing Insights: After a release, ask “Are users responding better to the new design?” and get an answer based on actual review text analysis.
Competitive Analysis: If you’re analyzing reviews for your own apps, what about… other apps? (The API only gives you access to your own apps, but the architecture is there for when that changes.)
Multi-Language Sentiment: Reviews come in many languages. AI is pretty good at understanding all of them. “What are my Japanese users saying that’s different from my American users?”
Proactive Outreach: Identify users who left thoughtful negative reviews and might be good beta testers for new features.

I’ve built the foundation. The MCP server exposes three tools—ListApps, FetchReviews, and AnalyzeReviews—and the AI takes it from there. The creativity is limited only by what you ask it to do.

The Setup

Getting started is straightforward:

Get an App Store Connect API Key
- Go to App Store Connect → Users and Access → Keys
- Generate a key with App Manager, Customer Support, or Admin access
- Download the .p8 file and note your Key ID and Issuer ID

Install and Configure

dotnet tool install -g AppReviewFetch.Cli
arfetch setup

For MCP: Add the server to your AI assistant’s config:

{
  "servers": {
    "appreviewfetch": {
      "command": "arfetch-mcp"
    }
  }
}

That’s it. The CLI and MCP server share the same credentials file, so configure once, use everywhere.

The Three Packages

AppReviewFetch - The core library. Use this if you’re building your own tooling.
AppReviewFetch.Cli - The interactive CLI. Use this if you want a better alternative to App Store Connect’s web interface.
AppReviewFetch.Mcp - The MCP server. Use this if you want AI to do the analysis for you.

All three are MIT licensed and available on GitHub.

What’s Next

Google Play support is in early beta. The API is different (shocker), and Google’s authentication is its own special adventure, but the architecture is already there—the library is built around an IAppReviewService interface that can support multiple stores.

I also want to add Windows Store support, but I haven’t looked into that API yet.

There are of course other sources of feedback for apps, email and social media being the big ones. If I can find reliable APIs for those, I may add them as well.

The Real Point

I’ve been an iOS developer since 2008. Eighteen years. That’s a lot of reviews. That’s a lot of time spent in App Store Connect’s web interface, clicking and waiting and clicking and waiting.

Developer tools should respect developers’ time. They should be fast. They should be automatable. They should integrate with the workflows we’re already using.

App Store Connect is none of those things. So I built something that is.

And now that AI assistants can actually understand and act on data from external sources, the potential for what we can do with our apps’ feedback has exploded. The reviews are just sitting there. They’re full of insights. We just need better ways to access them.

AppReviewFetch on GitHub

NuGet Packages

The Many Ways to Deploy iOS Apps in 2025

2025-10-21T00:00:00+00:00

TL;DR There are no less than 4 different ways you can deploy your iOS app to a physical device for testing. I enumerate all of them below, along with their pros and cons. In the end, I recommend using dotnet build -t:Run if you can, as it is the simplest and most reliable method. But knowing the alternatives can be useful in certain situations.

From None to Many

Just a couple years ago, I didn’t know any of the ways to deploy my .NET/MAUI iOS apps to a physical device. I only knew how to use Visual Studio for Mac’s built-in run button, which worked fine for me. Until they killed it.

At present, there are only two IDEs for macOS that support .NET MAUI iOS development: JetBrains Rider and Visual Studio Code. Rider should be nice, but has terrible bugs where it will sometimes try to deploy iossimulator builds to the device, or that persistence bug where it fails to rebuild apps when dependencies change, or it just doesn’t detect devices at all. It’s not at all reliable. Visual Studio Code should be nice too, but it basically requires that you write MAUI apps for anything to work. If you’re like me and prefer to build native UIs, then you’re out of luck.

But there is good news, and I am here to deliver it. There’s not one but four different ways you can deploy your iOS app to a physical device for testing, none of which require an IDE. Here they are, in order of my preference with my own heart-felt pros and cons.

1. `dotnet build -t:Run`

It turns out, good old MSBuild has a built-in task to do exactly what we want. You can use the dotnet build command with the -t:Run target to build and deploy your app to a connected device in one step. How is this different from dotnet run? I don’t know. Don’t ask me, ask Microsoft.

You can read all about dotnet build -t:Run in the official “Launch the app on a device” documentation.

Just run:

dotnet build -t:Run -f net9.0-ios -p:RuntimeIdentifier=ios-arm64 -p:_DeviceName=MY_SPECIFIC_UDID MyApp.csproj

and replace MY_SPECIFIC_UDID with your device’s UDID and MyApp.csproj with the name of your project file (or elide it if you’re in the project’s directory). Adjust the -f target framework to match your project (for example, net8.0-ios).

A UDID is a Unique Device Identifier, a unique string that identifies your iOS device. It’s like a UUID, but there’s a D instead of a U.

How to Find Your Device’s UDID

Don’t you worry, here are 4 easy ways to find your device’s UDID:

Xcode: Open Xcode, go to “Window” > “Devices and Simulators”, and select your device. The UDID will be listed there.
xcrun: Buried in every Xcode installation is the xcrun command line tool. You can use it to list connected devices and their UDIDs with this command:
```
 xcrun devicectl list devices --hide-default-columns --columns Name --columns UDID
```
Output will look something like this:
```
 Name                    UDID                     
 ---------------------   -------------------------
 Precious XXIII          00008030-000E409C0E10802E
 Precious XXIV           00008103-001829AE14BB001E
 Precious XXVII          00008140-000A0DC83013C01C
 Precious XXVIII         00008150-001C2C3E0EFB801C
```
The only problem is that it will only report modern devices with their modern UDIDs. If you have devices running iOS 15 or earlier, Apple doesn’t support them with devicectl, so you’ll have to use one of the other methods.
mlaunch: Buried so deeply in the .NET iOS workloads that you’ll need an oxygen mask to find it, is another wonderful tool called mlaunch. You can use it to list connected devices and their UDIDs with this command:
```
 mlaunch --listdev
```
Output will look something like this:
```
 Precious XXIII: 00008030-000E409C0E10802E
 Precious XXIV: 00008103-001829AE14BB001E
 Precious XXVII: 00008140-000A0DC83013C01C
 Precious XXVIII: 00008150-001C2C3E0EFB801C
 Precious XVIV: c52a6fd19cc179aad6696abe67cce53705bf22d0
```
(along with a bunch of errors). This works for all devices, old and new. Don’t make fun of the XVIV, Roman numerals are hard.
ios-deploy: Thanks to a few saints masquerading as software developers, there is the ios-deploy tool available one agonizing brew install ios-deploy away. You can use it to list connected devices and their UDIDs with this command:
```
 ios-deploy --detect
```
Output will look something like this:
```
 [....] Found 00008103-001829AE14BB001E (J517AP, iPad Pro 3G (11"), iphoneos, arm64e, 26.0, 23A340) a.k.a. 'Precious XXIV'
 [....] Found c52a6fd19cc179aad6696abe67cce53705bf22d0 (D11AP, iPhone 7 Plus, iphoneos, arm64, 15.7, 19H12) a.k.a. 'Precious XVIV'
```
This works for all devices, old and new so long as they are available for deployment. You see, the other tools report devices that are paired to the host machine, even if they are not available for deployment (e.g. locked, no trust relationship, etc.) but ios-deploy only reports devices that are actually available for deployment. This can be a blessing or a curse depending on your mood.

2. `mlaunch`

Buried so deeply in the .NET iOS workloads that you’ll need a spatula and a crowbar to find it, is the wonder aptly named mlaunch. You can use it to deploy your app to a connected device in two steps: first install the app, then launch it.

There is no documentation for mlaunch, but if you run it with the --help flag, you can see the myriad of options available to you. We shall discuss the most mundane ones here.

To install your app, you’re first going to need to build it. You can do this with the following command:

dotnet build -f net9.0-ios -r ios-arm64 MyApp.csproj

Now you can install the app with this command:

mlaunch --installdev=bin/Debug/net9.0-ios/ios-arm64/MyApp.app --devname=MY_SPECIFIC_UDID

The location of mlaunch is curious, spurious, and mysterious. You can find it where all the great workloads are hidden, I mean stored. Somewhere beneath /usr/local/share/dotnet/packs/Microsoft.iOS.Sdk.*

For example, on my machine it is located at: /usr/local/share/dotnet/packs/Microsoft.iOS.Sdk.net10.0_18.5/18.5.10727-net10-rc.1/tools/lib/mlaunch/mlaunch.app/Contents/MacOS/mlaunch

Now that your app is installed, you can launch it with this command:

mlaunch --launchdevbundleid=MY_BUNDLE_ID --devname=MY_SPECIFIC_UDID

Replace MY_BUNDLE_ID with your app’s bundle identifier (e.g. com.mycompany.myapp) and MY_SPECIFIC_UDID with your device’s UDID.

That’s it, you’re now a pro mlaunch user!

3. `xcrun`

If you are not in the mood to dig through the .NET SDK installation to find mlaunch, you can use the xcrun command-line tool that comes with Xcode. It has a subcommand called devicectl that you can use to install and launch your app on a connected device.

Build your app, and then run this command to install it:

xcrun devicectl device install app --device MY_SPECIFIC_UDID bin/Debug/net9.0-ios/ios-arm64/MyApp.app

Then run this command to launch it:

xcrun devicectl device process launch --terminate-existing --console --device MY_SPECIFIC_UDID MY_BUNDLE_ID

The --terminate-existing flag will kill any existing instance of your app before launching it, and the --console flag will stream the app’s console output to your terminal. Hot.

Sadly, xcrun devicectl does not support older devices with long-in-the-tooth iOS versions, so if you have an iOS 15 or older device, you should go spelunking for mlaunch instead.

You’ve made it this far, you are truly a command line iOS hacker dev. But there’s one more way to deploy your app, and it’s the most hacky way of them all.

4. `ios-deploy`

If you aren’t satisfied with first- or third‑party tools, and want to dabble with “fourth‑party” greatness, then you can use the ios-deploy command-line tool. This tool is not officially supported by Apple or Microsoft, but it is widely used by the iOS development community.

First, make sure you have ios-deploy installed. If you haven’t done this yet, you can install it using Homebrew:

brew install ios-deploy

Make sure you have a full battery, a gigabit internet connection, and a giant cup of coffee, because this may take a while. ios-deploy is small and quick to install, but Homebrew has to download 1/2 of the internet and at least 3 different versions of Python before it will let you have your machine back.

Once you have ios-deploy installed, you can use it to deploy and run your app. Build your app, then run:

ios-deploy --justlaunch --debug -i MY_SPECIFIC_UDID -b bin/Debug/net9.0-ios/ios-arm64/MyApp.app

And that’s it! You’ve now deployed your app using ios-deploy. Check out the ios-deploy README on GitHub for more information and advanced usage.

Conclusion

There you have it, four different ways to deploy your iOS app to a physical device for testing. No IDE needed. My personal favorite is dotnet build -t:Run because it’s the simplest and most reliable method. But knowing the alternatives can be useful in certain situations, when showing off to your mom, looking cool at parties, or when you just want to feel like a true command‑line iOS hacker dev.

Updating App Icons for iOS and macOS 26

2025-09-12T00:00:00+00:00

TL;DR Apple has introduced a new layered approach to app icons in iOS and macOS 26. This post explains how to update your .NET and MAUI app icons to comply with the new requirements and ensure your app looks great on all devices. I outline two different approaches from easiest (do nothing) to hardest (composing and compiling your icons). I also outline the steps needed to manually compile your icons using the actool command line tool.

The New Icon Requirements

With the release of iOS and macOS 26, Apple has introduced a new way to handle app icons. Instead of using a single static image, developers can now create layered icons that adapt to different contexts and styles.

All icons are now rounded rectangles with a required background layer and then up to four additional layers on top. These layers can each have their own material properties such as specularity (shininess), transparency (along with frosting), and drop shadows. Each layer is treated as being physically above the previous layer, creating a variety of 3D effects.

The layers are composited by the OS to create the final icon, which dynamically adapts to the environment, and, more importantly, user settings.

So how do you take advantage of this new feature in your .NET and MAUI apps? I’ll give you 3 options from easiest to hardest.

Option 1: Do Nothing

The easiest option is to do nothing. If you don’t update your app icons, Apple will analyze your icon and automatically create a layered version for you. This is the easiest option, but it may not give you the best results.

Simple Icons

This approach works well if your icon is a typical two-layer icon with a background and some distinct foreground imagery. In this case, Apple will attempt to separate the background and foreground layers and apply some default material properties to each layer.

For example, here is a simple icon with a blue background and white foreground and a little drop shadow. To its right is the automatically generated layered icon.

Note that the corners are rounded and the system applied specular highlights to the border. The drop shadow and the gradient background were preserved.

If your icon is simple like this, you can probably get away with doing nothing. You’ll miss out on the 3D layering effects, but your icon will still look good and will fit in with the OS.

What if your icon is more complex?

Complex Icons

Complex icons, those that use transparency to create a non-rectangular shape, do not fare as well with the automatic approach. The system will add a background layer of its own choosing (some shade of gray) and then place your icon on top of it. This immediately distinguishes your app as older and not taking advantage of the new icon system.

Here is an example of a more complex icon with transparency and multiple colors.

In this case, the system added a gray background layer and then squeezed the original icon on top of it. The result is a bit jarring and does not fit in with the new icon style.

If your icon is complex like this, then you should consider fully redesigning your icon using the new layered approach and Icon Composer app.

Option 2: Use Icon Composer

The Icon Composer app is a new tool included with Xcode 26 that allows you to create layered app icons. You can use this tool to design your icon with a background and up to four foreground layers, each with its own material properties.

To use it, you will want to start with your original icon and decompose it into layers. Ignore the background layer as that will be configured in the Icon Composer app. Then, create up to four additional layers for the foreground elements of your icon.

These layers should be simple SVG or PNG files without drop shadows, transparency, or too much highlighting since those attributes can be configured in the Icon Composer app itself.

Let’s see it in action. First, I extracted the foreground layer and exported it as a PNG file. I then created a new icon in Icon Composer with a background gradient and this simple foreground layer.

On the left is my single layer (with transparency) and on the right is the rendered icon (on macOS). There is the background gradient shaded a bit differently than before (I used the same colors from the other examples, the difference is due to the system’s rendering lighting model). The foreground layer is now sitting on top of the background with a very light drop shadow.

Most striking is how the foreground now gets rendered with its own specular highlights and shading. It looks quite rounded now. These are just the defaults, within Icon Composer you can adjust the material properties of each layer to get the look you want.

Here is a screenshot of the Icon Composer app itself so you can see the layers and properties.

There are a lot of options and even more environments (dark, light, mac, iOS, tinted, etc.) to preview your icon in.

Manually Compiling Your Icons

As of this writing (September 12, 2025), the .NET build system does not handle the new layered icon format.

However, you can manually compile your icons using the actool command line tool that comes with Xcode. This tool will take your .icon file created with Icon Composer and produce (1) a compiled Assets.car file that contains the layered icon for macOS 26 and iOS 26, and (2) a backwards compatible .icns file for macOS that will be used on older versions of macOS. This tool needs to be run separately for iOS and macOS.

This process is quite hacky, so I’d recommend waiting until the .NET build system supports this natively. But if you want to try it out, here are the steps.

iOS

# Compile for iOS
/Applications/Xcode.app/Contents/Developer/usr/bin/actool \
    MyApp/MyIcon.icon --app-icon MyIcon \
    --compile . \
    --output-partial-info-plist assetcatalog_generated_info.plist \
    --target-device iphone --target-device ipad --minimum-deployment-target 13.0 --platform iphoneos \

This will produce 4 files:

MyIcon60x60@2x.png - backwards compatible image for iPhones running iOS 13.0 and later
MyIcon76x76@2x~ipad.png - backwards compatible image for iPads running iOS 13.0 and later
Assets.car - the archived assets that contain the new fully layered icon for iOS 26 and beyond. This is the file you need to include in your app bundle as a resource for the new icon to be used.
assetcatalog_generated_info.plist - a plist file whose entries you should add to your app’s Info.plist file.

If your app is using other Assets, then you should augment the command line to include the .xcassets directory. For example:

# Compile for iOS with existing assets
/Applications/Xcode-26-RC.app/Contents/Developer/usr/bin/actool \
    MyApp/Assets.xcassets \
    MyApp/MyIcon.icon --app-icon MyIcon \
    --compile . \
    --output-partial-info-plist assetcatalog_generated_info.plist \
    --target-device iphone --target-device ipad --minimum-deployment-target 13.0 --platform iphoneos \

macOS

Supporting macOS is similar, but you only need to support one target device (mac) and you will get a backwards compatible .icns file instead of multiple PNG files.

# Compile for macOS
/Applications/Xcode.app/Contents/Developer/usr/bin/actool \
    MyApp/MyIcon.icon --app-icon MyIcon \
    --compile . \
    --output-partial-info-plist assetcatalog_generated_info.plist \
    --minimum-deployment-target 11.0 --platform macosx --target-device mac

This will produce 3 files:

MyIcon.icns - backwards compatible icon for macOS 11.0 and later
Assets.car - the archived assets that contain the new fully layered icon for macOS 26. This is the file you need to include in your app bundle for the new icon to be used.
assetcatalog_generated_info.plist - a plist file that you can ignore (just contains the icon name).

Including in Your App

Once you have the compiled Assets.car file for iOS and/or macOS, you need to include it in your app bundle as a resource (along with the .png and .icns files).

In your .csproj file, you can include the files like this:

   Include="Assets.car" />
   Include="MyIcon.icns" Condition="'$(TargetFramework)' == 'net9.0-macos'" />
   Include="MyIcon.png" Condition="'$(TargetFramework)' == 'net9.0-ios'" />

You also need to update your Info.plist file adding whatever entries were generated in the assetcatalog_generated_info.plist file.

Conclusion

Apple’s new layered app icon system in iOS and macOS 26 offers a great opportunity to make your app stand out with dynamic and visually appealing icons. You can choose to do nothing and let the system generate a layered icon for you, or you can use the Icon Composer app to create a custom layered icon. For the adventurous, you can manually compile your icons using the actool command line tool. Whichever approach you choose, updating your app icons will help ensure your app looks great on all devices and fits in with the latest design trends.

Building and Deploying iOS MAUI Apps with GitHub Actions

2025-08-19T00:00:00+00:00

TL;DR Here are the basic steps to set up CI/CD for your iOS MAUI apps using GitHub Actions! I outline every step needed to setup the build environment, import your certificates and provisioning profiles, build, and, finally, upload your iOS apps to Test Flight and the App Store. I leave the macOS, Android, and Windows versions as an exercise for the reader.

The Challenge of MAUI CI/CD

MAUI (Multi-platform App UI) is a powerful framework for building cross-platform applications, but setting up Continuous Integration and Continuous Deployment (CI/CD) can be a bit tricky, especially for iOS apps. The process involves setting up the build server to have all the right Xcode and .NET versions, actually building the app, signing it, and then distributing it to Test Flight and the App Store.

This blog post is a lot longer than I would like it to be, but the good news is that once you have a good CI/CD script running, it’s pretty stable and you can reuse it for all your MAUI apps. So, let’s dive in!

Setting Up the Job

First, you need to setup the job to have the correct versions of Xcode and .NET. This is crucial for building your MAUI app correctly. Let’s start with locking down the macOS version and the Xcode version. You can use the maxim-lobanov/setup-xcode action to specify the Xcode version you want to use.

jobs:
  build:
    name: Build iOS
    runs-on: macos-15
    timeout-minutes: 45

    env:
      DOTNET_CLI_TELEMETRY_OPTOUT: 1
      DOTNET_VERSION: "net9.0"

    steps:

    - name: Checkout Code
      uses: actions/checkout@v4
      with:
        submodules: true

    - name: Set Xcode
      uses: maxim-lobanov/setup-xcode@v1
      with:
        xcode-version: "16.3"

Here, I have specified macOS 15 and Xcode 16.3. You can adjust these versions based on your requirements.

I have also set a timeout of 45 minutes for the job, because macOS jobs are very slow and are also very expensive to run. You want to prevent runaway jobs from costing you a fortune.

I set the DOTNET_CLI_TELEMETRY_OPTOUT environment variable to 1 to disable telemetry, which is a good practice for CI/CD environments.

Lastly, I set the DOTNET_VERSION environment variable to net9.0 because it is repeated throughout these steps (and in build paths) and I like to minimize the things I need to change when updating .NET versions.

Now it’s time to install the .NET SDK. This is a two-step process:

Install the .NET SDK using the actions/setup-dotnet action.
Install the workloads needed for your MAUI app.

    - name: Setup .NET
      uses: actions/setup-dotnet@v4
      with:
        global-json-file: global.json

    - name: Install Workloads
      run: dotnet workload restore MyApp.sln

Here, I reference the global.json file to ensure the correct .NET SDK version is used. If you do not use a global.json file (why aren’t you?), specify the .NET SDK version directly in the dotnet-version input of the actions/setup-dotnet action.

Install Certificates and Provisioning Profiles

To build and sign your iOS app, you need to install the necessary certificates and provisioning profiles. This is a crucial step for iOS apps, as they require proper signing to run on devices and be distributed via Test Flight or the App Store.

The trick is to store your certificate as a GitHub secret. You’ll then restore that certificate to the keychain and then you’ll be able to automatically download the provisioning profile from Apple.

I use the apple-actions/import-codesign-certs action to import the certificate. It requires 2 things:

The base64-encoded P12 file of your certificate. I store this in a GitHub secret named APPSTORE_CERTIFICATE_P12.
The password for the P12 file. I store this in a GitHub secret named APPSTORE_CERTIFICATE_P12_PASSWORD.

    - name: Import Apple Certificate
      uses: apple-actions/import-codesign-certs@v4
      with:
        create-keychain: true
        keychain-password: ${{ secrets.APPSTORE_CERTIFICATE_P12_PASSWORD }}
        p12-file-base64: ${{ secrets.APPSTORE_CERTIFICATE_P12 }}
        p12-password: ${{ secrets.APPSTORE_CERTIFICATE_P12_PASSWORD }}

I use the same password for the keychain and the P12 file because I’m often using the same job to build macOS, iOS, and Catalyst apps and sharing the same keychain is convenient.

To generate the P12 file, you can use the Keychain Access app on your Mac. Export your certificate as a P12 file and save it somewhere with a password. Then, encode it to base64, which you can do with the following command:

base64 -i 'MyAppleCertificate.p12' | pbcopy

That will copy the base64-encoded string to your clipboard, which you can then paste into your GitHub secret. Make sure to also set the password for the P12 file as a GitHub secret.

Now we need to download the provisioning profile from Apple. This is done using the apple-actions/download-provisioning-profiles action. You need to provide the App ID and the Team ID for your Apple Developer account.

    - name: Download Provisioning Profile
      uses: apple-actions/download-provisioning-profiles@v4
      with: 
        bundle-id: 'com.example.myapp'
        profile-type: 'IOS_APP_STORE'
        issuer-id: ${{ secrets.APPSTORE_ISSUER_ID }}
        api-key-id: ${{ secrets.APPSTORE_KEY_ID }}
        api-private-key: ${{ secrets.APPSTORE_PRIVATE_KEY }}

(Set the bundle-id to match your app’s bundle identifier.)

The APPSTORE_ISSUER_ID, APPSTORE_KEY_ID, and APPSTORE_PRIVATE_KEY are also GitHub secrets that you need to set up. You can generate these from your Apple Developer account.

Go to https://appstoreconnect.apple.com/access/integrations/api to create an API key that you will use for GitHub Actions. This will give you the APPSTORE_ISSUER_ID, APPSTORE_KEY_ID, and the private key that you need to store as a GitHub secret.

Once all those secrets are set up, you can run the job and it will import the certificate and download the provisioning profile automatically.

Build the iOS App

This is the easy part. Take a stretch. Have some coffee. You’ve earned it.

All you need to do is run the dotnet publish command and pass it the project file of your MAUI app. Do not pass the solution as the build process is designed to work with the project file directly and will otherwise try to publish every project in the solution.

There are a few important flags to pass to the dotnet publish command:

-c Release: This specifies that you want to build the app in Release mode.
-f $-ios: This specifies the target framework for iOS. The env.DOTNET_VERSION variable is set to net9.0 in the env section of the job, so it will resolve to net9.0-ios.
-p:ArchiveOnBuild=true: This tells the build process to create an archive of the app, which is necessary for distribution.
-p:RuntimeIdentifier=ios-arm64: This specifies the runtime identifier for iOS.
"/p:CodesignKey=\"Apple Distribution: My Awesome Company, Inc. (XXX12AB34C)\"": This specifies the code signing key to use for signing the app. Replace this with your own code signing key name that you can see during the key import step. Escaping the quotes is necessary to ensure the command is parsed correctly and is a little insanity making, but it works.

    - name: Build
      run: |
        dotnet publish -c Release -f ${{env.DOTNET_VERSION}}-ios -p:ArchiveOnBuild=true -p:RuntimeIdentifier=ios-arm64 "/p:CodesignKey=\"Apple Distribution: My Awesome Company, Inc. (XXX12AB34C)\"" MyApp/MyApp.csproj

For details on all the wonderful options you can pass to the dotnet publish command, see the official documentation: Publish an iOS app using the command line.

Assuming you were a good developer, paid your taxes, and pass the karma test, this will build your app and create an .ipa file in the bin/Release/$-ios/ios-arm64/publish directory of your MAUI project.

Upload that Puppy to Test Flight

Finally, we need to upload the built .ipa file to Test Flight. This is done using the apple-actions/upload-testflight-build action.

    - name: Upload to TestFlight
      uses: apple-actions/upload-testflight-build@v1
      with:
        app-type: ios
        app-path: 'MyApp/bin/Release/${{env.DOTNET_VERSION}}-ios/ios-arm64/publish/MyApp.ipa'
        issuer-id: ${{ secrets.APPSTORE_ISSUER_ID }}
        api-key-id: ${{ secrets.APPSTORE_KEY_ID }}
        api-private-key: ${{ secrets.APPSTORE_PRIVATE_KEY }}

You were worried we were going to have to create more secrets weren’t you? Don’t lie. I know you were. Good news! You can use the same APPSTORE_ISSUER_ID, APPSTORE_KEY_ID, and APPSTORE_PRIVATE_KEY secrets that you used to download the provisioning profile.

This action will upload the .ipa file to Test Flight, where you can then test your app before releasing it to the App Store.

Conclusion

You are now ready to build and deploy your MAUI apps using GitHub Actions! Pat yourself on the back, that wasn’t easy. But the good news is that once you have this setup, you can reuse it for all your MAUI apps. Just make sure to adjust the bundle identifier and the project file path in the dotnet publish command.

Now go! Continuously integrate. Deploy continuously. And may your MAUI apps be bug-free and loved by users everywhere!

I Built the World’s Largest Translated Cuneiform Corpus using AI

2023-06-09T00:00:00+00:00

TL;DR I used a custom-trained Large Language Model (T5) to create the world’s largest online corpus of translated cuneiform texts. It’s called the AICC (AI Cuneiform Corpus) and contains 130,000 AI translated texts from the CDLI and ORACC projects.

Cuneiform

Cuneiform is the oldest known writing system. It was used in Mesopotamia (modern day Iraq) for over 3,000 years. It was used to write Sumerian, Akkadian, and other languages. Written on clay, it has survived the millennia and is now being translated by scholars around the world.

Sadly, we have more clay tablets than scholars. Fortunately, we have computers.

Introducing the AICC

I’m proud to introduce the AICC - a collection of 130,000 cuneiform texts translated from ancient Sumerian and Akkadian to English using a neural network. It is the largest collection of translated cuneiform texts in the world.

This is the 2nd edition of the translated corpus I released last summer. The 1st edition contained about 30,000 texts but this new edition boasts 130,000 texts. The corpus is growing fast!

How good are the translations? Well, they’re decent. :-) I hope you’ll go browse the site and see for yourself.

Judging the quality of cuneiform translations has a rich history. Indulge me in a story.

Can it Translate Tiglath-Pileser?

In 1857 a new cylinder inscribed with cuneiform text and the name Tiglath-Pileser was found (dated 1150 BC). At this time, cuneiform was just being relearned and there was a question as to how good various translation methods were.

The Royal Asiatic Society decided to perform an experiment (that was later published as the book Inscription of Tiglath Pilser I., King of Assyria). They would give the same inscription to three different translators and see how well they did. The idea was that if the translations were similar, then the current understanding of cuneiform was also good.

Here is a page from the book showing the beginning of two of the three translations:

We can see that although the translations are different, they convey the same meaning. The experiment was a success.

I am pleased to now add my own (well, my AI’s) translation to the mix. Just one problem, my corpus contains many objects with similar inscriptions and I’m not sure which one is this specific Tiglath-Pileser cylinder. There’s P393923, P463064 (newly translated), P463510 (newly translated), P467316 (newly translated), Q005926, Q006021 and more.

I decided in fairness and in the spirit of the competition to translate one that had no previous translation. I chose P467316 as its beginning seemed to match the other translations.

AI Translation of Tiglath-Pileser

AI Translation.

Ashur, great lord, who makes the totality of the gods submit, who gives scepter and crown, who establishes kingship; Enlil, lord, king of all the Anunnaku gods, father of the gods, lord of the lands; Sin, wise one, lord of the crown, saqu-worthy; Magur, Shamash, judge of heaven and underworld, who carries out the slander of the enemy, who breaks up the foe; Adad, hero, conqueror of the four quarters of the lands, the four quarters; Ninurta, hero, villainous sacrificial spirit and enemy, who destroys the heart of the people; Ishtar, foremost among the gods, lady of battle;

The great gods, who make the heavens and earth a lordly place, whose utterances are a scepter and a scepter, who make kingship supreme, Tiglath-Pileser, beloved prince, your beloved, your shepherd, who by your true heart you have entrusted to me, this exalted one, you have established for the sovereignty of the land of the great Enlil, you have granted him a scepter.

Continued in P467316.

While stilted in places, it is a decent translation, and I deem this experiment a success!

Why AI Translations?

Existing online repositories (CDLI, Oracc) contain many transliterations of ancient cuneiform texts (a transliteration is a rewriting of a text from one writing system to another without changing the language), but they are very lacking in the translations department.

While I am not a cuneiform expert, I am an expert at neural networks and have a deep passion for languages and writing systems. I want any person to have access to the archives of the ancients. A grandiose goal for sure, but also a very achievable one thanks to modern engineering advancements.

Sumerian

Consider Sumerian (spoken by the creators of cuneiform). There are currently 103,075 texts published with transliterations from cuneiform symbols to (mostly) latin letters. But only 4,583 of these texts have publicly available translations online. That is a mere 4% of texts available to a lay person such as myself.

Publications	Count
Transliterated	103,075
Translated	4,583
Need Translations	98,492

Given the existing transliterations, there are 98,492 works that can be translated but have not yet been.

(There are more translations than these, but the others are not freely available and are held under copyright. In other words, you need to go by a book to read them.)

Things aren’t much better for Akkadian (the language spoken by the famous Sargon and Ashurbanipal).

Akkadian

Publications	Count
Transliterated	31,747
Translated	10,069
Need Translations	21,678

We can see that 21,678 works are all set to be translated but have not been.

Training a Large Language Model

The modern advancement of large language models (LLMs) has affected and will continue to affect nearly every human endeavor.

The current architecture that is heralding this new age of knowledge is the Transformer architecture. It was designed specifically to be very good at translating text from one language to another using the innovative “attention mechanism”. It’s a little funny that this network designed for translation is now broaching the realm of artificial general intelligence (AGI), but I digress.

Ignoring the absurdly large LLMs that are dominating the field now (GPT-4 and friends), the humble smaller transformers are still quite powerful and have made the problem of translation a somewhat trivial.

My favorite one of these is the T5 network from Google. While large itself, it is capable of being trained using off-the-shelf (though expensive) GPUs. If you can build a large training set, you can train this network at home to accomplish wonders.

Knowing this I set about building a training set that the network could use to learn these ancient languages.

Building the Dataset

Thankfully there has been a push to digitize acquired artifacts and to publish their cuneiform on the web.

The two great projects are the CDLI (Cuneiform Digital Library Initiative) and Oracc. I owe a large debt to these projects.

As any machine learning expert will tell you, 90% of the problem is collecting a good training dataset (the other 10% is justifying the compute bill). Building the cuneiform dataset presented its own unique set of challenges.

Inconsistent Transliterations

Sadly, Assyriologists took some time to settle on a consistent transliteration system. When works were first transliterated to a digital form, only ASCII characters were available and the researchers made due using funny characters like # to denote demonstratives, numbers to disambiguate symbols, and ALL CAPS whenever they were in the mood (just kidding, but the use is so random it might as well be).

When other character encodings became available, researchers adapted. They started to use diacritic marks to disambiguate symbols (loosely based on guessed sounds). And then HTML was invented and they went wild with special marks attempting to better capture the original writing.

While neural networks are powerful and can certainly handle these inconsistencies, it’s not ideal. If you want the network to properly learn the language it’s best not to distract it with also learning the histrionics of human computer interface systems.

A wrote a variety of cuneiform and english normalizers to help with this problem. They’re not perfect, but they do a decent job.

Paragraph Wrapping and Unwrapping

Cuneiform texts are usually written line by line in a column and are read from top to bottom.

These lines are often short and, when translated, contain even fewer words. If I train the network on just these lines (and, surprise, I did for the 1st edition), the translations it produces are also short and choppy. They’re not great.

To work around this problem, I automatically “unwrap” lines into paragraphs to be translated all together. This way the network can learn to translate longer sentences and paragraphs.

The network, however, has its own limitations and can only translate sentences up to 512 tokens long. To work around this problem, I “wrap” the paragraphs into chunks of up to 512 tokens and translate those. I then stitch the translations back together to form the final translation.

This “unwrap” then “wrap” process is not perfect and can lead to some strange translations, but it’s better than the alternative of just translating single lines.

Training Process

I started with a pre-trained a T5 base model from Hugging Face and fine-tuned it on my dataset. This model has 220 million parameters and is capable of translating sequences of up to 512 tokens.

I trained it on a dataset of 210,247 translation examples for 30 epochs. It took about 48 hours on my RTX3090.

While starting with a pre-trained model saves me a lot of compute time, it has drawbacks. The pre-trained model was trained to translate from English to French or German. Ideally, I would have a model that was pre-trained to translate to English.

Also, I used its default tokenizer which does not support all the characters I need and performs poorly on the transliterated cuneiform.

Learning Sumerian and Akkadian Simultaneously

Since my datasets are small in size, I decided to combine learning Sumerian and Akkadian simultaneously. This has the benefit of increasing the training size and exposing the network to more cuneiform symbols. Interestingly, Akkadian often uses some Sumerian intermixed with its own language so it’s not a bad idea to train on both.

Bidirectional Translation

The network was having a hard time converging on a good solution. It would train well enough for many epochs, and then it would fall apart.

I found a regularization strategy that helped a lot. I would train it to also translate from English to Sumerian and Akkadian. Doing this helped the network to always converge. I assume this is an affect of using the pre-trained network.

While translating from English to Akkadian or Sumerian is not a useful task, it is a “fun party trick” as my friend put it.

Future Work

I want to continue to improve the translations and hope to take these steps in the future:

Fine-tune the model for specific translation tasks like Akkadian to English.
Pre-train a new model from scratch using a better tokenizer.
Train a larger model like T5 large.
Add more training data.

Conclusion

I hope you enjoyed this deep dive into neural networks and ancient languages.

When I started this project, I had no idea whether it would work or not. I was delighted that it did, and I am extremely delighted to be able to introduce the AICC to the world. Now amateur Assyriologists like myself can read and read to their heart’s content.

Side note: If you are an academic and would like to collaborate on this project, please reach out to me by filing issues on GitHub. I have a million questions about cuneiform that I would love to ask you.

How I Re-implemented PyTorch for WebGPU

2023-05-19T00:00:00+00:00

TL;DR I’ve been working on a WebGPU optimized inference and autograd library called webgpu-torch with an API that matches PyTorch. The goal is to run neural networks in the browser at speeds comparable to a Linux workstation. Many kernels have been implemented and its design is easily extensible. It’s available on NPM now and works in both the browser and Node.js!

Neural Networks in the Browser

Nine months ago, I got Hugging Face Transformers (Large Language Models like GPT but a wee bit smaller) working in the browsers thanks to the ONNX web runtime and some painfully hand-coded tokenizers.

It’s quite liberating running these nets in the browser since the web is the best software distribution platform ever created. You can just send someone a link and they can run your code. No need to install anything. No need to worry about what OS they’re running. No need to worry about what hardware they have. It’s all just there.

The only problem is that ONNX is a wee bit, shall we say, slow.

Thankfully, WebGPU has arrived in browsers and we can now properly access the GPU to write optimized kernels for neural network operations. This is a huge deal. It means we can now run neural networks in the browser at speeds comparable to NVIDIA/CUDA.

Someone just needs to, you know, do the hard work of implementing all those operations for the GPU.

Well that’s what I’m very pleased to announce I’ve been working on for the past few months. I’ve been re-implementing PyTorch in TypeScript for WebGPU.

What is a PyTorch?

PyTorch is a wrapper over the torch runtime (which I first used with Lua) for performing neural network operations. It’s a very popular library for doing AI work and seems to have won the arms race for now.

The library is broken up into parts:

An optimized (for GPU) math library supporting element-wise operations, matrix multiplication, convolutions, reductions, etc. over tensors.
An automatic differentiation library (autograd) that is just a lot of bookkeeping to keep track of the operations performed on tensors so that gradients can be calculated.
A neural network library that is just a bunch of layers that can be composed together to form a neural network.

Doesn’t sound so hard to re-implement right? And so I did.

What is a WebGPU?

WebGPU is the new standard for accessing GPUs from the browser. It supports generic compute shaders and is designed to be a low level API that can be used to build higher level libraries. The compute shaders are able to break work up into a 3D grid and, so long as you can reformulate your code to take advantage of that 3D grid, you can benefit from dedicated hardware doing the computations.

This is perfect for the web since JavaScript is single-threaded and not optimized for doing heavy computation. The GPU is a perfect fit for this since it’s designed to do heavy computation in parallel.

Writing Optimized WebGPU Kernels

PyTorch is very mature now and supports a huge variety of operations. It’s also very well optimized for CUDA and CUDNN (NVIDIA’s compute libraries). So how do you go about re-implementing all of those for WebGPU?

Well, you start with the basics. You implement the basic operations like element-wise operations, matrix multiplication, convolutions, reductions, etc. But there is a tremendous amount of similarity between these operations.

For example, element-wise multiplication and addition only vary by the operator used in the inner loop. The trick is to optimize the memory layout and kernels of those operations so they are fast. They need to adapt to big and small GPUs and they need to adapt to big and small workloads.

This is a perfect scenario to take advantage of code generation. I wrote a code generator that takes a template and generates the optimized kernels for each operation. The code generator is written in TypeScript and generates WebGPU compute shader code. This means that the generated code can be heavily optimized for the given scenario and those optimizations can be shared between operations.

For example, here is how I define the ReLU operation (from op_table.ts):

{
    name: "relu",
    nnName: "ReLU",
    nnOp: true,
    type: "unary",
    forward: "output = max(input, 0.0)",
    backward: "inputGrad = input > 0.0 ? outputGrad : 0.0",
}

In this template I define both the forward computation max(input, 0.0) and the backward computation input > 0.0 ? outputGrad : 0.0. The code generator then generates the optimized kernels for both the forward and backward passes based on the size of your GPU (the size of compute workgroups) and the shape of tensors (in addition to the memory layouts of the tensors).

Keeping the template short and simple gives me flexibility to optimize the kernels as needed while preserving the core logic. For example, different kernels can be emitted for contiguous memory tensors vs strided memory tensors. For operations like reductions, 1D, 2D, 3D, and xD kernels can be emitted to take advantage of the 3D workgroup grid.

At first I designed the template system to help me save some typing, but I quickly realized its power and now I use it for all operations.

Debugging WebGPU Kernels

Another huge benefit came from the fact that I was generating the kernels. I could generate the kernels to not only emit WebGPU code, but also JavaScript code. The core logic gets wrapped in another function that can be called from JavaScript. This means that I can run the same code in JavaScript and WebGPU and compare the results. Even better, I can debug kernels in JavaScript and then execute them on WebGPU.

The JavaScript CPU kernels are terribly slow, but they’re not supposed to be fast. They instead provide a convenient playground for debugging and testing kernels.

This also means that my WebGPU library can also run just fine in Node.js, without WebGPU, whatever. Isn’t it great when architectural decisions keep paying off?

Testing

The worst part of using a new neural network library is when it doesn’t give the exact same results as previous libraries you’ve used. One of my biggest frustrations with the WebGL ONNX backend is the fact that it gives very inaccurate results compared to PyTorch. I didn’t want that. I want full fidelity. I want to make sure all my WebGPU kernels match the results of PyTorch operations.

To that end, I have built a test harness that first runs code snippets in PyTorch to record results, then runs the same code snippets in my library and compares the results. If they don’t match, it throws an error.

This has produced a silly but fun web page to go visit. If you go to https://praeclarum.org/webgpu-torch/tests/ you will see a huge set of tests running to verify all the supported operations. It’s a great way to see what operations are supported and what the results are.

Goals

I like to train imaging networks and to that end my goal is to get Stable Diffusion and similar nets running under this library. Once that’s accomplished I will focus on the many Hugging Face transformer networks. I’m hoping to get all of them running in the browser at CUDA speeds.

I have a set of TODOs in the README of the project. If you’re interested in helping out, please take a look!

Generating SVG Images with ChatGPT

2023-04-03T00:00:00+00:00

TL;DR I used ChatGPT, with varying levels of success, to generate SVG images. With enough patience and prompting, you can get it to draw basic shapes and simple scenes about as well as a five year old can. (Which, hey, let’s face is, is good enough a lot of the time.)

Background

I like the idea of using AI’s to generate images. Go back in this blog and you’ll see lots of posts and casual references to imaging networks I like build and experiment with.

With recent advances like DALL-E, Stable Diffusion, and Midjourney, it seems like the static image generation problem has been solved. Sure, there’s plenty of room for growth and improvement, but the basic problem of generating images from text is solved.

But what about vector graphics? As an app developer, I’m often in need of graphical resources. Images are neat, but they’re also so 1970s. They don’t scale well and are a terribly wasteful and inconvenient (for editing) encoding of graphical information.

Like everyone else, I’ve been using ChatGPT a lot recently, and I was curious to see if it could generate SVG code for images. I know it’s pretty good at HTML, so I figured it might be able to do SVG too. It’s certainly not something it was designed to do, but I thought it would be fun experiment to conduct.

To my delight, it actually works. You’ll see below that the results are not always great or, um, err, good, but it’s still pretty cool and useful that it can do this at all.

As of GPT-4, I would say it’s capable of generating decent shapes and simple combinations of shapes. But I’ll let you you judge.

Experiments

The following transcripts are from my first experiments generating SVGs with ChatGPT. I was curious to see what kind of images the model could generate, and how it would respond to different inputs.

I performed the experiments using the ChatGPT API using the gpt-4 model.

I used my app AskGPT to run these prompts and dutifully pasted the responses into SVG files to make this post.

Please generate an SVG of a catus [sic] out in the desert on a sunny day, no explanation.

As an AI language model, I cannot create or directly display SVG images. However, I can help you create the SVG code, which you could then use in any SVG viewer or compatible software/website to visualize the cactus in the desert.

Here’s a simple SVG code for a cactus in a desert environment. Copy and paste this code into an SVG viewer or an HTML document to display the image.

 width="300" height="150" xmlns="http://www.w3.org/2000/svg">
    
     width="300" height="150" x="0" y="0" fill="skyblue" />
    
     cx="270" cy="30" r="20" fill="yellow" />
    
     width="300" height="50" x="0" y="100" fill="tan" />
    
     width="30" height="80" x="130" y="70" fill="green" />
     width="20" height="30" x="125" y="50" fill="green" />
     width="20" height="30" x="145" y="50" fill="green" />

SUCCESS Although the initial part of the response was discouraging (why the low confidence ChatGPT?), the model dutifully generated the SVG code for a cactus in the desert.

The cactus could obviously use some work, but this image is mostly what I was expecting. I know these models are sensitive to inputs, so I decied to give it some encouragement.

OK, great. But please now generate a much more detailed image using gradients and paths

Here is a more detailed SVG code for a cactus in the desert using gradients and paths for a richer visual representation. To view the image, copy and paste the code into an SVG viewer or an HTML document.

(I’m going to elide the SVG code from now on, but you can see it by viewing the source of the images.)

SUCCESS OK, so it’s not going to win any rewards with that, but I applaud the effort. The cactus has a bit more shape and the horizon is more interesting.

It looks like the model is having a hard time getting the directions of arcs correct in the SVG code. Golly, I can relate. I’ve written several SVG libraries and still I can never get the ordering right.

That said, I feel like I’m failing the network by not being a good prompter. Let’s try again with stupid amounts of prompting.

Please generate an SVG of a cactus in the desert on a sunny day. Please make it as detailed as possible. Use vibrant colors. Use dramatic lighting. I want depth! I want to feel like I’m there! I want something shockingly beautiful.

I apologize, but as an AI text model, I cannot create images or SVG files. However, I can help you understand how you can create such an image using a vector graphics software like Adobe Illustrator or Inkscape.

Open your preferred vector graphics software and create a new document.

…

FAILURE Oops, I went too far and it just started explaining to me how to use a drawing program. I guess it doesn’t like that kind of prompting.

Let’s change subjects and see how it does at generating icons.

App UI Icons

As an AI, I cannot generate an SVG file directly in this text interface, but I can provide you with the code that you can copy and paste into an SVG file, like ‘toggle_preferences_icon.svg’.

Here’s a simple SVG code example for a mac toolbar icon with two gear-like symbols representing the toggling of the selected item preferences pane:

FAILURE Haha, OK, um so those aren’t gears. E for effort.

Can we just make a single gear if we concentrate really hard?

Please generate an SVG code of a gear icon. Please make it detailed showing the teeth and such. You can use paths if you want. Make your response as long as it needs to be.

Here’s an SVG code representation of a gear icon using paths:

FAILURE I see gears aren’t its strong suit.

Let’s see if it can generate something more basic.

Please generate an SVG code of a house icon. You can use paths if you want. Make your response as long as it needs to be.

SUCCESS? Do walls need to be straight? Does it matter if your roof has a little overhang? That’s a house. I’ll take it. Even better, I think an app with a “messy icon” aesthetic would be a welcome breath of fresh air.

Enough with boring toolbar icons. Let’s see if it can make a new icon for my app iCircuit.

App Icons

Please generate the SVG code of an app icon for iCircuit, an app that enables you to draw and simulate electronic circuits. The icon should have a nice graident background and fit in well on the iOS home screen. You can use paths if you want. Make your response as long as it needs to be.

FAILURE I dig that gradient (those colors are pretty close to the actual app icon), but the logo is a bit too… abstract. I can forgive abstractness, but I can’t forgive the fact that it’s not centered properly.

Let’s try prompting our way out of this hole.

Nice, I like that gradient. But the inner logo is too weird. Maybe put a nice looking sine wave in there. Centered nicely.

FAILURE The sin wave is so close! If I were generous, I would grade this one a “SUCCESS?”, but it failed to center again, and I just can’t have that.

Art

Please generate the SVG code of a nice looking abstract art piece. You can use paths if you want. Make your response as long as it needs to be.

SUCCESS I like it. It’s very “Saved by the Bell”, very 1990s.

That gives me an idea.

Please generate the SVG code of a spaceship using awesome neon colors and rad gradients.

FAILURE Cool colors, for sure, but it’s not a spaceship.

OK, it’s not an artist. Let’s see if it can do simple shapes.

Please generate the SVG code of a red circle. Inside of that circle put a white square. Inside that, put the red text ‘THIS IS A WARNING’. Wrap the text if needed. Use some gradients to make it look good an intimidating.

SUCCESS The text isn’t centered. I get it, it’s hard. I’m expecting a lot out of a language model.

But look, it’s text! Readable text. That’s a big deal because imaging networks aren’t really known for their text prowess. For example, here are DALL-E and Stable Diffusion’s attempts at the same prompt:

At least ChatGPT gave me exactly what I wanted (I’m not thinking about centering, I’m not thinking about centering, …). I got a red circle, a white box, and readable text. Neither of the other two networks could do/did that.

Conclusion

My biggest takeways are:

Yes, ChatGPT can generate SVG code.
It’s still in a very early stage and cannot handle much complexity.

I was hoping for better, but for a network that was not designed to do this at all, it’s not bad. I certainly look forward to GPT-5 and 6 improving at this.

I keep thinking back to the last warning sign experiment. The images generated by DALL-E and Stable Diffusion were not good. They were not even really what I wanted. ChatGPT, on the other hand, was able to generate something much closer to what I wanted.

I think this is a benefit of how it was trained. The engineers worked hard on the “alignment problem” (pun unintended) and it shows. It does what I expect. Old GPT-3 and friends were pure statistical generators. During their training, the only positive feedback they received was when their output sampling match language use. ChatGPT, however, was trained with a bias towards being helpful and providing the kinds of answers people want. This was accomplished by having people hand-rank its responses. I have a feeling that this hand-ranking, in order to aid in the human alignment problem, is going to fuel the next few years of machine learning improvements.

I went into this wanting to see if I could get SVGs, what I think of as a more useful format for my life as an app developer, out of ChatGPT. I got that, but I also got a lot more. I got a glimpse into the future of AI.

Anyway, enough armchair philosophizing. I hope you enjoyed seeing the results of my experiements, and I hope you’ll try some of your own.

Notes

I am aware there are neural nets specifically trained to generate SVG code. DeepSVG is one example. Specialized networks give more reliable results today and will certainly be useful in the future. I wanted to experiment with ChatGPT because I am intrigued by its generalized knowledge and I wanted to see if that general knowledge produced interesting results.
None of this was done scientifically. ChatGPT is still a stochastic (random) model and its outputs can vary. I should have had it generate more samples for each prompt. But I didn’t. I just wanted to see roughly what it could do.

Transformers-js - Neural Networks in the Browser

2022-08-25T00:00:00+00:00

TL;DR I wrote a javascript library that lets you run modern transformer neural networks from Hugging Face 🤗 in the browser. It works on mobile browsers, desktop browsers, pretty much everywhere. Check out transformers-js on GitHub to see how it works and checkout out the live translation demo running on my static website: https://transformers-js.praeclarum.org.

Introduction

Currently, the best way to deploy neural networks is to pay a cloud provider to host it and pay them to run inference. The more customers you have, the more you pay. It’s an old-fashioned big-iron middle-man’s utopia.

I’m a big fan of running neural nets on everyday hardware. It makes sense to let customers, who already invested a lot of money and carbon, use their own hardware. It’s also a huge privacy win: attackers can’t steal your information if it’s never on the network (insert Intel joke here). It’s good economically, environmentally, and it’s good for security. Sign me up.

Let’s fight the big-iron trend. Let’s run neural networks in the browser!

Announcing transformers-js: a library to make running translation and other language neural nets in the browser simple.

Update (Feb 7, 2023): I’ve re-released the library as web-transformers with full NPM and webpack support.

Hugging Face 🤗 Transformers with transformers-js

Transformers are neural networks that are good at manipulating serialized symbols. Ahem, sorry. By “serialized symbols” I mean language. They do language things: Sentiment analysis, summarization, translation, transmogrification. Basically, any -ation you can think of that works with a discrete set of symbols laid out one after the other.

And you know these networks from their friendly household names: GPT-3, Copilot, DALL-E, Stable Diffusion. There seems no end to what they can do (see also the CNN revolution of 2014).

Hugging Face 🤗 has established itself as the “GitHub of Transformers”. They have an excellent unifying framework, great documentation, and good-ish hosting. I only say good-ish hosting here because I had a demo fail because their servers were down. Clouds…

In fact, it was that demo fail that got me to thinking, “why can’t I just run this thing in the browser?” That thought led me to 3 days of programming. Those 3 days produced a javascript library. And that javascript library produces some kick-ass neural translations.

I wrote transformers-js to make running transformers from Hugging Face 🤗 in the browser just as easy as running them in Python land. To do this, I leverage the amazing ONNX runtime in order to run the network. ONNX offers a browser-compatible runtime using WASM compiled from the complete ONNX opset code. That’s very powerful because it means that, if you can get your net running in ONNX, you can get it running in the browser. (ONNX also offers a webgl backend that is much faster than their WASM backend. But you lose so much precision in webgl that I have yet to see a network work correctly using that engine.)

But running the neural network is only half the battle. Running transformers requires more software than just the neural net. You also need text tokenization software to convert your text to tokens (symbols) and you need sampling software to convert the neural net’s output probabilities back to symbols. Transformers-js takes care of all that for you.

Tokenization

Step 1 in running a transformer is getting a working tokenizer. Each neural net is optimized to solve a problem and that means each net uses a slightly different tokenizer from each other. I thought writing the tokenizers would be a piece of cake. I’ve written hundreds of tokenizers in my career in my pursuit of programming language nirvana, but I have never run into the kind of tokenizers that data scientists have come up with.

Side tangent: did you know that modern tokenizers use classical AI approaches? Neither did I! For example, the T5 symbol list is redundant; you can encode the same sentence many many different ways. In order to correctly tokenize the sentence for input to T5, you have to find the optimal path through the redundant symbol list based on the a-priori probabilities of the symbols. It’s a graph problem, and those are hard. Fortunately, classic AI people loved graph problems and found solutions. Two AI winters ago, people thought graphing techniques would be the foundation of all future AIs. They were wrong, but it’s nice to see these old powerful algorithms live on.

Back to tokenizers. I learned all that graph theory so you don’t have to! I encoded that knowledge into code that a computer can decode to make the magic happen. Behold:

// Load the tokenizer
const tokenizer = await AutoTokenizer.fromPretrained("t5-small", "/models");

That loads a tokenizer. Currently, I only support Sentence Piece Unigram models (good enough for most nets). I hope to support Byte Pair Encoding in the future (GPT’s preferred tokenization).

With that tokenizer, you can convert strings into token lists:

// Tokenize "Hello, world!"
const english = "Hello, world!";
const inputTokenIds = tokenizer.encode("translate English to French: " + english);

inputTokenIds is a list of integers that represent the symbols in the sentence. Some words are just one symbol. While other, less common or longer words, can be more than one symbol.

I added a little prefix to the string (“translate English to French:”) because I’m building up to a translation demo here and the T5 network, with all its advanced capabilities, needs to be told what to do.

Generation

Now that we have tokens, we can hand them off to the neural network to be run:

// Translate
const outputTokenIds = await model.generate(inputTokenIds, {maxLength:50,topK:10});

That’s it! The code takes the input tokens, runs them through the network, and returns a new list of output tokens.

That little generate function is hiding a lot of work. Most networks generate one token at a time. That means you have to run them over and over until you get the whole sentence. This can be terribly inefficient if you run the entire network over and over. Instead, you split it into pieces and run each piece only as it is needed.

The generate method also has to sample from the neural network’s output probabilities. Networks are not into commitment, and will always output a variety of options. A sampling technique is needed to pick the right one.

Greedy sampling is when you just pick the highest probability option. Top-k sampling is when you randomly pick from the top k probable options. Greedy is good for when you want the most probable option. Top-k is good for when you want to inject a bit of creativity (randomness) into the results. This library supports both. I hope to add more sampling options in the future.

Now that we have a list of output tokens, we can convert them back to a string:

// Convert output tokens to a string
const french = tokenizer.decode(outputTokenIds, true);
console.log(french); // "Bonjour monde!"

The output is “Bonjour monde!” which makes sense given our input of “Hello World”.

That’s it! In about 5 lines of code we executed a neural translation algorithm that ran completely in the browser.

Demo

I put the code above together into a little web app to demonstrate the library.

https://transformers-js.praeclarum.org

It’s a little translation app that is able to go from English to French, German, and Romanian.

When you run it the first time, it will take some time to download the neural network. After that, the browser can cache it and subsequent reloads will be faster.

It’s hosted on Azure Static Web Apps to demonstrate that no logic is running on the server. The server merely provides the neural network data to be downloaded and run on the browser. Pretty cool, huh?

Optimizing Models for the Browser

Now, let’s talk about a few last details to make networks in the browser run well.

Because I’m using the WASM version of ONNX, the neural network is executed on the CPU. It’s therefore beneficial to optimize it to run there. The best optimization right now is to use quantized 8-bit weights. This converts what are otherwise 32-bit floats into 8-bit integers. This is done carefully to preserve as much accuracy as possible and is honestly a bit of a dark art.

Fortunately, there are lots of dark art practitioners and we can stand on their shoulders. The fastT5 library converts T5 models from Hugging Face 🤗 to quantized 8-bit models ready to run in the browser. I wrapped that library in a script as I hope to support other model types in the future.

Lastly, I highly recommend running this code in a background web worker. This will prevent the browser from locking up while it’s executing. I didn’t implement that feature in the demo web site but I would say it’s worth the effort in a production app.

Conclusion

While there will always be some big networks out there that need big servers to run them, a surprsing number of neural networks work just fine in the browser. Doing so is good for economic, environmental, and security reasons.

Transformers-js was written to make running language models in the browser simple. I hope you will find it enjoyable to use and I hope you will use it to make cool things.

Practical Guide to Solving Hard Problems

2022-02-19T00:00:00+00:00

I sometimes find myself in a position of needing to write some code that I’m just not sure how to write. Been there have you? Here are the steps I take when I’m stumped. No huge revelations here, just hard-earned advice.

Think hard about the problem for a few weeks before typing any code.
Type in a function or write a class that has the inputs and outputs you need.
Break the function down into multiple steps with clear objectives. You may not know how to achieve those objectives, but that’s a problem for your future self. Right now, you’re just trying to write out the high-level algorithm.
Create a function for each of those steps and throw new NotImplementedException() in each of them. Their names should be long and explanatory and there should be no question about what’s expected of them. It’s really OK if you don’t actually know how to write ‘em.
Now, go implement a few of those functions. You know they’re not all hard. Some may even be fun! Build up your confidence and implement the easy ones. It feels good to make progress and it lets the analytical part of your brain run in the background for a bit while you focus on nitty-gritty number types and file IO.
Time to tackle some of those harder functions. Go into each of those and break the problem down into steps just like you did before. You’re right, I’m gonna say it: Rinse and repeat. Keep breaking those hard problems down into steps. Turn each of those steps into a function with a clear name. Implement the easy ones. Then break the hard ones down into steps again. Do this over and over again. You’ll be surprised how much you can actually get done.
Pretty soon (haha) you will have an 80% complete solution with just a few pesky functions left that throw NotImplemented. Now go scour your favorite package repository, or code repository, or question and answer site, or artificial intelligence programming assistant for implementations. Chances are you’re not the first person to need this particular function or widget. Find some giants, climb on top of them, and scream “Holy shit, there are a lot of smart programmers in the world!”
OK, you’ve scoured the inter webs and yet you still have a few pesky NotImplemented exceptions. It’s time to check on those scientists. Enter every SEO permutation of your problem statement into arXiv. Surely others have worked on problems related to one you are trying to solve. They will most likely offer insights or perspective shifts that can help you reframe your problem into something solvable. Do that. Reframe your problem and knock out those NotImplementeds.
Now you’re in trouble. If you still have a few NotImplemented exceptions, and there are no giants upon which to stand nor academics obsessing over this particular field, then it’s all up to you. Think big. Think outside the box. Your career depends on it. (Just kidding, I hope.) Perhaps a bath will help you think?

I think these are steps all programmers take, but sometimes it’s good to spell it out.

I especially value the functional decomposition. Functions are a powerful abstraction, not just for writing less code, but for thinking about problems.

And please don’t misinterpret my use of the word “functions” to mean only those things functional programmers like. I mean any data transformer: from lowly lambdas to state-bearing IO-processing monolith objects.

Thanks for reading! Now go solve those hard problems!

praeclarum

Coding with Local Agents on an RTX 3090

Overview

Download the Model

Build llama.cpp

Run the Server

Install LLM Gateway in VS Code

MTP for Speed

Is it Worth It?

Conclusion

AppReviewFetch - Because App Store Connect Shouldn’t Require a Web Browser

The Boring Part (The Library)

The Actually Useful Part (The CLI)

The Wild Part (The MCP Server)

Things You Can Do That Sound Made Up But Aren’t

The Really Wild Ideas

The Setup

The Three Packages

What’s Next

The Real Point

The Many Ways to Deploy iOS Apps in 2025

From None to Many

1. dotnet build -t:Run

How to Find Your Device’s UDID

2. mlaunch

3. xcrun

4. ios-deploy

Conclusion

Updating App Icons for iOS and macOS 26

The New Icon Requirements

Option 1: Do Nothing

Simple Icons

Complex Icons

Option 2: Use Icon Composer

Manually Compiling Your Icons

iOS

macOS

Including in Your App

Conclusion

Building and Deploying iOS MAUI Apps with GitHub Actions

The Challenge of MAUI CI/CD

Setting Up the Job

Install Certificates and Provisioning Profiles

Build the iOS App

Upload that Puppy to Test Flight

Conclusion

I Built the World’s Largest Translated Cuneiform Corpus using AI

Cuneiform

Introducing the AICC

Can it Translate Tiglath-Pileser?

AI Translation of Tiglath-Pileser

Why AI Translations?

Sumerian

Akkadian

Training a Large Language Model

Building the Dataset

Inconsistent Transliterations

Paragraph Wrapping and Unwrapping

Training Process

Learning Sumerian and Akkadian Simultaneously

Bidirectional Translation

Future Work

Conclusion

How I Re-implemented PyTorch for WebGPU

Neural Networks in the Browser

What is a PyTorch?

What is a WebGPU?

Writing Optimized WebGPU Kernels

Debugging WebGPU Kernels

Testing

Goals

Generating SVG Images with ChatGPT

Background

Experiments

Please generate an SVG of a catus [sic] out in the desert on a sunny day, no explanation.

OK, great. But please now generate a much more detailed image using gradients and paths

Please generate an SVG of a cactus in the desert on a sunny day. Please make it as detailed as possible. Use vibrant colors. Use dramatic lighting. I want depth! I want to feel like I’m there! I want something shockingly beautiful.

App UI Icons

Would you generate the SVG code for a mac toolbar icon that toggles the selected item preferences pane on and off?

Please generate an SVG code of a gear icon. Please make it detailed showing the teeth and such. You can use paths if you want. Make your response as long as it needs to be.

1. `dotnet build -t:Run`

2. `mlaunch`

3. `xcrun`

4. `ios-deploy`