Some Programming Language Ideas

Page content

Programming languages seem to have somewhat stagnated to me. A lot of shuffling around ideas that already exist, not a lot of new ones.

This is not necessarily bad. Shuffling around ideas that already exist is a natural part of refining them. Shuffling around ideas that already exist is safer than a radical rewrite of existing convention. Even taking a language that already exists and giving it a new standard library can be worthwhile.

However, I occasionally have some ideas that may then trigger other ideas in people, and in the interests of getting them out of my head, I’m posting them here.

Disclaimers

None of these ideas are fleshed out to a specification level. Some of these are little more than goals with no idea how to manifest them in a real language.

While I mostly don’t know of languages that do these things, that doesn’t mean I’m claiming they don’t exist, just that I don’t know of them, because of course I am not intimately familiar with every language that has every been written.

Some of these ideas are probably bad, even outright crazy. At least one of them is something that I would put in the known bad category, and the idea is mostly just “but what if someone could figure out a way to fix it?”

Some of these are mutually contradictory and can’t live in one language.

In general, while I can’t control how people react to this list, should this end up on, say, Hacker News, I’m looking more for replies of the form “that’s interesting and it makes me think of this other interesting idea” and less “that’s stupid and could never work because X, Y, and Z so everyone stop talking about new ideas” or “why hasn’t jerf heard of this other obscure language that tried that 30 years ago”. (Because, again, of course I don’t know everything that has been tried.)

Loosen Up The Functions

This idea comes from Erlang, though it doesn’t quite follow through on it to the extent I’m talking about here.

A function call is a very strong primitive. There is no possibility a function call can fail. This is so deeply ingrained into our understanding of function calls that we can’t even see it. Note this is not a matter of the “function returning an error” or “throwing an exception”, this is “the code reached for strlen and it wasn’t there”. Or in the case of dynamic languages, not that it couldn’t “find” a particular function but the code to locate it suddenly is gone.

The closest we get is “I ran out of memory trying to make this call” and most of us, most of the time, just ignore that possibility.

I wrote a comment on Hacker News a while back about how network RPC failed in the 90s when it tried to pretend to be a native local function. It isn’t, and it can’t pretend to be a native local function. RPC functions have errors that can happen that a normal function can’t have. And it’s not just a matter of returning an error value; sometimes the error is “this call should have taken 250 nanoseconds but it froze for a minute before timing out”. While it’s no problem for a computer to wait for a minute, trying to build a program on operations that may take “somewhere in this range of 9 orders of magnitude to execute” is not a very useful primitive.

Well, what if you loosen up the concept of a function until all function calls are “looser” and can express all the failure cases that also arise with RPC? This would include things like:

  • Pervasively include a concept of timing out for all functions.
  • Pervasively make functions easy to wait on or skip past; in “async” terms, solve the problem by making all function calls async.

And so on.

Then, once you’ve done that, you will find that it is annoying to call a single function and do all the error handling you need to do, so you make that easier, with maybe scoped error handling declarations or some more defaults or something.

Then, RPC really can be as easy as calling a function in your program, because you lowered what a function promises.

Erlang implements this to some degree in its various gen_* services, but still has a conventional concept of function call overall.

The downside of this is that it is questionable whether a function call’s simplicity can be recovered to the point that programmers are willing to use this. Ultimately, strlen isn’t going to fail to be available and I probably want to just wait for it regardless, and somehow the syntax is going to need to make that reasonably easy, even if it’s as simple as prefixing such calls with a symbol that means “just pretend this is local” or perhaps embedding “yeah, this is actually a local call” into the type system.

Or possibly it can all be left alone and the compiler can simply optimize local function calls when possible, which is a lot of the time.

Capabilities

This is not a new idea, so I won’t go deeply into what it is, and I have been told some languages are playing with it, but it is something I’d like to see more of.

There’s a language called E that tried to bring this about, but my impression is that it was built on top of Java, and that’s probably too great a change to try to build on Java. It really needs it own language from top to bottom.

The time was probably not ripe. Consider this not a claim to novelty, but the observation that maybe the time is right for this now.

A possible hazard here is trying to make capabilities too much, e.g., also trying to write Rust-style mutation controls into them. Or perhaps that would work like gangbusters, I dunno.

We keep trying to half-ass bodge capabilities on the side of existing programs, and it honestly just keeps not working that well. Maybe instead of writing conventional code and then trying to work out exactly what capabilities that capability-oblivious code and programmers ended up calling for, it’s time to put this into languages themselves.

Production-Level Releases

We’ve learned a lot about production-quality releases in the past several years. Little of this has made it back into the languages. This has probably been a good thing, because we’ve needed the freedom to experiment, but I think it’s time for languages to start embedding solution to these problems into themselves so we can harvest the benefits of solutions.

We really ought to be able to:

  • Have a fully standardized logging interface now, such that external libraries can just provide configurable logging output.
  • Built-in metrics from day one, so third-party libraries can just provide metrics gathering and processing.
  • Build in some sort of “request context” usable for request tracing and stuff.

This is an example of something that isn’t even a “language feature”. It’s not like we need custom syntax constructs for metrics. (Although if you’re going for a very “pure” language, maybe some way of having some ability to ping metrics reliably without that “counting” as impure would be helpful.) Just getting good-enough versions of this stuff into the standard library would be enough.

There’s a number of these capabilities that are getting fairly mature and could be lifted up into a new language directly, e.g. I think “structured logging” has probably matured to this point.

The disadvantage of lifting into the language is that anything so lifted becomes difficult to change once 1.0 hits. The advantage is that the rest of the standard library and third party libraries can integrate with it. It’s great that language X has 7 viable logging libraries but it becomes more difficult for other libraries to build on an assumption of what “logging” looks like.

This is an example of where a new language that is otherwise “just” shuffling around older ideas could still get a leg up on its competition. It takes a lot of maturity to write these interfaces, though. For each of the things I mentioned, you really need to get a lot of experienced devs together and make sure you’re pulling in the best tested versions of the capability. This is, for better or worse, not a place for some 19-year-old who has never worked with any of these things to just splat out some off-the-cuff interface specification that gets cast in stone. We already have that.

Semi-Dynamic Language

Many programmers like the convenience of dynamic languages. I spent the first ~15 years of my career about 100% dynamic, and I’m not sure I include myself in that set. Still, they’re pretty popular.

The problem is that they seem to be fundamentally slow. Some people still discuss their performance as if it’s the year 2000 and maybe someday we’ll get “sufficiently smart compilers”, but the reality is that immense effort has been poured into speeding these things up and it is no longer appropriate to “hope” for what may happen someday. And the result has been… some success. Mixed success. You can get dynamic languages to go faster, but it costs you a lot of RAM and you still tend to cap out at 10x slower than C. It’s a lot of work for a fairly marginal reward, worth it only because they’re so popular that a 4x speedup multiplied across “all the dynamic code in the world” is still well worth fighting for.

Alternatively, you can go the LuaJIT route and just hack out bits of the language that don’t JIT well. But this seems to be only minimally popular. Other than that it’s a good idea, though.

But the thing about “dynamicness” is that if you look, the vast, vast majority of it takes place at startup, or at very defined times such as “I’m loading in a new user plugin”. Almost no code is constantly sitting there and dynamically modifying this and that as it runs. Yet you pay for this dynamicness all the time. Every attribute lookup needs to run a whole bunch of code to correctly look up an attribute, in case someone has modified the lookup procedure since the last time the value was looked up, or, in the case of a JIT, the JIT’s procedures still need to be correct as if this can happen all the time, which is inevitably slower than code that can’t do that.

What about a language where for any given bit of code, the dynamicness is only a phase of compilation? The code can do whatever during initialization, load database tables to dynamically construct classes or whatever, but once it’s done, there’s a point where it locks down, becomes nearly statically-typed (not necessarily fully, you could look at this as an incremental typing situation), and being dynamic is no longer possible?

I’m not sure what all this would look like. It’s a sketch of an idea. Partially because I’m pretty satisfied in my own programming world with static languages.

But if there was a phase where everything locked down, then a JIT would have vastly more power to optimize the code safely. JITs have to do so much work to deal with “well what if someone passes in something really pathological to this function later?” and it seems like they could go a lot faster if they could be rigidly guaranteed by the types in the final compiled code that couldn’t happen.

You may also be able to create a sort of hybrid compile phase, where the code is not “compiled”, but you can still run something like a “check” that verifies the locked-down program is coherent according to whatever rules the runtime or the user want to implement.

While I’m not aware of anything that works exactly like what I have in mind, it is clearly a position on a well-explored continuum of “exactly when does compilation happen?” and not some brand-new idea. I reiterate that I’m not making a claim that any of this is brand new. The Common Language Runtime’s ILR and subsequent compilation on a target system is reasonably close to this, but focused on something different. Some Lisps may be able to do all this, although I don’t know if they quite do what I’m talking about here; I’m talking about there being a very distinct point where the programmer says “OK, I’m done being dynamic” for any given piece of code. Shader compilation for video games may have a component of this, especially including the ability to cache compilation outputs.

Another view on this idea is, “Isn’t it about time someone wrote a dynamic scripting language that was designed from day one to be easy to JIT?” What we have out in the world right now is either dynamic scripting languages where the JITs came along literally a decade or two after the language was created, and the JIT basically had to be instantly 100% compatible with a language that was never designed for JIT’ing right out of the gate to be even remotely useful, or we have LuaJIT where an existing language got bits and pieces sliced out of it, but we don’t have anything that I know of where the language was designed from the start to be dynamic, but still easy to JIT.

While you’re at it, you’ll naturally also create a dynamic scripting language that handles threading properly from the beginning, rather than trying to retrofit it on to a decades-old code base. A dynamic scripting language with perhaps a 2-3x slowdown over C (or, to put it another way, basically the same speed as Go) that is also natively capable of near-static-language threading speeds could raise a lot of eyebrows.

Value Database

Smalltalk and another esoteric programming environment I used for a while called Frontier had an idea of a persistent data store environment. Basically, you could set global.x = 1, shut your program down, and start it up again, and it would still be there. And by that I mean, storing a value persistently was literally that easy; no opening a file and dumping JSON and loading it later, no fussing with SQLite and having to interact with a foreign SQL interface (which, no matter how nice that may be, is not your language’s native paradigm), none of that. Just “set this value and keep it forever”.

This… is a superficially appealing but bad idea, unfortunately. It’s on the list of Things I See People Angrily Claim Programming Needs To Do To Level Up, right there along with Everything Should Be Visual Programming and the recent “Low Code” burst that seems to have died down again. It’s an entrant that doesn’t show up often, but I’ve seen it enough that it’s on my list.

But it carries some significant disadvantages, most notably that entropy tends to attack this shared store pretty badly. The developer sets a value in their store, then sends the code out to production, but whoops, it turns out the code absolutely depends on that value being set and it fails everywhere else. It’s takes a lot of work to set up a scenario in which twiddling a run-time variable for debugging in your staging environment can propagate straight into a bug on production because of accidental dependencies on that value, but persistent stores are up for the challenge!

So, my own experience certainly attests to the fact that this is far from a magic solution to all our problems.

However, I still wonder if this can’t be fetched from the dustbin of history somehow, with some sort of better controls on what goes into these stores. Base it on event streaming? Access controls? Some sort of structural typing system that ensures that a “table” is of some shape directly, before trying to use values of it and failing? Just plain typing these things?

Because, my gosh, what a mess you could make on the one hand… but on the other, I can’t tell you how nice it is to just say myval.x = 5 and it’s just there as myval.x tomorrow, with no queries, no mappings, no ORMs, no files, no failures… just boom, there.

A Truly Relational Language

Although, on the note of “no fussing with SQLite”, how about a language whose fundamental data type is a relational DB table?

You wouldn’t actually want SQL; you’d want to go back to relational principles and build something that works as a programming language rather than banging SQL together. The many, many technologies in the world like LINQ in the .Net world or SQLAlchemy show at least a possibility of what that would look like, although being able to sit down at the language grammar level to integrate this even more deeply opens up even more interesting possibilities than LINQ.

(Being able to emit SQL from the language for when you really do want to talk to an SQL database is probably a good idea. This is harder than it looks at first glance. You definitely want to study LINQ, and consider how you allow the user to use things like SQL_NO_CACHE or SQL_CALC_FOUND_ROWS, because you need to be able to do those sorts of things to SQL even if you don’t need them in every query.)

Relational databases are clearly a tech that is here to stay, yet most modern languages still treat them as an exotic thing to be dipped into every once in a while at great cost. Maybe there’s a special “table” data type as the data programmers have with their “data frames”, or you get something nice like LINQ, but the language is still ultimately either product or sum type data structures as its native representation and there’s always this foreign conversion step to go from the relational data to the “real” data.

You know you’d have something like this when you could query across three different data types in your code, and get the results in the form of some ad-hoc data type specifically for that query, which could then be natively passed around and perhaps even have methods added to it directly.

In the type theory world this is heavily related to row types. I am not aware of a language that uses them natively. (Although as is often the case, if you squint hard enough at a dynamically-typed language it can “look like” row types, but that’s again because by punting on types entirely it can “look like” a lot of things, but in the end if you violate the types you just get an exception thrown.) Although there is more work to be done to make this idea work, row types is just where I’d start. You’d still want to examine things like “can I put methods on some sort of row type in such a way that the method doesn’t care how the row type is constructed?”

That is, suppose you had a user ID and a username in one table, that linked to an identity that contained their human name. By querying across these two things you could end up with a User ID/Username/Human Name tuple, even though that doesn’t literally exist as a data type in your system… could you work out a way to put a method on this anyhow that might do something like a debug dump of those three things? A method on a datatype that never concretely exists as a declared data type? There’s some interesting possibilities here.

(This also may harmonize with the JIT idea above. This would tend to create a proliferation of possible types that some code somewhere could use; conceivably even an infinite number of them depending on how you implement it. Conventional generics-based precompilation may not really be possible. But in practice there would be a finite and generally relatively small set of those types actually used and a JIT that could determine what those are and JIT them to native-ish speeds could potentially recover a lot of performance out of this by not compiling all of the myriad possible types that could have a UserID/Username/HumanName in them with their static struct offsets until the type is actually used.)

A Language To Encourage Modular Monoliths

The modular monolith has been a structure that has been flying under the radar lately, but I feel like it’s coming up more and more often. Personally, everything I write large enough to need architecture is now a modular monolith, and I find it a fantastic way to program at at least medium scales. I have to admit I have not yet tried it on a truly large project. My guess it is that it should continue to scale, albeit possibly requiring more discipline to maintain, but it doesn’t seem to be gassing out in my own uses yet.

To have a modular monolith, you need to use dependency injection and interfaces. You write as much code as possible in terms of “Hey, this is what I need; I need a way to turn DNS addresses into IP address, and I need a way to turn email addresses into user accounts, and I need this and that and the other thing”, and then you construct each component of the modular monolith by providing each of them components with all the services it needs.

I think “modular monolith” is arguably what should be the “default” architecture for any non-trivial project.

However, modern languages tend to fight you on that.

Static languages require extensive declaration of interfaces of some sort to do this, and as such, it requires much more discipline than hard-wiring together everything with concrete types. As such, in real code, lots of things end up hardwired together just due to the sheer hassle of interfaces, even in the languages where they are the easiest.

Dynamic languages are nominally easier, once again because they pretty much punt on everything, but the trade off is no compile-time guarantee that all the services you are getting passed actually do the things you want them to do. In practice this becomes scarier and scarier to do as you scale up, because now every time you call a new method on some passed-in parameter you are changing the interface for all things passed in to that method, and there is effectively no way to notify the callers of that method. After all this may even be a library and you may have no human connection whatsoever to the caller.

I’d be interested in something that strikes a middle ground; a static language with compile time guarantees, but one where all function parameters are automatically interfaces, even if they are given an “exemplar” type in their type signature. If I declare something as a “string”, and what I do with that string is concatenate it with another string and iterate on Unicode codepoints, what if the compiler just automatically was able to take anything that could “concatenate itself to a string” and “iterate on Unicode codepoints” and accept it by treating it as if there was an interface declaration right there already?

(It would be interesting to see if you can get type inference to the point that “exemplar types” are no longer necessary but working out if that is the case is well beyond the level of work I’m doing here.)

Every parameter coming in to a function could have an interface automatically extracted out of it just by what the user does to that value. Integration through a language server could do something like extract that interface out automatically. I don’t know if it should be implicit and checked at compile time, or if you might want to do something like “on save, automatically reify all interfaces into actual declarations the human can see”. If it is left implicit we definitely want the language server to have a command that returns “this is the actual interface for this parameter”.

(On that note, not worth its own section, but I think that there’s a lot of interesting “write a static language that assumes you’re writing it with the language server and provides very rich querying capabilities” like that that could be done. You see a lot of good ideas in the best IDEs for that sort of thing but the ideas always end up detached from the languages and eventually stranded when the IDE line comes to an end. Collecting those capabilities up into the language project itself and integrating the language serve right into the design of the language at all phases should probably have some interesting effects.)

I think I’d want to see something like the Python module system, where technically, libraries themselves are objects, which means that entire libraries could be swapped out by providing a different one.

You could in principle merge this with another interesting idea, which is more extensive use of dynamic scopes to do something like provide a built-in service registry of things that look like a fancy dependency-injection library, so some code can do something like “fork my current registry, change the UserProvider to this other object, and run this test code”. Or change the definition of a “transaction”. Or whatever.

In theory, if you successfully made sure there aren’t any back doors to this system (like “primitive types like ints are just ints and they can’t be shimmed”), this would almost automatically make any system written in it a modular monolith. Of course, it might be a super messy modular monolith, but in principle any function in the system, even though the whole thing is statically typed, could be executed in such a way that everything it depends on, regardless of whether it was written for being swappable, is swappable, with enough work. You could do things like have literally any code that reads & writes to a filesystem be executed in a context that provides a fake file system for testing, without the code itself having to do any explicit declaration of that fact.

You’d want to block truly global variables entirely, although if you combine them with the dynamic scope idea, you can put things in the dynamic scope for similar uses.

There’s probably also some interesting synergies with structured concurrency, and having these dynamic scopes attached to execution contexts. Make sure your dynamic scopes also have the capabilities that the Go contexts do and you’d end up with some interesting possibilities.

Modular Linting

This is another place where I absolutely make no claims about what may be happening in the many dozens of language communities in the world. I’m just making an observation from one of the ones I’m deeply into and suggesting it’s a good pattern, not that it’s the only place it happens. Plus I’m going to say it isn’t happening enough anyhow even where it is happening.

With that throat clearing, the Go world happened to end up with a lot of various linters over the years. Eventually they were combined into a project called golangci-lint, which has become the de facto linter for the community. I linked to the list of linters built in so you can see what’s in there.

What’s interesting about golangci-lint, though, is that the various linters are largely independent from each other, each independent projects written by various developers to scratch a particular itch. They were later merged together technically, but are in principle still just a big pile of community linters with a nice modular interface on top.

This isn’t about the language at all, but it would be interesting for a language project to reify that. golangci-lint eventually shared an AST view among a lot of its linters; a project could copy that idea and write it in early. Let linters be fully modular, perhaps even by mentioning them by github project through a fully standardized interface that doesn’t require them to even be “integrated” into a single executable from a third-party project.

What’s neat about this approach is that if it was made a part of the language design process, a lot of things that aren’t necessarily important could be kicked to optional linting. For instance, Go was especially a bit notorious when it first came out for mandating that all imported packages were used, and all declared variables were used. Complaints about that over the years have quieted down, but that’s a good example of something that could have been taken out of the compiler and shuffled off into a linter provided by the main project.

There’s definitely some downsides too… you end up with “dialects” of the language, but, the truth is, you end up with that anyhow. Generally developers learn pretty quickly not to fire their own bespoke linting configuration at other people’s libraries.

But it would be interesting to see how much could be kicked out to linters, like, do you want to insist that all values of an enumeration (whether a classic int or the later trend towards using that term for what I think of as “sum types”) are checked in switch statements or not? Do you want to validate your printf parameters are correct? Do you want a linter flagging every time you use an external program to pass your arguments through some check routine? Perhaps things like HTML template libraries could even ship with their own linters to flag suspicious constructs for injection, formally as part of the library.

This pairs in an interesting way with the parenthetical about leaning on a Language Server more; taking the Language Server as a core part of the project makes the language task bigger, but allowing for community linters and kicking non-essential aspects of the language out to the linters shrinks what the core of the language project has to worry about.

This is also one of the only ideas in this list that doesn’t need to be in the language from the very, very beginning, and could be added either by a young language design team or even just a motivated external developer to an existing project. Things like Python or C# have too much inertia for a dedicated dev, but you might be able to get the momentum in something still young like Nim or Zig.