At JetBrains, we develop the most effective, convenient, and enjoyable tools for engineers and researchers so that their natural drive to develop can flourish and bear fruit. We dare to pursue ambitious, visionary ideas. We wanted to use a programming language that we ourselves would enjoy, so we created one, and now Kotlin is the language of choice for millions of software developers worldwide.
Likewise, we have endeavoured to develop a convenient and enjoyable language for formal reasoning: a general-purpose language for expressing results, definitions, conjectures, constructions, and proofs in all major areas of mathematics and computer science — including, of course, theorems about Kotlin applications and Kotlin itself. After 25 years of active research, supported by a series of breakthroughs resulting from the Univalent Foundations programme, we finally have a blueprint for such a language. These developments have also led to a variety of ideas for Kotlin that greatly improve its expressiveness and conciseness, and most importantly enabling enforcing correctness-by-construction and verifiable contract programming — the only mainstreamable formal methods.
Higher Categorical Construction Calculus (HCCC) is a tentative name for a univalent computational type theory we are developing. As a proof calculus, it will be capable of classical reasoning with choice, structural induction over its own language, and higher categorical reasoning. It will enjoy decidable proof checking and will be shown to be a conservative extension of the classical set theory extended with appropriate higher infinity axioms.
It is a construction calculus,
because, besides proofs, it can express constructions such as numerical algorithms,
straightedge and compass constructions, and abstract constructions such as profinite completions.
In line with the tradition of the original Calculus of Constructions,
it features a universal type of propositions to express structures defined by non-first-order axiomatic theories.
It is higher categorical,
because instances of structures, i.a. Models of axiomatic theories,
come conveniently prepackaged in displayed ω-categories that keep track of structure-respecting correspondences,
homomorphisms, and equivalences on every level,
so that all proofs and constructions can be generalized, specialized, and transferred along.
Computation<T>
.∀<X : T> Y
and
the S4 necessity modality mapping each proper type T
to the set □T
of its closed-form inhabitants.
This way, the theory acquires truly polymorphic type (List<T>
), typeclass (Monoid<T>
),
and instance (id : ∀<T> T → T
) definitions and LEM-compatible (□-internal) parametric reasoning,
so { x ↦ x }
can be shown to be the unique closed-form inhabitant of ∀<T> T → T
.T
to the spectrum ◇T
of its formal inhabitants
to enable univalence-compatible (◇-internal) classical reasoning with choice
without compromising favorable computational properties and decidability of proof and type checking.
The computational content of the ◇-modality turns out to be given by the
Verse calculus, a recently developed deterministic approach to
functional logic programming.
Incidentally, we also vastly expand of the computational power by allowing all classically provable algorithms.Human readers instantly recognize implicit conversions, forgive minor omissions, and think along with the author, bridging nontrivial gaps and transforming arguments “mutatis mutandis”. Proof formalization is plagued by the pain to elaborate all of this explicitly. An enjoyable proof language must at least address the issues of that kind with known solution approaches:
SProp
of definitionally propositional types (“Definitional proof-irrelevance without K”)SData
of definitionally set-like types (“Observational Equality meets CiC”), and FData
of discrete finitary typesIn a series of short proposals we develop a versatile syntax for the resulting language designed for excellent readability, conciseness, and typographic perfection. Based on Kotlin, Python, Lean, Agda, Scala, and Fortress, it is a culmination of over two decades of meticulous collection and evaluation of ideas, carefully assembled into a coherent system.
Observational type theories can be seen as functional programming languages that also contain an embedded language to reason about the programs’ behaviour. Unambiguous specifications of language primitives are part of the language and can be composed into formal proofs regarding any behaviorally relevant properties of the programs.
The ability to make use of classical termination proofs makes HCCC an exceptionally powerful total programming language, which is also capable of expressing not necessarily terminating Turing-complete computations and exact computations over the reals ℝ. The ◇-modality adds the unparalleled expressiveness of deterministic functional logic programming based on the recently introduced Verse Calculus.
HCCC seems to have everything one could ever want from a language for non-interactive programming. Yet, as great as it sounds in theory, programming in bare-bones intensional type theories demands for frustrating amounts of explicit termination, productivity, and convertibility proofs. A decent practical programming experience can only be achieved by automating them away via liquid types and providing convenient infrastructure to deal with cases that cannot be automated: indexed modalities for size-guarded recursion and clock-guarded corecursion.
Another aspect is the practicability of exact real computations, which are orders of magnitude slower than ordinary approxinate real computations using floating-point numbers. To achieve practical relevance, exact real computations require tools for extraction of fast approximate algorithms of (at least probablistically) guaranteed precision.
Non-interactive programs deal only with data and are inherently deterministic, atemporal, and inconsequential: it does not matter if one repeats a some part of the computation or performs it only once, it does not matter if one performs an unnecessary part of the computation speculatively, it does not matter if the computation is performed sequentially or concurrently.
Interactive programs interact with the environment, these interactions are partially ordered in time and are in general non-deterministic both due to order unpredictability
for (i in 1..3) {
launch { user.say(i) }
}
and due to the environment unpredictability
val n = user.ask<Int>("Enter any integer number: ")
We think that interactivity can be dealt with by introducing
Interaction only happens through references, and in general changes their typestates, which makes interactions partially ordered and limited according to the typestate transition and subsumption diagram of the respective reference, which plays the role of the type for references. We obtain references can be usable at most once (affine), at least once (relevant), and exactly once (linear) as the elementary special cases.
By specifying an implementation for a typestate diagram, we construct an object and get a fresh reference to it. In general, objects may construct and use other objects inside and/or capture external references.
Pure objects x : ⊡X
are self-contained;
they do not make use of any external references, even those available in the context,
see Controlling purity in Kotlin.
Sometimes the straightforward, correct-by-construction implementation of a non-interactive
function is best described as an imperative algorithm f : ⊡(X => Y)
.
Internally, it can create mutable objects and act on them, even in a concurrently non-deterministic way, and
make use of effects (such as throwing exceptions) as long as they are encapsulated
(caught inside, in case of exceptions).
Liquid types facilitate entirely automated verification that the algorithm always terminates with a definite value
in most cases of interest, so the self-contained imperative implementation can be used as a genuine function
f : X -> Y
.
This way, interactive extension of HCCC helps even for non-interactive programming.
Live objects x : ⟐X
may have captured references, i.e.
they can interact through references not present in the context.
Being objects that may interact or have interacted
with the outside world, they have a spectrum of possible states rather than one fixed state,
just like the formal variables in Verse calculus.
Interaction with live objects can be formalized in analogously to functional logic programming,
albeit without one : ◇◇T -> ◇T
and all : ◇T -> Stream<T>
operators.
Adjunction between ⊡ and ⟐ modalities is given by the box and unbox operations of Capture Calculus CC<:□ by M. Odersky et al. This calculus also allows introducing temporarily borrowed and temporarily locked references, which can be captured inside objects of limited lifetime we introduce using reference-level existential quantification. This mechanism enables structured sharing and structured concurrency.
We have not yet worked out the formal definition for the interactive extension of HCCC, but we have developed type system extension for Kotlin that embraces effects, mutability, concurrency, and synchronization on the practical level. It should also make Kotlin amenable to automated verification.
Similar to the case of guarded (co)recursion modalities, the whole system will be reducible to plain HCCC by encoding objects as parametrized relative (co)monads and interpreting expressions involving reference types via do-notation. See Paella: algebraic effects with parameters and their handlers.
Assume that every time one interacts with a self-contained object x : ⊡X
by calling x.doSomething(params)
,
the action name, parameters, and outcome are saved into a log.
Such logs form a monoid under concatenation, the free monoid of X
-histories generated by atomic entries
action(params) ↦ outcome
.
Now let us consider histories equivalent if they lead to observationally indistinguishable states of x
.
The quotient monoid generated by this equivalence is known as the trace monoid of X
.
From extensional point of view, state of an object x : X
is given by an element of its trace monoid.
(In general, some actions may add or remove actions available afterwards, making trace monoid into a category with objects given by typestates of the object.)
When concurrency and interaction come into play,
we have to deal with incomplete information about the object states.
Instead of a fixed state, we have a set of possible states.
Thus, the state of a live object x : ⟐X
is a boolean combination of fixed states,
given by an element of the respective trace algebra.
user.say("Hi!")
for (i in 1..2) {
launch { user.say(i) }
}
user.say("Bye!")
// user = say("Hi"!) ; (user.say(1) | user.say(2)) ; user.say("Bye!")
// = say("Hi"!) ; user.say(1) ; user.say("Bye!") | say("Hi"!) ; user.say(2) ; user.say("Bye!")
val n = user.ask<Int>("Enter any integer number: ")
// user = (ask<Int>("Enter any integer number: ") ↣ 0)
// | (ask<Int>("Enter any integer number: ") ↣ 1)
// | (ask<Int>("Enter any integer number: ") ↣ -1)
// | (ask<Int>("Enter any integer number: ") ↣ 2)
// | (ask<Int>("Enter any integer number: ") ↣ -2)
// | (ask<Int>("Enter any integer number: ") ↣ 3)
// | ···
Objects such as heaps, where fresh objects can be created, have very particular trace monoids generated by creation operators:
val r1 = heap.new<Int>(1)
val r2 = heap.new<Int>(2)
r1.set(3)
// heap = (new(1) ↣ r1) ; (new(2) ↣ r2) ; r1.set(3)
// = (new(2) ↣ r1) ; (new(2) ↣ r2)
If we consider value substitution operators such as r1.set(3)
,
we have to consider a trace category instead of a trace monoid
since this substitution operator is only available after r1
is created.
But setting the r1
to 3
has the same effect as if it were originally initialized by 3
,
so it is effectively sufficient to consider a trace monoid.
It is customary to write creation operators (new(1) ↣ r1)
as r1 ↦ 1
and write
(*) instead of (;) when the actions commute, so the final state of the heap can be written
as (r1 ↦ 3) * (r2 ↦ 2)
.
Above posed that object types are described by typestate transition and subsumption diagrams, which in particular contain actions with their signatures as transitions. Typestate transition and subsumption diagrams generate the respective free trace algebroids.
In the case of heap-like objects (also including coroutine scopes), we have indexed families of typestates. In particular, for heaps, the typestate is indexed by the number and types of allocated variables. Object may (and should) additionally contain equational laws for their trace algebroids, such as the commutativity of creation operators and absorption of substitutions:
contracts {
{ val x = new(a) ; x.set(b) } ≡ { val x = new(b) }
{ val x = new(a) ; val y = new(b) } ≡ { val y = new(b) ; val x = new(n) }
}
This way, types of heap-like objects give rise to the associated concurrent separation logic within the type theory.
We suspect that reference-level counterparts of indexed guarded computation modalities would give rise to causal temporal logic allowing internal conceptualization of such notions as consensus fairness and eventual consistency.
At some point, Kotlin might be extended to incorporate models of physical systems that software is interacting with or using as components to enable verification of AI-coupled robotic systems (including self-driving cars and delivery drones), medical devices, reactor control systems, transportation systems, and electric circuits. There is a readily available and widely applied tool (KeYmaera X) that allows modeling hybrid systems with nonlinear discrete jumps, nonlinear differential equations, differential-algebraic equations, differential inequalities, and systems with nondeterministic discrete or continuous input, lacking only the probablistic toolset. However, being a separate tool rather than a part of a programming language (and its formal backing construction calculus), it introduces a gap between a piece of software as it stands and its model, compromising the verification reliability and complicating it.
There are two major areas that are not yet covered:
The live modality ⟐ introduces the notion of objects with mixed states. Interactions make such objects entangled, restricting their joint state spectra to particular combinations of their respective states, which reminds of quantum mechanics. Trace algebras for heaps remind of algebras of observables in physics. In the case of heaps, they are commutative since observations are non-destructive and independent of each other. Its non-commutative generalization describes the case where observations are no longer necessarily independent and cannot be in general made non-destructive. It allows description of quantum objects where the coexistence of multiple pure states is inherent rather than due to lack of knowledge. Non-commutative state creation operators algebras allow describing quantum heaps where separate objects can be entangled. There, separately created objects do not remain separable, and the picture of heaps as collections of distinct objects becomes fuzzy: instead of “distinct particles”, we deal with “field quanta”, with quantum heaps playing the role of quantum fields. We assume this analogy can be made precise allowing to describe quantum algorithms and model physical quantum systems, hopefully shedding new some light onto formalization of the quantum field theory.
The affine logic for biconstructive mathematics seems an entirely distinct subject at first. However, biconstructive propositions are semantically described as particularly simple Chu spaces, while the general form of which is known to also describe the Hilbert spaces of quantum states. It resonates with the situation in the homotopy type theory, where the propositions are particularly simple (truncated) types.