Annex B: Sociology and linguistics#
Etymology of Crochet#
Crochet (/kʁɔ.ʃɛ/ — Audio pronunciation on Wiktionary) is the English term for needlework with a yarn thread and hooked needle. It comes from the French “crochet” which also means “hook”, and ultimately derives from Old Norse’s “krókr” (hook) — still alive in Swedish as “krok”.
It might sound weird to choose such a name for a programming language, more so for one designed for security. But it’s a name chosen entirely for its aural aesthetics, not for its meaning. However, meaning can be derived post-fact. Crochet does try to help weave software by joining many small components together—though not necessarily made by the same person.
Lexical restrictions of names#
An identifier is what Crochet calls the names used for
types and fields.
Identifiers can start with a lowercase latin letter
z), and can be followed by any lowercase latin letter
z), a plain hyphen (
-), or digits (
A name is what Crochet calls the names used for variables.
They’re similar to identifiers, but must start with an uppercase latin letter
Both names and identifiers are case-sensitive. That is, the identifier
aBc and the identifier
abc refer to distinct entities, despite
being comprised of the same logical sequence of letters.
This is very biased towards indo-european languages (particularly English), but these restrictions exist for two primary reasons:
Crochet has a textual representation, and is often written as a plain piece of text in a text editor. Restricting what can be understood as a name makes the task of making a computer understand the program unambiguous.
For example, consider a field name such as
angle in radians. There’s
nothing wrong with this as a field name. Indeed, it can match perfectly what
the programmer has in mind for it. And even declaring this field wouldn’t
be as much of an issue:
type data(angle in radians is integer);
is perfectly unambiguous.
Sadly, this is not always the case. Consider projecting this field:
simulate for Data.angle in radians until quiescence. Should Crochet
try to read this as
simulate for (Data.angle in radians) until quiescence,
and thus have the simulation run for the value in the
angle in radians
field, or should Crochet try to read it as
simulate for (Data.angle) in radians until quiescence, thus running the
simulation for the field
angle in the simulation context
Although syntactical ambiguity can be addressed by requiring people to be more explicit when ambiguity arises (just like bracket use in common mathematical notations), the experience of making mistakes becomes rather poor. Because Crochet cannot really infer what is and what isn’t a mistake without broader context (in the above example, both interpretations make complete sense locally), it’s hard for the programming tools supporting Crochet to help programmers when they do make mistakes. What kind of error message should be displayed? How do we figure out that this is an error? What can we even suggest to the programmer? Not having to ask those questions makes the programmer life easier in the long run, at the cost of ruling out some perfectly reasonable spellings.
Spelling display ambiguity#
The second problem is a bit more nuanced. Although the restrictions could be relaxed by allowing any unicode letter in the names, thus making the choice of identifiers less English-centric (and indo-european-centric), doing so creates a different kind of ambiguity: that of symbolic representation.
For example, consider a program that defines the following types:
type かめ; // kame (turtle) type カメ; // kame (turtle) type 亀; // kame (turtle)
All of these types are different spellings of the same concept, with the same pronunciation: a turtle (or “kame”). So when a coworker ask you to instantiate the turtle, how do you figure out which spelling to use? How should tools like auto-completion handle these identifiers? They certainly would need to be very language aware to know that they may even be pronounced in different ways. I don’t know enough Japanese to design such system (and all of the other languages I speak are either in the Latin or Germanic families).
The other problem with allowing a richer set of symbols is in how these
different symbols can be misused and reinterpreted across different
cultures. For example, the
ː (Modifier Letter Triangular Colon, unicode
02D0) is often used in the International Phonetic Alphabet
to describe that the preceding vowel is a long one. But it looks a lot like
: (Colon, unicode 003A). A name like
would be a perfectly valid command name, if all letters were allowed,
but it takes 2 arguments, not 3.
foldːright: is one keyword,
even if it may not be readily recognisable as one depending on the
display font users choose or their knowledge of the International
Now, while these edge cases may not be a problem in actual code—and people should have the poetic freedom to treat their code writing as an art form in any case—, spelling ambiguity does raise some security concerns due to how Crochet works.
In Crochet, security is guaranteed based on controlling which pieces of code have access to what. This depends directly on programmers being able to recognise names and decide how much trust they want to place on each component. Crochet then helps with this by making each name unique, readily identifiable, and associated with a specific source of trust (i.e.: you should be able to know who you’re trusting). Allowing spelling ambiguity makes this complicated. For example, if a programmer is trying to invoke a function and auto-completion suggests possible candidates that match the name, attackers might be able to exploit this to escalate their privileges by crafting commands with similar names, but different spellings, and then deceiving users into choosing the attacker version for auto-completion. Crochet cannot rule out these implementations from the candidate list because it can only compare exact matches of a name.
Still, this is another example of things that need to be investigated in the future—particularly due to Crochet’s over-reliance on tooling—, as they will happen to a lesser extent even without spelling ambiguity anyway.