Morning Edition

84231641-412b-4848-aeda-8f0ecf393de3

Coronavirus: What powers do the police have?

Words: - BBC News - 23:54 23-07-2020

It will become compulsory to wear face coverings in shops in England from Friday 24 July.

But who is responsible for making sure people follow these and other rules intended to tackle coronavirus?

From 24 July under the law, you must wear a face covering in shops, supermarkets, transport hubs and shopping centres in England. Masks have been required in shops in Scotland since 10 July.

The law will also require a covering - which could be either a mask, scarf or bandana - to be donned when buying takeaway food.

Shop staff and security guards have no new powers to enforce the wearing of masks - which means that disputes may need to be resolved by the police.

If police are forced to intervene, shoppers who refuse to cover their face could face a £100 penalty ticket. The government hopes that personal responsibility and encouragement from business owners will lead to people following the law.

Some people are exempt from the face covering rules - including children under 11 and people with certain physical or mental health problems. The law also states that there can be good reasons not to wear a mask - such as to assist someone who relies on lip-reading, or for personal identification at a bank.

What about public transport?

In England, Scotland and Northern Ireland it's already a legal requirement to cover the mouth and nose on public transport (although some people are exempt). Wales will introduce this rule on 27 July.

If you refuse, police officers can issue £100 penalty tickets. The police can also order people off trains and buses - or stop them boarding - as can officers from Transport for London in the capital.

In England, the law allows you to meet in a group of up to 30 people outside, or at home ("outside" means any public place - including beaches, parks, streets and the countryside).

So if you want to organise a picnic or garden party, you can now invite 29 guests.

If you go above that number, the police can turn up and force people to leave. They could issue you with a £100 penalty ticket (£50 if paid within 14 days), rising to £3,200 for six or more offences.

In exceptional cases, the Crown Prosecution Service could take someone to court.

The law in England now allows even bigger formally organised gatherings, providing the people behind it can show they have a plan to minimise the risk of spreading coronavirus.

Officers can turn up and inspect the organiser's written plan. They can order people to leave if they decide there are genuine dangers.

In England, Health Secretary Matt Hancock has an exceptional new power to completely close a specific public place.

And he has also given local councils a suite of new powers to close down premises, stop events and shut down places like parks.

This could be used to close beaches or beauty spots if there are concerns about crowds potentially spreading the virus.

If the land belongs to the Queen or Prince Charles, a council will first need their permission before it can restrict access.

If your favourite beach becomes what the law calls a "restricted area", it would be a crime to go there.

Indoor leisure facilities such as bowling alleys, skating rinks and casinos will re-open in England on 1 August (although nightclubs will remain shut).

These places have stayed closed until now because there was thought to be a risk of coronavirus spreading from close contact.

Police will still have the power to close these businesses again.

However, in practice, decisions are more likely to be left to local authorities, whose trading standards officers can also enforce the law.

Pubs, restaurants, hotels and hair salons can now open in England - but they could still be forced to close, if they cannot keep their staff and customers safe.

The Health and Safety Executive can enforce closure if it believes there is a danger - for instance in an overcrowded factory.

Environmental health officers working for local councils will also be inspecting premises for potential health risks.

Businesses that are open must be able to show they have plans to reduce the risk of transmission - for instance by creating one-way systems around their premises.

Officers patrol a busy bar street in Soho

If a premises was the source of an outbreak, local public health directors could close it while the virus was tackled. This is a longstanding power that has been used to contain other diseases.

(read more)

The Case Against OOP is Wildly Overstated

Words: mpweiher - lobste.rs - 13:51 31-07-2020

Matthew MacDonald Follow Jul 24 · 6 min read You can’t rule the development world for decades without attracting some enemies. And object-oriented programming, which provides the conceptual underpinning for dozens of languages old and new, certainly has some enemies.

Maybe that’s why we’ve suffered through a never-ending series of hot takes about OOP. They’ve described it as a productivity-destroying disaster, a set of deceitful programming patterns, and a mediocre tool designed to help poor programmers hide their incompetence. OOP was even proclaimed dead (14 years ago, so take that one with a grain of salt).

What all these rants have in common is that they point out (rightfully) some of the pitfalls in modern software design and then conclude (wrongfully) that this indicates a terrible rot at the core of the programming world. Yes, object-oriented programming doesn’t look so great if you conflate it with sloppy design practices and fuzzy architectural thinking. But are these crimes really an unavoidable part of OOP? Or are they just one of the wrong paths we sometimes wander as programming neophytes, armed with too much confidence and too much curiosity?

The problems start with one brittle assumption made by some OOP supporters and almost all of its critics — that OOP is meant to model the real world. This is the original sin of OOP, a corrosive idea that’s responsible for countless bloated codebases.

Although there’s nothing in OOP theory that requires programming objects to parallel the real world, plenty of well-meaning teachers use this idea to lower the curve of complexity for new students. Here’s an illustration of the problem from Oracle’s official Java documentation:

“Objects are key to understanding object-oriented technology. Look around right now and you’ll find many examples of real-world objects: your dog, your desk, your television set, your bicycle … Software objects are conceptually similar to real-world objects.”

This is not an isolated example. Many introductory texts blur the line between code constructs and real-world objects, by presenting examples with Car and Wheel objects, or hopelessly tied-together groups of Person and Family objects. It’s madness.

This idea also leads to antipatterns, like exploding a database into a fog of linked classes with object-relational mapping.

This sort of design isn’t wrong for everyone. But there are plenty of unhappy people handcuffed to ORM systems, endlessly generating data class boilerplate that’s far less efficient than they want and for more complicated than they need. Did OOP encourage this? Maybe, but the real culprit is the idea-gone-wild that every identifiable thing deserves its own object representation.

Nothing good happens when we forget that our designs should be led by the needs of our code, not the completeness of our object models.

A better description of objects is — like many honest answers — a little vague. It goes something like this:

An object is a programming construct that lets you pack together data and functionality in a somewhat reusable package. Some objects may be structs by another name. Other objects may simply be libraries of related functionality. Deciding how to break down a programming problem into objects is part of the art of OOP.

And, from the always-insightful Eloquent JavaScript:

“The fact that something sounds like an object does not automatically mean that it should be an object in your program. Reflexively writing classes for every concept in your application tends to leave you with a collection of interconnected objects that each have their own internal, changing state. Such programs are often hard to understand and thus easy to break.”

An experienced programmer knows that when choosing between a solution that’s less object-oriented and one that’s more object-oriented, you should pick the simplest approach that meets the needs of your project.

If OOP is difficult to do right, that’s at least partly because software design is hard to do right, no matter what tools you use.

In fact, OOP is much less prescriptive in design than many people believe. Object-oriented languages give you a set of tools for using objects (formalizing their interactions with interfaces, extending them with inheritance, and so on). But they don’t say much about how you should apply these objects to a problem. This is a great and deliberate ambiguity.

The gap between theory and practice has fueled the interest in design patterns. As OOP became more popular, programmers looked to them for help with their achitecture. Unfortunately, design patterns can easily become a way to smuggle in overly complex OOP design under a veneer of respectability.

How do you avoid this trap? Focus on the rock-solid principles of good programming that are cited so often some have turned into acronyms. That includes principles like DRY (Don’t Repeat Yourself), YAGNI (don’t build it if You Ain’t Gonna Need It), the Law of Demeter (restrict what classes must know about each other), continuous refactoring, and valuing simplicity and readability above all else. Start with these solid principles — a philosophy of coding — and let your design take shape in that environment.

Some of the sharpest attacks launched on OOP target inheritance. Critics point out the very real fragile base class problem, where a codebase becomes frozen in time thanks to subtle dependencies between child classes and their parents.

The solution to the fragile base class problem and other inheritance hangovers is surprisingly simple — don’t use it. All the cautionary tales you’ve heard are true.

When inheritance makes sense is in framework design — in other words, as a tool for the people who build the tools that you use. The .NET or Java class libraries would be a far poorer and less organized place without a rigorous inheritance hierarchy tying things together. But creating and maintaining this type of framework is a massive architectural task. It’s not the kind of thing you want to undertake if you’re a fast-moving customer-focused team of agile developers. And here’s a dirty secret — you probably won’t get it right unless you do it wrong a few times first.

In other words, inheritance is a great feature when you use it indirectly, but rat poison squared when you use it to extend your own classes. And you don’t even need it. If you want a way to reuse functionality, containment and delegation work perfectly well. And if you need to standardize different classes, that’s what interfaces are for.

Which brings us to the real limitations of OOP. It doesn’t prevent you from applying the wrong solution to a problem. It doesn’t prevent you from designing yourself into a cramped corner before a tight deadline and next to a hungry alligator. It gives you set of tools that can be enjoyed or abused. The rest is up to you.

There’s one criticism leveled against OOP that’s probably true. OOP may not be dead, but its moment of total world domination is fading. Functional programming continues to grow alongside OOP (although it’s a fire that’s been very slow in starting). And pure OOP is shifting to make room for so-called multi-paradigm languages like Go and Rust — languages that have a slimmer set of object-oriented features, and avoid some of the traditional OOP baggage. Sometime in the next decade we’ll know these languages have truly arrived, when we see them featured in their own developer take-downs. Until then, enjoy your OOP, and keep your code clean.

(read more)

The Twitter Bitcoin hack can happen anywhere humans are involved

Words: raymiiOrg - lobste.rs - 13:06 31-07-2020

The recent twitter hack involved social engineering and

access to the twitter backend. This opinion piece will show you

that this sort of incident can happen everywhere as long as humans are involved.

Everywhere there are manual actions or admin / backend panels, this can happen.

Pay a support-slave enough and they'll delete an account 'by accident'. Or a

rougue sysadmin that disables logging, does something horrible and enables

logging again. Or a sales person giving 100% discounts and disable reporting.

I hope that this piece makes you think about the things that can go wrong at your

company and hopefully you take time to fix it or get it fixed. Or at least mitigate

risks, add some more logging, train your staff, treat them well, etcetera.

The ars technica article has screenshots of the aledged backend:

I'll show you that there is no one size fits all solution. Or at least, not a

single fix for all. Treating your employees well, educatingthem on risks

and automating as much as possible will get you a long way."

If you like this article, consider sponsoring me by trying out a Digital Ocean

VPS. With this link you'll get $100 credit for 60 days). (referral link)

If you have missed it, just recently very prominent twitter accounts started to

tweet a bitcoin scam. "Transfer bitcoins to here and I'll pay you back the double

amount", in some form or other. As twitter themselves state, their staff

was social engineered and thus access to their internal systems was gained.

My guess is that it's way more easy / cheaper to scam a low level support person or to

offer them a sum of cash to cause an "accident" than to search for vulnerabilities,

exploit one and island-hop further into a network and then execute such an action.

There might not be anything malicious going on, it could just be an actual successfull

phising action. Sucks to be the one that got phished, but the company probably

lacked user security training, or two factor authentication, two ways to mitigate

phishing.

However, imagine yourself being a low level, low paid IT support person with a

slight grudge against the company. It would be a nice revenge to cause public

havoc, or get ransomware inside, or by accident delete important data, while

being able to claim you made an honest mistake, either because of social

engineering or because the manual procedure you were executing should have been

automated away years ago.

Now each and every company I've worked for has had these backend panels. Most of

the time these are built as an aftertought when you grow big enough to hire support

or clercs, lack the same validation / security measures as the customer facing self

service side or there are still manual procedures to be done by helpdesk staff,

which require high level permissions, because automating it takes more time than

just instructing Joe from the helpdesk to not make mistakes.

Each and every other company has these backends and every company also has staff

that can abuse them, willingly or by accident.

As they say, security is like an onion, it stinks and makes you cry. Or, maybe

they mean the layers part. If you train your users, patch your software, have a

web proxy / outgoing filter, strict firewall/vpn, restrict executables, run

antivirus and mandate two factor auth, you make it harder for hackers (and for

your employees), but not impossible. Just as with regular burglars, you only

have to be more secure than your neighbor is in most cases.

If you have none of the above, chances are you've already been hacked but don't

know it. If you do have all of the above, you still can get hacked. Then you hopefully

have proper (offsite) logging in place to reconstruct what happened.

Even with logging it will still cause damage to the company, reconstruction takes

time, backups take time to restore and the damage is already done, causing much

inconvinience.

Technical measures only go so far in solving people problems. Train all your users

on phising. The hospital I worked at did a massive phising email once a year to

over fifteen thousand staff, each year they saw less and less people filling in

their passwords due to awareness training.

But, if there is malicious intent, it might be hard to see if it was an accident

or malicious on purpose. Here's another example I've had happen in one of my

companies.

Imagine a sales person being able to assign a discount,

which is a regular part of his job. He sees that every month a report of all the

discounts goes to the sales manager, for checks and balances. But, by accident,

he gives a 100% discount, and next month doesn't see it on the balance sheet.

Maybe because the reporting software never was written for a 100% discount.

Now, six months later, he's has himself a little side business, where he gets

paid directly by some shady customers, who all have 100% discount but stay

under the radar. This can go on forever, until the report is changed, some other

staff member sees the accounts or because someone talks. But by then this person

is long gone with the wind, and if caught, could just claim it was a typo / bug.

In my humble opinion, there is no one size fits all solution. However, splitting

the problem up into seperate categories, solutions come to mind. Both technical

and people-wise.

One other big thing is to not have a "blame" culture. If employees are likely

to be mistreated after making a mistake, they will cover it up. If you have an

open culture where mistakes can be made and you try to learn from them, people

are more open to admitting failure, which in the long run is better for everyone.

Everywhere humans are doing work, mistakes are made. Twitter was breached due to

social engineering, but the question to ask is, did staff actually need all those

extra rights which the hackers abused? Was there a way that, even with hackers inside,

not all keys to the kingdom where lost right away?

Manual procedures, disgruntled employees and security risks all allow for mistakes

to happen. Mistakes with large consequences, mistakes that maybe are on purpose

and things that shouldn't have happened at all if there was proper tooling in place.

Can you fix it? No, not entirely. You can however get a long way in making sure

it doesn't come to it, and if it does come to it, have proper measures of

reconstructing what happened.

Not all measures are technical, training, checking and treating people well go

a long way. Technical measures can help, but as far as I'm concerned, they cannot

entirely replace people measures.

What follows are a few scenarios and procedures I've come accross in my carreer.

In the following paragraph I'll show you one of the backends I worked with,

give you examples of manual procedures and what can go wrong with them.

By showing these scenarios and the ones above, I hope that you can think of the

things in your organization that can be improved.

In my career as a sysadmin I've worked with many different sorts of companies,

as you can see on the about page. Most of the stories are not applicable

anymore due to procedure changes, policy changes or software updates. Some

information is masked, but you still get the general idea.

Every company with a fleet of windows computers probably uses active directory

to manage users, computers and policies. 90% of those companies have in house

or external helpdesk which users can call to reset their password.

The above dialog window is all that you need to do to reset a users password.

Now there are policies and rights you can manage to let only certain people

reset passwords, but I suspect that your average company just allows the IT

helpdesk to reset passwords for everyone. If you're lucky you have a self service

portal, but who remembers those recovery questions anyway...

After the rougue IT intern has reset the password of the CEO friday evening, he emails a few of

the biggest customers, informing them they're like the backend of a horse, he

resets the twitter password and tweets nasty things on the CEO's account and

sends Sally from finance a few PDF's and requests to pay big amounts to his

bank account. On sunday he's out of state and was never heard from again. Or, he

claims to have been on vacation, and it turns out the culprit was another IT admin you

fired recently, but who still had access because of passwords not changed.

In the hospital I worked at the electronic medical records were first stored

in a system named ZIS (ziekenhuis informatie systeem). It was a self

written operating system for the PDP-11 from the seventies, as a research

project for a few large hospitals in the Netherlands. Ported to VAX and

when I worked there it was running on linux machines. Here is a dutch PDF

describing the migration from VAX to UNIX to LINUX. When active, it

served every hospital digital task, from the finances, to the kitchen, to the

lab and all related to patient care. It had it's own Wordperfect-like text

editor for patient letters (TEKST) and everything you could think of related

to hospital patient care.

I can talk about this system because it has been phased out a few years ago.

Every IT staff had an administrative account which allowed them to kill user sessions

(telnet wasn't the most reliable), reset printers and see everything for every

patient. Here are a few screenshot of the manual procedure to reset a printer:

First you had to edit the hosts file (either with a line editor (edl/eds)

like ed or search, an interactive version) to find the printer IP address

(tcp/ip was added in later on in the systems lifetime, before that everything was

connected via serial lines), then in another program (aptly named PRINTER), you

had to reset it via a sequene of commands in an interactive menu.

One time, I even had zis crash with a stacktrace:

Here's a picture of my user account from back when I was still at university for

my nursing degree:

Here is another screenshot of the program we used to print medication stickers:

I studied both nursing and computer science, and worked in the hospital in both

roles. But back to the printer issue. As you can see on the picture, you edited

the system wide hosts file. If you wanted, you could remove lines relating to

other things than just printers, like computers on patient departments used by

nurses to login, or other servers (database etc.). Because printer resetting was

a common occurence, after every paper jam you had to reset it, even interns could

execute this action. As far as I know it never went wrong, but if you screwed

up the hosts file, you could cause a huge impact to the system. Not to mention

other privileged actions like killing user sessions, editing patient records

or removing imaging data.

In the semi-graphical UI based on ZIS, IT staff could even remove patients:

That was because at some point in time regular staff were able to duplicate patients,

but not remove them. The duplication button was later removed, but the permission

for IT staff wasn't.

One point to note here is that a lot of permission was granted based on verified

trust. Proper logging and backups were in place and to even be able to work

there you had to have a VOG (Verklaring omtrent gedrag). A dutch government

(justice department) issued statement telling that you had no earlier offences.

It was specific for the job required, so if you are a convicted pedophile you

won't get a VOG for a childcare job, but you might get one for a cab driver job.

If you were convicted for drunk driving, you won't get one for a cab driver job

but could get one for a childcare job.

For IT related jobs, at least in the hospital, categories checked were handling

of sensitive data / documents and general convictions as far as I remember.

A few years ago, ZIS was phased out, finally, after 40 years, in favor of some

self written oracle based software, which was later replaced by Chipsoft HIX.

The history of ZIS is so interesting and innovative that I might do an article

on it in the future.

Tags:

,

,

,

,

,

(read more)

Trump told reporters he will use executive power to ban TikTok

Words: Rita Liao - TechCrunch - 03:05 01-08-2020

President Donald Trump said he could act to ban the world’s most popular short video app TikTok from the US as early as Saturday, according to The Hill.

The president said he could use “emergency economic powers or an executive order” to bar TikTok from the US, he told reporters aboard Air Force One on Friday.

The news came hours after reports broke that Microsoft was in talks to buy TikTok. Investors are reportedly valuing three-year-old TikTok at $50 billion. In his remark on Friday, Trump signaled he was not supportive of allowing an American company to acquire TikTok.

On the same day, Bloomberg reported that Trump could order ByteDance to divest its ownership of TikTok.

In response to Trump’s decision, TikTok, as usual, tried to make a case that it’s in the interest of the US to keep the app and it poses no national security threat:

“100 million Americans come to TikTok for entertainment and connection, especially during the pandemic. We’ve hired nearly 1,000 people to our US team this year alone, and are proud to be hiring another 10,000 employees into great paying jobs across the US. Our $1 billion creator fund supports US creators who are building livelihoods from our platform. TikTok US user data is stored in the US, with strict controls on employee access. TikTok’s biggest investors come from the US. We are committed to protecting our users’ privacy and safety as we continue working to bring joy to families and meaningful careers to those who create on our platform,” said a TikTok spokesperson.

Trump’s announcement confirmed weeks of speculation that US regulators planned to block TikTok, which is immensely popular among American teens, over concerns that it could be a spying tool for Beijing.

The question is how a divestment or ban of TikTok will take shape. TikTok is owned by Beijing-based ByteDance, which has emerged as the most promising tech startup in China in recent times, reportedly valued at a staggering $100 billion. It operates Douyin, the popular Chinese version of TikTok, separately for China-based users.

ByteDance has sought various ways to distance TikTok from any Chinese association. Efforts in the past few months range from appointing former Disney executive Kevin Mayer as TikTok’s CEO, claiming the app’s data is stored on American land, through to promising to create 10,000 jobs in the US.

TikTok’s comms team also tried to assuage concerns by reiterating that four of its parent company’s five board seats are “controlled by some of the world’s best-respected global investors,” including Arthur Dantchik, managing director of Susquehanna International Group; William Ford, CEO of General Atlantic; Philippe Laffont, founder of Coatue Management; and Neil Shen, the boss of Sequoia China. ByteDance founder and CEO Zhang Yiming is the chairman of the board.

It’s worth noting that the Committee on Foreign Investment in the US (CFIUS) still hasn’t released its decision on whether the Musical.ly-TikTok merger constitutes a national security threat to the U.S. Even if it orders TikTok to shed Musical.ly, it’s unclear how the sale will happen in practice. When ByteDance merged the two apps back in 2018, it asked Musical.ly’s existing users to download the TikTok app, which already had users, so all of TikTok’s current users are, technically, TikTok users.

If the divestment is aimed at TikTok, will ByteDance be forced to sell all of its international assets? TikTok also has a substantial user base outside the US. Before India banned TikTok over national security fears, a favorite criticism among many US politicians, the country was the app’s largest overseas market.

It’s looking increasingly likely that Zhang Yiming’s worst nightmare is going to happen. The entrepreneur had aspirations to conquer the international market from the outset, and now his startup has become the latest pawn in US-China relations.

(read more)

What every developer should know about consistency

Words: redblackbit - lobste.rs - 10:11 31-07-2020

If you make a request to a database to update some data, the change won’t necessarily be visible to other clients right away, even if your request has completed successfully. Whether it becomes visible sooner rather than later depends on the consistency guarantees offered by the database.

“But wait, aren’t databases supposed to take care of consistency issues for me?” I hear you ask. Some databases come with counter-intuitive consistency guarantees to provide high availability and performance. Others have knobs that allow you to chose whether you want better performance or stronger consistency guarantees, like Azure’s Cosmos DB. Because of that, you need to know what the trade-offs are, and whether there are knobs you can use to tune your database to your specific use-case.

Let’s take a look at what happens when you send a request to a database. In an ideal world, your request executes instantaneously:

But we don’t live an ideal world - your request needs to reach the data store, which then needs to process the request and finally send back a response to you. All these actions take time and are not instantaneous:

The best guarantee a database can provide is that the request executes somewhere between its invocation and completion time. You might think that this doesn’t look like a big deal - after all, it’s what you are used to when writing single-threaded applications - if you assign 1 to x and read its value right after, you expect to find 1 in there, assuming there is no other thread writing to the same variable. But, once you start dealing with data stores that replicate their state on multiple machines for high availability and scalability, all bets are off. To understand why that’s the case, we will explore the trade-offs a system designer has to make to implement reads in a simplified model of a distributed database.

Suppose we have a distributed key-value store, which is composed of a set of replicas. The replicas elect a leader among themselves, which is the only node that can accept writes. When the leader receives a write request, it broadcasts it asynchronously to the other replicas. Although all replicas receive the same updates in the same order, they do so at different times.

You are tasked to come up with a strategy to handle read requests - how would you go about it? Well, a read can potentially be served by the leader or a replica. If all reads were to go through the leader, the throughput would be limited by what a single node can handle. Alternatively, any replica could serve any read request - that would definitely scale, but then two clients, or observers, could have a different view of the system’s state, as replicas can lag behind the leader and between them.

Intuitively, there is a trade-off between how consistent the observers’ views of the system are, and the system’s performance and availability. To understand this relationship, we need to define precisely what we mean by consistency. We will do so with the help of consistency models, which formally define the possible views of the system’s state observers can experience.

If clients send writes and reads exclusively to the leader, then every request appears to take place atomically at a very specific point in time as if there was a single copy of the data. No matter how many replicas there are or how far behind they are lagging, as long as the clients always query the leader directly, from their point of view there is a single copy of data.

Because a request is not served instantaneously, and there is a single node serving it, the request executes somewhere between its invocation and completion time. Another way to think about it is that once a request completes, it’s side-effects are visible to all observers:

Since a request becomes visible to all other participants between its invocation and completion time, there is a real-time guarantee that must be enforced - this guarantee is formalized by a consistency model called linearizability, or strong consistency. Linearizability is the strongest consistency guarantee a system can provide for single-object requests.

What if the client sends a read request to the leader, but by the time the request gets there, the server that received the request thinks it’s still the leader, but it actually was deposed? If the ex-leader was to process the request, the system would no longer be strongly consistent. To guard against this case, the presumed leader first needs to contact a majority of the replicas to confirm whether it still is the leader. Only then is it allowed to execute the request and send back the response to the client. This considerably increases the time required to serve a read.

So far, we have discussed serializing all reads through the leader. But doing so creates a single chokepoint, which limits the system’s throughput. On top of that, the leader needs to contact a majority of replicas to handle a read. To increase the read performance, we could allow the replicas to handle requests as well.

Even though a replica can lag behind the leader, it will always receive new updates in the same order as the leader. If a client A only ever queries replica 1, and client B only ever queries replica 2, the two clients will see the state evolving at different times, as replicas are not entirely in sync:

The consistency model in which operations occur in the same order for all observers, but doesn’t provide any real-time guarantee about when an operation’s side-effect becomes visible to the observers, is called sequential consistency. The lack of real-time guarantees is what differentiates sequential consistency with linearizability.

A simple application of this model is a producer/consumer system synchronized with a queue - a producer node writes items to the queue, which a consumer reads. The producer and the consumer see the items in the same order, but the consumer lags behind the producer.

Although we managed to increase the read throughput, we had to pin clients to replicas - what if a replica goes down? We could increase the availability of the store by allowing a client to query any replica. But, this comes at a steep price in terms of consistency. Say there are two replicas 1 and 2, where replica 2 lags behind replica 1. If a client queries replica 1 and right after replica 2, it will see a state from the past, which can be very confusing. The only guarantee a client has is that eventually, all replicas will converge to the final state if the writes to the system stop. This consistency model is called eventual consistency.

It’s challenging to build applications on top of an eventually consistent data store because the behavior is different from the one you are used to when writing single-threaded applications. Subtle bugs can creep up that are hard to debug and to reproduce. Yet, in eventual consistency’s defense, not all applications require linearizability. You need to make the conscious choice whether the guarantees offered by your data store, or lack thereof, satisfy your application’s requirements. An eventually consistent store is perfectly fine if you want to keep track of the number of users visiting your website, as it doesn’t really matter if a read returns a number that is slightly out of date. But for a payment processor, you definitely want strong consistency.

There are more consistency models than the ones presented in this post. Still, the main intuition behind them is the same: the stronger the consistency guarantees are, the higher the latency of individual operations is, and the less available the store becomes when failures happen. This relationship is formalized by the PACELC theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).

In my system design book, I explore other tunable trade-offs data stores make to guarantee high availability and performance, like the isolation guarantees that prevent a group of operations within a transaction from interfering with other concurrently running transactions.

(read more)

The object-oriented Amiga Exec (1991)

Words: idrougge - lobste.rs - 09:58 31-07-2020

T HE

O BJECT- O RIENTED

A MIGA E XEC

Object-oriented is the computer buzzword for the early 1990s. It's the latest Holy Grail, which will let programmers leap tall buildings in a single bound, cure world hunger, and produce 100,000 lines of fully debugged code a day.

Well, maybe not. After all, AT&T invented one of the premier object-oriented programming languages, C++, and was unable to get release 2.0 out within a year of its predicted release date. Nevertheless, there is no question that OOP is a Good Thing, and producers of operating systems have been furiously recoding their products as object-oriented systems, generally using C++.

Yet there is one object-oriented operating system that has been in widespread use since 1985. It runs Commodore's Amiga, although, ironically, it was not written in an OOP language.

The Amiga's operating system is sometimes incorrectly referred to as AmigaDOS. Actually, the Amiga operating system has three major components: Exec, the multitasking kernel AmigaDOS proper, which provides the high-level file systems and Command Line Interface; and Intuition, the basis for the graphical user interface (GUI). I'll be discussing only the Amiga Exec. Of the three, it is the most object-oriented and the nost comparable to the C++programming language.

That may seem surprising. Aren't GUI's what made OOP so famous? Well, yes, but there's more to OOP than simply dealing with data objects that correspond to graphical images. More on this later.

Although many Amiga features have a potentially unlimited number of elements, the absolute minimum RAM that the Amiga system software requires is a fraction of what many comparable systems require; even counting the ROM part ofthe operating system, the memory consumed is only about 512Kbytes (although newer Amigas can support larger ROMs).

So what's missing from the Amiga operating system that makes it so small? How does the Amiga manage to provide multitasking and windowing services in less than one-half tc one-fourth the memory that Apple and IBM computers need? The answer lies in minimal redandancy.

If you examine the internal structure of many popular operating systems, you'll discover that it's "OS versus them." That is, you have this somewhat monolithic block of stuff that is the core operating system, some additional voodoo acting as device drivers and the like, and applications code--and only an uneasy truce ever lets them meet. Even though the operating system may itself be composed of many components, its appearance to the applications programmer is still essentially as a rather mysterious edifice, beyond which only selected portions of applications software may go.

The Amiga operating system is different. Relatively few parts of it are totally opaque; in fact, with later revisions of the operating system, the trend has been to further open its internals for applications use. This is even more surprising considering that, unlike the Macintosh, the Amiga runs its applications in the nonprivileged state.

Most present-day operating systems operate as if the operating system code is "magic"; that is, it spends most of its time running in privileged states, inhibiting interrupts, executing arcane instructions that are incomprehensible to mere mortal applications programmers, and otherwise doing things that are completely outside the scope of applications programming. That isn't really true.

Very little operating-system code is truly magic; most of it deals with managing tables, lists, queues, and other such mundane tasks. However, since these are operating-system tables lists, queues, and so forth, they are generally managed with special routines that run in privileged, noninterruptible, memory-managed, or otherwise arcane environments.

What tends to be overlooked is that it is often possible to take all the "magical" parts of code and separate them from the "nonmagical" parts, resulting in a set of controlling routines to switch modes, and a lot of data-handling routines that look suspiciously similar.

This situation means that, first, there exists the possibility that general-purpose versions of some of these routines can be created to replace all these similar-but-not-identical functions; and second, since these routines are no longer magic, they can be accessible to application programs as well, reducing their overall size and complexity, to say nothing of the time saved by using pre-debugged code.

Using general-purpose code for critical operating-system functions might seem like heresy to some--after all, aren't "general-purpose" and "efficient" mutually exclusive? The answer appears to be no, or more accurately, "If it's general-purpose and it's not efficient, perhaps it's not general-purpose enough." What usually wastes time in general-purpose code is all the testing and branching that it has to do to handle the variations of data structures it processes.

Where does OOP come into all this? The answer lies in the principle of inheritance. By arranging the system data structures much as we did for system code in traditional systems and by placing related data items in a common sequence, we gain two advantages: a reduction in the amount of special-case processing that was so offensive, and the creation of a hierarchy of data object classes. As a side effect, the system becomes easier to understand--there are fewer unique functions.

Exec, as mentioned earlier, is the nucleus (or kernel) of the Amiga operating system, and it is realized in just such a manner. Exec consists of a collection of increasingly complex object classes, as can be seen in figure 1. When a new Exec object class is defined based on a simpler class, it contains all the data objects of the simpler class and also is (usually) valid for not only operations defined for that class, but for all operations pertaining to the simpler class, as well. This is known as function inheritance.

Exec's heavy reliance on inheritance is what makes it so compact. It does not contain separate sets of routines to manipulate tasks, I/O devices, intertask messages, and so on; instead, it contains basic routines to handle collections of objects--be they task objects, device objects, or whatever--and adds functions only where additional support is required. In contrast, many operating systems contain a collection of task routines (including those required to manage the task table) and a collection of device routines (including those required to manage the device table), and so forth.

There are many ways to represent collections of data internally, each with its own advantages and disadvantages. Exec is based on the doubly linked list. The list elements are allocated dynamically from anywhere in RAM that's convenient (see the section on memory management); therefore, there are no tables to fill up. On the other hand, access speed is highly dependent on the number of nodes in the list, but there are ways to reduce that problem, as I'll explain later.

The Amiga operating system distinguishes itself in that the operating system itself provides support for doubly linked lists. Exec supports two levels of lists: lists and MinLists. (The Lattice C++ implementation adds two more that are similar to the standard Exec lists, but without automatic initialization.) These are used to define items that the Amiga system software initializes.

The MinList is the anchor to a doubly linked list of MinNodes. MinNodes contain next and previous MinNode pointers. The MinList structure contains a pair of dummy MinNodes to simplify processing by reducing special-case logic required to process items at the ends of the list. An empty MinList always consists of two MinNodes--the dummy nodes at the front and the end of the MinList--both of which are contained within the MinList data structure itself. The dummy front node's next-node pointer points to the actual first node of the list. Its previous-node pointer is always NULL.

A similar situation exists in regard to the dummy end node. Since the dummy front node's previous-node pointer is always NULL and the dummy end node's next-node pointer is also always NULL, it was possible to save a small amount of memory by making them overlap by sharing the same NULL pointer. Dummy MinNodes do add one complication: The last actual item of the list is not the one with the NULL next-node pointer: That honor belongs to the dummy node.

A complete set of functions exists to support insertion and deletion of MinNodes at either end--or points in between--of a MinList. By using the proper functions, therefore, a MinList can be used as a first-in/first-out (FIFO) (also known as a queue) or as a last-in/first-out (LIFO) (also known as a stack), as well as a general-purpose list.

Once again, note that there is nothing magical about MinLists, MinNodes, or any of the functions that act on them. Although they are extensively used by the Amiga operating system, they can be used just as freely in any application program.

A note about C++ friend and member functions. When a class is defined in C++, its internal components are (unless otherwise specified) protected from casual access. This is a strong selling point; it makes it harder for an object's innards to be corrupted and easier to locate the responsible function. To make practical use of (and to alter) the information within a class object, therefore, some sort of access mechanism is required. C++ provides two: a friend function, which is like a traditional C function, except that by having been declared a friend of one or more classes, it is allowed to directly access the data stored within that class or classes, and a member function which is actually owned by a specific class and therefore has an "invisible" extra parameter passed to it: "this," which is a pointer to the class object being acted on.

Exec was designed to be used by non-OOP languages; thus the Exec functions are, in effect, friend functions. The #include files made up by MTS Associates to support C++ on the Amiga generally define them as such. However, to better support Exec in its capacity as an object-oriented system, a number of member functions were also defined. For example, virtually every object in the Amiga is in some sort of list, so most objects have a member function named next(). No matter what it is, no matter how it's linked, and no matter what the name or relative location of the object's next-item pointer, you are thus always guaranteed that you can get a pointer to the next one in the list by using the next ( ) function.

A list is an extended MinList, made up of nodes. The nodes are MinNodes plus a 1-byte type field, a 1-byte priority, and a pointer to the node's name, which is a C-format string. Figure 2 shows the structure of a node. A node incorporates the structure of a MinNode, and thus automatically inherits the properties of a MinList to form a list.

Subsequently, a list can use all the MinList functions. A list can be maintained as a FIFO or LIFO, just like a MinList. However, a list can also be maintained in priority sequence courtesy of the list friend function Enqueue( ). If the nodes in the list are given names, it is also possible to search the list for the first/next node of that name. This can be very useful, as it's how Exec locates a number of public objects.

Signals are represented by a 32-bit word containing a pattern of signal flags. There are 16 that are allocatable to the user and 16 reserved by the operating system. When a task signals another task and the other task is in a signal wait state, the receiving task's incoming signal information has the incoming signal bits logically ORed in. This is then ANDed with the recipient task's pattern of signals that it is waiting on. A nonzero result causes the task to become dispatchable.

This is an extremely efficient way to activate a sleeping task and it can be done at any system level, including in interrupt routines, which are denied many of the more sophisticated system services.

Figure 3 shows how a message is constructed from a node. Messages are extremely important in the operation of the Amiga. They are used to pass information from task to task, as the basis for I/O requests, and as the medium of transfer for Intuition's mouse and window events. Unlike a signal, which can merely give an "I'm here!" indication, a message can have complex information piggybacked on it.

A message is an extended node and is usually transmitted to a message port (MsgPort), another type of node that contains a list of incoming messages to be serviced, in priority order (see figure 4). MsgPorts can be private and anonymous, or they can be added to the system message port list. Frequently, they occur in pairs (one of each), since after a message is serviced, it is common to forward it to a reply MsgPort, where it is generally recycled or discarded--although it is possible to bounce a message through a whole series of ports. There are several different ways to implement a MsgPort, but the most common way is supported by a special C++ class named the StdPort--or standard message port--which can be created and initialized by coding

StdPort *listener = new StdPort ( "I hear you" ) ;

The StdPort constructor takes care of all the details of standard MsgPort initialization. Memory is allocated and initialized, and a signal is acquired on which the listening task can wait. Using the AddPort function, the MsgPort can be put on the system's public MsgPort list, where it can be found by any task that wishes to send it a message. Because Exec was designed in an object-oriented manner, the new operating-system functions are quite simple. A C++ reconstruction shows the following:

Here's a reconstruction of another system function:

FindPort illustrates another important design feature. If you searched a system list every time you wanted to access an element in that list, system performance would suffer. Instead, the convention is to search and return the object's address. Thereafter, the object's address can be used directly (the Amiga does not use Macintosh-style handles, which cause objects to shift about in memory). The downside of that is that you must never move or remove an object that other tasks may be using. Libraries and devices ensure this by maintaining a user count. For simple message ports, the application should either enforce a log-in/log-out facility or else require that all messages be sent on a one-shot basis (i.e., FindPort / PutMsg).

Each task is limited to a maximum of 32 distinct signals but can have an unlimited number of MsgPorts. The same signal can be used by more than one MsgPort, which is what keeps Intuition tasks from being limited to a finite number of open windows.

If you examine the

internal structure of many popular

operating systems, you'11 discover

that it's "0S versus them."

IORequests are extended messages that include I/O control and transfer information sent to devices. A basic set of commands (read, write, control, and so on) is common to all IORequests; for a given device, additional extensions can be added as needed. A number of special-purpose device IORequest classes have been derived; any device implementer is at liberty to derive his or her own extensions as needed. It's fairly common to end up with something like the following:

MyDeviceRequest is based on:

StdIORequest is based on:

IORequest is based on:

Message is based on:

Node is based on:

MinNode

With each level of inheritance, you gain additional properties and functions. The only new code required is that which supports your own unique class of object.

Another important type of node is the library. It consists of a base structure, preceded by function vectors and followed by optional private storage. There is a set of basic functions (e.g., open, close, and expunge) common to all libraries. Beyond that, the designer is free to add functionality at will.

Unlike most operating systems, the Amiga operating system does not use software interrupts or illegal instruction traps to provide operating-system services. Instead, there is a master library, named exec.library, located in the ROM kernel. All the fundamental system functions--the list primitives, memory management, functions to load and open libraries (the libraries' own internal initialization and open routines are called from this)--are defined here. The only immutable part of the operating system is absolute memory location 4, which points to the Exec library structure (ExecBase). The data portion of ExecBase contains the fundamental Exec structures, including the list definitions for the system message ports, libraries, devices, and tasks.

It's interesting to compare Exec libraries with the dynamic link libraries used by OS/2 and Microsoft Windows. DLLs support sets of functions, but they provide additional services as well. The Intel 286 and subsequent chips support the concepts of different levels (rings) of security. If you don't hold the requisite minimum security level, a request will fail. DLL function dispatching can cause security-level switching. The Motorola 68000-series equivalent of this is the Module Call facility. It, however, requires at least a 68020 microprocessor unit and preferably a paged memory management unit. AmigaDOS runs on all 68000s, so the only security levels inherently available are due to the fact that AmigaDOS programs run by the default user state, whereas the operating system runs in supervisor state, as required.

There are pros and cons to both approaches. Since Exec libraries are essentially simple vector tables, the overhead of calling library functions is barely higher than when the function is resident in the calling program (much less than a software interrupt), instead of being shared system code. On the other hand, a carefully designed DLL, while incurring a small speed penalty, is more immune to damage from programs that have run amok.

Note that DLLs are extensions to the Microsoft operating systems; the basic system functions are still software-interrupt driven. Hence, there has to be logic for both kinds of library interfaces. Amiga libraries, however, not only provide a single interface, but they are immune to the problem inherent to all software interrupts--there's only a finite number of them, which never seems to be enough for practical purposes. Libraries, on the other hand, are not only "infinitely" expandable, but it is a straightforward task to create new ones that are indistinguishable from the built-in ones--or even to completely override a built-in one by inserting a new library of the same name at a higher priority on the system library list.

The Amiga operating system

is unusual in that it doesn't partition

memory for applications.

The library concept is itself extended; it forms the basis for an I/O device by adding a few standard functions. Most devices work via extended messages, called IORequests. There are also extensions to these extensions (such as the StdIORequest), as well as customized extensions for specific devices. Devices typically also possess one or more tasks so that I/O can be done asynchronously, although this is not mandatory.

There is another, less-understood extension to the library, called the resource. A resource essentially acts as a coordinator for shared resources (generally hardware), such as the different drives on a disk controller, or the serial and parallel I/O ports (whicb are implemented on the same chips).

The task structure is yet another node. This one contains all the definitions required to make Exec a fully functional, preemptive, priority-driven, multitasking operating system. A task is roughly equivalent to an OS/2 thread. An extended task, known as a process, provides additional information to permit use of the AmigaDOS functions defined in the library named dos.library--chiefly such things as Unix-like I/O services, program loading capabilities, and the like.

Exec's task scheduler is not as elaborate as OS/2's, which is rumored to have been lifted bodily from IBM's VM/370 mainframe operating system. The OS/2 dispatcher dynamically adjusts task priorities based on certain algorithms that are in the "magic" part of the operating system. While this is impressive, it's doubtful that a single-user operating system needs it. No matter; you're stuck with it. About the best you can do is turn it off, but it still eats up real memory.

Exec gives good performance with a simple time-slice dispatching algorithm. More complex custom algorithms can be attached in a straightforward manner, if required. This can be done safely (and in a release-independent manner) on the Amiga, because both the dispatcher functions and task lists are accessible via well-defined interfaces.

Nothing in the basic design of Exec actually requires only a single CPU to be present in the system. Exec could be implemented in a multiprocessor system if access to system lists were properly serialized. Amiga 2500 systems contain a 68000 and a 68020 (or 68030). At present, one or the other is put to sleep at boot time, but there are possibilities here.

Originally, serialization in Exec was done either by the Forbid( ) function, which prevents other tasks from being dispatched, or Disable(),which switches off interrupts. However, this serializes the entire system. For serializing access to a specific resource, semaphores are better.

There are two kinds of semaphores in Exec. A SignalSemaphore is based on a MinNode. It provides high-performance serialization but has restrictions on use. A semaphore is based on the message system and can be used in more general situations. Either type of semaphore is preferable to the cruder Forbid()/Permit() or Disable()/Enable() functions, both of which reduce the amount of multitasking that can be done while serialized on the resource.

An interrupt is a data structure that points to interrupt-handling code, plus any working storage it might require. To allow for more than one task to handle an interrupt event, interrupts are nodes. The exact handling of interrupts varies, depending on the type of interrupt.

The Amiga operating system is unusual in that it doesn't partition memory for applications, or even the operating system itself. Instead, it maintains a free memory list where each chunk of free memory has certain attributes, and requests are matched against them. Thus, no application runs out of memory until the system itself runs out of memory, and there is no requirement to juggle segments or, as with the Macintosh, compact memory.

A set of low-level functions can be used to acquire and free memory, but Exec also provides a set of functions to manage memory within pools acquired by the application. This has several advantages: less overall memory fragmentation, lower overhead (since the entire pool can be released as a unit, instead of piecemeal), and the ability to preallocate enough memory for applications that have a lot of dynamic memory usage.

There is no system-memory-in-use list. If an application fails and doesn't have a cleanup routine, or if the programmer neglects to free all acquired memory, it's lost until the system is rebooted.

Exec memory management supports both bank-switched memory and virtual memory. Memory allocated with the MEMF_PUBLIC attribute is guaranteed to always be visible to all tasks, interrupt routines, and the system supervisor. Memory on the application's stack or allocated without this flag has no such guarantee. With a minimum of 8.5 megabytes of RAM, neither virtual memory nor switched memory cards are in widespread use, so it's likely that many applications will fail should this situation change.

All the other components of the Amiga's operating system--AmigaDOS, Intuition, the WorkBench, the system's unique built-in animation routines, and so on--all ultimately depend on the services of Exec. Exec is compact, efficient, flexible. reliable, and expandable. And no other system I've ever worked with has been so easy to work with. I like that.

BIBLIOGRAPHY

Commodore staff. "Amiga ROM Kernel Reference Manual: Exec." Reading, MA:Addison-Wesley, 1986.

Sassenrath, Carl. "Guru's Guide to the Commodore Amiga: Meditation #1--Interrupts." Ukiah, CA: Sassenrath Research, 1988.

Tim Holloway is president of MTS Associates, a system software development firm in Jacksonville, Florida. He can be reached on BlX as "tholloway."

From BYTE Magazine, January 1991

Copyright 1991 by McGraw-Hill, Inc.

(read more)

How we migrated Dropbox from Nginx to Envoy

Words: mseri - lobste.rs - 05:45 31-07-2020

In this blogpost we’ll talk about the old Nginx-based traffic infrastructure, its pain points, and the benefits we gained by migrating to Envoy. We’ll compare Nginx to Envoy across many software engineering and operational dimensions. We’ll also briefly touch on the migration process, its current state, and some of the problems encountered on the way.

When we moved most of Dropbox traffic to Envoy, we had to seamlessly migrate a system that already handles tens of millions of open connections, millions of requests per second, and terabits of bandwidth. This effectively made us into one of the biggest Envoy users in the world.

Disclaimer: although we’ve tried to remain objective, quite a few of these comparisons are specific to Dropbox and the way our software development works: making bets on Bazel, gRPC, and C++/Golang.

Also note that we’ll cover the open source version of the Nginx, not its commercial version with additional features.

Our legacy Nginx-based traffic infrastructure

Our Nginx configuration was mostly static and rendered with a combination of Python2, Jinja2, and YAML. Any change to it required a full re-deployment. All dynamic parts, such as upstream management and a stats exporter, were written in Lua. Any sufficiently complex logic was moved to the next proxy layer, written in Go.

Our post, “Dropbox traffic infrastructure: Edge network,” has a section about our legacy Nginx-based infrastructure.

Nginx served us well for almost a decade. But it didn’t adapt to our current development best-practices:

Our internal and (private) external APIs are gradually migrating from REST to gRPC which requires all sorts of transcoding features from proxies.

Protocol buffers became de facto standard for service definitions and configurations.

All software, regardless of the language, is built and tested with Bazel.

Heavy involvement of our engineers on essential infrastructure projects in the open source community.

Also, operationally Nginx was quite expensive to maintain:

Config generation logic was too flexible and split between YAML, Jinja2, and Python.

Monitoring was a mix of Lua, log parsing, and system-based monitoring.

An increased reliance on third party modules affected stability, performance, and the cost of subsequent upgrades.

Nginx deployment and process management was quite different from the rest of the services. It relied a lot on other systems’ configurations: syslog, logrotate, etc, as opposed to being fully separate from the base system.

With all of that, for the first time in 10 years, we started looking for a potential replacement for Nginx.

Why not Bandaid?

As we frequently mention, internally we rely heavily on the Golang-based proxy called Bandaid. It has a great integration with Dropbox infrastructure, because it has access to the vast ecosystem of internal Golang libraries: monitoring, service discoveries, rate limiting, etc. We considered migrating from Nginx to Bandaid but there are a couple of issues that prevent us from doing that:

Golang is more resource intensive than C/C++. Low resource usage is especially important for us on the Edge since we can’t easily “auto-scale” our deployments there.

CPU overhead mostly comes from GC, HTTP parser and TLS, with the latter being less optimized than BoringSSL used by Nginx/Envoy.

The “goroutine-per-request” model and GC overhead greatly increase memory requirements in high-connection services like ours.

No FIPS support for Go’s TLS stack.

Bandaid does not have a community outside of Dropbox, which means that we can only rely on ourself for feature development.

With all that we’ve decided to start migrating our traffic infrastructure to Envoy instead.

Our new Envoy-based traffic infrastructure

Let’s look into the main development and operational dimensions one by one, to see why we think Envoy is a better choice for us and what we gained by moving from Nginx to Envoy.

Performance

Nginx’s architecture is event-driven and multi-process. It has support for p , p , and worker-to-CPU pinning. Although it is event-loop based, is it not fully non-blocking. This means some operations, like opening a file or access/error logging, can potentially cause an event-loop stall (even with p , aio_write, and thread pools enabled.) This leads to increased tail latencies, which can cause multi-second delays on spinning disk drives.

Envoy has a similar event-driven architecture, except it uses threads instead of processes. It also has SO_REUSEPORT support (with a BPF filter support) and relies on libevent for event loop implementation (in other words, no fancy epoll(2) features like EPOLLEXCLUSIVE.) Envoy does not have any blocking IO operations in the event loop. Even logging is implemented in a non-blocking way, so that it does not cause stalls.

It looks like in theory Nginx and Envoy should have similar performance characteristics. But hope is not our strategy, so our first step was to run a diverse set of workload tests against similarly tuned Nginx and Envoy setups.

If you are interested in performance tuning, we describe our standard tuning guidelines in “Optimizing web servers for high throughput and low latency.” It involves everything from picking the hardware, to OS tunables, to library choices and web server configuration.

Our test results showed similar performance between Nginx and Envoy under most of our test workloads: high requests per second (RPS), high bandwidth, and a mixed low-latency/high-bandwidth gRPC proxying.

It is arguably very hard to make a good performance test. Nginx has guidelines for performance testing, but these are not codified. Envoy also has a guideline for benchmarking, and even some tooling under the envoy -perf project, but sadly the latter looks unmaintained.

We resorted to using our internal testing tool. It’s called “hulk” because of its reputation for smashing our services.

That said, there were a couple of notable differences in results:

Nginx showed higher long tail latencies. This was mostly due to event loops stalls under heavy I/O, especially if used together with SO_REUSEPORT since in that case connections can be accepted on behalf of a currently blocked worker.

Nginx performance without stats collections is on part with Envoy, but our Lua stats collection slowed Nginx on the high-RPS test by a factor of 3. This was expected given our reliance on lua_shared_dict, which is synchronized across workers with a mutex.

We do understand how inefficient our stats collection was. We considered implementing something akin to FreeBSD’s p in userspace: CPU pinning, per-worker lockless counters with a fetching routine that loops through all workers aggregating their individual stats. But we gave up on this idea, because if we wanted to instrument Nginx internals (e.g. all error conditions), it would mean supporting an enormous patch that would make subsequent upgrades a true hell.

Since Envoy does not suffer from either of these issues, after migrating to it we were able to release up to 60% of servers previously exclusively occupied by Nginx.

Observability

Observability is the most fundamental operational need for any product, but especially for such a foundational piece of infrastructure as a proxy. It is even more important during the migration period, so that any issue can be detected by the monitoring system rather than reported by frustrated users.

Non-commercial Nginx comes with a “stub status” module that has 7 stats:

Active connections: 291

server accepts handled requests

16630948 16630948 31070465

Reading: 6 Writing: 179 Waiting: 106

This was definitely not enough, so we’ve added a simple log_by_lua handler that adds per-request stats based on headers and variables that are available in Lua: status codes, sizes, cache hits, etc. Here is an example of a simple stats-emitting function:

function _M.cache_hit_stats(stat)

if _var.upstream_cache_status then

if _var.upstream_cache_status == "HIT" then

stat:add("upstream_cache_hit")

else

stat:add("upstream_cache_miss")

end

end

end

In addition to the per-request Lua stats, we also had a very brittle error.log parser that was responsible for upstream, http, Lua, and TLS error classification.

On top of all that, we had a separate exporter for gathering Nginx internal state: time since the last reload, number of workers, RSS/VMS sizes, TLS certificate ages, etc.

A typical Envoy setup provides us thousands of distinct metrics (in prometheus format) describing both proxied traffic and server’s internal state:

$ curl -s http://localhost:3990/stats/prometheus | wc -l

14819

This includes a myriad of stats with different aggregations:

Per-cluster/per-upstream/per-vhost HTTP stats, including connection pool info and various timing histograms.

Per-listener TCP/HTTP/TLS downstream connection stats.

Various internal/runtime stats from basic version info and uptime to memory allocator stats and deprecated feature usage counters.

A special shoutout is needed for Envoy’s admin interface. Not only does it provide additional structured stats through /certs, /clusters, and /config_dump endpoints, but there are also very important operational features:

The ability to change error logging on the fly through p . This allowed us to troubleshoot fairly obscure problems in a matter of minutes.

/cpuprofiler, /heapprofiler, /contention which would surely be quite useful during the inevitable performance troubleshooting.

/runtime_modify  endpoint allows us to change set of configuration parameters without pushing new configuration, which could be used in feature gating, etc.

In addition to stats, Envoy also supports pluggable tracing providers. This is useful not only to our Traffic team, who own multiple load-balancing tiers, but also for application developers who want to track request latencies end-to-end from the edge to app servers.

Technically, Nginx also supports tracing through a third-party OpenTracing integration , but it is not under heavy development.

And last but not least, Envoy has the ability to stream access logs over gRPC. This removes the burden of supporting syslog-to-hive bridges from our Traffic team. Besides, it’s way easier (and secure!) to spin up a generic gRPC service in Dropbox production than to add a custom TCP/UDP listener.

Configuration of access logging in Envoy, like everything else, happens through a gRPC management service, the Access Log Service (ALS). Management services are the standard way of integrating the Envoy data plane with various services in production. This brings us to our next topic.

Integration

Nginx’s approach to integration is best described as “Unix-ish.” Configuration is very static. It heavily relies on files (e.g. the config file itself, TLS certificates and tickets, allowlists/blocklists, etc.) and well-known industry protocols (logging to syslog and auth sub-requests through HTTP). Such simplicity and backwards compatibility is a good thing for small setups, since Nginx can be easily automated with a couple of shell scripts. But as the system’s scale increases, testability and standardization become more important.

Envoy is far more opinionated in how the traffic dataplane should be integrated with its control plane, and hence with the rest of infrastructure. It encourages the use of protobufs and gRPC by providing a stable API commonly referred as xDS. Envoy discovers its dynamic resources by querying one or more of these xDS services.

Nowadays, the xDS APIs are evolving beyond Envoy: Universal D ata Plane API (UDPA) has the ambitious goal of “becoming de facto standard of L4/L7 loadbalancers.”

From our experience, this ambition works out well. We already use Open Request Cost Aggregation (ORCA) for our internal load testing, and are considering using UDPA for our non-Envoy loadbalancers e.g. our Katran-based eBPF/XDP Layer-4 Load Balancer.

This is especially good for Dropbox, where all services internally already interact through gRPC-based APIs. We’ve implemented our own version of xDS control plane that integrates Envoy with our configuration management, service discovery, secret management, and route information.

For more information about Dropbox RPC, please read “Courier: Dropbox migration to gRPC.” There we describe in detail how we integrated service discovery, secret management, stats, tracing, circuit breaking, etc, with gRPC.

Here are some of the available xDS services, their Nginx alternatives, and our examples of how we use them:

Access Log Service (ALS), as mentioned above, lets us dynamically configure access log destinations, encodings, and formats. Imagine a dynamic version of Nginx’s log_format and access_log.

Endpoint discovery service (EDS) provides information about cluster members. This is analogous to a dynamically updated list of upstream block’s server entries (e.g. for Lua that would be a   p ) in the Nginx config. In our case we proxied this to our internal service discovery.

Secret discovery service (SDS) provides various TLS-related information that would cover various ssl_* directives (and respectively p .)  We adapted this interface to our secret distribution service.

Runtime Discovery Service (RTDS) is providing runtime flags. Our implementation of this functionality in Nginx was quite hacky, based on checking the existence of various files from Lua. This approach can quickly become inconsistent between the individual servers. Envoy’s default implementation is also filesystem-based, but we instead pointed our RTDS xDS API to our distributed configuration storage. That way we can control whole clusters at once (through a tool with a sysctl-like interface) and there are no accidental inconsistencies between different servers.

Route discovery service (RDS) maps routes to virtual hosts, and allows additional configuration for headers and filters. In Nginx terms, these would be analogous to a dynamic location block with set_header/proxy_set_header and a proxy_pass. On lower proxy tiers we autogenerate these directly from our service definition configs.

For an example of Envoy’s integration with an existing production system, here is a canonical example of how to integrate Envoy with a custom service discovery. There are also a couple of open source Envoy control-plane implementations, such as Istio and the less complex go-control-plane.

Our homegrown Envoy control plane implements an increasing number of xDS APIs. It is deployed as a normal gRPC service in production, and acts as an adapter for our infrastructure building blocks. It does this through a set of common Golang libraries to talk to internal services and expose them through a stable xDS APIs to Envoy. The whole process does not involve any filesystem calls, signals, cron, logrotate, syslog, log parsers, etc.

Configuration

Nginx has the undeniable advantage of a simple human-readable configuration. But this win gets lost as config gets more complex and begins to be code-generated.

As mentioned above, our Nginx config is generated through a mix of Python2, Jinja2, and YAML. Some of you may have seen or even written a variation of this in erb, pug, Text::Template, or maybe even m4:

{% for server in servers %}

server {

{% for error_page in server.error_pages %}

error_page {{ error_page.statuses|join(' ') }} {{ error_page.file }};

{% endfor %}

...

{% for route in service.routes %}

{% if route.regex or route.prefix or route.exact_path %}

location {% if route.regex %}~ {{route.regex}}{%

elif route.exact_path %}= {{ route.exact_path }}{%

else %}{{ route.prefix }}{% endif %} {

{% if route.brotli_level %}

brotli on;

brotli_comp_level {{ route.brotli_level }};

{% endif %}

...

Our approach to Nginx config generation had a huge issue: all of the languages involved in config generation allowed substitution and/or logic. YAML has anchors, Jinja2 has loops/ifs/macroses, and of course Python is Turing-complete. Without a clean data model, complexity quickly spread across all three of them.

This problem is arguably fixable, but there were a couple of foundational ones:

There is no declarative description for the config format. If we wanted to programmatically generate and validate configuration, we would need to invent it ourselves.

Config that is syntactically valid could still be invalid from a C code standpoint. For example, some of the buffer-related variables have limitations on values, restrictions on alignment, and interdependencies with other variables. To semantically validate a config we needed to run it through nginx -t.

Envoy, on the other hand, has a unified data-model for configs: all of its configuration is defined in Protocol Buffers. This not only solves the data modeling problem, but also adds typing information to the config values. Given that protobufs are first class citizens in Dropbox production, and a common way of describing/configuring services, this makes integration so much easier.

Our new config generator for Envoy is based on protobufs and Python3. All data modeling is done in proto files, while all the logic is in Python. Here’s an example:

from dropbox.proto.envoy.extensions.filters.http.gzip.v3.gzip_pb2 import Gzip

from dropbox.proto.envoy.extensions.filters.http.compressor.v3.compressor_pb2 import Compressor

def default_gzip_config(

compression_level: Gzip.CompressionLevel.Enum = Gzip.CompressionLevel.DEFAULT,

) -> Gzip:

return Gzip(

# Envoy's default is 6 (Z_DEFAULT_COMPRESSION).

compression_level=compression_level,

# Envoy's default is 4k (12 bits). Nginx uses 32k (MAX_WBITS, 15 bits).

window_bits=UInt32Value(value=12),

# Envoy's default is 5. Nginx uses 8 (MAX_MEM_LEVEL - 1).

memory_level=UInt32Value(value=5),

compressor=Compressor(

content_length=UInt32Value(value=1024),

remove_accept_encoding_header=True,

content_type=default_compressible_mime_types(),

),

)

Note the Python3 type annotations in that code!  Coupled with mypy-protobuf protoc plugin, these provide end-to-end typing inside the config generator. IDEs capable of checking them will immediately highlight typing mismatches.

There are still cases where a type-checked protobuf can be logically invalid. In the example above, gzip window_bits can only take values between 9 and 15. This kind of restriction can be easily defined with a help of protoc-gen-validate protoc plugin:

google.protobuf.UInt32Value window_bits = 9 [(validate.rules).uint32 = {lte: 15 gte: 9}];

Finally, an implicit benefit of using a formally defined configuration model is that it organically leads to the documentation being collocated with the configuration definitions. Here ’ s an example from p :

// Value from 1 to 9 that controls the amount of internal memory used by zlib. Higher values.

// use more memory, but are faster and produce better compression results. The default value is 5.

google.protobuf.UInt32Value memory_level = 1 [(validate.rules).uint32 = {lte: 9 gte: 1}];

For those of you thinking about using protobufs in your production systems, but worried you may lack a schema-less representation, here’s a good article from Envoy core developer Harvey Tuch about how to work around this using google.protobuf.Struct and google.protobuf.Any: “Dynamic extensibility and Protocol Buffers.”

Extensibility

Extending Nginx beyond what’s possible with standard configuration usually requires writing a C module. Nginx’s development guide provides a solid introduction to the available building blocks. That said, this approach is relatively heavyweight. In practice, it takes a fairly senior software engineer to safely write an Nginx module.

In terms of infrastructure available for module developers, they can expect basic containers like hash tables/queues/rb-trees, (non-RAII) memory management, and hooks for all phases of request processing. There are also couple of external libraries like pcre, zlib, openssl, and, of course, libc.

For more lightweight feature extension, Nginx provides Perl and Javascript interfaces. Sadly, both are fairly limited in their abilities, mostly restricted to the content phase of request processing.

The most commonly used extension method adopted by the community is based on a third-party lua- nginx -module and various OpenResty libraries. This approach can be hooked in at pretty much any phase of request processing. We used log_by_lua for stats collection, and balancer_by_lua for dynamic backend reconfiguration.

In theory, Nginx provides the ability to develop modules in C++. In practice, it lacks proper C++ interfaces/wrappers for all the primitives to make this worthwhile. There are nonetheless some community attempts at it. These are far from ready for production, though.

Envoy’s main extension mechanism is through C++ plugins. The process is not as well documented as in Nginx’s case, but it is simpler. This is partially due to:

Clean and well-commented interfaces. C++ classes act as natural extension and documentation points. For example, checkout the HTTP filter interface.

C++14 language and standard library. From basic language features like templates and lambda functions, to type-safe containers and algorithms. In general, writing modern C++14 is not much different from using Golang or, with a stretch, one may even say Python.

Features beyond C++14 and its stdlib. Provided by the abseil library, these include drop-in replacements from newer C++ standards, mutexes with built-in static deadlock detection and debug support, additional/more efficient containers, and much more.

For specifics, here’s a canonical example of an HTTP Filter module.

We were able to integrate Envoy with Vortex2 (our monitoring framework) with only 200 lines of code by simply implementing the Envoy stats interface.

Envoy also has Lua support through moonjit, a LuaJIT fork with improved Lua 5.2 support. Compared to Nginx’s 3rd-party Lua integration it has far fewer capabilities and hooks. This makes Lua in Envoy far less attractive due to the cost of additional complexity in developing, testing, and troubleshooting interpreted code. Companies that specialize in Lua development may disagree, but in our case we decided to avoid it and use C++ exclusively for Envoy extensibility.

What distinguishes Envoy from the rest of web servers is its emerging support for WebAssembly (WASM) — a fast, portable, and secure extension mechanism. WASM is not meant to be used directly, but as a compilation target for any general-purpose programming language. Envoy implements a WebAssembly for Proxies specification (and also includes reference Rust and C++ SDKs) that describes the boundary between WASM code and a generic L4/L7 proxy. That separation between the proxy and extension code allows for secure sandboxing, while WASM low-level compact binary format allows for near native efficiency. On top of that, in Envoy proxy-wasm extensions are integrated with xDS. This allows dynamic updates and even potential A/B testing.

The “Extending Envoy with WebAssembly” presentation from Kubecon’19 (remember that time when we had non-virtual conferences?) has a nice overview of  WASM in Envoy and its potential uses. It also hints at performance levels of 60-70% of native C++ code.

With WASM, service providers get a safe and efficient way of running customers’ code on their edge. Customers get the benefit of portability: Their extensions can run on any cloud that implements the proxy-wasm ABI. Additionally, it allows your users to use any language as long as it can be compiled to WebAssembly. This enables them to use a broader set of non-C++ libraries, securely and efficiently.

Istio is putting a lot of resources into WebAssembly development: they already have an experimental version of the WASM-based telemetry extension and the WebAssemblyHub community for sharing extensions. You can read about it in detail in “Redefining extensibility in proxies - introducing WebAssembly to Envoy and Istio . ”

Currently, we don’t use WebAssembly at Dropbox. But this might change when the Go SDK for proxy-wasm is available.

Building and Testing

By default, Nginx is built using a custom shell-based configuration system and make-based build system. This is simple and elegant, but it took quite a bit of effort to integrate it into B azel-built monorepo to get all the benefits of incremental, distributed, hermetic, and reproducible builds.

Google open-sourced their B azel-built Nginx version which consists of Nginx, BoringSSL, PCRE, ZLIB, and Brotli library/module.

Testing-wise, Nginx has a set of Perl-driven integration tests in a separate repository and no unit tests.

Given our heavy usage of Lua and absence of a built-in unit testing framework, we resorted to testing using mock configs and a simple Python-based test driver:

class ProtocolCountersTest(NginxTestCase):

@classmethod

def setUpClass(cls):

super(ProtocolCountersTest, cls).setUpClass()

cls.nginx_a = cls.add_nginx(

nginx_CONFIG_PATH, endpoint=["in"], upstream=["out"],

)

cls.start_nginxes()

@assert_delta(lambda d: d == 0, get_stat("request_protocol_http2"))

@assert_delta(lambda d: d == 1, get_stat("request_protocol_http1"))

def test_http(self):

r = requests.get(self.nginx_a.endpoint["in"].url("/"))

assert r.status_code == requests.codes.ok

On top of that, we verify the syntax-correctness of all generated configs by preprocessing them (e.g. replacing all IP addresses with 127/8 ones, switching to self-signed TLS certs, etc.) and running nginx -c on the result.

On the Envoy side, the main build system is already Bazel. So integrating it with our monorepo was trivial: Bazel easily allows adding external dependencies.

We also use copybara scripts to sync protobufs for both Envoy and udpa. Copybara is handy when you need to do simple transformations without the need to forever maintain a large patchset.

With Envoy we have the flexibility of using either unit tests (based on gtest/gmock) with a set of pre-written mocks, or Envoy’s integration test framework, or both. There’s no need anymore to rely on slow end-to-end integration tests for every trivial change.

gtest is a fairly well-known unit-test framework used by Chromium and LLVM, among others. If you want to know more about googletest there are good intros for both googletest and googlemock.

Open source Envoy development requires changes to have 100% unit test coverage. Tests are automatically triggered for each pull request via the Azure CI Pipeline.

It’s also a common practice to micro-benchmark performance-sensitive code with google/becnhmark:

$ bazel run --compilation_mode=opt test/common/upstream:load_balancer_benchmark -- --benchmark_filter=".*LeastRequestLoadBalancerChooseHost.*"

BM_LeastRequestLoadBalancerChooseHost/100/1/1000000 848 ms 449 ms 2 mean_hits=10k relative_stddev_hits=0.0102051 stddev_hits=102.051

...

After switching to Envoy, we began to rely exclusively on unit tests for our internal module development:

TEST_F(CourierClientIdFilterTest, IdentityParsing) {

struct TestCase {

std::vector<std::string> uris;

Identity expected;

};

std::vector<TestCase> tests = {

{{"spiffe://prod.dropbox.com/service/foo"}, {"spiffe://prod.dropbox.com/service/foo", "foo"}},

{{"spiffe://prod.dropbox.com/user/boo"}, {"spiffe://prod.dropbox.com/user/boo", "user.boo"}},

{{"spiffe://prod.dropbox.com/host/strange"}, {"spiffe://prod.dropbox.com/host/strange", "host.strange"}},

{{"spiffe://corp.dropbox.com/user/bad-prefix"}, {"", ""}},

};

for (auto& test : tests) {

EXPECT_CALL(*ssl_, uriSanPeerCertificate()).WillOnce(testing::Return(test.uris));

EXPECT_EQ(GetIdentity(ssl_), test.expected);

}

}

Having sub-second test roundtrips has a compounding effect on productivity. It empowers us to put more effort into increasing test coverage. And being able to choose between unit and integration tests allows us to balance coverage, speed, and cost of Envoy tests.

Bazel is one of the best things that ever happened to our developer experience. It has a very steep learning curve and is a large upfront investment, but it has a very high return on that investment: incremental builds, remote caching, distributed builds/tests, etc.

One of the less discussed benefits of Bazel is that it gives us an ability to query and even augment the dependency graph. A programmatic interface to the dependency graph, coupled with a common build system across all languages, is a very powerful feature. It can be used as a foundational building block for things like linters, code generation, vulnerability tracking, deployment system, etc.

Security

Nginx’s code surface is quite small, with minimal external dependencies. It’s typical to see only 3 external dependencies on the resulting binary: zlib (or one of its faster variants), a TLS library, and PCRE. Nginx has a custom implementation of all protocol parsers, the event library, and they even went as far as to re-implement some libc functions.

At some point Nginx was considered so secure that it was used as a default web server in OpenBSD. Later two development communities had a falling out, which lead to the creation of  httpd. You can read about the motivation behind that move in BSDCon’s “Introducing OpenBSD ’s new httpd.”

This minimalism paid off in practice. Nginx has only had 30 vulnerabilities and exposures reported in more than 11 years.

Envoy, on the other hand, has way more code, especially when you consider that that C++ code is far more dense than the basic C used for Nginx. It also incorporates millions of lines of code from external dependencies. Everything from event notification to protocol parsers is offloaded to 3rd party libraries. This increases attack surface and bloats the resulting binary.

To counteract this, Envoy relies heavily on modern security practices. It uses AddressSanitizer, ThreadSanitizer, and MemorySanitizer. Its developers even went beyond that and adopted fuzzing.

Any opensource project that is critical to the global IT infrastructure can be accepted to the OSS-Fuzz—a free platform for automated fuzzing. To learn more about it, see “OSS-Fuzz / Architecture.”

In practice, though, all these precautions do not fully counteract the increased code footprint. As a result, Envoy has had 22 security advisories in the p ast 2 years.

Envoy's security release policy is described in great detail, and in postmortems for selected vulnerabilities. Envoy is also a participant in Google’s Vulnerability Reward Program (VRP). Open to all security researchers, VRP provides rewards for vulnerabilities discovered and reported according to their rules.

For a practical example of how some of these vulnerabilities can be potentially exploited, see this writeup about CVE-2019–18801: “Exploiting an Envoy heap vulnerability.”

To counteract the increased vulnerability risk, we use best binary hardening security practices from our upstream OS vendors Ubuntu and Debian. We defined a special hardened build profile for all edge-exposed binaries. It includes ASLR, stack protectors, and symbol table hardening:

build:hardened --force_pic

build:hardened --copt=-fstack-clash-protection

build:hardened --copt=-fstack-protector-strong

build:hardened --linkopt=-Wl,-z,relro,-z,now

Forking web-servers, like Nginx, in most environments have issues with stack protector. Since master and worker processes share the same stack canary, and on canary verification failure worker process is killed, the canary can be brute-forced bit-by-bit in about 1000 tries. Envoy, which uses threads as a concurrency primitive, is not affected by this attack.

We also want to harden third-party dependencies where we can. We use BoringSSL in FIPS mode, which includes startup self-tests and integrity checking of the binary. We’re also considering running ASAN-enabled binaries on some of our edge canary servers.

Features

Here comes the most opinionated part of the post, brace yourself.

Nginx began as a web server specialized on serving static files with minimal resource consumption. Its functionality is top of the line there: static serving, caching (including thundering herd protection), and range caching.

On the proxying side, though, Nginx lacks features needed for modern infrastructures. There’s no HTTP/2 to backends. gRPC proxying is available but without connection multiplexing. There’s no support for gRPC transcoding. On top of that, Nginx’s “open-core” model restricts features that can go into an open source version of the proxy. As a result, some of the critical features like statistics are not available in the “community” version.

Envoy, by contrast, has evolved as an ingress/egress proxy, used frequently for gRPC-heavy environments. Its web-serving functionality is rudimentary: no file serving, still work-in-progress caching, neither brotli nor pre-compression. For these use cases we still have a small fallback Nginx setup that Envoy uses as an upstream cluster.

When HTTP cache in Envoy becomes production-ready, we could move most of static-serving use cases to it, using S3 instead of filesystem for long-term storage. To read more about eCache design, see “eCache: a multi-backend HTTP cache for Envoy.”

Envoy also has native support for many gRPC-related capabilities:

gRPC proxying. This is a basic capability that allowed us to use gRPC end-to-end for our applications (e.g. Dropbox desktop client.)

HTTP/2 to backends. This feature allows us to greatly reduce the number of TCP connections between our traffic tiers, reducing memory consumption and keepalive traffic.

gRPC → HTTP bridge (+ reverse.)  These allowed us to expose legacy HTTP/1 applications using a modern gRPC stack.

gRPC-WEB. This feature allowed us to use gRPC end-to-end even in the environments where middleboxes (firewalls, IDS, etc) don’t yet support HTTP/2.

gRPC JSON transcoder. This enables us to transcode all inbound traffic, including Dropbox public APIs, from REST into gRPC.

In addition, Envoy can also be used as an outbound proxy. We used it to unify a couple of other use cases:

Egress proxy: since Envoy added support for the HTTP CONNECT method, it can be used as a drop-in replacement for Squid proxies. We’ve begun to replace our outbound Squid installations with Envoy. This not only greatly improves visibility, but also reduces operational toil by unifying the stack with a common dataplane and observability (no more parsing logs for stats.)

Third-party software service discovery: we are relying on the Courier gRPC libraries in our software instead of using Envoy as a service mesh. But we do use Envoy in one-off cases where we need to connect an open source service with our service discovery with minimal effort. For example, Envoy is used as a service discovery sidecar in our analytics stack. Hadoop can dynamically discover its name and journal nodes. Superset can discover airflow, presto, and hive backends. Grafana can discover its MySQL database.

Community

Nginx development is quite centralized. Most of its development happens behind closed doors. There’s some external activity on the nginx-devel mailing list, and there are occasional development-related discussions on the official bug tracker.

There is an #nginx channel on FreeNode. Feel free to join it for more interactive community conversations.

Envoy development is open and decentralized: coordinated through GitHub issues/pull requests, mailing list, and community meetings.

There is also quite a bit of community activity on Slack. You can get your invite here.

It’s hard to quantify the development styles and engineering community, so let’s look at a specific example of HTTP/3 development.

Nginx QUIC and HTTP/3 implementation was recently presented by F5. The code is clean, with zero external dependencies. But the development process itself was rather opaque. Half a year before that, Cloudflare came up with their own HTTP/3 implementation for Nginx. As a result, the community now has two separate experimental versions of HTTP/3 for Nginx.

In Envoy’s case, HTTP/3 implementation is also a work in progress, based on chromium’s "quiche" (QUIC, HTTP, Etc.) library. The project’s status is tracked in the GitHub issue. The de sign doc was publicly available way before patches were completed. Remaining work that would benefit from community involvement is tagged with “help wanted.”

As you can see, the latter structure is much more transparent and greatly encourages collaboration. For us, this means that we managed to upstream lots of small to medium changes to Envoy–everything from operational improvements and performance optimizations to new gRPC transcoding features and load balancing changes.

Current state of our migration

We’ve been running Nginx and Envoy side-by-side for over half a year and gradually switching traffic from one to another with DNS. By now we have migrated a wide variety of workloads to Envoy:

Ingress high-throughput services. All file data from Dropbox desktop client is served via end-to-end gRPC through Envoy. By switching to Envoy we’ve also slightly improved users’ performance, due to better connection reuse from the edge.

Ingress high-RPS services. This is all file metadata for Dropbox desktop client. We get the same benefits of end-to-end gRPC, plus the removal of the connection pool, which means we are not bounded by one request per connection at a time.

Notification and telemetry services. Here we handle all real-time notifications, so these servers have millions of HTTP connections (one for each active client.) Notification services can now be implemented via streaming gRPC instead of an expensive long-poll method.

Mixed high-throughput/high-RPS services. API traffic (both metadata and data itself.) This allows us to start thinking about public gRPC APIs. We may even switch to transcoding our existing REST-based APIs right on the Edge.

Egress high-throughput proxies. In our case, the Dropbox to AWS communication, mostly S3. This would allow us to eventually remove all Squid proxies from our production network, leaving us with a single L4/L7 data plane.

One of the last things to migrate would be www.dropbox.com itself. After this migration, we can start decommissioning our edge Nginx deployments. An epoch would pass.

Issues we encountered

Migration was not flawless, of course. But it didn’t lead to any notable outages. The hardest part of the migration was our API services. A lot of different devices communicate with Dropbox over our public API—everything from curl-/wget-powered shell scripts and embedded devices with custom HTTP/1.0 stacks, to every possible HTTP library out there. Nginx is a battle-tested de-facto industry standard. Understandably, most of the libraries implicitly depend on some of its behaviors. Along with a number of inconsistencies between Nginx and Envoy behaviors on which our api users depend, there were a number of bugs in Envoy and its libraries. All of them were quickly resolved and upstreamed by us with the community help.

Here is just a gist of some the “unusual”/non-RFC behaviors:

Merge slashes in URLs . URL normalization and slash merging is a very common feature for web-proxies. Nginx enables slash normalization and slash merging by default but Envoy did not support the latter. We submitted a patch upstream that add that functionality and allows users to opt-in by using the p option.

Ports in virtual host names . Nginx allows receiving Host header in both forms: either example.com or example.com:port. We had a couple of API users that used to rely on this behavior. First we worked around this by duplicating our vhosts in our configuration (with and without port) but later added an option to ignore the matching port on the Envoy side: p .

Transfer encoding case sensitivity . A tiny subset API client for some unknown reason used Transfer-Encoding: Chunked (note the capital “C”) header. This is technically valid, since RFC7230 states that Transfer-Encoding/TE headers are case insensitive. The fix was trivial and submitted to the upstream Envoy.

Request that have both p and Transfer-Encoding: c hunked . Requests like that used to work with Nginx, but were broken by Envoy migration. RFC7230 is a bit tricky there, but general idea is web-servers should error these requests because they are likely “smuggled.” On the other hand, next sentence indicates that proxies should just remove the Content-Length header and forward the request. We’ve extended http-parse to allow library users to opt-in into supporting such requests and currently working on adding the support to to Envoy itself.

It’s also worth mentioning some common configuration issues we’ve encountered:

Circuit-breaking misconfiguration. In our experience, if you are running Envoy as an inbound proxy, especially in a mixed HTTP/1&HTTP/2 environment, improperly set up circuit breakers can cause unexpected downtimes during traffic spikes or backend outages. Consider relaxing them if you are not using Envoy as a mesh proxy. It’s worth mentioning that by default, circuit-breaking limits in Envoy are pretty tight — be careful there!

Buffering. Nginx allows request buffering on disk. This is especially useful in environments where you have legacy HTTP/1.0 backends that don’t understand chunked transfer encoding. Nginx could convert these into requests with Content-Length by buffering them on disk. Envoy has a Buffer filter, but without the ability to store data on disk we are restricted on how much we can buffer in memory.

If you’re considering using Envoy as your Edge proxy, you would benefit from reading “Configuring Envoy as an edge proxy.”  It does have security and resource limits that you would want to have on the most exposed part of your infrastructure.

What’s next?

HTTP/3 is getting closer for the prime time. Support for it was added to the most popular browsers (for now, gated by a flags or command-line options). Envoy support for it is also experimentally available. After we upgrade the Linux kernel to support UDP acceleration, we will experiment with it on our Edge.

Internal xDS-based load balancer and outlier detection. Currently, we are looking at using the combination of Load Reporting service (LRS) and Endpoint discovery service (EDS) as building blocks for creating a common look-aside, load-aware loadbalancer for both Envoy and gRPC.

WASM-based Envoy extensions. When Golang proxy-wasm SDK is available we can start writing Envoy extensions in Go which will give us access to a wide variety of internal Golang libs.

Replacement for Bandaid. Unifying all Dropbox proxy layers under a single data-plane sounds very compelling. For that to happen we’ll need to migrate a lot of Bandaid features (especially, around loadbalancing) to Envoy. This is a long way but it’s our current plan.

Envoy mobile. Eventually, we want to look into using Envoy in our mobile apps. It is very compelling from Traffic perspective to support a single stack with unified monitoring and modern capabilities (HTTP/3, gRPC, TLS1.3, etc) across all mobile platforms.

Acknowledgements

This migration was truly a team effort. Traffic and Runtime teams were spearheading it but other teams heavily contributed: Agata Cieplik, Jeffrey Gensler, Konstantin Belyalov, Louis Opter, Naphat Sanguansin, Nikita V. Shirokov, Utsav Shah, Yi-Shu Tai, and of course the awesome Envoy community that helped us throughout that journey.

We also want to explicitly acknowledge the tech lead of the Runtime team Ruslan Nigmatullin whose actions as the Envoy evangelist, the author of the Envoy MVP, and the main driver from the software engineering side enabled this project to happen.

We’re hiring!

If you’ve read this far, there’s a good chance that you actually enjoy digging deep into webservers/proxies and may enjoy working on the Dropbox Traffic team! Dropbox has a globally distributed Edge network, terabits of traffic, and millions of requests per second. All of it is managed by a small team in Mountain View, CA.

(read more)

Rust verification tools

Words: sanxiyn - lobste.rs - 00:02 31-07-2020

[I have been updating this post continually in response to suggestions and questions.

If you want to see recent changes, you can find the diffs here.]

The Rust language and the Rust community are really interesting if you are

want to build better quality systems software.

Over the last few months, I have been trying to understand one more part

of the story:

what is the state of formal verification tools for Rust?

The clean, principled language design and, in particular, the Rust type system

fit really well with recent work on formal verification.

Academic researchers are showing a lot of interest in Rust and

and it seems that the community should be receptive to the idea of formal verification.

So what tools are out there?

What can you do with them?

Are they complete?

Are they being maintained?

What common standards and benchmarks exist?

Here is a list of the tools that I know about (more details below):

There are also some interpreters, tools and libraries that are not formal

verification tools but that are relevant – I mention these at the end.

Before I go any further, I should probably add a disclaimer:

Although I have spent some time looking at what is available and

reading Rust verification papers,

I am not an expert in this area so I have probably got things wrong, missed

out important tools, etc.

You should also bear in mind that things are changing fast: I am writing this

in early May 2020 but I hope that, in a few months time, everything I

say will be out of date.

Do please contact me with additions and

corrections.

Of course, the fact that I am updating this post as people point things out

means that if you look at comments in twitter or reddit about this post,

they may not make sense because I have tried to fix this post in response.

There are four major categories of software verification tool

in roughly increasing order of how hard it is to use them:

symbolic execution tools,

automatic (aka extended static checkers),

auto-active verifiers

and deductive verifiers.

These tools are designed to find bugs by exploring

paths through your program and/or to generate testsuites

with high control coverage.

Unlike the other three kinds of tool, these tools typically don’t provide

any guarantee that there are no bugs left but they scale really well

and they are probably the best tools to use on a new codebase.

The tools that I know to be in this category are

Cargo-KLEE,

Haybale

and Seer.

These tools are good for checking for what some call “Absence of Run-Time

Exception” (AoRTE).

Runtime errors includes things like the following

(not all of these apply to safe Rust code).

While not all tools aim to check all of the above,

the automatic verification tools I know of are

CBMC,

Crux-mir,

MIRAI,

RustHorn,

SMACK.

It is worth saying that the Crust tool is different from the other

tools in that it is designed to check that a library that contains

unsafe Rust code is externally safe.

One of the appealing features of the automatic verification tools

is that you don’t have to write specifications.

Typically, all you have to do is build a verification harness

(that looks a wee bit like a fuzzing harness)

and maybe add some extra assertions into your code.

For me, this makes these tools the most interesting because,

once the kinks have been worked out, these tools have the

most potential to be added into a normal development flow.

You don’t need a lot of training to make use of these tools.

(But note the comments below about the kinks that still have

to be worked out.)

While automatic tools focus on things not going wrong,

auto-active verification tools help you verify some key

properties of your code: data structure invariants,

the results of functions, etc.

The price that you pay for this extra power is that

you may have to assist the tool by adding function

contracts (pre/post-conditions for functions),

loop invariants, type invariants, etc. to your code.

The only auto-active verification tool that I am

aware of is Prusti.

Prusti is a really interesting tool because

it exploits Rust’s unusual type system to help it verify code.

Also Prusti has the slickest user interface:

a VSCode extension

that checks your code as you type it!

These tools can be used to show things like “full functional correctness”:

that the outputs are exactly what they should be.

Deductive verification tools typically generate a set of “verification

conditions” that are then proved using an interactive theorem prover.

The deductive verification tools for Rust that I know of are

Electrolysis and RustBelt.

Electrolysis transpiles Rust code into a functional program in the Lean

interactive theorem prover

and you then prove correctness of that program using Lean.

The goal of RustBelt is to verify unsafe Rust code but,

strictly speaking, RustBelt does not actually verify Rust code:

you manually transcribe Rust code into λ-Rust and

then use RustBelt to verify that code using IRIS and

the Coq theorem prover.

As far as I can tell, no verification tool currently

supports the full Rust language.

(In contrast, C verification tools are complete enough to verify

things like OS device drivers.)

Some of the big challenges are:

The Electrolysis repository has the clearest statement of

language coverage

of all the tools.

It uses the language reference manual as a guide to what

has to be covered and it uses test code from the manual

(as well as some hand-written tests) to confirm that

that feature is supported.

THE KNOWLEDGE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF UNLEASHING

INDESCRIBABLE HORRORS THAT SHATTER YOUR PSYCHE AND SET YOUR MIND ADRIFT IN THE

UNKNOWABLY INFINITE COSMOS.

— The Rustonomicon.

The problem with unsafe code is that it eliminates the big advantage

of Rust code: that the type system gives you a bunch of guarantees that

you can rely on while reasoning about your code.

This means that every tool that takes advantage of the Rust typesystem

is going to have a problem with unsafe code.

In particular, I think that this is a problem for Electrolysis,

Prusti and RustHorn.

On the other hand, tools like SMACK that are based on the LLVM IR

have no problem with unsafe code.

While “unsafe” code raises some fundamental barriers for some tools,

as far as I can tell, closures just seem to take more effort.

They bring in a degree of indirection / higher-order behaviour that

is harder for tools to handle.

The only tools that I am aware of that can handle closures at the moment

are Electrolysis and, I suspect, SMACK.

(But I could easily have missed some.)

The standard library is complicated in two ways:

So, many verification tools replace the standard library with

something simpler such as a simpler implementation or

a function contract.

It is quite a lot of work to create and maintain this verification

version of the library so standard library support can be quite

incomplete.

Two tools that I know are affected by this are

These tools all vary in how actively they are being developed.

Here is what I know about them.

(Please tell me if I got this wrong.)

Actively developed:

Cargo-KLEE,

Crux-mir,

haybale,

MIRAI,

Miri,

Prusti,

RustBelt,

RustHorn,

Seems to have stalled:

Pull request for CBMC,

(Rust support in) SMACK.

Appears to be abandoned:

Crust,

Electrolysis,

Seer.

Definitely abandoned:

KLEE Rust.

If verification of Rust programs is going to take off, we

need standards and benchmarks.

Standard interfaces

let us try different tools to see which one is best for our codebase;

they let us switch between different kinds of tools (maybe one is

great for finding bugs while another is great for showing the

absence of bugs);

and they let us use a portfolio of tools running in parallel.

Three emerging interfaces are

Viper rust-contracts that

provides macros requires!, ensures! and invariant!

for use in function attributes and loop bodies.

The contracts crate that

provides macros pre! and post! for function contracts

and invariant! for loop invariants.

There is a virtuous cycle between standard interfaces, benchmarks

and verification competitions.

Standard interfaces enable the development of meaningful benchmarks

because they make it possible to share verification harnesses

and code annotations.

Benchmarks allow you to compare tools which makes it possible to

create verification competitions.

The best benchmark suites contain a mixture of different types and

sizes of code

and a mixture of code with known (tricky) bugs and of code

with no bugs to identify tools that are good in one mode or another.

This mixture reflects real requirements for the tools but it also

allows for multiple winners – depending on which category matters

most to each user group or different project phase.

(Benchmarks are also useful when developing tools and competitive

benchmark results are useful when publishing papers about tools.)

Verification competitions such as SV-COMP

let tool developers demonstrate how

good their tools are and they encourage friendly competition between

tools.

But you can only take part if your tool implements the interface

used in the benchmarks.

While I was looking at these tools, I noticed that many of the tools act on

a single file. But if I want to verify a Rust package, I really want something

that is integrated with the Cargo tool.

Tools that seem to have Cargo integration are Cargo-KLEE and Crux-mir.

This article is about formal verification tools.

But testing and fuzzing tools are an important, complementary part

of any verification story.

LibHoare is a Rust library for adding runtime pre- and post-conditions to

Rust programs.

Miri (paper) is not a formal verification tool

but it can be used to detect undefined behaviour

and it is important in defining what “unsafe” Rust

is and is not allowed to do.

RustFuzz is a collection of resources for fuzz-testing Rust code.

Sealed Rust is “Ferrous System’s plan to qualify the Rust

Language and Compiler for use in the Safety Critical domain.”

Is looks as though Rust is a very active area for verification tools.

I am not sure yet whether any of the tools are complete enough for

me to use them in anger but it seems that some of them are getting close.

🕶

The Rust verification future looks very bright!

🕶

Where did I get this list of tools from?

As you might imagine, I searched Google Scholar and the

web for things like “Rust verification tool”

This finds things like

(read more)

Different approaches to HTTP routing in Go

Words: benhoyt - lobste.rs - 19:17 30-07-2020

There are many ways to do HTTP path routing in Go – for better or worse. There’s the standard library’s http.ServeMux , but it only supports basic prefix matching. There are many ways to do more advanced routing yourself, including Axel Wagner’s interesting ShiftPath technique. And then of course there are lots of third-party router libraries. In this article I’m going to do a comparison of several custom techniques and some off-the-shelf packages.

I’ll be upfront about my biases: I like simple and clear code, and I’m a bit allergic to large dependencies (and sometimes those are in tension). Most libraries with “framework” in the title don’t do it for me, though I’m not opposed to using well-maintained libraries that do one or two things well.

My goal here is to route the same 11 URLs with eight different approaches. These URLs are based on a subset of URLs in a web application I maintain. They use GET and POST, but they’re not particularly RESTful or well designed – the kind of messiness you find in real-world systems. Here are the methods and URLs:

GET / # home

GET /contact # contact

GET /api/widgets # apiGetWidgets

POST /api/widgets # apiCreateWidget

POST /api/widgets/:slug # apiUpdateWidget

POST /api/widgets/:slug/parts # apiCreateWidgetPart

POST /api/widgets/:slug/parts/:id/update # apiUpdateWidgetPart

POST /api/widgets/:slug/parts/:id/delete # apiDeleteWidgetPart

GET /:slug # widget

GET /:slug/admin # widgetAdmin

POST /:slug/image # widgetImage

The :slug is a URL-friendly widget identifier like foo-bar, and the :id is a positive integer like 1234. Each routing approach should match on the exact URL – trailing slashes will return 404 Not Found (redirecting them is also a fine decision, but I’m not doing that here). Each router should handle the specified method (GET or POST) and reject the others with a 405 Method Not Allowed response. I wrote some table-driven tests to ensure that all the routers do the right thing.

In the rest of this article I’ll present code for the various approaches and discuss some pros and cons of each (all the code is in the benhoyt/go-routing repo). There’s a lot of code, but all of it is fairly straight-forward and should be easy to skim. You can use the following links to skip down to a particular technique. First, the five custom techniques:

And three versions using third-party router packages:

I also tried httprouter, which is supposed to be really fast, but it can’t handle URLs with overlapping prefixes like /contact and /:slug. Arguably this is bad URL design anyway, but a lot of real-world web apps do it, so I think this is quite limiting.

There are many other third-party router packages or “web frameworks”, but these three bubbled to the top in my searches (and I believe they’re fairly representative).

In this comparison I’m not concerned about speed. Most of the approaches loop or switch through a list of routes (in contrast to fancy trie-lookup structures). All of these approaches only add a few microseconds to the request time (see benchmarks), and that isn’t an issue in any of the web applications I’ve worked on.

The first approach I want to look at is the method I use in the current version of my web application – it’s the first thing that came to mind when I was learning Go a few years back, and I still think it’s a pretty good approach.

It’s basically a table of pre-compiled regexp objects with a little 21-line routing function that loops through them, and calls the first one that matches both the path and the HTTP method. Here are the routes and the Serve() routing function:

var routes = [] route {

newRoute ( "GET" , "/" , home ),

newRoute ( "GET" , "/contact" , contact ),

newRoute ( "GET" , "/api/widgets" , apiGetWidgets ),

newRoute ( "POST" , "/api/widgets" , apiCreateWidget ),

newRoute ( "POST" , "/api/widgets/([^/]+)" , apiUpdateWidget ),

newRoute ( "POST" , "/api/widgets/([^/]+)/parts" , apiCreateWidgetPart ),

newRoute ( "POST" , "/api/widgets/([^/]+)/parts/([0-9]+)/update" , apiUpdateWidgetPart ),

newRoute ( "POST" , "/api/widgets/([^/]+)/parts/([0-9]+)/delete" , apiDeleteWidgetPart ),

newRoute ( "GET" , "/([^/]+)" , widget ),

newRoute ( "GET" , "/([^/]+)/admin" , widgetAdmin ),

newRoute ( "POST" , "/([^/]+)/image" , widgetImage ),

}

func newRoute ( method , pattern string , handler http . HandlerFunc ) route {

return route { method , regexp . MustCompile ( "^" + pattern + "$" ), handler }

}

type route struct {

method string

regex * regexp . Regexp

handler http . HandlerFunc

}

func Serve ( w http . ResponseWriter , r * http . Request ) {

var allow [] string

for _ , route := range routes {

matches := route . regex . FindStringSubmatch ( r . URL . Path )

if len ( matches ) > 0 {

if r . Method != route . method {

allow = append ( allow , route . method )

continue

}

ctx := context . WithValue ( r . Context (), ctxKey {}, matches [ 1 : ])

route . handler ( w , r . WithContext ( ctx ))

return

}

}

if len ( allow ) > 0 {

w . Header () . Set ( "Allow" , strings . Join ( allow , ", " ))

http . Error ( w , "405 method not allowed" , http . StatusMethodNotAllowed )

return

}

http . NotFound ( w , r )

}

Path parameters are handled by adding the matches slice to the request context, so the handlers can pick them up from there. I’ve defined a custom context key type, as well as a getField helper function that’s used inside the handlers:

type ctxKey struct {}

func getField ( r * http . Request , index int ) string {

fields := r . Context () . Value ( ctxKey {}) . ([] string )

return fields [ index ]

}

A typical handler with path parameters looks like this:

// Handles POST /api/widgets/([^/]+)/parts/([0-9]+)/update

func apiUpdateWidgetPart ( w http . ResponseWriter , r * http . Request ) {

slug := getField ( r , 0 )

id , _ := strconv . Atoi ( getField ( r , 1 ))

fmt . Fprintf ( w , "apiUpdateWidgetPart %s %d \n " , slug , id )

}

I haven’t checked the error returned by Atoi(), because the regex for the ID parameter only matches digits: [0-9]+. Of course, there’s still no guarantee the object exists in the database – that still needs to be done in the handler. (If the number is too large, Atoi will return an error, but in that case the id will be zero and the database lookup will fail, so there’s no need for an extra check.)

An alternative to passing the fields using context is to make each route.handler a function that takes the fields as a []string and returns an http.HandleFunc closure that closes over the fields parameter. The Serve function would then instantiate and call the closure as follows:

handler := route . handler ( matches [ 1 : ])

handler ( w , r )

Then each handler would look like this:

func apiUpdateWidgetPart ( fields [] string ) http . HandlerFunc {

return func ( w http . ResponseWriter , r * http . Request ) {

slug := fields [ 0 ]

id , _ := strconv . Atoi ( fields [ 1 ])

fmt . Fprintf ( w , "apiUpdateWidgetPart %s %d \n " , slug , id )

}

}

I slightly prefer the context approach, as it keeps the handler signatures simple http.HandlerFuncs, and also avoids a nested function for each handler definition.

There’s nothing particularly clever about the regex table approach, and it’s similar to how a number of the third-party packages work. But it’s so simple it only takes a few lines of code and a few minutes to write. It’s also easy to modify if you need to: for example, to add logging, change the error responses to JSON, and so on.

Full regex table code on GitHub.

The second approach still uses regexes, but with a simple imperative switch statement and a match() helper to go through the matches. The advantage of this approach is that you can call other functions or test other things in each case. Also, the signature of the match function allows you to “scan” path parameters into variables in order to pass them to the handlers more directly. Here are the routes and the match() function:

func Serve ( w http . ResponseWriter , r * http . Request ) {

var h http . Handler

var slug string

var id int

p := r . URL . Path

switch {

case match ( p , "/" ) :

h = get ( home )

case match ( p , "/contact" ) :

h = get ( contact )

case match ( p , "/api/widgets" ) && r . Method == "GET" :

h = get ( apiGetWidgets )

case match ( p , "/api/widgets" ) :

h = post ( apiCreateWidget )

case match ( p , "/api/widgets/([^/]+)" , & slug ) :

h = post ( apiWidget { slug } . update )

case match ( p , "/api/widgets/([^/]+)/parts" , & slug ) :

h = post ( apiWidget { slug } . createPart )

case match ( p , "/api/widgets/([^/]+)/parts/([0-9]+)/update" , & slug , & id ) :

h = post ( apiWidgetPart { slug , id } . update )

case match ( p , "/api/widgets/([^/]+)/parts/([0-9]+)/delete" , & slug , & id ) :

h = post ( apiWidgetPart { slug , id } . delete )

case match ( p , "/([^/]+)" , & slug ) :

h = get ( widget { slug } . widget )

case match ( p , "/([^/]+)/admin" , & slug ) :

h = get ( widget { slug } . admin )

case match ( p , "/([^/]+)/image" , & slug ) :

h = post ( widget { slug } . image )

default :

http . NotFound ( w , r )

return

}

h . ServeHTTP ( w , r )

}

// match reports whether path matches regex ^pattern$, and if it matches,

// assigns any capture groups to the *string or *int vars.

func match ( path , pattern string , vars ... interface {}) bool {

regex := mustCompileCached ( pattern )

matches := regex . FindStringSubmatch ( path )

if len ( matches ) <= 0 {

return false

}

for i , match := range matches [ 1 : ] {

switch p := vars [ i ] . ( type ) {

case * string :

* p = match

case * int :

n , err := strconv . Atoi ( match )

if err != nil {

return false

}

* p = n

default :

panic ( "vars must be *string or *int" )

}

}

return true

}

I must admit to being quite fond of this approach. I like how simple and direct it is, and I think the scan-like behaviour for path parameters is clean. The scanning inside match() detects the type, and converts from string to integer if needed. It only supports string and int right now, which is probably all you need for most routes, but it’d be easy to add more types if you need to.

Here’s what a handler with path parameters looks like (to avoid repetition, I’ve used the apiWidgetPart struct for all the handlers that take those two parameters):

type apiWidgetPart struct {

slug string

id int

}

func (h apiWidgetPart) update(w http.ResponseWriter, r *http.Request) {

fmt.Fprintf(w, "apiUpdateWidgetPart %s %d\n", h.slug, h.id)

}

func (h apiWidgetPart) delete(w http.ResponseWriter, r *http.Request) {

fmt.Fprintf(w, "apiDeleteWidgetPart %s %d\n", h.slug, h.id)

}

Note the get() and post() helper functions, which are essentially simple middleware that check the request method as follows:

// get takes a HandlerFunc and wraps it to only allow the GET method

func get ( h http . HandlerFunc ) http . HandlerFunc {

return allowMethod ( h , "GET" )

}

// post takes a HandlerFunc and wraps it to only allow the POST method

func post ( h http . HandlerFunc ) http . HandlerFunc {

return allowMethod ( h , "POST" )

}

// allowMethod takes a HandlerFunc and wraps it in a handler that only

// responds if the request method is the given method, otherwise it

// responds with HTTP 405 Method Not Allowed.

func allowMethod ( h http . HandlerFunc , method string ) http . HandlerFunc {

return func ( w http . ResponseWriter , r * http . Request ) {

if method != r . Method {

w . Header () . Set ( "Allow" , method )

http . Error ( w , "405 method not allowed" , http . StatusMethodNotAllowed )

return

}

h ( w , r )

}

}

One of the slightly awkward things is how it works for paths that handle more than one method. There are probably different ways to do it, but I currently test the method explicitly in the first route – the get() wrapper is not strictly necessary here, but I’ve included it for consistency:

case match ( p , "/api/widgets" ) && r . Method == "GET" :

h = get ( apiGetWidgets )

case match ( p , "/api/widgets" ) :

h = post ( apiCreateWidget )

At first I included the HTTP method matching in the match() helper, but that makes it more difficult to return 405 Method Not Allowed responses properly.

One other aspect of this approach is the lazy regex compiling. We could just call regexp.MustCompile, but that would re-compile each regex on every reqeust. Instead, I’ve added a concurrency-safe mustCompileCached function that means the regexes are only compiled the first time they’re used:

var (

regexen = make ( map [ string ] * regexp . Regexp )

relock sync . Mutex

)

func mustCompileCached ( pattern string ) * regexp . Regexp {

relock . Lock ()

defer relock . Unlock ()

regex := regexen [ pattern ]

if regex == nil {

regex = regexp . MustCompile ( "^" + pattern + "$" )

regexen [ pattern ] = regex

}

return regex

}

Overall, despite liking the clarity of this approach and the scan-like match() helper, a point against it is the messiness required to cache the regex compilation.

Full regex switch code on GitHub.

This approach is similar to the regex switch method, but instead of regexes it uses a simple, custom pattern matcher.

The patterns supplied to the custom match() function handle one wildcard character, +, which matches (and captures) any characters till the next / in the request path. This is of course much less powerful than regex matching, but generally I’ve not needed anything more than “match till next slash” in my routes. Here is what the routes and match code look like:

func Serve ( w http . ResponseWriter , r * http . Request ) {

var h http . Handler

var slug string

var id int

p := r . URL . Path

switch {

case match ( p , "/" ) :

h = get ( home )

case match ( p , "/contact" ) :

h = get ( contact )

case match ( p , "/api/widgets" ) && r . Method == "GET" :

h = get ( apiGetWidgets )

case match ( p , "/api/widgets" ) :

h = post ( apiCreateWidget )

case match ( p , "/api/widgets/+" , & slug ) :

h = post ( apiWidget { slug } . update )

case match ( p , "/api/widgets/+/parts" , & slug ) :

h = post ( apiWidget { slug } . createPart )

case match ( p , "/api/widgets/+/parts/+/update" , & slug , & id ) :

h = post ( apiWidgetPart { slug , id } . update )

case match ( p , "/api/widgets/+/parts/+/delete" , & slug , & id ) :

h = post ( apiWidgetPart { slug , id } . delete )

case match ( p , "/+" , & slug ) :

h = get ( widget { slug } . widget )

case match ( p , "/+/admin" , & slug ) :

h = get ( widget { slug } . admin )

case match ( p , "/+/image" , & slug ) :

h = post ( widget { slug } . image )

default :

http . NotFound ( w , r )

return

}

h . ServeHTTP ( w , r )

}

// match reports whether path matches the given pattern, which is a

// path with '+' wildcards wherever you want to use a parameter. Path

// parameters are assigned to the pointers in vars (len(vars) must be

// the number of wildcards), which must be of type *string or *int.

func match ( path , pattern string , vars ... interface {}) bool {

for ; pattern != "" && path != "" ; pattern = pattern [ 1 : ] {

switch pattern [ 0 ] {

case '+' :

// '+' matches till next slash in path

slash := strings . IndexByte ( path , '/' )

if slash < 0 {

slash = len ( path )

}

segment := path [ : slash ]

path = path [ slash : ]

switch p := vars [ 0 ] . ( type ) {

case * string :

* p = segment

case * int :

n , err := strconv . Atoi ( segment )

if err != nil || n < 0 {

return false

}

* p = n

default :

panic ( "vars must be *string or *int" )

}

vars = vars [ 1 : ]

case path [ 0 ] :

// non-'+' pattern byte must match path byte

path = path [ 1 : ]

default :

return false

}

}

return path == "" && pattern == ""

}

Other than that, the get() and post() helpers, as well as the handlers themselves, are identical to the regex switch method. I quite like this approach (and it’s efficient), but the byte-by-byte matching code was a little fiddly to write – definitely not as simple as calling regex.FindStringSubmatch().

Full pattern matcher code on GitHub.

This approach simply splits the request path on / and then uses a switch with case statements that compare the number of path segments and the content of each segment. It’s direct and simple, but also a bit error-prone, with lots of hard-coded lengths and indexes. Here is the code:

func Serve ( w http . ResponseWriter , r * http . Request ) {

// Split path into slash-separated parts, for example, path "/foo/bar"

// gives p==["foo", "bar"] and path "/" gives p==[""].

p := strings . Split ( r . URL . Path , "/" )[ 1 : ]

n := len ( p )

var h http . Handler

var id int

switch {

case n == 1 && p [ 0 ] == "" :

h = get ( home )

case n == 1 && p [ 0 ] == "contact" :

h = get ( contact )

case n == 2 && p [ 0 ] == "api" && p [ 1 ] == "widgets" && r . Method == "GET" :

h = get ( apiGetWidgets )

case n == 2 && p [ 0 ] == "api" && p [ 1 ] == "widgets" :

h = post ( apiCreateWidget )

case n == 3 && p [ 0 ] == "api" && p [ 1 ] == "widgets" && p [ 2 ] != "" :

h = post ( apiWidget { p [ 2 ]} . update )

case n == 4 && p [ 0 ] == "api" && p [ 1 ] == "widgets" && p [ 2 ] != "" && p [ 3 ] == "parts" :

h = post ( apiWidget { p [ 2 ]} . createPart )

case n == 6 && p [ 0 ] == "api" && p [ 1 ] == "widgets" && p [ 2 ] != "" && p [ 3 ] == "parts" && isId ( p [ 4 ], & id ) && p [ 5 ] == "update" :

h = post ( apiWidgetPart { p [ 2 ], id } . update )

case n == 6 && p [ 0 ] == "api" && p [ 1 ] == "widgets" && p [ 2 ] != "" && p [ 3 ] == "parts" && isId ( p [ 4 ], & id ) && p [ 5 ] == "delete" :

h = post ( apiWidgetPart { p [ 2 ], id } . delete )

case n == 1 :

h = get ( widget { p [ 0 ]} . widget )

case n == 2 && p [ 1 ] == "admin" :

h = get ( widget { p [ 0 ]} . admin )

case n == 2 && p [ 1 ] == "image" :

h = post ( widget { p [ 0 ]} . image )

default :

http . NotFound ( w , r )

return

}

h . ServeHTTP ( w , r )

}

The handlers are identical to the other switch-based methods, as are the get and post helpers. The only helper here is the isId function, which checks that the ID segments are in fact positive integers:

func isId ( s string , p * int ) bool {

id , err := strconv . Atoi ( s )

if err != nil || id <= 0 {

return false

}

* p = id

return true

}

So while I like the bare-bones simplicity of this approach – just basic string equality comparisons – the verbosity of the matching and the error-prone integer constants would make me think twice about actually using it for anything but very simple routing.

Full split switch code on GitHub.

Axel Wagner wrote a blog article, How to not use an http-router in go, in which he maintains that routers (third party or otherwise) should not be used. He presents a technique involving a small ShiftPath() helper that returns the first path segment, and shifts the rest of the URL down. The current handler switches on the first path segment, then delegates to sub-handlers which do the same thing on the rest of the URL.

Let’s see what Axel’s technique looks like for a subset our set of URLs:

func serve ( w http . ResponseWriter , r * http . Request ) {

var head string

head , r . URL . Path = shiftPath ( r . URL . Path )

switch head {

case "" :

serveHome ( w , r )

case "api" :

serveApi ( w , r )

case "contact" :

serveContact ( w , r )

default :

widget { head } . ServeHTTP ( w , r )

}

}

// shiftPath splits the given path into the first segment (head) and

// the rest (tail). For example, "/foo/bar/baz" gives "foo", "/bar/baz".

func shiftPath ( p string ) ( head , tail string ) {

p = path . Clean ( "/" + p )

i := strings . Index ( p [ 1 : ], "/" ) + 1

if i <= 0 {

return p [ 1 : ], "/"

}

return p [ 1 : i ], p [ i : ]

}

// ensureMethod is a helper that reports whether the request's method is

// the given method, writing an Allow header and a 405 Method Not Allowed

// if not. The caller should return from the handler if this returns false.

func ensureMethod ( w http . ResponseWriter , r * http . Request , method string ) bool {

if method != r . Method {

w . Header () . Set ( "Allow" , method )

http . Error ( w , "405 method not allowed" , http . StatusMethodNotAllowed )

return false

}

return true

}

// ...

// Handles /api and below

func serveApi ( w http . ResponseWriter , r * http . Request ) {

var head string

head , r . URL . Path = shiftPath ( r . URL . Path )

switch head {

case "widgets" :

serveApiWidgets ( w , r )

default :

http . NotFound ( w , r )

}

}

// Handles /api/widgets and below

func serveApiWidgets ( w http . ResponseWriter , r * http . Request ) {

var head string

head , r . URL . Path = shiftPath ( r . URL . Path )

switch head {

case "" :

if r . Method == "GET" {

serveApiGetWidgets ( w , r )

} else {

serveApiCreateWidget ( w , r )

}

default :

apiWidget { head } . ServeHTTP ( w , r )

}

}

// Handles GET /api/widgets

func serveApiGetWidgets ( w http . ResponseWriter , r * http . Request ) {

if ! ensureMethod ( w , r , "GET" ) {

return

}

fmt . Fprint ( w , "apiGetWidgets \n " )

}

// Handles POST /api/widgets

func serveApiCreateWidget ( w http . ResponseWriter , r * http . Request ) {

if ! ensureMethod ( w , r , "POST" ) {

return

}

fmt . Fprint ( w , "apiCreateWidget \n " )

}

type apiWidget struct {

slug string

}

// Handles /api/widgets/:slug and below

func ( h apiWidget ) ServeHTTP ( w http . ResponseWriter , r * http . Request ) {

var head string

head , r . URL . Path = shiftPath ( r . URL . Path )

switch head {

case "" :

h . serveUpdate ( w , r )

case "parts" :

h . serveParts ( w , r )

default :

http . NotFound ( w , r )

}

}

func ( h apiWidget ) serveUpdate ( w http . ResponseWriter , r * http . Request ) {

if ! ensureMethod ( w , r , "POST" ) {

return

}

fmt . Fprintf ( w , "apiUpdateWidget %s \n " , h . slug )

}

func ( h apiWidget ) serveParts ( w http . ResponseWriter , r * http . Request ) {

var head string

head , r . URL . Path = shiftPath ( r . URL . Path )

switch head {

case "" :

h . serveCreatePart ( w , r )

default :

id , err := strconv . Atoi ( head )

if err != nil || id <= 0 {

http . NotFound ( w , r )

return

}

apiWidgetPart { h . slug , id } . ServeHTTP ( w , r )

}

}

// ...

With this router, I wrote a noTrailingSlash decorator to ensure Not Found is returned by URLs with a trailing slash, as our URL spec defines those as invalid. The ShiftPath approach doesn’t distinguish between no trailing slash and trailing slash, and I can’t find a simple way to make it do that. I think a decorator is a reasonable approach for this, rather than doing it explicitly in every route – in a given web app, you’d probably want to either allow trailing slashes and redirect them, or return Not Found as I’ve done here.

While I like the idea of just using the standard library, and the path-shifting technique is quite clever, I strongly prefer seeing my URLs all in one place – Axel’s approach spreads the logic across many handlers, so it’s difficult to see what handles what. It’s also quite a lot of code, some of which is error prone.

I do like the fact that (as Axel said) “the dependencies of [for example] ProfileHandler are clear at compile time”, though this is true for several of the other techniques above as well. On balance, I find it too verbose and think it’d be difficult for people reading the code to quickly answer the question, “given this HTTP method and URL, what happens?”

Full ShiftPath code on GitHub.

Chi is billed as a “lightweight, idiomatic and composable router”, and I think it lives up to this description. It’s simple to use and the code looks nice on the page. Here are the route definitions:

func init () {

r := chi . NewRouter ()

r . Get ( "/" , home )

r . Get ( "/contact" , contact )

r . Get ( "/api/widgets" , apiGetWidgets )

r . Post ( "/api/widgets" , apiCreateWidget )

r . Post ( "/api/widgets/{slug}" , apiUpdateWidget )

r . Post ( "/api/widgets/{slug}/parts" , apiCreateWidgetPart )

r . Post ( "/api/widgets/{slug}/parts/{id:[0-9]+}/update" , apiUpdateWidgetPart )

r . Post ( "/api/widgets/{slug}/parts/{id:[0-9]+}/delete" , apiDeleteWidgetPart )

r . Get ( "/{slug}" , widgetGet )

r . Get ( "/{slug}/admin" , widgetAdmin )

r . Post ( "/{slug}/image" , widgetImage )

Serve = r

}

And the handlers are straight-forward too. They look much the same as the handlers in the regex table approach, but the custom getField() function is replaced by chi.URLParam(). One small advantage is that parameters are accessible by name instead of number:

func apiUpdateWidgetPart ( w http . ResponseWriter , r * http . Request ) {

slug := chi . URLParam ( r , "slug" )

id , _ := strconv . Atoi ( chi . URLParam ( r , "id" ))

fmt . Fprintf ( w , "apiUpdateWidgetPart %s %d \n " , slug , id )

}

As with my regex table router, I’m ignoring the error value from strconv.Atoi() as the path parameter’s regex has already checked that it’s made of digits.

If you’re going to build a substantial web app, Chi actually looks quite nice. The main chi package just does routing, but the module also comes with a whole bunch of composable middleware to do things like HTTP authentication, logging, trailing slash handling, and so on.

Full Chi code on GitHub.

The Gorilla toolkit is a bunch of packages that implement routing, session handling, and so on. The gorilla/mux router package is what we’ll be using here. It’s similar to Chi, though the method matching is a little more verbose:

func init () {

r := mux . NewRouter ()

r . HandleFunc ( "/" , home ) . Methods ( "GET" )

r . HandleFunc ( "/contact" , contact ) . Methods ( "GET" )

r . HandleFunc ( "/api/widgets" , apiGetWidgets ) . Methods ( "GET" )

r . HandleFunc ( "/api/widgets" , apiCreateWidget ) . Methods ( "POST" )

r . HandleFunc ( "/api/widgets/{slug}" , apiUpdateWidget ) . Methods ( "POST" )

r . HandleFunc ( "/api/widgets/{slug}/parts" , apiCreateWidgetPart ) . Methods ( "POST" )

r . HandleFunc ( "/api/widgets/{slug}/parts/{id:[0-9]+}/update" , apiUpdateWidgetPart ) . Methods ( "POST" )

r . HandleFunc ( "/api/widgets/{slug}/parts/{id:[0-9]+}/delete" , apiDeleteWidgetPart ) . Methods ( "POST" )

r . HandleFunc ( "/{slug}" , widgetGet ) . Methods ( "GET" )

r . HandleFunc ( "/{slug}/admin" , widgetAdmin ) . Methods ( "GET" )

r . HandleFunc ( "/{slug}/image" , widgetImage ) . Methods ( "POST" )

Serve = r

}

Again, the handlers are similar to Chi, but to get path parameters, you call mux.Vars(), which returns a map of all the parameters that you index by name (this strikes me as a bit “inefficient by design”, but oh well). Here is the code for one of the handlers:

func apiUpdateWidgetPart ( w http . ResponseWriter , r * http . Request ) {

vars := mux . Vars ( r )

slug := vars [ "slug" ]

id , _ := strconv . Atoi ( vars [ "id" ])

fmt . Fprintf ( w , "apiUpdateWidgetPart %s %d \n " , slug , id )

}

Full Gorilla code on GitHub.

Pat is interesting – it’s a minimalist, single-file router that supports methods and path parameters, but no regex matching. The route setup code looks similar to Chi and Gorilla:

func init () {

r := pat . New ()

r . Get ( "/" , http . HandlerFunc ( home ))

r . Get ( "/contact" , http . HandlerFunc ( contact ))

r . Get ( "/api/widgets" , http . HandlerFunc ( apiGetWidgets ))

r . Post ( "/api/widgets" , http . HandlerFunc ( apiCreateWidget ))

r . Post ( "/api/widgets/:slug" , http . HandlerFunc ( apiUpdateWidget ))

r . Post ( "/api/widgets/:slug/parts" , http . HandlerFunc ( apiCreateWidgetPart ))

r . Post ( "/api/widgets/:slug/parts/:id/update" , http . HandlerFunc ( apiUpdateWidgetPart ))

r . Post ( "/api/widgets/:slug/parts/:id/delete" , http . HandlerFunc ( apiDeleteWidgetPart ))

r . Get ( "/:slug" , http . HandlerFunc ( widgetGet ))

r . Get ( "/:slug/admin" , http . HandlerFunc ( widgetAdmin ))

r . Post ( "/:slug/image" , http . HandlerFunc ( widgetImage ))

Serve = r

}

One difference is that the Get() and Post() functions take an http.Handler instead of an http.HandlerFunc, which is generally a little more awkward, as you’re usually dealing with functions, not types with a ServeHTTP method. You can easily convert them using http.HandlerFunc(h), but it’s just a bit more noisy. Here’s what a handler looks like:

func apiUpdateWidgetPart ( w http . ResponseWriter , r * http . Request ) {

slug := r . URL . Query () . Get ( ":slug" )

id , err := strconv . Atoi ( r . URL . Query () . Get ( ":id" ))

if err != nil {

http . NotFound ( w , r )

return

}

fmt . Fprintf ( w , "apiUpdateWidgetPart %s %d \n " , slug , id )

}

One of the interesting things is that instead of using context to store path parameters (and a helper function to retrieve them), Pat stuffs them into the query parameters, prefixed with : (colon). It’s a clever trick – if slightly dirty.

Note that with Pat I am checking the error return value from Atoi(), as there’s no regex in the route definitions to ensure an ID is all digits. Alternatively you could ignore the error, and just have the code return Not Found when it tries to look up a part with ID 0 in the database and finds that it doesn’t exist (database IDs usually start from 1).

Full Pat code on GitHub.

As I mentioned, I’m not concerned about speed in this comparison – and you probably shouldn’t be either. If you’re really dealing at a scale where a few microseconds to route a URL is an issue for you, sure, use a fancy trie-based router like httprouter, or write your own heavily-profiled code. All of the hand-rolled routers shown here work in linear time with respect to the number of routes involved.

But, just to show that none of these approaches kill performance, below is a simple benchmark that compares routing the URL /api/widgets/foo/parts/1/update with each of the eight routers (code here). The numbers are “nanoseconds per operation”, so lower is better. The “operation” includes doing the routing and calling the handler. The “noop” router is a router that actually doesn’t route anything, so represents the overhead of the base case.

As you can see, Pat and Gorilla are slower than the others, showing that just because something is a well-known library doesn’t mean it’s heavily optimized. Chi is one of the fastest, and my custom pattern matcher and the plain strings.Split() method are the fastest.

But to hammer home the point: all of these are plenty good enough – you should almost never choose a router based on performance. The figures here are in microseconds, so even Pat’s 3646 nanoseconds is only adding 3.6 millionths of a second to the response time. Database lookup time in a typical web app is going to be around 1000 times that.

Overall this has been an interesting experiment: I came up with a couple of new (for me, but surely not original) custom approaches to routing, as well as trying out Axel’s “ShiftPath” approach, which I’d been intrigued about for a while.

If I were choosing one of the home-grown approaches, I think I would actually end up right back where I started (when I implemented my first server in Go a few years back) and choose the regex table approach. Regular expressions are quite heavy for this job, but they are well-understood and in the standard library, and the Serve() function is only 21 lines of code. Plus, I like the fact that the route definitions are all neatly in a table, one per line – it makes them easy to scan and determine what URLs go where.

A close second (still considering the home-grown approaches) would be the regex switch. I like the scan-style behaviour of the match() helper, and it also is very small (22 lines). However, the route definitions are a little messier (two lines per route) and the handlers that take path parameters require type or closure boilerplate – I think that storing the path parameters using context is a bit hacky, but it sure keeps signatures simple!

For myself, I would probably rule out the other custom approaches:

I disagree with Axel’s assessment that third-party routing libraries make the routes hard to understand: all you typically have to know is whether they match in source order, or in order of most-specific first. I also disagree that having all your routes in one place (at least for a sub-component of your app) is a bad thing.

In terms of third-party libraries, I quite like the Chi version. I’d seriously consider using it, especially if building a web app as part of a large team. Chi seems well thought out and well-tested, and I like the composability of the middleware it provides.

On the other hand, I’m all too aware of node-modules syndrome and the left-pad fiasco, and agree with Russ Cox’s take that dependencies should be used with caution. Developers shouldn’t be scared of a little bit of code: writing a tiny customized regex router is fun to do, easy to understand, and easy to maintain.

(read more)
1
2
3
4
5