Ask HN: What are some good resources to learn safety-critical C/C++ coding from?

Hacker News - Fri Aug 5 11:13

Ask HN: What are some good resources to learn safety-critical C/C++ coding from?
23 points by bobowzki 3 hours ago | hide | past | favorite | 35 comments
I will need to learn about writing safety-critical C/C++ code at my current job. Many resources[1-2] tell you what not to do, but few tell you what to do[3].

What are some excellent examples of open source code bases from which to learn?

1: 2: 3:

CERT C is a really good standard and book. But there's really no reason to read a book. It's very simple if you follow these steps.

Step 1. NO CONSTANT NUMBERS! All constants should be a define macro or a constant. This will allow you to change code without overflows and having to update the number in 20 places and not knowing what number to use when looping through.

Step 2. SESE(RAII in c++, but most use SESE even in c++). SINGLE ENTRY SINGLE EXIT. Your code should look like

" int *ptr = foo(); if(ptr == nullptr) DEBUG_PRINT("FAILED ALLOCATING PTR IN __FILE__ @ __LINE__) goto exit;

EXIT: if(ptr) free(ptr); .... "

So any allocations you cleanup in exit. This way you won't miss it with wierd control flows. This is reccomended by all cert c standards.

Step 3: If you can, there's analyzers you can use that will point out all bugs by annotating your code. SAL is arguable the best in the industry and you can catch pretty much all bugs.

Step 4: Even without an analyzer, you should be looking at all warnings and either adding a compiler macro to ignore it, or fixing whats causing it.

> Step 1. NO CONSTANT NUMBERS! All constants should be a define macro or a constant. This will allow you to change code without overflows and having to update the number in 20 places and not knowing what number to use when looping through.

My radix sort now needs BUCKETS_PER_LEVEL_1BYTE, BUCKETS_PER_LEVEL_2BYTE, FIRST_BYTE_SHIFT_AMOUNT, (0x100, 0x10000, ((sizeof(x)-1)*8) respectively) etc. along with many of my mmaps needing NO_FD to replace -1

Useful resources: colleagues, professional training, case studies of errors.

If your job is safety critical software I guess they'd pay for relevant training. If not, looking at the course outlines at least lets you know what trainers think are important topics, for example

One training course I had talked about how to design a system with integrity while integrating open source code of unknown integrity. Since software quality and safety critical software depends so much on process, then open source by default isn't built to any integrity level. If a system needs two independent implementations of a calculation, an open source code base would never show that.

If you have an experienced safety engineer, ask them about how typically to design the system and software to make the safety case easier and they'll have some ideas of what needs to commonly be done. It depends on the integrity level what strategy and process needs to be followed.

It's not just the code style, but there's a broader mindset that you need to develop.

There's also good presentations and lectures that come up from time to time here or on YouTube where the failure of safety critical software is studied. These can be excellent case studies: Such as:

First, stop saying C/C++. If you are talking about C, you are not talking about C++. If you are talking about C++, you are not talking about C.

Second, give up on C. It simply has not got the resources to help you with safety. It is a wholly lost cause.

In C++, you can package semantics in libraries in ways hard to misuse accidentally. In effect, your library provides the safety that Rust reserves to its compiler. C++ offers more power to the library writer than Rust offers. Use it!

Not open source, but Medtronic published a complete ventilator design and documentation, including firmware, in response to the COVID crisis.

Architect your system for handling failures. No software will be bug free, because the hardware you run it is not perfect and can introduce things like bit flips. It's okay to fail, but you need to be be able to recover.

MISRA doesn't say anything about soft errors. You can write perfect MISRA code and the first ray garbles your logic. It also doesn't say anything about common design principles like "black channels". It also doesn't say anything about what a safe state is, when to go into one, and how to make sure you're reaching it (even when your interrupt controller is misbehaving). It also doesn't tell you anything about how to safely recover from such a safe state and go back into normal operations. It also doesn't say anything about when you're needing two or more controllers, and it doesn't say anything about making sure that both controllers are executing the same code on the same data.

Unfortunately, safety code practices are highly dependent on your field and its practices, and I'm not aware of a good book or course. You mostly learn it by osmosis when joining a team that develops safety-related systems.

>Unfortunately, safety code practices are highly dependent on your field and its practices, and I'm not aware of a good book or course. You mostly learn it by osmosis when joining a team that develops safety-related systems.

You should have lead with this. Every "use a safe language" or "follow these guidelines" post that comes up when the subject of safety-critical software comes up needs basically this response.

Also, you will almost never be designing a safety-critical system in a green-field domain where no one has ever done anything like that before. So, there will be standards. You can learn a lot by reading and following the standards.

I'll be that guy, then, and note that your response demonstrates that you've never worked on a nontrivial safety-critical project.

It's clear from the OP's question that the org is already C-centric. Language support in a safety environment is a large and complex issue. The compiler, tools, libraries are also required to be validated in context (depending on the standards environment), coding and other supporting standards have to be developed or vetted for adoption, you need a population of able reviewers, interoperation between implementations needs to be validated, etc. etc. etc.

And at the end of the day, the "safety" that another language buys you doesn't actually get you very far. A lot of folks get hung up about memory safety, or this or that language feature, when in reality the majority of safety issues in large codebases are algorithmic in nature, and no low-level language feature is going to save you from implementing the wrong design.

Well, I have worked on safety-critical projects in the industry, though there wasn't a lot of C/C++ (health-care software connected to the internet). I currently work with researchers, some in the software security domain.

I don't think it's clear that the OP organisation is stuck with C. Perhaps it's the case, but I think it's also time, in 2022, to push a bit to move toward safer programming languages.

You describe an excellent set of rules and processes, perhaps too much for most projects. It sounds like it could be even better with a memory-safe programming language. Also, I doubt that most secure critical C/C++ projects proof their source code correctly in Coq or similar.

I also think many security issues are tightly related to the programming language. For example, SQL injections are because of SQL, XSS because of HTTP/HTML, and Buffer overflows because of C.

Security and safety are different domains and only tangentially related - i.e. it helps in both domains for code to be correct, but it's not sufficient in either.

Yep, this is why in Vulnerability Research one of the main ways of finding bugs is concolic execution (Looking at logic for error cases). Memory safety is pretty easy to catch if you arn't a clown and use good coding patterns. Logic isn't, which is where the big money and big boy vulns come in. And honestly, most modern OS's have memory safety protections built in. This is 1992 anymore.

Yes, memory safety is a solved problem, and research has mostly moved on. The best solution is to use a memory-safe programming language. Yet we find security vulnerabilities all the time in software written with memory unsafe programming languages.

"Memory safety is pretty easy to catch" - my experience over the last 18 months or so on a large C++ open source project says very much otherwise, almost every week a new NPE or core dump/access violation seems to leak into the codebase. These simply can't happen in many other languages and many now even have good support for non-optional references, ruling out entire classes of bugs that can suck up significant developer time to track down. And for long running systems memory leaks can also be a substantial cause of instability. If I were working on a safety critical system I'd actively push to avoid using C/C++ unless there really was no alternative. And yes modern OS kernels are pretty good evidence that it is possible to do but I'd suggest it requires exceptional programmers and some very rigorous processes to enforce.

Assume you have a functional safety requirement for which a standard exists. What language implementation would you use that is certified for that standard?

Not sure how that question can be meaningfully answered unless it's for a specific safety requirement/standard. Are you suggesting that for most standards that require certification, there are no compiler implementations certified except for C/C++ ones?

I had some involvement with development of an air traffic management system, which was being written in Java, as that was certified for use in that field. As a language, I'd rather use C++ over Java, but if safety criticality were the priority, I'd choose Java every time. I haven't coded in Ada but unless there's something about it that's truly awful to work with, it seems a better choice than C/C++ from what I know about it.

My experience in the aerospace industry may or may not be more widely applicable, but here are my two cents.

Using "better more safe programming languages" generally gets you [some greater level of] memory safety and thread safety. Developing for safety-critical embedded systems with a single thread and no dynamic memory allocation renders those benefits irrelevant. There are still concerns around type safety, but full compiler warnings and strict code review, both of which we need regardless, handle that.

We also need to be able to certify not only that our source code matches our requirements, but that our binaries match our source code. Compiling with --c99 --debug -O0 gives us a highly visible link between each line of source code that goes into the compiler and the assembly instructions that come out of the compiler. We know exactly what the computer is actually doing, not just what we think we've told it to do. The various "better" languages all get "better" via more powerful and clever (read: complex and opaque) compilers, which is a no-go in our field.

With little benefit and impermissible cost to the alternatives, and the breadth and depth and longevity of the resources and support available for it, there's no sane choice for us but C.

There's sense in this if you don't trust compilers/ interpreters in other languages to be reliably doing the right thing, which is certainly a reason to be wary of languages that are new or aren't super widely used. But the amount of effort that goes into ensuring Go or Rust or Ada compilers always generate the correct underlying machine code is surely far more than your own team can achieve, and if there were bugs in such compilers you'd be extraordinarily unlucky to be the first devs affected by them. Also are you saying the final shipped product is built with all the debug flags on (and optimisation flags off)? If not, how do you trust those binaries- trying to match machine code output with C source when optimizations are on (and key debug info stripped) is virtually impossible much of the time.

If the debug code with no optimisations is the one trusted most, that's what ships.

If the optimiser drops off some code as it thinks it has no effect, I'd guess that's possible to spot, and you'd want to know to either fix the code or remove dead code. I've not personally had to inspect compiler output but I do spend a surprising amount of time in linker map files understanding what's going on.

This is a great reply. I'd never considered that part of this field was reverse engineering your own program to confirm the compiler actually emits what you asked it to.

I didn't mention Rust or any language because depending on the problem, many other programming languages may be better too, such as Golang, C#, Python, Java…

I know that some people get upset when they have a question, and they don't get the answer but some suggestions. But sometimes, people are not aware of the alternatives, and it's important to remind them that perhaps a better solution exists.

If someone asks how to cut crusty bread with a butter knife safely, you can talk about the proper technic, how most people do it wrong, write a book about your industry do it better, but perhaps, someone should also mention that it's easier and better with a bread knife.

There's many other issues to mention, but in the "memory safe" languages you have poor control over memory allocation and fragmentation, which over time can cause issues with performance, size, swapping, etc.