Aurich Lawson | Getty Images I've written before about my nostalgia for the Windows XP- or Windows 7-era "clean install," when you could substantially improve any given pre-made PC merely by taking an official direct-from-Microsoft Windows install disk and blowing away the factory install, ridding yourself of 60-day antivirus trials, WildTangent games, outdated drivers, and whatever other software your PC maker threw on it to help subsidize its cost. You can still do that with Windows 11—in fact, it's considerably easier than it was in those '00s versions of Windows, with multiple official Microsoft-sanctioned ways to download and create an install disk, something you used to need to acquire on your own. But the resulting Windows installation is a lot less "clean" than it used to be, given the continual creep of new Microsoft apps and services into more and more parts of the core Windows experience. I frequently write about Windows, Edge, and other Microsoft-adjacent technologies as part of my day job, and I sign into my daily-use PCs with a Microsoft account, so my usage patterns may be atypical for many Ars Technica readers. But for anyone who uses Windows, Edge, or both, I thought it might be useful to detail what I'm doing to clean up a clean install of Windows, minimizing (if not totally eliminating) the number of annoying notifications, Microsoft services, and unasked-for apps that we have to deal with. That said, this is not a guide about creating a minimally stripped-down, telemetry-free version of Windows that removes anything other than what Microsoft allows you to remove. There are plenty of experimental hacks dedicated to that sort of thing—NTDev's Tiny11 project is one—but removing built-in Windows components can cause unexpected compatibility and security problems, and Tiny11 has historically had issues with basic table-stakes stuff like "installing security updates." Avoiding Microsoft account sign-in The most contentious part of Windows 11's setup process relative to earlier Windows versions is that it mandates Microsoft account sign-in, with none of the readily apparent "limited
(read more)
Pierre Cusa November 2022 (history) "Torusphere Accelerator", the animation that motivated this article. What happens if you take motion blur past its logical extreme? Here are some fun observations and ideas I encountered while trying to answer this question, with an attempt to apply the results in a procedural animation. What is motion blur supposed to look like? Motion blur started out purely as an film artifact, the result of a subject moving while the camera's shutter is open. This artifact turned out to be desirable, especially for videos, because it improves the perceptual similarity between a video and a natural scene, something I'll dive into in this section. In a 3D and animation context, it's interesting to note that those two goals - looking natural, and simulating a camera, might not be in agreement, and might result in different motion blurs. I'll keep the simulation aspect as a side note, and ask what the most na
(read more)
December 14, 2022Volume 20, issue 5   PDF Designing an algorithm with reduced connection churn that could replace deterministic subsetting Peter Ward and Paul Wankadia with Kavita Guliani In recent years, the Autopilot system at Google has become increasingly popular internally for improving resource utilization. Autopilot can do multiple things: It can be configured to perform horizontal scaling, which adjusts the number of tasks a service has running to meet the demand; and it can be configured to perform vertical scaling, which adjusts the CPU/memory resources provisioned per task. Autopilot is also effective in preventing outages: It can respond to increased demand by scaling up a service faster than the human operators. As the usage of Autopilot became widespread, service owners discovered an interesting problem: Whenever a horizontally scaled service resized, many client connections (usually long-lived) would briefly drop and reconnect. This connection churn caused second-order effects: • Increased errors or latency for in-flight requests • Increased CPU/memory usage from connection handshakes • Reduced throughput from TCP slow start on newly established connections • Increased pressure on connection caches The severity of these effects varied by service, but in some cases, the increased errors or latency put the services' service-level objectives at risk and blocked the adoption of Autopilot. Investigation determined that this connection churn was caused by backend subsetting. Backend subsetting—a technique for reducing the number of connections when connecting services together—is useful for reducing costs and may even be necessary for operating within the system limits. For more than a decade, Google used deterministic subsetting as its default backend subsetting algorithm, but although this algorithm balances the number of connections per backend task, deterministic subsetting has a high level of connection churn. Our goal at Google was to design an algorithm with reduced connection churn that could replace deterministic subsetting as the default backend subsetting algorithm. It was ambitious because, as Hyrum's Law states, "All observable behaviors of your system will be depended on by somebody." We needed to understand all the behaviors of deterministic subsetting to avoid regressions. Backend Subsetting in Borg Google services run on Borg, the company's cluster management software. Service owners configure jobs running in multiple Borg cells for geographical diversity. Within a cell, a job consists of one or more tasks, with each task running on some machine in the datacenter. The tasks are numbered consecutively from zero. Backend subsetting is used when connecting jobs together—if a frontend job consisting of M tasks connects to a backend job consisting of N tasks, there would normally be M×N connections, which can be quite large when jobs have thousands of tasks. Instead, each of the M frontend tasks connects to k of the backend tasks, reducing the number of connections to M×k. Choosing an appropriate value for k is left to the reader, but it will usually be much less than M or N. To use backend subsetting, the service must be replicated: If the same request is sent to different tasks, they should perform equivalent work and return equivalent responses. A load-balancing policy at the frontend task is used to direct each request to a specific backend task, with the goal of uniform usage across backend tasks. Each backend task is allocated the same resources, so to avoid overload, we need to provision for the most loaded backend task. The Previous Approach The subsets chosen by the backend subsetting algorithm have various effects on production: connection balance, subset diversity, connection churn, and subset spread. To describe these behaviors and explain how the new algorithm was developed, let's start with a simple algorithm and improve it iteratively. Random subsetting One of the simplest possible algorithms is to choose random subsets: Each frontend task shuffles the list of backend tasks (identified by task numbers 0 to N-1) and selects the first k tasks. Unfortunately, this interacts poorly with many load-balancing policies. Suppose you have a CPU-bound service where all requests have the same cost and each frontend task uses round-robin load balancing to balance requests evenly across backend tasks. Thus, the load on each backend task would be directly correlated with the number of connections to it. The connection distribution from random subsetting is far from uniform, however, as figure 1 shows. Round robin is a simple load-balancing policy but not the only one influenced by the connection distribution. Given the diversity of Google services and their differing load-balancing requirements, requiring connection-agnostic load-balancing policies is impractical. Therefore, the subsetting algorithm should strive to balance the connection distribution. Property: Connection Balance The goal is to measure the amount of load imbalance contributed by the subsetting algorithm, assuming that the load-balancing policy is influenced by the connection distribution. To do this, every frontend task is assumed to generate an equal amount of load on each backend task in its subset; this is rarely exactly true in practice, but suffices for these purposes. Utilization is a useful measurement of load balancing: Dividing the total usage by the total capacity gives the fraction of resources being used. This can be applied to the connection distribution: Total usage will be the total number of connections (M×k), and (since we provision for the most loaded backend task) the total capacity will be based on the backend task with the most connections (max(Cn)×N, where Cn is the number of connections to the nth backend task). This provides the following metric: This metric, however, does not take into account the discrete nature of connections. If M×k is not divisible by N, an ideal subsetting algorithm has to assign either or connections to each backend task, so max(Cn) = and Utilization < 1. To achieve Utilization = 1 in this case, the metric must be adjusted to give the achievable utilization: Using this metric compares connection balance for subsetting algorithms across a variety of different scenarios. Note that achieving a high utilization is straightforward in two ways. First, increasing k naturally improves utilization because it decreases the effect of subsetting on load balancing; increasing the subset size to N would disable subsetting entirely. Second, as the ratio of frontend tasks to backend tasks increases, the subsetting algorithm has "more choices" per backend task, so the connection balance improves naturally even if choosing randomly. This is shown in figure 2, which plots utilization against the ratio of frontend tasks to backend tasks for jobs with, at most, 256 tasks (k = 20, 1 ≤ M ≤ 256, k ≤ N ≤ 256, M×k > N); while not a realistic bound, this is sufficient to demonstrate the algorithm's behavior. Round-robin subsetting Random subsetting can be improved by introducing coordination between the frontend tasks via their task numbers (0 to M-1). Round-robin subsetting assigns backend tasks consecutively to the first frontend task's subset, and then the second task's, and so on, demonstrated in table 1. Each frontend task m can efficiently generate its subset by starting at backend task number. It should be fairly straightforward to see that this will balance connections as uniformly as possible: Once a backend task n is assigned a connection, it will not be assigned another connection until all other backend tasks have been assigned connections. Although this algorithm has good connection balance, its other behaviors are undesirable. Property: Subset Diversity Imagine what would happen if there were more frontend tasks in table 1. Frontend task 5 would get assigned the next four backend tasks, which are {0, 1, 2, 3}, but this is the same as the subset for frontend task 0. With 10 backend tasks and four tasks per subset, there are 10 choose 4 = 210 possible subsets that could
(read more)
Recently, I decided to try the now inbuilt LSP client called Eglot. I’ve been using the lsp-mode package for some years and while I don’t have any problems with it, I decided to try the in-house solution. Though, I already tried Eglot in the past, and it didn’t work for me, due to some complications with the language I’ve tried to use it with. At the time it didn’t support Clojure, and while adding it wasn’t hard, some features did not work. Which wasn’t the case with lsp-mode, thus I used it instead. In this post, I’ll outline some problems I encountered, which are mostly Clojure-related. At this point, I can’t say if I will actually move to Eglot, but I plan to use it for some time to see if I miss any features from lsp-mode. Until then, it will stay in my private configuration. Configuration and installation I often see claims that lsp-mode packs too many features. While true, they’re all mostly optional and individually configurable. Over the years, I’ve configured lsp-mode in a fairly minimal way. You see, with Clojure, and other lisps for that matter, language servers are somewhat inferior, because there’s a much more accurate way to obtain information about the code. I, of course, talk about the REPL. Being a dynamic environment, we can ask it about its state, known functions, macros, available modules, etc. Of course, it depends on the implementation of the REPL, and not all are created equally, but the one I’m using (nREPL) is quite capable. Language server has to do all of that statically, using their own parsers, tracking of variables, modules, and so on. Because of that, I don’t really use most of the language server’s features - only linting, and occasionally go to definition. If it wasn’t for one specific feature missing from CIDER, I wouldn’t use language server at all, I think. Here’s my lsp-mode configuration: (use-package lsp-mode :ensure t :hook ((lsp-mode . lsp-diagnostics-mode) (lsp-mode . lsp-completion-mode-maybe)) :preface (defun lsp-completion-mode-maybe () (unless (bound-and-true-p cider-mode) (lsp-completion-mode 1))) :custom (lsp-keymap-prefix "C-c l") (lsp-diag
(read more)
By Shaun Mirani Near the end of 2022, Trail of Bits was hired by the Open Source Technology Improvement Fund (OSTIF) to perform a security assessment of the cURL file transfer command-line utility and its library, libcurl. The scope of our engagement included a code review, a threat model, and the subject of this blog post: an engineering effort to analyze and improve cURL’s fuzzing code. We’ll discuss several elements of this process, including how we identified important areas of the codebase lacking coverage, and then modified the fuzzing code to hit these missed areas. For exam
(read more)
I think filing bugs on browsers is one of the most useful things a web developer can do. When faced with a cross-browser compatibility problem, a lot of us are conditioned to just search for some quick workaround, or to keep cycling through alternatives until something works. And this is definitely what I did earlier in my career. But I think it’s too short-sighted. Browser dev teams are just like web dev teams – they have priorities and backlogs, and they sometimes let bugs slip through. Also, a well-written bug report with clear steps-to-repro can often lead to a quick resolution �
(read more)
One of the most persistent, frustrating, and entirely valid, criticisms of Sourcehut that I’ve heard has been that git operations over SSH are too slow. The reason this is a frustrating complaint to hear is that the git.sr.ht SSH pipeline is a complicated set of many moving parts, and fixing the problem involved changes at every level. However, as many of you will (hopefully) have noticed by now, pushing to and pulling from git.sr.ht is quite snappy now! So after a huge amount of work overhauling everything to get us here, I thought it would be nice to reflect on what caused these issues, how this system is structured, and how the problem was eventually solved. There are several major tasks that need to happen when you push or pull to git.sr.ht. In order, they are: Dispatch Which system are you SSHing into? git.sr.ht? builds.sr.ht? Identification Who are you? Authorization Are you allowed to do what you’re trying to do? Execution Hand things off to git to complete your operation. Follow-up Do we need to submit any CI jobs? Webhooks? During each of these steps, your terminal is blocked. You have to wait for all of them to complete. Well, most of them, at least. Let’s discuss each step in detail. Dispatching There are several Sourcehut services which you can log onto using SSH: git.sr.ht, hg.sr.ht, and builds.sr.ht, and perhaps more in the future. Before we overhauled it, man.sr.ht used to have a dedicated SSH service as well. In our case, we run each of these services on their own servers at their own IP addresses. However, this was not always the case, and we still support third-party installations of Sourcehut services which are all sharing a single server. Therefore, we have to have a way of identifying which service you’re trying to SSH into, and for this purpose we use the user. You use [email protected], [email protected], and [email protected], to log onto each respective service. This phase is handled by our gitsrht-dispatch binary, whose source code you can view here. This is run by OpenSSH in order to generate an authorized_keys file, with a list of SSH keys which are allowed to log into Sourcehut. That’s beyond the scope of dispatch, however, which delegates it to the next step. Instead, it just figures out which service it needs to hand you off to for authentication. [git.sr.ht::dispatch] # # The authorized keys hook uses this to disp
(read more)
i,D`0$U.0gKBn0TL$[NGukn'''35X2D95WudK1&`NFA"rnQH*0b"]:/Mu5hdCFf (NgmlPUj9IEF=E8WK6ZKnf%i$jS^/18-'u8'c?9mZ/$kDC1*0\W?c_45@9DH5ZG endstream endobj 105 0 obj << /Filter [ /ASCII85Decode /FlateDecode ] /Length 421 /Subtype /Type1C >> stream 8;Ued?t!MPA7QBrFE\77En:d-0lf!(a>YdQ$#9F#i=YDDQ=8s=@81C">JT1EeG-ej )p[AMs2b4/Yl";R+U08_7Xg2WU;W;Ggr,rRima^H1k\^7%#jQr(!KIbq./+[0*+c\ r@,WBkB@h[#$^,eMB-t8o,%;Ldd#rhF,`=lEn7<1\_1!+=uQ%mAGgn\Nd]*V8[G>< c^?S7m;AmFf@WeZ6^e?t1.hluX6_i%f]IQWA,TO]p-"T+qnYLX^/+XrM"Q+!#$YP@ IWK?87pSi1G-obkB_IO%)%3J*Ai)&Q>6cqWiB$[4$i;[email protected]$qDh>I$Q[ 7h,@afmb;N,M7HPr:--H:BGn(j@j+9FKT-2hQp\^N7f1,AklU;=6`:4qnCW]FgL2p B')gEn)[Zlc3Fng!/>cW`;~> endstream endobj 106 0 obj [ /ICCBased 146 0 R ] endobj 107 0 obj << /Type /Font /Subtype /Type1 /FirstChar 32 /LastChar 122 /Widths [ 254 0 0 0 0 0 0 0 0 0 0 0 254 329 254 284 507 0 507 507 0 0 507 507 507 507 269 0 0 0 0 433 925 702 0 672 716 612 537 702 716 298 0 0 0 896 716 716 552 0 672 552 597 0 0 0 0 0 0 0 0 0 0 0 0 448 507 448 507 433 329 493 507 284 284 507 269 776 507 493 507 507 329 388 284 507 493 702 493 478 433 ] /Encoding /WinAnsiEncoding /BaseFont /AGGOMA+TT2D1Do00 /FontDescriptor 108 0 R >> endobj 108 0 obj << /Type /FontDescriptor /Ascent 701 /CapHeight 671 /Descent -208 /Flags 32 /FontBBox [ -89 -239 896 702 ] /FontName /AGGOMA+TT2D1Do00 /ItalicAngle 0 /StemV 0 /XHeight 462 /CharSet (/H/period/v/colon/h/P/slash/w/i/F/d/y/n/zero/j/N/M/z/k/O/A/two/m/x/three\ /o/R/question/p/S/at/E/q/T/r/space/g/b/six/C/s/seven/c/a/comma/D/t/l/eig\ ht/e/G/hyphen/u/nine/f/I) /FontFile3 147 0 R >> endobj 109 0 obj << /Type /Font /Subtype /Type1 /FirstChar 32 /LastChar 241 /Widths [ 253 325 0 0 0 832 0 0 337 337 0 0 253 325 253 277 506 506 506 506 506 506 506 506 506 506 277 277 0 0 0 0 0 711 675 663 723 614 554 723 723 337 397 723 603 892 723 723 566 723 663 554 626 723 723 952 723 723 603 337 0 337 0 0 0 446 506 446 506 446 325 494 494 277 289 494 277 759 494 506 506 506 337 386 277 494 494 699 494 482 446 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 325 434 446 0 506 1000 0 0 0 0 0 0 0 0 253 0 0 0 0 0 0 0 0 759 0 0 0 325 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 494 ] /Encoding /WinAnsiEncoding /BaseFont /AGGOPO+TT2D1Eo00 /FontDescriptor 111 0 R >> endobj 110 0 obj << /Type /FontDescriptor /Ascent 658 /CapHeight 658 /Descent -213 /Flags 32 /FontBBox [ -2 -214 923 675 ] /FontName /AGGPBE+TT2D1Fo00 /ItalicAngle 0 /StemV 0 /CharSet (/r/space/h/s/i/L/a/t/M/u/T/A/v/g/m/colon/C/w/o/c/R/d/comma/n/p/l/e/S/y/h\ yphen/f/U) /FontFile3 149 0 R >> endobj 111 0 obj << /Type /FontDescriptor /Ascent 698 /CapHeight 662 /Descent -216 /Flags 32 /FontBBox [ -84 -217 1012 699 ] /FontName /AGGOPO+TT2D1Eo00 /ItalicAngle 0 /StemV 0 /XHeight 458 /CharSet (/K/ntilde/m/X/l/hyphen/o/n/Y/period/p/Z/slash/P/bracketleft/q/B/T/zero/s\ pace/r/C/A/one/s/exclam/bracketright/D/two/t/a/G/three/u/quotedblright/H\ /x/N/I/v/four/quotedblleft/J/E/w/F/five/L/percent/emdash/y/six/d/M/b/z/s\ even/O/c/quoteright/eight/Q/e/parenleft/nine/R/f/colon/h/S/parenright/co\ pyright/semicol
(read more)
Running GTA: Vice City on a TP-Link TL-WDR4900 wireless routerWhat is it?A TP-Link wireless router, with an external AMD Radeon GPU connected via PCIe,running Debian Linux and playing games: What makes this router so special?TP-LINK’s TL-WDR4900 v1 is a very interesting WiFi router:Instead of the typical MIPS or ARM CPUs found in normal WiFi routers, the WDR4900 features a PowerPC-based CPU by NXP.The NXP/Freescale QorIQ P1014 CPU used in the WDR4900 is a PowerPC e500v2 32bit processor.These CPUs offer a full 36bit address space, a lot of performance (for a router released in 2013) and they have excellent PCIe controllers.They quickly gained popularity in the OpenWrt and Freifunk communities for being cheap routers with a lot of CPU performance. Both 2.4 GHz and 5 GHz WiFi chipsets (made by Qualcomm/Atheros) are connected to the CPU via PCIe.PCIe problems on embedded systemsPCIe cards are mapped into the host CPU’s memory space transparently. The PCIe controller in the host CPU will then send all accesses to this region to the PCIe device responsible for that memory region.Each PCIe card can have several such mappings, also called “BARs” (Base Address Registers). The maximum size for such mappings varies between different CPUs.In the past, even the common Raspberry Pi CM4 could only allocate 64 MiB of their address space to graphics cards:https://github.com/raspberrypi/linux/commit/54db4b2fa4d17251c2f6e639f849b27c3b553939Many other devices (like MIPS-based router CPUs) are limited to only 32 MiB (or less).Basically, all modern graphics cards require the host system to have at least a 128 MiB BAR space available for communication with their driver. Even newer cards like Intel ARC even require “Resizable BAR”, a marketing term for very large, 64-bit memory regions. These cards will map their entire VRAM (on the order of 12+ GiB) into the hosts’ memory space.Even with sufficient BAR space, PCIe device memory might not behave in the same way as regular memory (like on an x86 CPU):This caused numerous issues when people tried to attach GPUs to a Raspberry Pi.Similar issues (regarding memory-ordering/caching/nGnRE maps/alignment) even affect large Arm64 server CPUs, resulting in crude kernel hacks and workarounds like:Work around Ampere Altra erratum #82288Fixup handler for alignment faults in aarch64 codeRetrofitting a miniPCIe slotFrom factory, the router didn’t provide any external PCIe connectivity. To connect a graphics card, a custom miniPCIe breakout PCB was designed and connected with enameled copper wire into the router: The PCIe traces leading from the CPU to one of the Atheros chipsets were cut and redirected to the miniPCIe slot.U-boot reports PCIe2 being connected to an AMD Radeon HD 7470 graphics card:U-Boot 2010.12-svn19826 (Apr 24 2013 - 20:01:21) CPU: P1014, Version: 1.0, (0x80f10110) Core: E500, Version: 5.1, (0x80212151) Clock Configuration: CPU0:800 MHz, CCB:400 MHz, DDR:333.333 MHz (666.667 MT/s data rate) (Asynchronous), IFC:100 MHz L1: D-cache 32 kB enabled I-cache 32 kB enabled Board: P1014RDB SPI: ready DRAM: 128 MiB L2: 256 KB enabled Using default environment PC
(read more)
R/OR/O (HTTP)R/W (SSH) R/W (HTTPS)Fork Nix flake for RPython interpreters Release FilesNo download files. Recent CommitsRev.TimeAuthorMessage RSS67a448752024-03-02 17:02:10CorbinUse Cachix. 443b16032024-03-02 15:02:08CorbinAdd pyrolog. eb4df1f42024-03-01 17:48:47CorbinUse setup hooks to prepare build directories. This was f...33974c3e2024-03-01 15:08:45CorbinSupport all upstream systems. Note that I have not *test...d0f70b512024-03-01 14:35:46CorbinDocument licenses, when possible. Some code, particularl...ce1d6da52024-03-01 10:36:21Corbinpysom: Add check phase. These tests are hermetic, so it'...2423c2d52024-02-10 14:57:54CorbinUse helper function for rest of open-coded interpreters. f2
(read more)
We’ve been talking about systems performance a lot lately. Few days back Groq made news for breaking LLM inference benchmarks with their language processing unit hardware (LPU). The success of LPU i
(read more)
Last month I wrote about Neal Agarwal’s web game Infinite Craft. Tom Fang wrote to tell me he’s created a dictionary of Infinite Craft elements, along with their uses and recipes. This got me thinking about the game’s mathematical structure. By “mathematical structure,” I mean something like how we make recipes and the metrics by which we might compare one recipe to another. For example, if our goal is to make Sandwich, we could do it like this: Wave = Water + Wind Steam = Fire + Water Plant = Earth + Water Sand = Earth + Wave Tea = Plant + Steam Sandwich = Sand + Tea Or like this: Wave = Water + Wind Sand = Earth + Wave Glass = Fire + Sand Wine = Glass + Water Sandwich =
(read more)
magical shell history English | 简体中文 Atuin replaces your existing shell history with a SQLite database, and records additional context for your commands
(read more)
A datalog system that makes it easy to query the web / integrate with existing APIs. This has been running through my head for a few years now and I do occasionally try to work on it, but it’s a big project and I feel like I need my time outside of work to unwind. It pains me how much data / information just floats about but we don’t have anything generic to interact with it. The whole semantic web / RDF / SPARQL story comes close to providing a solution to this problem, but it’s way too unergonomic. Reminds me of Pengines, have you heard of it? If not there’s a nice talk about it by Anne Ogborn from StrangeLoop 2014. Working on something like this, but as a general programming language. Do you know about trustfall? You still would need to write the adapters, but from a query engine standpoint it is really nice. I’ve posted this in the past, but one of my dream tool is a micro package manager that quickly vendor 1-file packages. Python micro libraries management could look like: $ micro-pip install "is-even>0.1" -o utils $ cat utils.py # micro-pip: is-even==0.1.1 def is_even(x): return x % 2 == 0 $ micro-pip upgrade utils $ cat iseven.py # micro-pip: is-even==0.2.0 def is_even(x: int) -> int: return x % 2 == 0 Or even simpler: $ micro-pip install https://github.com/acme/is-even/blob/main/src/is_even.py -o utils Keep the file tracked in source repository, metadata is as simple as a comment. The micro-package could be a good old setup.py. With some smarter parsing, it could be possible to allow edit to vendored functions and use merging techniques to apply upgrade. Could also be possible to keep multiple micro-package in the same file. This would fix many of the arguments against micro libraries: Keep them in the VCS and reviewable while keeping them easy to upgrade and test.
(read more)
March 2024 Go to: Baseline | Solutions | Table of results | Final comments I saw the One Billion Row Challenge a couple of weeks ago, and it thoroughly nerd-sniped me, so I went to Go solve it. I’m late to the party, as the original competition was in January. It was also in Java. I’m not particularly interested in Java, but I’ve been interested in optimising Go code for a while. This challenge was particularly simple: process a text file of weather station names and temperatures, and for each weather station, print out the minimum, mean, and maximum. There are a few other constraints to make it simpler, though I ignored the Java-specific ones. Here are a few lines of example input: Hamburg;12.0 Bulawayo;8.9 Palembang;38.8 St. John's;15.2 Cracow;12.6 ... The only catch: the input file has one billion rows (lines). That’s about 13GB of data. I’ve already figured out that disk I/O is no longer the bottleneck – it’s usually memory allocations and parsing that slow things down in a program like this. This article describes the nine solutions I wrote in Go, each faster than the previous. The first, a simple and idiomatic solution, runs in 1 minute 45 seconds on my machine, while the last one runs in about 4 seconds. As I go, I’ll show how I used Go�
(read more)
What is pyproject.nix Pyproject.nix is a collection of Nix utilities to work with Python project metadata in Nix. It mainly targets PEP-621 compliant pyproject.toml files and data formats, but also implement support for other & legacy formats such as Poetry & requirements.txt. Pyproject.nix aims to be a swiss army knife of simple customizable utilities that works together with the nixpkgs Python infrastructure. Foreword This documentation only helps you to get started with pyproject.nix. As it's a toolkit with many use cases not every use case can be documented fully. This documentation is centered around packaging Python applications & managing development environments. For other use cases see the reference documentation. Concepts pyproject.nix introduces a few high level abstract concepts. The best way to get started is to understand these concepts and how they fit together. Project A project attribute set is a high-level representation of a project that includes: The parsed pyproject.toml file Parsed dependencies Project root directory It can can be loaded from many different sources: PEP-621 pyproject.toml PEP-621 pyproject.toml with PDM extensions Poetry pyproject.toml requirements.txt Validators Validators work on dependency constraints as defined in a project and offers validation for them. This can be useful to check that a package set is compilant with the specification. Renderers A renderer takes a project together with a Python interpreter derivation and renders it into a form understood by various pieces of nixpkgs Python infrastructure. For example: The buildPythonPackage renderer returns an attribute set that can be passed to either nixpkgs function buildPythonPackage or buildPythonApplication. There might be information missing from what a renderer returned depending on what can be computed from the project. If any attributes are missing you can manually merge your own attribute set with what the renderer returned. Tying it together For a concrete example use see Use cases -> pyproject.toml.
(read more)
Published on 26 February 2024 “Tell me what you like about Go” A few weeks ago I was asked what I like about the Go language. It was during a job interview, and at that moment I realized I hadn't really given much thought to that, even though I use Go for almost every project for a while. After some thought, I decided to write it down. I share my experience from two perspectives: Ops perspective: on deploying and managing Go compiled programs. (Maybe my most important aspect) Developer perspective: on the Go language itself, the tooling, and the ecosystem. The Ops Perspective Performance. For my purposes and the kind of services I write, Go offers performances that are more than decent and it allows me to run Go programs on small cheaper instances and limited serverless runtimes, such as AWS Lambda Functions and Scaleway Serverless Jobs. For serverless execution, good performance is important because I am charged based on the execution time. Okay, maybe Go is not as efficient as Rust or C++, but coming from a heavy Python background, Go speed seems just stratospherical to me. Cross-compilation is a breeze. Ease of deployment on heterogeneous environments with multiple architectures is paramount. I write services running in the Cloud (serverless or IaaS), on bare-metal servers, on my laptop, and even on my Raspberry Pi 4. With the dawn of Arm64 servers (and maybe RISC-V 64 in the future?), I need cross-compilation that just works on my machine and the CI without turmoil. Static binaries and containerization. As previously stated, some of my projects run on heterogeneous environments. Naturally, containerization is thus the privileged method to package and deploy my services. Containerizing Go programs is quite easy, and since Go compiles fast, multi-stage image builds are pretty fast as well. Having static Go binaries allows me to choose very lightweight base images for containers, such as Alpine Linux, without worrying about libc dependencies. My images are typically a few megabytes. Startup time. When I work with orchestrators such as Kubernetes or AWS ECS I need containers to start fast, so rolling a new version or scaling out takes l
(read more)
A few months ago I introduced you to one of the more notable Apple pre-production units in my collection, a late prototype Macintosh Portable. But it turns out it's not merely notable for what it is than what it has on it: a beta version of System 6.0.6 (the doomed release that Apple pulled due to bugs), Apple sales databases, two online services — the maligned Mac Prodigy client, along with classic AppleLink as used by Apple staff — and two presentations, one on Apple's current Macintosh line and one on the upcoming System 7. Now that I've got the infamous Conner hard drive it came with safely copied over, it's time to explore its contents some more. We'll start with this Macintosh Portable itself and Apple's sales channel applications, moving from there to a brief presentation
(read more)
I’ve been thinking about writing this post for a long time. Normally I haunt the comment section on Hacker News and whenever an article about GA comes up I post something to the effect of: “GA is okay but it’s not as good as those people say, there’s something wrong with it, what you really want is the wedge product on its own!” Which is not especially productive and probably slightly unhinged. So today I want to actually make that point in one central place that I can link to instead. To be clear I’m not opposed to GA per se. What I have a problem with is some of the details of GA, and the fact that the proponents of GA haven’t fixed those details yet. In particular: Hestenes’ “geometric product” is a bad mathematical operation that needs to be discarded, and do
(read more)
Before the widespread existence of software repositories like CPAN, NPM, and PyPI, developers seeking to reuse an existing algorithm or library of routines would either check books or journals for code, or, they just might post a classified ad: Request posted in Decuscope 1965, Vol 4/Iss 2 User groups provided catalogues of software, from mathematical algorithms to system utilities to games and demos. Leveraging the user group’s periodicals, developers could post requests for specific examples of code. Or, more frequently, developers would review catalogs for existing solutions. They would contribute by sending their own creations to the group for others to use. In this article, we will examine how these user groups coordinated development and shared code, how they promoted discoverability of software, and how they attempted to maintain a high bar of quality. While the importance of a set of reusable subroutines to reduce the cost of development was noted in (Goldstine 1947) and the first set of published subroutines came out in (Wheeler 1951), the lack of standardization between computers and sites meant that it was concepts that were being s
(read more)
February 29, 2024Getting a fiber Internet connection to your home is a big deal! It’s probably the last physical connection you’ll ever need, due to the virtually unlimited bandwidth, stability, performance, and attainable speeds.Having your ISP drop it off on the building entrance however is not enough. Wiring within a building is often needed, especially if you need to reach apartments, mechanical rooms, and all sorts of places where networking may be required.In the spirit of DIY, as I’m the ISP, I decided to upgrade my existing wiring in my home and document the process here, for everyone to read, learn from, and enjoy. So to get started, let’s understand what we’re dealing with.The buildingThe building I am installing fiber to has 4 floors, it has access to two streets, on opposite sides, and has three vertical paths that cut almost across its entire height. There are also two manholes, one on each street, where it’s possible to accept fiber optic cables from the outside world. Here’s a visual representation of that, using my advanced architectural design skills:These vertical paths are either pre-existing, or they were created in previous work, and they allow for the easy traversal of cables within conduit across floors, following all building and safety codes. As buildings in Europe have concrete slabs and single or dual brick walls, insulation layers, pipes, cables, including within each floor, having these there helps immensly in implementation time, cost, and overall effort involved.The designDoing work on Layer 1 of the TCP/IP model is tedious and slow, requires extra materials that may not be in stock at home, and creates a mess. You’re drillin
(read more)
Download PDF HTML (experimental) Abstract:Safety is critical to the usage of large language models (LLMs). Multiple techniques such as data filtering and supervised fine-tuning have been developed to strengthen LLM safety. However, currently known techniques presume that corpora used for safety alignment of LLMs are solely interpreted by semantics. This assumption, however, does not hold in real-world applications, which leads to severe vulnerabilities in LLMs. For example, users of forums often use ASCII art, a form of text-based art, to convey image information. In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (ViTC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics. We show that five SOTA LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2) struggle to recognize prompts provided in the form of ASCII art. Based on this observation, we develop the jailbreak attack ArtPrompt, which leverages the poor performance of LLMs in recognizing ASCII art to bypass safety measures and elicit undesired behaviors from LLMs. ArtPrompt only requires black-box access to the victim LLMs, making it a practical attack. We evaluate ArtPrompt on five SOTA LLMs, and show that ArtPrompt can effectively and efficiently induce undesired behaviors from all five LLMs. Submission history From:
(read more)
Recently, tsnsrv has been getting a lot of high-quality contributions that add better support for Headscale and custom certificates, among other things. As they always do when things change, bugs crept in, and frustratingly, not in a way that existing tests could have caught: Instead of the go code (which has mildly decent test coverage), it was bugs in the nixos module!This was a great opportunity to investigate if we can test the tsnsrv NixOS module, and maybe improve the baseline quality of the codebase as a whole.A recent blog post series on the NixOS test driver (part1, part2) made the ro
(read more)
Publish your site with one command When your site is ready to be published, copy the files to our server with a familiar command: rsync -rv public/ pgs.sh:/myproj That's it! There's no need to formally create a project, we create them on-the-fly. Further, we provide TLS for every project automatically. Manage your projects with a remote CLI Use our CLI to manage your projects: ssh pgs.sh help Instant promotion and rollback Additi
(read more)
Want another comic like this in your email every Saturday? Sign up here! I'll send you one of my favourite comics from my archives every Saturday.
(read more)
The requested URL was rejected. Please consult with your administrator.Your support ID is: <13667626699990903781>[Go Back]
(read more)