You are here

Feed aggregator

Building a Better Middleman

O'Reilly Radar - Tue, 2022/04/19 - 05:22

What comes to mind when you hear the term “two-sided market?” Maybe you imagine a Party A who needs something, so they interact with Party B who provides it, and that’s that.  Despite the number “two” in the name, there’s actually someone else involved: the middleman.  This entity sits between the parties to make it easier for them to interact. (We can generalize that “two” to some arbitrary number and call this an N-sided market or multi-sided marketplace. But we’ll focus on the two-sided form for now.)

Two-sided markets are a fascinating study. They are also quite common in the business world, and therefore, so are middlemen. Record labels, rideshare companies, even dating apps all fall under this umbrella.  The role has plenty of perks, as well as some sizable pitfalls.  “Middleman” often carries a negative connotation because, in all fairness, some of them provide little value compared to what they ask in return.

Still, there’s room for everyone involved—Party A, Party B, and the middleman—to engage in a happy and healthy relationship.  In this first article, I’ll explain more about the middleman’s role and the challenges they face.  In the next article, I’ll explore what it takes to make a better middleman and how technology can play a role.

Paving the Path

When I say that middlemen make interactions easier, I mean that they address a variety of barriers:

  • Discovery: “Where do I find the other side of my need or transaction?” Dating apps like OKCupid, classified ads services such as Craigslist, and directory sites like Angi (formerly Angie’s List) are all a twist on a search engine. Party A posts a description of themself or their service, Party B scrolls and sifts the list while evaluating potential matches for fit.
  • Matching: “Should we interact? Are our needs compatible?” Many middlemen that help with discovery also handle the matching for you, as with ride-share apps.  Instead of you having to scroll through lists of drivers, Uber and Lyft use your phone’s GPS to pair you with someone nearby.  (Compared to the Discovery case, Matching works best when one or both counterparties are easily interchangeable.)
  • Standardization: “The middleman sets the rules of engagement, so we all know what to expect.”  A common example would be when a middleman like eBay sets the accepted methods of payment.  By narrowing the scope of what’s possible—by limiting options—the middleman standardizes how the parties interact.
  • Safety: “I don’t have to know you in order to exchange money with you.” Stock market exchanges and credit card companies build trust with Party A and Party B, individually, so the two parties (indirectly) trust each other through the transitive property.
  • Simplicity: “You two already know each other; I’ll insert myself into the middle, to make the relationship smoother.” Stripe and Squarespace make it easier for companies to sell goods and services by handling payments.  And then there’s Squire, which co-founder Songe Laron describes as the “operating system for the barber shop, [handling] everything from the booking, to the payment, to the point of sales system, to payroll,” and a host of other frictions between barber and customer.  In all cases, each party gets to focus on what it does best (selling goods or cutting hair) while the middleman handles the drudgework.
Nice Work, If You can Get It

As far as their business model, middlemen usually take a cut of transactions as value moves from Party A to Party B. And this arrangement has its benefits.

For one, you’re first in line to get paid: Party A pays you, you take a cut, then you pass the rest on to Party B.  Record labels and book publishers are a common example.  They pair a creator with an audience.  All of the business deals for that creator’s work run through the middleman, who collects the revenue from sales and takes their share along the way.

(The music biz is littered with stories of artists getting a raw deal—making a small percentage of revenue from their albums, while the label takes the lion’s share—but that’s another story.)

Then there’s the opportunity for recurring revenue, if Party A and Party B have an ongoing relationship.  Companies often turn to tech staffing agencies to find staff-augmentation contractors.  Those agencies typically take a cut for the entire duration of the project or engagement, which can run anywhere from a few weeks to more than a decade.  The staffing agency makes one hell of a return on their efforts when placing such a long-term contractor. Nice work, if you can get it.

Staffing agencies may have to refund a customer’s money if a contractor performs poorly.  Some middlemen, however, make money no matter how the deal ultimately turns out.  Did I foolishly believe my friend’s hot stock tip, in his drunken reverie, and pour my savings into a bad investment? Well, NYSE isn’t going to refund my money, which means they aren’t about to lose their cut.

A middleman also gets a bird’s-eye view of the relationships it enables.  It sees who interacts with whom, and how that all happens.  Middlemen that run online platforms have the opportunity to double-dip on their revenue model: first by taking their cut from an interaction, then by collecting and analyzing data around each interaction.  Everything from an end-user’s contact or demographic details, to exploring patterns of how they communicate with other users, can be packaged up and resold.  (This is, admittedly, a little shady. We’ll get to middlemen’s abuse of privilege shortly.)

Saddling Some Burdens, Too

Before you rush out to build your own middleman company, recognize that it isn’t all easy revenue.  You first need to breathe the platform into existence, so the parties can interact.  Depending on the field, this can involve a significant outlay of capital, time, and effort.  Then you need to market the platform so that everyone knows where to go to find the Party B to their Party A.

Once it’s up and running, maintenance costs can be low if you keep things simple.  (Consider the rideshare companies that own the technology platform, but not the vehicles in which passengers ride.) But until you reach that cruising altitude, you’re crossing your fingers that things pan out in your favor.  That can mean a lot of sleepless nights and stressful investor calls.

The middleman’s other big challenge is that they need to keep all of those N sides of the N-sided market happy.  The market only exists because all of the parties want to come together, and your service persists only because they want to come together through you.  If one side gets mad and leaves, the other side(s) will soon follow.  Keeping the peace can be a touchy balancing act.

Consider Airbnb.  Early in the pandemic they earned praise from guests by allowing them to cancel certain bookings without penalty.  It then passed those “savings” on to hosts, who weren’t too happy about the lost revenue.  (Airbnb later created a fund to support hosts, but some say it still fell short.)  The action sent a clear—though, likely, unintentional and incorrect—message that Airbnb valued guests more than hosts.  A modern-day version of robbing Peter to pay Paul.

Keeping all sides happy is a tough line for a middleman to walk.  Mohambir Sawhney, from Northwestern University’s McCormick Foundation, summed this up well: “In any two-sided market, you always have to figure out who you’re going to subsidize more, and who you’re going to actually screw more.” It’s easy for outsiders to say that Airbnb should have just eaten the losses—refunded guests’ money while letting hosts keep their take—but that sounds much easier said than done.  In the end, the company still has to subsidize itself, right?

The subsidize versus screw decision calculus gets even more complicated when one side only wants you but doesn’t need you.  In the Airbnb case, the company effectively serves as a marketing arm and payments processor for property owners.  Any sufficiently motivated owner is just one step away from handling that on their own, so even a small negative nudge can send them packing.  (In economics terms, we say that those owners’ switching costs are low.)

The same holds for the tech sector, where independent contractors can bypass staffing firms to hang their own shingle.  Even rideshare drivers have a choice.  While it would be tougher for them to get their own taxi medallion, they can switch from Uber to Lyft.  Or, as many do, they can sign up with both services so that switching costs are effectively zero: “delete Uber app, keep the Lyft app running, done.”

Making Enemies

Even with those challenges, delivering on the middleman’s raison d’être—”keep all parties happy”—should be a straightforward affair.  (I don’t say “easy,” just “straightforward.” There’s a difference.) Parties A and B clearly want to be together, you’re helping them be together, so the experience should be a win all around.

Why, then, do middlemen have such a terrible reputation?  It mostly boils down to greed.

Once a middleman becomes a sufficiently large and/or established player, they become the de facto place for the parties to meet.  This is a near-monopoly status. The middleman no longer needs to care about keeping one or even both parties happy, they figure, because those groups either interact through the middleman or they don’t interact at all. (This also holds true for the near-cartel status of a group of equally unpleasant middlemen.)

Maybe the middleman suddenly raises fees, or sets onerous terms of service, or simply mistreats one side of the pairing.  This raises the dollar, effort, and emotional cost to the parties since they don’t have many options to leave.

Consider food-delivery apps, which consumers love but can take as much as a 30% cut of an order’s revenue.  That’s a large bite, but easier to swallow when a restaurant has a modest take-away business alongside a much larger dine-in experience. It’s quite another story when take-away is suddenly your entire business and you’re still paying rent on the empty dining room space. Most restaurants found themselves in just this position early in the COVID-19 pandemic. Some hung signs in their windows, asking customers to call them directly instead of using the delivery apps.

Involving a middleman in a relationship can also lead to weird principal-agent problems.  Tech staffing agencies (even those that paint themselves as “consultancies”) have earned a special place here.  Big companies hand such “preferred vendors” a strong moat by requiring contractors to pass through them in lieu of establishing a direct relationship. Since the middlemen can play this Work Through Us, or Don’t Work at All card, it’s no surprise that they’ve been known to take as much as 50% of the money as it passes from client to contractor.  The client companies don’t always know this, so they are happy that the staffing agency has helped them find software developers and DBAs. The contractors, many of whom are aware of the large cuts, aren’t so keen on the arrangement.

This is on top of limiting a tech contractor’s ability to work through a competing agency.  I’ve seen everything from thinly-veiled threats (“if the client sees your resume from more than one agency, they’ll just throw it out”) to written agreements (“this contract says you won’t go through another agency to work with this client”).   What if you’ve found a different agency that will take a smaller cut, so you get more money?  Or what if Agency 1 has done a poor job of representing you, while you know that Agency 2 will get it right?  In both cases, the answer is: tough luck.

A middleman can also resort to more subtle ways to mistreat the parties.  Uber has reportedly used a variety of techniques from behavioral science—such as the gamification of male managers pretending to be women—to encourage drivers to work more.  They’ve also been accused of showing drivers and passengers different routes, charging the passenger for the longer way and paying the driver for the shorter way.

It’s Not All Easy Money

To be fair, middlemen do earn some of their cut. They provide value in that they reduce friction for both the buy and sell sides of an interaction.

This goes above and beyond building the technology for a platform.  Part of how the Deliveroos and Doordashes of the world connect diners to restaurants is by coordinating fleets of delivery drivers.  It would be expensive for a restaurant to do this on their own: hiring multiple drivers, managing the schedule, accounting for demand … and hoping business stays hot so that the drivers aren’t paid to sit idle. Similarly, tech staffing firms don’t just introduce you to contract talent. They also handle time-tracking, invoicing, and legal agreements. The client company cuts one large check to the staffing firm, which cuts lots of smaller checks to the individual contractors.

Don’t forget that handling contracts and processing payments come with extra regulatory requirements. Rules often vary by locale, and the middleman has to spend money to keep track of those rules.  So it’s not all profit.

(They can also build tools to avoid rules, such as Uber’s infamous “greyball” system … but that’s another story.)

That said, a middleman’s benefit varies by the industry vertical and even by the client.  Some argue that their revenue cut far exceeds the value they provide. In the case of tech staffing firms, I’ve heard plenty of complaints that recruiters take far too much money for  just “having a phone number” (having a client relationship) and cutting a check, when it’s the contractor who does the actual work of building software or managing systems for the client.

A Win-Win-Win Triangle

Running a middleman has its challenges and risks.  It can also be tempting to misuse the role’s power.  Still, I say that there’s a way to build an N-sided marketplace where everyone can be happy.  I’ll explore that in the next article in this series.

(Many thanks to Chris Butler for his thoughtful and insightful feedback on early drafts of this article.  I’d also like to thank Mike Loukides for shepherding this piece into its final form.)

Categories: Technology

Virtual Presentation for April 14th

PLUG - Thu, 2022/04/14 - 13:33

This is a remote meeting. Please join by going to at 7pm on Thursday April 14th

der.hans: Jekyll static site generator

Jekyll is a simple to use static site generator.

For easy, low maintenance, low resources like a blog, a static site generator (SSG) avoids complexity.
An SSG allows low effort content creation and easy deployment.
SSG can be hosted by a simple web server.
No need to setup programming languages or lots of modules.

About der.hans:
der.hans is a Free Software, technology and entrepreneurial veteran.

Hans is chairman of the Phoenix Linux User Group (PLUG); BoF organizer, jobs board maintainer, and jobs night organizer for the Southern California Linux Expo (SCaLE); and founder of the Free Software Stammtisch along with the Stammtisch Job Nights.

Currently leads a database support engineering team at ObjectRocket, most likely anything Hans says publicly was not approved by $dayjob.

The General Purpose Pendulum

O'Reilly Radar - Tue, 2022/04/12 - 04:59

Pendulums do what they do: they swing one way, then they swing back the other way.  Some oscillate quickly; some slowly; and some so slowly you can watch the earth rotate underneath them. It’s a cliche to talk about any technical trend as a “pendulum,” though it’s accurate often enough.

We may be watching one of computing’s longest-term trends turn around, becoming the technological equivalent of Foucault’s very long, slow pendulum: the trend towards generalization. That trend has been swinging in the same direction for some 70 years–since the invention of computers, really.  The first computers were just calculating engines designed for specific purposes: breaking codes (in the case of Britain’s Bombe) or calculating missile trajectories. But those primitive computers soon got the ability to store programs, making them much more flexible; eventually, they became “general purpose” (i.e., business) computers. If you’ve ever seen a manual for the IBM 360’s machine language, you’ll see many instructions that only make sense in a business context–for example, instructions for arithmetic in binary coded decimal.

That was just the beginning. In the 70s, word processors started replacing typewriters. Word processors were essentially early personal computers designed for typing–and they were quickly replaced by personal computers themselves. With the invention of email, computers became communications devices. With file sharing software like Napster and MP3 players like WinAmp, computers started replacing radios–then, when Netflix started streaming, televisions. CD and DVD players are inflexible, task-specific computers, much like word processors or the Bombe, and their functions have been subsumed by general-purpose machines.

The trend towards generalization also took place within software. Sometime around the turn of the millenium, many of us realized the Web browsers (yes, even the early Mosaic, Netscape, and Internet Explorer) could be used as a general user interface for software; all a program had to do was express its user interface in HTML (using forms for user input), and provide a web server so the browser could display the page. It’s not an accident that Java was perhaps the last programming language to have a graphical user interface (GUI) library; other languages that appeared at roughly the same time (Python and Ruby, for example) never needed one.

If we look at hardware, machines have gotten faster and faster–and more flexible in the process. I’ve already mentioned the appearance of instructions specifically for “business” in the IBM 360. GPUs are specialized hardware for high-speed computation and graphics; however, they’re much less specialized than their ancestors, dedicated vector processors.  Smartphones and tablets are essentially personal computers in a different form factor, and they have performance specs that beat supercomputers from the 1990s. And they’re also cameras, radios, televisions, game consoles, and even credit cards.

So, why do I think this pendulum might start swinging the other way?  A recent article in the Financial Times, Big Tech Raises its Bets on Chips, notes that Google and Amazon have both developed custom chips for use in their clouds. It hypothesizes that the next generation of hardware will be one in which chip development is integrated more closely into a wider strategy.  More specifically, “the best hope of producing new leaps forward in speed and performance lies in the co-design of hardware, software and neural networks.” Co-design sounds like designing hardware that is highly optimized for running neural networks, designing neural networks that are a good match for that specific hardware, and designing programming languages and tools for that specific combination of hardware and neural network. Rather than taking place sequentially (hardware first, then programming tools, then application software), all of these activities take place concurrently, informing each other. That sounds like a turn away from general-purpose hardware, at least superficially: the resulting chips will be good at doing one thing extremely well. It’s also worth noting that, while there is a lot of interest in quantum computing, quantum computers will inevitably be specialized processors attached to conventional computers. There is no reason to believe that a quantum computer can (or should) run general purpose software such as software that renders video streams, or software that calculates spreadsheets. Quantum computers will be a big part of our future–but not in a general-purpose way. Both co-design and quantum computing step away from general-purpose computing hardware. We’ve come to the end of Moore’s Law, and can’t expect further speedups from hardware itself.  We can expect improved performance by optimizing our hardware for a specific task.

Co-design of hardware, software, and neural networks will inevitably bring a new generation of tools to software development. What will those tools be? Our current development environments don’t require programmers to know much (if anything) about the hardware. Assembly language programming is a specialty that’s really only important for embedded systems (and not all of them) and a few applications that require the utmost in performance. In the world of co-design, will programmers need to know more about hardware? Or will a new generation of tools abstract the hardware away, even as they weave the hardware and the software together even more intimately? I can certainly imagine tools with modules for different kinds of neural network architectures; they might know about the kind of data the processor is expected to deal with; they might even allow a kind of “pre-training”–something that could ultimately give you GPT-3 on a chip. (Well, maybe not on a chip. Maybe a few thousand chips designed for some distributed computing architecture.) Will it be possible for a programmer to say “This is the kind of neural network I want, and this is how I want to program it,” and let the tool do the rest? If that sounds like a pipe-dream, realize that tools like GitHub Copilot are already automating programming.

Chip design is the poster child for “the first unit costs 10 billion dollars; the rest are all a penny apiece.”  That has limited chip design to well-financed companies that are either in the business of selling chips (like Intel and AMD) or that have specialized needs and can buy in very large quantities themselves (like Amazon and Google). Is that where it will stop–increasing the imbalance of power between a few wealthy companies and everyone else–or will co-design eventually enable smaller companies (and maybe even individuals) to build custom processors? To me, co-design doesn’t make sense if it’s limited to the world’s Amazons and Googles. They can already design custom chips.  It’s expensive, but that expense is itself a moat that competitors will find hard to cross. Co-design is about improved performance, yes; but as I’ve said, it’s also inevitably about improved tools.  Will those tools result in better access to semiconductor fabrication facilities?

We’ve seen that kind of transition before. Designing and making printed circuit boards used to be hard. I tried it once in high school; it requires acids and chemicals you don’t want to deal with, and a hobbyist definitely can’t do it in volume. But now, it’s easy: you design a circuit with a free tool like Kicad or Fritzing, have the tool generate a board layout, send the layout to a vendor through a web interface, and a few days later, a package arrives with your circuit boards. If you want, you can have the vendor source the board’s components and solder them in place for you. It costs a few tens of dollars, not thousands. Can the same thing happen at the chip level? It hasn’t yet. We’ve thought that field-programmable gate arrays might eventually democratize chip design, and to a limited extent, they have. FPGAs aren’t hard for small- or mid-sized businesses that can afford a few hardware engineers, but they’re far from universal, and they definitely haven’t made it to hobbyists or individuals.  Furthermore, FPGAs are still standardized (generalized) components; they don’t democratize the semiconductor fabrication plant.

What would “cloud computing” look like in a co-designed world? Let’s say that a mid-sized company designs a chip that implements a specialized language model, perhaps something like O’Reilly Answers. Would they have to run this chip on their own hardware, in their own datacenter?  Or would they be able to ship these chips to Amazon or Google for installation in their AWS and GCP data centers?  That would require a lot of work standardizing the interface to the chip, but it’s not inconceivable.  As part of this evolution, the co-design software will probably end up running in someone’s cloud (much as AWS Sagemaker does today), and it will “know” how to build devices that run on the cloud provider’s infrastructure. The future of cloud computing might be running custom hardware.

We inevitably have to ask what this will mean for users: for those who will use the online services and physical devices that these technologies enable. We may be seeing that pendulum swing back towards specialized devices. A product like Sonos speakers is essentially a re-specialization of the device that was formerly a stereo system, then became a computer. And while I (once) lamented the idea that we’d eventually all wear jackets with innumerable pockets filled with different gadgets (iPods, i-Android-phones, Fitbits, Yubikeys, a collection of dongles and earpods, you name it), some of those products make sense:  I lament the loss of the iPod, as distinct from the general purpose phone. A tiny device that could carry a large library of music, and do nothing else, was (and would still be) a wonder.

But those re-specialized devices will also change. A Sonos speaker is more specialized than a laptop plugged into an amp via the headphone jack and playing an MP3; but don’t mistake it for a 1980s stereo, either. If inexpensive, high-performance AI becomes commonplace, we can expect a new generation of exceedingly smart devices. That means voice control that really works (maybe even for those who speak with an accent), locks that can identify people accurately regardless of skin color, and appliances that can diagnose themselves and call a repairman when they need to be fixed. (I’ve always wanted a furnace that could notify my service contractor when it breaks at 2AM.) Putting intelligence on a local device could improve privacy–the device wouldn’t need to send as much data back to the mothership for processing. (We’re already seeing this on Android phones.) We might get autonomous vehicles that communicate with each other to optimize traffic patterns. We might go beyond voice controlled devices to non-invasive brain control. (Elon Musk’s Neuralink has the right idea, but few people will want sensors surgically embedded in their brains.)

And finally, as I write this, I realize that I’m writing on a laptop–but I don’t want a better laptop. With enough intelligence, would it be possible to build environments that are aware of what I want to do? And offer me the right tools when I want them (possibly something like Bret Victor’s Dynamicland)? After all, we don’t really want computers.  We want “bicycles for the mind”–but in the end, Steve Jobs only gave us computers.

That’s a big vision that will require embedded AI throughout. It will require lots of very specialized AI processors that have been optimized for performance and power consumption. Creating those specialized processors will require re-thinking how we design chips. Will that be co-design, designing the neural network, the processor, and the software together, as a single piece? Possibly. It will require a new way of thinking about tools for programming–but if we can build the right kind of tooling, “possibly” will become a certainty.

Categories: Technology

Radar trends to watch: April 2022

O'Reilly Radar - Tue, 2022/04/05 - 04:32

March was a busy month, especially for developers working with GPT-3. After surprising everybody with its ability to write code, it’s not surprising that GPT-3 is appearing in other phases of software development. One group has written a tool that creates regular expressions from verbal descriptions; another tool generates Kubernetes configurations from verbal descriptions. In his newsletter, Andrew Ng talks about the future of low-code AI: it’s not about eliminating coding, but eliminating the need to write all the boilerplate. The latest developments with large language models like GPT-3 suggest that the future isn’t that distant.

On the other hand, the US copyright office has determined that works created by machines are not copyrightable. If software is increasingly written by tools like Copilot, what will this say about software licensing and copyright?

Artificial Intelligence
  • An unusual form of matter known as spin glass can potentially allow the implementation of neural network algorithms in hardware. One particular kind of network allows pattern matching based on partial patterns (for example, face recognition based on a partial face), something that is difficult or impossible with current techniques.
  • OpenAI has extended GPT-3 to do research on the web when it needs information that it doesn’t already have.
  • Data-centric AI is gaining steam, in part because Andrew Ng has been pushing it consistently. Data-centric AI claims that the best way to improve the AI is to improve the data, rather than the algorithms. It includes ideas like machine-generated training data and automatic tagging. Christoper Ré, at one of the last Strata conferences, noted that data collection was the part of AI that was most resistant to improvement.
  • We’ve seen that GPT-3 can generate code from English language comments. Can it generate Kubernetes configurations from natural language descriptions?  Take a look at AI Kube Bot.
  • The US Copyright Office has determined that works created by an artificial intelligence aren’t copyrightable; copyright requires human authorship. This is almost certainly not the final word on the topic.
  • A neural network with a single neuron that is used many times may be as effective as large neural networks, while using much less energy.
  • Training AI models on synthetic data created by a generative model can be more effective than using real-world data. Although there are pitfalls, there’s more control over bias, and the data can be made to include unexpected cases.
  • For the past 70 years, computing has been dominated by general-purpose hardware: machines designed to run any code. Even vector processors and their descendants (GPUs) are fundamentally general purpose. The next steps forward in AI may involve software, hardware, and neural networks that are designed for each other.
  • Ithaca is a DeepMind project that uses deep learning to recover missing texts in ancient Greek documents and inscriptions.  It’s particularly interesting as an example of human-machine collaboration. Humans can do this work with 25% accuracy, Ithaca is 62% accurate on its own, but Ithaca and humans combined reach 72% accuracy.
  • Michigan is starting to build the infrastructure needed to support autonomous vehicles: dedicated lanes, communications, digital signage, and more.
  • Polycoder is an open source code generator (like Copilot) that uses GPT-2, which is also open sourced. Developers claim that Polycoder is better than Copilot for many tasks, including programming in C. Because it is open-source, it enables researchers to investigate how these tools work, including testing for security vulnerabilities.
  • New approaches to molecule design using self-supervised learning on unlabeled data promise to make drug discovery faster and more efficient.
  • The title says it all. Converting English to Regular Expressions with GPT-3, implemented as a Google sheet. Given Copilot, it’s not surprising that this can be done.
  • Researchers at MIT have developed a technique for injecting fairness into a model itself, even after it has been trained on biased data.
  • Low code programming for Python: Some new libraries designed for use in Jupyter Notebooks (Bamboo, Lux, and Mito) allow a graphical (forms-based) approach to working with data using Python’s Pandas library.
  • Will the Linkerd service mesh displace Istio?  Linkerd seems to be simpler and more attractive to small and medium-sized organizations.
  • The biggest problem with Stack Overflow is the number of answers that are out of date.  There’s now a paper studying the frequency of out-of-date answers.
  • Silkworm-based encryption: Generating good random numbers is a difficult problem. One surprising new source of randomness is silk.  While silk appears smooth, it is (not surprisingly) very irregular at a microscopic scale.  Because of this irregularity, passing light through silk generates random diffraction patterns, which can be converted into random numbers.
  • The Hub for Biotechnology in the Built Environment (HBBE) is a research center that is rethinking buildings. They intend to create “living buildings” (and I do not think that is a metaphor) capable of processing waste and producing energy.
  • A change to the protein used in CRISPR to edit DNA reduces errors by a factor of 4000, without making the process slower.
  • Researchers have observed the process by which brains store sequences of memories.  In addition to therapies for memory disorders, this discovery could lead to advances in artificial intelligence, which don’t really have the ability to create and process timelines or narratives.
  • Object detection in 3D is a critical technology for augmented reality (to say nothing of autonomous vehicles), and it’s significantly more complex than in 2D. Facebook/Meta’s 3DETR uses transformers to build models from 3D data.
  • Some ideas about what Apple’s AR glasses, Apple Glass, might be. Take what you want… Omitting a camera is a good idea, though it’s unclear how you’d make AR work. This article suggests LIDAR, but that doesn’t sound feasible.
  • According to the creator of Pokemon Go, the metaverse should be about helping people to appreciate the physical world, not about isolating them in a virtual world.
  • Jeff Carr has been publishing (and writing about) dumps of Russian data obtained by hackers from GRUMO, the Ukraine’s cyber operations team.
  • Sigstore is a new kind of certificate authority (trusted root) that is addressing open source software supply chain security problems.  The goal is to make software signing “ubiquitous, free, easy, and transparent.”
  • Russia has created its own certificate authority to mitigate international sanctions. However, users of Chrome, Firefox, Safari, and other browsers originating outside of Russia would have to install the Russian root certificate manually to access Russian sites without warnings.
  • Corporate contact forms are replacing email as a vector for transmitting malware. BazarBackdoor [sic] is now believed to be under development by the Conti ransomware group.
  • Dirty Pipe is a newly discovered high-severity bug in the Linux kernel that allows any user to overwrite any file or obtain root privileges. Android phones are also vulnerable.
  • Twitter has created an onion service that is accessible through the Tor network. (Facebook has a similar service.)  This service makes Twitter accessible within Russia, despite government censorship.
  • The attackers attacked: A security researcher has acquired and leaked chat server logs from the Conti ransomware group. These logs include discussions of victims, Bitcoin addresses, and discussions of the group’s support of Russia.
  • Attackers can force Amazon Echo devices to hack themselves. Get the device to speak a command, and its microphone will hear the command and execute it. This misfeature includes controlling other devices (like smart locks) via the Echo.
  • The Anonymous hacktivist collective is organizing (to use that word very loosely) attacks against Russian digital assets. Among other things, they have leaked emails between the Russian defense ministry and their suppliers, and hacked the front pages of several Russian news agencies.
  • The Data Detox Kit is a quick guide to the bot world and the spread of misinformation.  Is it a bot or not?  This site has other good articles about how to recognize misinformation.
  • Sensor networks that are deployed like dandelion seeds! An extremely light, solar-powered framework for scattering of RF-connected sensors and letting breezes do the work lets researchers build networks with thousands of sensors easily. I’m concerned about cleanup afterwards, but this is a breakthrough, both in biomimicry and low-power hardware.
  • Semiconductor-based LIDAR could be the key to autonomous vehicles that are reasonably priced and safe. LIDAR systems with mechanically rotating lasers have been the basis for Google’s autonomous vehicles; they are effective, but expensive.
  • The open source instruction set architecture RISC-V is gaining momentum because it is enabling innovation at the lowest levels of hardware.
Quantum Computing
  • Microsoft claims to have made a breakthrough in creating topological qubits, which should be more stable and scalable than other approaches to quantum computing.
  • IBM’s quantum computer was used to simulate a time crystal, showing that current quantum computers can be used to investigate quantum processes, even if they can’t yet support useful computation.
  • Mozilla has published their vision for the future evolution of the web. The executive summary highlights safety, privacy, and performance. They also want to see a web on which it’s easier for individuals to publish content.
  • Twitter is expanding its crowdsourced fact-checking program (called Birdwatch). It’s not yet clear whether this has helped stop the spread of misinformation.
  • The Gender Pay Gap Bot (@PayGapApp) retweets corporate tweets about International Womens’ Day with a comment about the company’s gender pay gap (derived from a database in the UK).
  • Alex Russell writes about a unified theory of web performance.  The core principle is that the web is for humans. He emphasizes the importance of latency at the tail of the performance distribution; improvements there tend to help everyone.
  • WebGPU is a new API that gives web applications the ability to do rendering and computation on GPUs.
Blockchains and NFTs Business
Categories: Technology

AI Adoption in the Enterprise 2022

O'Reilly Radar - Thu, 2022/03/31 - 04:35

In December 2021 and January 2022, we asked recipients of our Data and AI Newsletters to participate in our annual survey on AI adoption. We were particularly interested in what, if anything, has changed since last year. Are companies farther along in AI adoption? Do they have working applications in production? Are they using tools like AutoML to generate models, and other tools to streamline AI deployment? We also wanted to get a sense of where AI is headed. The hype has clearly moved on to blockchains and NFTs. AI is in the news often enough, but the steady drumbeat of new advances and techniques has gotten a lot quieter.

Compared to last year, significantly fewer people responded. That’s probably a result of timing. This year’s survey ran during the holiday season (December 8, 2021, to January 19, 2022, though we received very few responses in the new year); last year’s ran from January 27, 2021, to February 12, 2021. Pandemic or not, holiday schedules no doubt limited the number of respondents.

Our results held a bigger surprise, though. The smaller number of respondents notwithstanding, the results were surprisingly similar to 2021. Furthermore, if you go back another year, the 2021 results were themselves surprisingly similar to 2020. Has that little changed in the application of AI to enterprise problems? Perhaps. We considered the possibility that the same individuals responded in both 2021 and 2022. That wouldn’t be surprising, since both surveys were publicized through our mailing lists—and some people like responding to surveys. But that wasn’t the case. At the end of the survey, we asked respondents for their email address. Among those who provided an address, there was only a 10% overlap between the two years.

When nothing changes, there’s room for concern: we certainly aren’t in an “up and to the right” space. But is that just an artifact of the hype cycle? After all, regardless of any technology’s long-term value or importance, it can only receive outsized media attention for a limited time. Or are there deeper issues gnawing at the foundations of AI adoption?

AI Adoption

We asked participants about the level of AI adoption in their organization. We structured the responses to that question differently from prior years, in which we offered four responses: not using AI, considering AI, evaluating AI, and having AI projects in production (which we called “mature”). This year we combined “evaluating AI” and “considering AI”; we thought that the difference between “evaluating” and “considering” was poorly defined at best, and if we didn’t know what it meant, our respondents didn’t either. We kept the question about projects in production, and we’ll use the words “in production” rather than “mature practice” to talk about this year’s results.

Despite the change in the question, the responses were surprisingly similar to last year’s. The same percentage of respondents said that their organizations had AI projects in production (26%). Significantly more said that they weren’t using AI: that went from 13% in 2021 to 31% in this year’s survey. It’s not clear what that shift means. It’s possible that it’s just a reaction to the change in the answers; perhaps respondents who were “considering” AI thought “considering really means that we’re not using it.” It’s also possible that AI is just becoming part of the toolkit, something developers use without thinking twice. Marketers use the term AI; software developers tend to say machine learning. To the customer, what’s important isn’t how the product works but what it does. There’s already a lot of AI embedded into products that we never think about.

From that standpoint, many companies with AI in production don’t have a single AI specialist or developer. Anyone using Google, Facebook, or Amazon (and, I presume, most of their competitors) for advertising is using AI. AI as a service includes AI packaged in ways that may not look at all like neural networks or deep learning. If you install a smart customer service product that uses GPT-3, you’ll never see a hyperparameter to tune—but you have deployed an AI application. We don’t expect respondents to say that they have “AI applications deployed” if their company has an advertising relationship with Google, but AI is there, and it’s real, even if it’s invisible.

Are those invisible applications the reason for the shift? Is AI disappearing into the walls, like our plumbing (and, for that matter, our computer networks)? We’ll have reason to think about that throughout this report.

Regardless, at least in some quarters, attitudes seem to be solidifying against AI, and that could be a sign that we’re approaching another “AI winter.” We don’t think so, given that the number of respondents who report AI in production is steady and up slightly. However, it is a sign that AI has passed to the next stage of the hype cycle. When expectations about what AI can deliver are at their peak, everyone says they’re doing it, whether or not they really are. And once you hit the trough, no one says they’re using it, even though they now are.

Figure 1. AI adoption and maturity

The trailing edge of the hype cycle has important consequences for the practice of AI. When it was in the news every day, AI didn’t really have to prove its value; it was enough to be interesting. But once the hype has died down, AI has to show its value in production, in real applications: it’s time for it to prove that it can deliver real business value, whether that’s cost savings, increased productivity, or more customers. That will no doubt require better tools for collaboration between AI systems and consumers, better methods for training AI models, and better governance for data and AI systems.

Adoption by Continent

When we looked at responses by geography, we didn’t see much change since last year. The greatest increase in the percentage of respondents with AI in production was in Oceania (from 18% to 31%), but that was a relatively small segment of the total number of respondents (only 3.5%)—and when there are few respondents, a small change in the numbers can produce a large change in the apparent percentages. For the other continents, the percentage of respondents with AI in production agreed within 2%.

Figure 2. AI adoption by continent

After Oceania, North America and Europe had the greatest percentages of respondents with AI in production (both 27%), followed by Asia and South America (24% and 22%, respectively). Africa had the smallest percentage of respondents with AI in production (13%) and the largest percentage of nonusers (42%). However, as with Oceania, the number of respondents from Africa was small, so it’s hard to put too much credence in these percentages. We continue to hear exciting stories about AI in Africa, many of which demonstrate creative thinking that is sadly lacking in the VC-frenzied markets of North America, Europe, and Asia.

Adoption by Industry

The distribution of respondents by industry was almost the same as last year. The largest percentages of respondents were from the computer hardware and financial services industries (both about 15%, though computer hardware had a slight edge), education (11%), and healthcare (9%). Many respondents reported their industry as “Other,” which was the third most common answer. Unfortunately, this vague category isn’t very helpful, since it featured industries ranging from academia to wholesale, and included some exotica like drones and surveillance—intriguing but hard to draw conclusions from based on one or two responses. (Besides, if you’re working on surveillance, are you really going to tell people?) There were well over 100 unique responses, many of which overlapped with the industry sectors that we listed.

We see a more interesting story when we look at the maturity of AI practices in these industries. The retail and financial services industries had the greatest percentages of respondents reporting AI applications in production (37% and 35%, respectively). These sectors also had the fewest respondents reporting that they weren’t using AI (26% and 22%). That makes a lot of intuitive sense: just about all retailers have established an online presence, and part of that presence is making product recommendations, a classic AI application. Most retailers using online advertising services rely heavily on AI, even if they don’t consider using a service like Google “AI in production.” AI is certainly there, and it’s driving revenue, whether or not they’re aware of it. Similarly, financial services companies were early adopters of AI: automated check reading was one of the first enterprise AI applications, dating to well before the current surge in AI interest.

Education and government were the two sectors with the fewest respondents reporting AI projects in production (9% for both). Both sectors had many respondents reporting that they were evaluating the use of AI (46% and 50%). These two sectors also had the largest percentage of respondents reporting that they weren’t using AI. These are industries where appropriate use of AI could be very important, but they’re also areas in which a lot of damage could be done by inappropriate AI systems. And, frankly, they’re both areas that are plagued by outdated IT infrastructure. Therefore, it’s not surprising that we see a lot of people evaluating AI—but also not surprising that relatively few projects have made it into production.

Figure 3. AI adoption by industry

As you’d expect, respondents from companies with AI in production reported that a larger portion of their IT budget was spent on AI than did respondents from companies that were evaluating or not using AI. 32% of respondents with AI in production reported that their companies spent over 21% of their IT budget on AI (18% reported that 11%–20% of the IT budget went to AI; 20% reported 6%–10%). Only 12% of respondents who were evaluating AI reported that their companies were spending over 21% of the IT budget on AI projects. Most of the respondents who were evaluating AI came from organizations that were spending under 5% of their IT budget on AI (31%); in most cases, “evaluating” means a relatively small commitment. (And remember that roughly half of all respondents were in the “evaluating” group.)

The big surprise was among respondents who reported that their companies weren’t using AI. You’d expect their IT expense to be zero, and indeed, over half of the respondents (53%) selected 0%–5%; we’ll assume that means 0. Another 28% checked “Not applicable,” also a reasonable response for a company that isn’t investing in AI. But a measurable number had other answers, including 2% (10 respondents) who indicated that their organizations were spending over 21% of their IT budgets on AI projects. 13% of the respondents not using AI indicated that their companies were spending 6%–10% on AI, and 4% of that group estimated AI expenses in the 11%–20% range. So even when our respondents report that their organizations aren’t using AI, we find that they’re doing something: experimenting, considering, or otherwise “kicking the tires.” Will these organizations move toward adoption in the coming years? That’s anyone’s guess, but AI may be penetrating organizations that are on the back side of the adoption curve (the so-called “late majority”).

Figure 4. Share of IT budgets allocated to AI

Now look at the graph showing the percentage of IT budget spent on AI by industry. Just eyeballing this graph shows that most companies are in the 0%–5% range. But it’s more interesting to look at what industries are, and aren’t, investing in AI. Computers and healthcare have the most respondents saying that over 21% of the budget is spent on AI. Government, telecommunications, manufacturing, and retail are the sectors where respondents report the smallest (0%–5%) expense on AI. We’re surprised at the number of respondents from retail who report low IT spending on AI, given that the retail sector also had a high percentage of practices with AI in production. We don’t have an explanation for this, aside from saying that any study is bound to expose some anomalies.

Figure 5. Share of IT budget allocated to AI, by industry Bottlenecks

We asked respondents what the biggest bottlenecks were to AI adoption. The answers were strikingly similar to last year’s. Taken together, respondents with AI in production and respondents who were evaluating AI say the biggest bottlenecks were lack of skilled people and lack of data or data quality issues (both at 20%), followed by finding appropriate use cases (16%).

Looking at “in production” and “evaluating” practices separately gives a more nuanced picture. Respondents whose organizations were evaluating AI were much more likely to say that company culture is a bottleneck, a challenge that Andrew Ng addressed in a recent issue of his newsletter. They were also more likely to see problems in identifying appropriate use cases. That’s not surprising: if you have AI in production, you’ve at least partially overcome problems with company culture, and you’ve found at least some use cases for which AI is appropriate.

Respondents with AI in production were significantly more likely to point to lack of data or data quality as an issue. We suspect this is the result of hard-won experience. Data always looks much better before you’ve tried to work with it. When you get your hands dirty, you see where the problems are. Finding those problems, and learning how to deal with them, is an important step toward developing a truly mature AI practice. These respondents were somewhat more likely to see problems with technical infrastructure—and again, understanding the problem of building the infrastructure needed to put AI into production comes with experience.

Respondents who are using AI (the “evaluating” and “in production” groups—that is, everyone who didn’t identify themselves as a “non-user”) were in agreement on the lack of skilled people. A shortage of trained data scientists has been predicted for years. In last year’s survey of AI adoption, we noted that we’ve finally seen this shortage come to pass, and we expect it to become more acute. This group of respondents were also in agreement about legal concerns. Only 7% of the respondents in each group listed this as the most important bottleneck, but it’s on respondents’ minds.

And nobody’s worrying very much about hyperparameter tuning.

Figure 6. Bottlenecks to AI adoption

Looking a bit further into the difficulty of hiring for AI, we found that respondents with AI in production saw the most significant skills gaps in these areas: ML modeling and data science (45%), data engineering (43%), and maintaining a set of business use cases (40%). We can rephrase these skills as core AI development, building data pipelines, and product management. Product management for AI, in particular, is an important and still relatively new specialization that requires understanding the specific requirements of AI systems.

AI Governance

Among respondents with AI products in production, the number of those whose organizations had a governance plan in place to oversee how projects are created, measured, and observed was roughly the same as those that didn’t (49% yes, 51% no). Among respondents who were evaluating AI, relatively few (only 22%) had a governance plan.

The large number of organizations lacking AI governance is disturbing. While it’s easy to assume that AI governance isn’t necessary if you’re only doing some experiments and proof-of-concept projects, that’s dangerous. At some point, your proof-of-concept is likely to turn into an actual product, and then your governance efforts will be playing catch-up. It’s even more dangerous when you’re relying on AI applications in production. Without formalizing some kind of AI governance, you’re less likely to know when models are becoming stale, when results are biased, or when data has been collected improperly.

Figure 7. Organizations with an AI governance plan in place

While we didn’t ask about AI governance in last year’s survey, and consequently can’t do year-over-year comparisons, we did ask respondents who had AI in production what risks they checked for. We saw almost no change. Some risks were up a percentage point or two and some were down, but the ordering remained the same. Unexpected outcomes remained the biggest risk (68%, down from 71%), followed closely by model interpretability and model degradation (both 61%). It’s worth noting that unexpected outcomes and model degradation are business issues. Interpretability, privacy (54%), fairness (51%), and safety (46%) are all human issues that may have a direct impact on individuals. While there may be AI applications where privacy and fairness aren’t issues (for example, an embedded system that decides whether the dishes in your dishwasher are clean), companies with AI practices clearly need to place a higher priority on the human impact of AI.

We’re also surprised to see that security remains close to the bottom of the list (42%, unchanged from last year). Security is finally being taken seriously by many businesses, just not for AI. Yet AI has many unique risks: data poisoning, malicious inputs that generate false predictions, reverse engineering models to expose private information, and many more among them. After last year’s many costly attacks against businesses and their data, there’s no excuse for being lax about cybersecurity. Unfortunately, it looks like AI practices are slow in catching up.

Figure 8. Risks checked by respondents with AI in production

Governance and risk-awareness are certainly issues we’ll watch in the future. If companies developing AI systems don’t put some kind of governance in place, they are risking their businesses. AI will be controlling you, with unpredictable results—results that increasingly include damage to your reputation and large legal judgments. The least of these risks is that governance will be imposed by legislation, and those who haven’t been practicing AI governance will need to catch up.


When we looked at the tools used by respondents working at companies with AI in production, our results were very similar to last year’s. TensorFlow and scikit-learn are the most widely used (both 63%), followed by PyTorch, Keras, and AWS SageMaker (50%, 40%, and 26%, respectively). All of these are within a few percentage points of last year’s numbers, typically a couple of percentage points lower. Respondents were allowed to select multiple entries; this year the average number of entries per respondent appeared to be lower, accounting for the drop in the percentages (though we’re unsure why respondents checked fewer entries).

There appears to be some consolidation in the tools marketplace. Although it’s great to root for the underdogs, the tools at the bottom of the list were also slightly down: AllenNLP (2.4%), BigDL (1.3%), and RISELab’s Ray (1.8%). Again, the shifts are small, but dropping by one percent when you’re only at 2% or 3% to start with could be significant—much more significant than scikit-learn’s drop from 65% to 63%. Or perhaps not; when you only have a 3% share of the respondents, small, random fluctuations can seem large.

Figure 9. Tools used by respondents with AI in production Automating ML

We took an additional look at tools for automatically generating models. These tools are commonly called “AutoML” (though that’s also a product name used by Google and Microsoft). They’ve been around for a few years; the company developing DataRobot, one of the oldest tools for automating machine learning, was founded in 2012. Although building models and programming aren’t the same thing, these tools are part of the “low code” movement. AutoML tools fill similar needs: allowing more people to work effectively with AI and eliminating the drudgery of doing hundreds (if not thousands) of experiments to tune a model.

Until now, the use of AutoML has been a relatively small part of the picture. This is one of the few areas where we see a significant difference between this year and last year. Last year 51% of the respondents with AI in production said they weren’t using AutoML tools. This year only 33% responded “None of the above” (and didn’t write in an alternate answer).

Respondents who were “evaluating” the use of AI appear to be less inclined to use AutoML tools (45% responded “None of the above”). However, there were some important exceptions. Respondents evaluating ML were more likely to use Azure AutoML than respondents with ML in production. This fits anecdotal reports that Microsoft Azure is the most popular cloud service for organizations that are just moving to the cloud. It’s also worth noting that the usage of Google Cloud AutoML and IBM AutoAI was similar for respondents who were evaluating AI and for those who had AI in production.

Figure 10. Use of AutoML tools Deploying and Monitoring AI

There also appeared to be an increase in the use of automated tools for deployment and monitoring among respondents with AI in production. “None of the above” was still the answer chosen by the largest percentage of respondents (35%), but it was down from 46% a year ago. The tools they were using were similar to last year’s: MLflow (26%), Kubeflow (21%), and TensorFlow Extended (TFX, 15%). Usage of MLflow and Kubeflow increased since 2021; TFX was down slightly. Amazon SageMaker (22%) and TorchServe (6%) were two new products with significant usage; SageMaker in particular is poised to become a market leader. We didn’t see meaningful year-over-year changes for Domino, Seldon, or Cortex, none of which had a significant market share among our respondents. (BentoML is new to our list.)

Figure 11. Tools used for deploying and monitoring AI

We saw similar results when we looked at automated tools for data versioning, model tuning, and experiment tracking. Again, we saw a significant reduction in the percentage of respondents who selected “None of the above,” though it was still the most common answer (40%, down from 51%). A significant number said they were using homegrown tools (24%, up from 21%). MLflow was the only tool we asked about that appeared to be winning the hearts and minds of our respondents, with 30% reporting that they used it. Everything else was under 10%. A healthy, competitive marketplace? Perhaps. There’s certainly a lot of room to grow, and we don’t believe that the problem of data and model versioning has been solved yet.

AI at a Crossroads

Now that we’ve looked at all the data, where is AI at the start of 2022, and where will it be a year from now? You could make a good argument that AI adoption has stalled. We don’t think that’s the case. Neither do venture capitalists; a study by the OECD, Venture Capital Investments in Artificial Intelligence, says that in 2020, 20% of all VC funds went to AI companies. We would bet that number is also unchanged in 2021. But what are we missing? Is enterprise AI stagnating?

Andrew Ng, in his newsletter The Batch, paints an optimistic picture. He points to Stanford’s AI Index Report for 2022, which says that private investment almost doubled between 2020 and 2021. He also points to the rise in regulation as evidence that AI is unavoidable: it’s an inevitable part of 21st century life. We agree that AI is everywhere, and in many places, it’s not even seen. As we’ve mentioned, businesses that are using third-party advertising services are almost certainly using AI, even if they never write a line of code. It’s embedded in the advertising application. Invisible AI—AI that has become part of the infrastructure—isn’t going away. In turn, that may mean that we’re thinking about AI deployment the wrong way. What’s important isn’t whether organizations have deployed AI on their own servers or on someone else’s. What we should really measure is whether organizations are using infrastructural AI that’s embedded in other systems that are provided as a service. AI as a service (including AI as part of another service) is an inevitable part of the future.

But not all AI is invisible; some is very visible. AI is being adopted in some ways that, until the past year, we’d have considered unimaginable. We’re all familiar with chatbots, and the idea that AI can give us better chatbots wasn’t a stretch. But GitHub’s Copilot was a shock: we didn’t expect AI to write software. We saw (and wrote about) the research leading up to Copilot but didn’t believe it would become a product so soon. What’s more shocking? We’ve heard that, for some programming languages, as much as 30% of new code is being suggested by the company’s AI programming tool Copilot. At first, many programmers thought that Copilot was no more than AI’s clever party trick. That’s clearly not the case. Copilot has become a useful tool in surprisingly little time, and with time, it will only get better.

Other applications of large language models—automated customer service, for example—are rolling out (our survey didn’t pay enough attention to them). It remains to be seen whether humans will feel any better about interacting with AI-driven customer service than they do with humans (or horrendously scripted bots). There’s an intriguing hint that AI systems are better at delivering bad news to humans. If we need to be told something we don’t want to hear, we’d prefer it come from a faceless machine.

We’re starting to see more adoption of automated tools for deployment, along with tools for data and model versioning. That’s a necessity; if AI is going to be deployed into production, you have to be able to deploy it effectively, and modern IT shops don’t look kindly on handcrafted artisanal processes.

There are many more places we expect to see AI deployed, both visible and invisible. Some of these applications are quite simple and low-tech. My four-year-old car displays the speed limit on the dashboard. There are any number of ways this could be done, but after some observation, it became clear that this was a simple computer vision application. (It would report incorrect speeds if a speed limit sign was defaced, and so on.) It’s probably not the fanciest neural network, but there’s no question we would have called this AI a few years ago. Where else? Thermostats, dishwashers, refrigerators, and other appliances? Smart refrigerators were a joke not long ago; now you can buy them.

We also see AI finding its way onto smaller and more limited devices. Cars and refrigerators have seemingly unlimited power and space to work with. But what about small devices like phones? Companies like Google have put a lot of effort into running AI directly on the phone, both doing work like voice recognition and text prediction and actually training models using techniques like federated learning—all without sending private data back to the mothership. Are companies that can’t afford to do AI research on Google’s scale benefiting from these developments? We don’t yet know. Probably not, but that could change in the next few years and would represent a big step forward in AI adoption.

On the other hand, while Ng is certainly right that demands to regulate AI are increasing, and those demands are probably a sign of AI’s ubiquity, they’re also a sign that the AI we’re getting is not the AI we want. We’re disappointed not to see more concern about ethics, fairness, transparency, and mitigating bias. If anything, interest in these areas has slipped slightly. When the biggest concern of AI developers is that their applications might give “unexpected results,” we’re not in a good place. If you only want expected results, you don’t need AI. (Yes, I’m being catty.) We’re concerned that only half of the respondents with AI in production report that AI governance is in place. And we’re horrified, frankly, not to see more concern about security. At least there hasn’t been a year-over-year decrease—but that’s a small consolation, given the events of last year.

AI is at a crossroads. We believe that AI will be a big part of our future. But will that be the future we want or the future we get because we didn’t pay attention to ethics, fairness, transparency, and mitigating bias? And will that future arrive in 5, 10, or 20 years? At the start of this report, we said that when AI was the darling of the technology press, it was enough to be interesting. Now it’s time for AI to get real, for AI practitioners to develop better ways to collaborate between AI and humans, to find ways to make work more rewarding and productive, to build tools that can get around the biases, stereotypes, and mythologies that plague human decision-making. Can AI succeed at that? If there’s another AI winter, it will be because people—real people, not virtual ones—don’t see AI generating real value that improves their lives. It will be because the world is rife with AI applications that they don’t trust. And if the AI community doesn’t take the steps needed to build trust and real human value, the temperature could get rather cold.

Categories: Technology

D-Day in Kyiv

O'Reilly Radar - Tue, 2022/03/22 - 11:02
My experience working with Ukraine’s Offensive Cyber Team

By Jeffrey Carr
March 22, 2022

When Russia invaded Ukraine on February 24th,  I had been working with two offensive cyber operators from GURMO—Main Intelligence Directorate of the Ministry of Defense of Ukraine—for several months trying to help them raise funds to expand development on an OSINT (Open Source Intelligence) platform they had invented and were using to identify and track Russian terrorists in the region. Since the technology was sensitive, we used Signal for voice and text calls. There was a lot of tension during the first few weeks of February due to Russia’s military buildup on Ukraine’s borders and the uncertainty of what Putin would do.

Then on February 24th at 6am in Kyiv (February 23, 8pm in Seattle where I live), it happened.

SIGNAL log 23 FEB 2022 20:00 (Seattle)  / 24 FEB 2022 06:00 (Kyiv)

Missed audio call - 8:00pm It started 8:01PM War? 9:36PM Incoming audio call - 9:37PM Call dropped. 9:41PM Are you there? 9:42PM

I didn’t hear from my GURMO friend again for 10 hours. When he pinged me on Signal, it was from a bunker. They were expecting another missile attack at any moment.

“Read this”, he said, and sent me this link. “Use Google Translate.”

It linked to an article that described Russia’s operations plan for its attack on Ukraine, obtained by sources of Ukrainian news website ZN.UA. It said that the Russian military had sabotage groups already placed in Ukraine whose job was to knock out power and communications in the first 24 hours in order to cause panic. Acts of arson and looting would follow, with the goal of distracting law enforcement from chasing down the saboteurs. Then, massive cyber attacks would take down government websites, including the Office of the President, the General Staff, the Cabinet, and the Parliament (the Verkhovna Rada). The Russian military expected little resistance when it moved against Kyiv and believed that it could capture the capital in a matter of days.

The desired result is to seize the leadership of the state (it is not specified who exactly) and force a peace agreement to be signed on Russian terms under blackmail and the possibility of the death of a large number of civilians.

Even if part of the country’s leadership is evacuated, some pro-Russian politicians will be able to “take responsibility” and sign documents, citing the “escape” of the political leadership from Kyiv.

As a result, Ukraine can be divided into two parts—on the principle of West and East Germany, or North and South Korea.

At the same time, the Russian Federation recognizes the legitimate part of Ukraine that will sign these agreements and will be loyal to the Russian Federation. Guided by the principle: “he who controls the capital—he controls the state.”

The first significant Russian cyber attack of the war is suspected to be the one that took down satellite provider ViaSat at precisely 06:00 Kyiv time (04:00 UTC), the exact time that Russia started its invasion.

The cause is believed to be a malicious firmware update sent to ViaSat customers that “bricked” the satellite modems. Since ViaSat is a defense contractor, the NSA, France’s ANSSI, and Ukrainian Intelligence are investigating. ViaSat hired Mandiant to handle digital forensics and incident response (DFIR).

“Is Ukraine planning to retaliate?”, I asked.

“We’re engaging in six hours. I’ll keep you informed.”

That last exchange happened about 22 hours after the start of the war.

FRIDAY, FEB 25, 2022 07:51

I received a Signal alert.

“Download ready” and a link.

The GURMO cyber team had gained access to the accounting and document management system at Russian Military Unit 6762, part of the Ministry of Internal Affairs that deals with riot control, terrorists, and the territorial defense of Russia. They downloaded all of their personnel data, including passports, military IDs, credit cards, and payment records. I was sent a sampling of documents to do further research and post via my channels.

The credit cards were all issued by Sberbank. “What are you going to do with these”, I asked. He sent me a wink and a grin icon on Signal and said:

Buy weapons and ammo for our troops! We start again at 6:30am tomorrow. When you wake up, join us. Will do!

Over the next few days, GURMO’s offensive cyber team hacked a dizzying array of Russian targets and stole thousands of files from:

  • Black Sea Fleet’s communications servers
  • FSB Special Operations unit 607
  • Sergey G. Buev, the Chief Missile Officer of the Ministry of Defense
  • Federal Air Transport Agency

Everything was in Russian, so the translation process was very time-consuming. There were literally hundreds of documents in all different file types, and to make the translation process even harder, many of the documents were images of a document. You can’t just upload those into Google Translate. You have to download the Google Translate app onto your mobile phone, then point it at the document on your screen and read it that way.

Once I had read enough, I could write a post at my Inside Cyber Warfare Substack that provided information and context to the breach. Between the translation, research, writing, and communication with GURMO ,who were 11 hours ahead (10 hours after the time change), I was getting about 4 ½ hours of sleep each night.

We Need Media Support

TUESDAY, MARCH 1, 2022 09:46 (Seattle)

On Signal

We need media support from USA. All the attacks you mentioned during these 6 days. We have to make headlines to demoralize Russians. I know the team at a young British PR firm. I’ll check with them now.

Nara Communications immediately stepped up to the challenge. They agreed to waive their fee and help place news stories about the GURMO cyber team’s successes. The Ukrainians did their part and gave them some amazing breaches, starting with the Beloyarsk Nuclear Power Plant—the world’s only commercial fast breeder reactors. Other countries were spending billions of dollars trying to achieve what Russia had already mastered, so a breach of their design documents and processes was a big deal.

The problem was that journalists wanted to speak to GURMO and that was off the table for three important reasons:

  1. They were too busy fighting a war to give interviews.
  2. The Russian government knew who they were, and their names and faces were on the playing cards given to Kadryov’s Chechen Guerillas for assassination.
  3. They didn’t want to expose themselves to facial recognition or voice capture technologies because…see #2.

Journalists had only a few options if they didn’t want to run with a single-source story.

They could speak with me because I was the only person who the GURMO team would directly speak to. Plus, I had possession of the documents and understood what they were.

They could contact the CIA Legat in Warsaw, Poland where the U.S. embassy had evacuated to prior to the start of the war. GURMO worked closely with and gave frequent briefings to its allied partners, and they would know about these breaches. Of course, the CIA most likely wouldn’t speak with a journalist.

They could speak with other experts to vet the documents, which would effectively be their second source after speaking with me. Most reporters at major outlets didn’t bother reporting these breaches under those conditions. To make matters worse, there were no obvious victims. The GURMO hackers weren’t breaking things, they were stealing things, and they liked to keep a persistent presence in the network so they could keep coming back for more. Plus, Russia often implemented a communications strategy known as Ихтамнет (Ihtamnet), which roughly translated means “nothing happened” or to put it into context “What hacks? There were no hacks.”

In spite of all those obstacles, Nara Communications was successful in getting an article placed with SC magazine, a radio interview with Britain’s The Times, and a podcast with the Evening Standard.

By mid-March, Putin showed no signs of wanting peace, even after President Zelensky had conceded that NATO membership was probably off the table for Ukraine, and GURMO was popping bigger targets than ever.

The Russians’ plan to establish a fully automated lunar base called Luna-Glob was breached. Russia’s EXOMars project was breached. The new launch complex being built at Vostochny for the Angara rocket was breached. In every instance, a trove of files was downloaded for study by Ukraine’s government and shared with its allies. A small amount was always carved out for me to review, post at the Inside Cyber Warfare Substack, and share with journalists. Journalist Joe Uchill referred to this strategy as Hack and Leak.

Hack and Leak

By hacking some of Russia’s proudest accomplishments (its space program) and most successful technologies (its nuclear research program), the Ukrainian government is sending Putin a message that your cybersecurity systems cannot keep us out, that even your most valuable technological secrets aren’t safe from us, and that if you push us too far, we can do whatever we want to your networks.

Apart from the attack on ViaSat, there hasn’t been evidence of any destructive cyber attacks against Ukrainian infrastructure. Part of that was strategic planning on the part of Ukraine (that’s all that I can say about that), part was Ukraine’s cyber defense at work, and part of that may be that GURMO’s strategy is working. However, there’s no sign that these leaks are having any effect on impeding Russia’s military escalation, probably because that’s driven out of desperation in the face of its enormous military losses so far. Should that escalation continue, GURMO has contingency plans that will bring the war home to Russia.

Jeffrey Carr has been an internationally-known cybersecurity adviser, author, and researcher since 2006. He has worked as a Russia SME for the CIA’s Open Source Center Eurasia Desk. He invented REDACT, the world’s first global R&D database and search engine to assist companies in identifying which intellectual property is of value to foreign governments. He is the founder and organizer of Suits & Spooks, a “collision” event to discuss hard challenges in the national security space, and is the author of Inside Cyber Warfare: Mapping the Cyber Underworld (O’Reilly Media, 2009, 2011). 

Categories: Technology

The Future of Security

O'Reilly Radar - Tue, 2022/03/15 - 07:02

The future of cybersecurity is being shaped by the need for companies to secure their networks, data, devices, and identities. This includes adopting security frameworks like zero trust, which will help companies secure internal information systems and data in the cloud. With the sheer volume of new threats, today’s security landscape has become more complex than ever. With the rise of ransomware, firms have become more aware of their ability to recover from an attack if they are targeted, but security needs also continue to evolve as new technologies, apps, and devices are developed faster than ever before. This means that organizations must be focused on solutions that allow them to stay on the cutting edge of technology and business.

What does the future have in store for cybersecurity? What are some of today’s trends, and what might be future trends in this area? Several significant cybersecurity trends have already emerged or will continue to gain momentum this coming year and beyond. This report covers four of the most important trends:

  • Zero trust (ZT) security (also known as context-aware security, policy-based enforcement), which is becoming more widespread and dominates many enterprise and vendor conversations.
  • Ransomware threats and attacks, which will continue to rise and wreak havoc.
  • Mobile device security, which is becoming more urgent with an increase in remote work and mobile devices.
  • Cloud security and automation, as a means for addressing cloud security issues and the workforce skills gap/ shortage of professionals.Related to this is cybersecurity as a service (CaaS or CSaaS) that will also gain momentum as companies turn to vendors who can provide extensive security infrastructure and support services at a fraction of the cost of building self-managed infrastructure.

We’ll start with zero trust, a critical element for any security program in this age of sophisticated and targeted cyberattacks.

Zero Trust Security

For decades, security architects have focused on perimeter protection, such as firewalls and other safety measures. However, as cloud computing increased, experts recognized that traditional strategies and solutions would not work in a mobile-first/hybrid world. User identities could no longer be confined to a company’s internal perimeter, and with employees needing access to business data and numerous SaaS applications while working remotely or on business travel, it became impossible to control access centrally.

The technology landscape is witnessing an emergence of security vendors rethinking the efficacy of their current security measures and offerings without businesses needing to rebuild entire architectures. One such approach is zero trust, which challenges perimeter network access controls by trusting no resources by default. Instead, zero trust redefines the network perimeter, treating all users and devices as inherently untrusted and likely compromised, regardless of their location within the network. Microsoft’s approach to zero trust security focuses on the contextual management of identities, devices, and applications—granting access based on the continual verification of identities, devices, and access to services.1


Zero trust security is a paradigm that leverages identity for access control and combines it with contextual data, continuous analysis, and automated response to ensure that the only network resources accessible to users and devices are those explicitly authorized for consumption.2

In Zero Trust Networks (O’Reilly, 2017), Evan Gilman and Doug Barth split a ZT network into five fundamental assertions:

  • The network is always assumed to be hostile.
  • External and internal threats exist on the web at all times.
  • Network locality is not sufficient for decided trust in a network.
  • Every device user and network flow is authenticated and authorized.
  • Policies must be dynamic and calculated from as many data sources as possible.3

Therefore, a zero trust architecture shifts from the traditional perimeter security model to a distributed, context-aware, and continuous policy enforcement model. In this model, requests for access to protected resources are first made through the control plane, where both the device and user must be continuously authenticated and authorized.

An identity first, contextual, and continual enforcement security approach will be especially critical for companies interested in implementing cloud services. Businesses will continue to focus on securing their identities, including device identities, to ensure that access control depends on context (user, device, location, and behavior) and policy-based rules to manage the expanding ecosystem of users and devices seeking access to corporate resources.

Enterprises that adopt a zero trust security model will more confidently allow access to their resources, minimize risks, and better mitigate cybersecurity attacks. IAM (identity and access management) is and will continue to be a critical component of a zero trust strategy.

The rise of cryptocurrency, the blockchain, and web3 technologies4 has also introduced conversations around decentralized identity and verifiable credentials.5 The decentralized identity model suggests that individuals own and control their data wherever or whenever used. This model will require identifiers such as usernames to be replaced with self-owned and independent IDs that enable data exchange using blockchain and distributed ledger technology to secure transactions. In this model, the thinking is that user data will no longer be centralized and, therefore, less vulnerable to attack.

By contrast, in the traditional identity model, where user identities are verified and managed by a third-party authority/identity provider (IdP), if an attacker gains access to the authority/IdP, they now have the keys to the kingdom, allowing full access to all identities.

Ransomware, an Emerging and Rapidly Evolving Threat

One of the most pressing security issues that businesses face today is ransomware. Ransomware is a type of malware that takes over systems and encrypts valuable company data requiring a ransom to be paid before the data is unlocked. The “decrypting and returning” that you pay for is, of course, not guaranteed; as such, ransomware costs are typically more than the costs of preparing for these attacks.

These types of attacks can be very costly for businesses, both in terms of the money they lose through ransomware and the potential damage to a company’s reputation. In addition, ransomware is a widespread method of attack because it works. As a result, the cybersecurity landscape will experience an increasing number of ransomware-related cybersecurity attacks estimated to cost businesses billions in damages.

So, how does it work? Cybercriminals utilize savvy social engineering tactics such as phishing, vishing, smishing, to gain access to a computer or device and launch a cryptovirus. The cryptovirus encrypts all files on the system, or multiple systems, accessible by that user. Then, the target (recipient) receives a message demanding payment for the decryption key needed to unlock their files. If the target (recipient) refuses to comply or fails to pay on time, the price of the decryption key increases exponentially, or the data is released and sold on the dark web. That is the simple case. With a growing criminal ecosystem, and subscription models like ransomware as a service (RaaS), we will continue to see compromised credentials swapped, sold, and exploited, and therefore, continued attacks across the globe.

Terms to Know

Phishing: a technique of fraudulently obtaining private information. Typically, the phisher sends an email that appears to come from a legitimate business—a bank or credit card company—requesting “verification” of information and warning of some dire consequence if it is not provided. The email usually contains a link to a fraudulent web page that seems legitimate—with company logos and content—and has a form requesting everything from a home address to an ATM card’s PIN or a credit card number.6

Smishing: the act of using SMS text messaging to lure victims into executing a specific action. For example, a text message claims to be from your bank or credit card company but includes a malicious link.

Vishing (voice phishing): a form of smishing except done via phone calls.

Cryptojacking: a type of cybercrime that involves unauthorized use of a device’s (computer, smartphone, tablet, server) computing power to mine or generate cryptocurrency.

Because people will trust an email from a person or organization that appears to be a trustworthy sender (e.g., you are more likely to trust an email that seems to be from a recognizable name/brand), these kinds of attacks are often successful.

As these incidents continue to be a daily occurrence, we’ve seen companies like Netflix and Amazon invest in cyber insurance and increase their cybersecurity budgets. However, on a more positive note, mitigating the risk of ransomware attacks has led companies to reassess their approach to protecting their organizations by shoring up defenses with more robust security protocols and advanced technologies. With companies storing exponentially more data than ever before, securing it has become critical.

The future of ransomware is expected to be one that will continue to grow in numbers and sophistication. These attacks are expected to impact even more companies, including targeted attacks focused on supply chains, industrial control systems, hospitals, and schools. As a result, we can expect that it will continue to be a significant threat to businesses.

Mobile Device Security

One of the most prominent areas of vulnerability for businesses today is through the use of mobile devices. According to Verizon’s Mobile Security Index 2020 Report,7 39% of businesses had a mobile-related breach in 2020. User threats, app threats, device threats, and network dangers were the top five mobile security threats identified in 2020, according to the survey. One example of a mobile application security threat can be an individual downloading apps that look legitimate but are actually spyware and malware aimed at stealing personal and business information.

Another potential problem involves employees accessing and storing sensitive data or emails on their mobile devices while traveling from one domain to another (for example, airport WiFi, coffee shop WiFi).

Security experts believe that mobile device security is still in its early stages, and many of the same guidelines used to secure traditional computers may not apply to modern mobile devices. While mobile device management (MDM) solutions are a great start, organizations will need to rethink how they handle mobile device security in enterprise environments. The future of mobile device management will also be dependent on contextual data and continuous policy enforcement.

With mobile technology and cloud computing becoming increasingly important to both business and consumer life, smart devices like Apple AirTags, smart locks, video doorbells, and so on are gaining more weight in the cybersecurity debate.

Security concerns range from compromised accounts to stolen devices, and as such, cybersecurity companies are offering new products to help consumers protect their smart homes.

A key issue involving the future of mobile device management is how enterprises can stay ahead of new security issues as they relate to bring your own device (BYOD) and consumer IoT (Internet of Things) devices. Security professionals may also need to reevaluate how to connect a growing number of smart devices in a business environment. Security has never been more important, and new trends will continue to emerge as we move through the future of BYOD and IoT.

Cloud Security and Automation

We have seen an increase in businesses moving their operations to the cloud to take advantage of its benefits, such as increased efficiency and scalability. As a result, the cloud is becoming an integral part of how organizations secure their data, with many companies shifting to a hybrid cloud model to address scale, security, legacy technologies, and architectural inefficiencies. However, staffing issues and the complexities of moving from on-premises to cloud/hybrid cloud introduces a new set of security concerns.

Cloud services are also often outsourced, and as such, it can be challenging to determine who is responsible for the security of the data. In addition, many businesses are unaware of the vulnerabilities that exist in their cloud infrastructure and, in many cases, do not have the needed staff to address these vulnerabilities. As a result, security will remain one of the biggest challenges for organizations adopting cloud computing.

One of the most significant benefits cloud computing can provide to security is automation. The need for security automation is rising as manual processes and limited information-sharing capabilities slow the evolution of secure implementations across many organizations. It is estimated that nearly half of all cybersecurity incidents are caused by human error, mitigated through automated security tools rather than manual processes.

However, there can be a downside to automation. The industry has not yet perfected the ability to sift signals from large amounts of noise. An excellent example is what happens around incident response and vulnerability management—both still rely on human intervention or an experienced automation/tooling expert. Industry tooling will need to improve in this area. While automation can also help reduce the impact of attacks, any automated solution runs the risk of being ineffective against unknown threats if human eyes do not assess it before it is put into practice.

In a DevOps environment, automation takes the place of human labor. The key for security will be code-based configuration, and the ability to be far more confident about the current state of existing security and infrastructure appliances. Organizations that have adopted configuration by code will also have higher confidence during audits—for example, an auditor checks each process for changing firewall rules, which already go through change control, then spot checks one out of thousands of rules versus validating the CI/CD pipeline. The auditor then runs checks on your configuration to confirm it meets policy.

The evolution of SOAR (security, orchestration, automation, and response) tools and automation of security policy by code will open up a huge potential benefit for well-audited businesses in the future.

Automation May Help with the Security Workforce Shortage

The shortage of cyber workers will persist because there aren’t enough cybersecurity professionals in the workforce, and cyber education isn’t keeping up with the demand at a solid pace. As a result, cybersecurity teams are understaffed and burnt-out, lowering their effectiveness while posing risks.

Automation may help organizations fill the cybersecurity talent gap and address many of the same activities that human employees perform, such as detection, response, and policy configuration.

While automation cannot completely replace the need for human cybersecurity experts, it can assist in decreasing the burden on these professionals and make them more successful in their work. In addition to more professionals joining the field with varying backgrounds, automated technologies will play a significant role in mitigating the impact of cyberattacks and assisting in solving the cybersecurity workforce shortage problem.

(Cyber)Security as a Service

Cybersecurity as a service (CaaS or CSaaS) is growing more popular as companies turn to managed service vendors that can provide extensive security infrastructure and support services at a fraction of the cost of building self-managed infrastructure. As a result, organizations can use their resources more effectively by outsourcing security needs to a specialized vendor rather than building in-house infrastructure.

CaaS provides managed security services, intrusion detection and prevention, and firewalls by a third-party vendor. By outsourcing cybersecurity functions to a specialist vendor, companies can access the security infrastructure support they need without investing in extensive on-site infrastructure, such as firewalls and intrusion detection systems (IDS).

There are additional benefits:

  • Access to the latest threat protection technologies.
  • Reduced costs: outsourced cybersecurity solutions can be less expensive than an in-house security team.
  • Improved internal resources: companies can focus on their core business functions by outsourcing security to a third party.
  • Flexibility: companies can scale their security needs as needed.

The ransomware attack on Hollywood Presbyterian Medical Center8 is an excellent example of why CaaS will continue to be sought after by organizations of all sizes. Cybercriminals locked the hospital’s computer systems and demanded a ransom payment to unlock them. As a result, the hospital was forced to turn to a cybersecurity vendor for help in restoring its computer systems.

Of course, this approach has disadvantages:

  • Loss of control over how data is stored and who has access to your data/infrastructure. Security tooling often needs to run at the highest levels of privilege, enabling attackers to attack enterprises at scale, use the managed service provider network to bypass security safeguards, or exploit software vulnerabilities like SolarWinds Log4j.
  • In addition, CaaS providers may or may not support existing legacy software or critical business infrastructure specific to each organization.

CaaS is expected to continue on a solid growth path as more enterprises rely on cloud-based systems and the IoT for their business operations.


Cyberattacks continue to be successful because they are effective. Thanks to cutting-edge technology, services, and techniques available to every attacker, organizations can no longer afford to make security an afterthought. To defend against present and future cyberattacks, businesses must develop a comprehensive security plan that incorporates automation, analytics, and context-aware capabilities. Now more than ever, companies must be more diligent about protecting their data, networks, and employees.

Whether businesses embrace identity-first and context-aware strategies like zero trust, or technologies like cloud computing, mobile devices, or cybersecurity as a service (CaaS), the growth of ransomware and other cyberattacks is forcing many companies to rethink their overall cybersecurity strategies. As a result, organizations will need to approach security holistically by including all aspects of their business operation and implementing in-depth defense strategies from the onset.

The future is bright for the cybersecurity industry, as companies will continue to develop new technologies to guard against the ever-evolving threat landscape. Government rules, regulations, and security procedures will also continue to evolve to keep up with emerging technologies and the rapid number of threats across both private and public sectors.


1. “Transitioning to Modern Access Architecture with Zero Trust”.

2. Scott Rose et al., NIST Special Publication 800-207.

3. Evan Gilman and Doug Barth, Zero Trust Networks (O’Reilly, 2017).

4. See “Decentralized Identity for Crypto Finance”.

5. See “Verifiable Credentials Data Model”.

6. See this social engineering article for more information.

7. “The State of Mobile Security”.

8. “Hollywood Hospital Pays $17,000 in Bitcoin to Hackers; FBI Investigating”.

Categories: Technology

Identity problems get bigger in the metaverse

O'Reilly Radar - Tue, 2022/03/15 - 07:01

If the hype surrounding the metaverse results in something real, it could improve the way you live, work, and play. Or it could create a hellworld where you don’t get to be who you are or want to be.  Whatever people think they’ve read, the metaverse originally imagined in Snow Crash is not a vision for an ideal future. In the novel, it’s a world that replaced the “real world” so that people would feel less bad about the reality they actually had. In the end, the story is about the destabilization of the individual’s identity and implosion of traditional identities, rather than the securing of a new one.

Even in the real world (a.k.a. meatspace), identity can be hard to pin down. You are who you are, but there are many ways you may define yourself depending on the context. In the latest metaverse discourse there has been lots of talk of virtual avatars putting on NFT-based clothing, skins, weapons, and other collectable assets, and then moving those assets around to different worlds and games without issue. Presentation is just a facet of identity, as the real-world fashion industry well knows.

The latest dreams of web3 include decentralized and self-sovereign identity. But this is just re-hashing years of identity work that focuses on the how (internet standards) and rarely the why (what people need to feel comfortable with identity online). Second Life has been grappling with how people construct a new identity and present their avatars since 2003.

There are many ways that the web today and the metaverse tomorrow will continue to integrate further with our reality:

ExperiencesExamples Online through a laptop like the web todayPosting to Facebook, discussing work on Slack or joining a DAO on Discord.Mobile devices while walking around in the real worldSeeing the comments about a restaurant while standing in front of it, getting directions to a beach or getting access to a private club via an NFT.Mixed and augmented reality (MR/AR) experiences where the digital is overlaid on realityChatting with someone who looks like they are sitting next to you or seeing the last message you sent to someone you are talking to.Fully immersive virtual reality (VR) experiencesGoing to a chat room in AltspaceVR or playing a game with friends in Beatsaber.

Before we can figure out what identity means to people in “the metaverse,” we need to talk about what identity is, how we use identity in the metaverse, and how we might create systems that better realize the way people want their identities to work online.

I login therefore I am

When I mention identity, am I starting a philosophical discussion that answers the question “who am I?” Am I trying to figure out my place within an in-person social event? Or do you want to confirm that I meet some standard, such as being over 21?

All of these questions have a meaning in the digital world; most often, those questions are answered by logging in with an email address and password to get into a particular website. Over the last decade, some services like Facebook, Google, and others have started to allow you to use the identity you have with them to log into other websites.

Is the goal of online identity to have one overarching identity that ties everything together? Our identities are constantly renegotiated and unsaid. I don’t believe we can encode all of the information about our identities into a single digital record, even if some groups are trying. Facebook’s real-name policy requires you to use your legal name and makes you collapse all of your possible pseudo-identities into your legal one. If they think you aren’t using a legal name, they require you to upload a government issued document. I’d argue that because people create multiple identities even when faced with account deactivation, it is not their goal to have one single compiled identity.

All of me(s)

As we consider identities in the metaverse extensions to the identities we have in the real world, we need to understand that we build pseudo-identities for different interactions. My pseudo-identities for a family, work, my neighborhood, PTA, school friends, etc. all overlap to some extent. These are situations, contexts, realms, or worlds that I am part of, and that extend to the web and metaverse.

In most pseudo-identities there are shared parts that are the “real me,” like my name or my real likeness. Some may be closer to a “core” pseudo-identity that represents more of what I consider to be me; others may just be smaller facets. Each identity is associated with a different reputation, a different level of trust from the community, and different data (profile pictures, posts, etc.).

The most likely place to find our identities are:

  • Lists of email and password pairs stored in our browsers
  • Number of groups we are part of on Facebook
  • Gamer tags we have on Oculus, Steam, or PSN
  • Discords we chat on
  • …and the list goes on

Huge numbers of these identities are being created and managed by hand today. On average, a person has 1.75 email addresses and manages 90 online accounts. It will only get more complex and stranger with the addition of the metaverse.

There are times that I don’t want my pseudo-identity’s reputations or information to interact with a particular context; for these cases, I’ll create a pseudo-anonymous identity. There is a lot of prior work on anonymity as a benefit:

  • Balaji Srinivasan has discussed the value of an economy based on pseudonymous identities as a way to “air gap” against repercussions of social problems.
  • Jeff Kosseff, professor and author, has recently written a book about the benefits of anonymity “The United States of Anonymous.” In a great discussion on the TechDirt podcast he talks about how the ability to question powers is an important aspect of the ability to be anonymous.
  • Christopher “moot” Poole, the creator of 4chan, has often talked about the benefits of anonymous online identities including the ability to be more creative without the risk of failure. Given the large amount of harmful abuse that comes out of communities like 4chan, this argument for anonymity is questionable.
My many identities and overlapping zones of attributes, information, and privacy.

If you link one of my pseudo-identities to another pseudo-identity in a way I didn’t expect, it can feel like a violation. I expect to control the flow of information about me (see Helen Nissenbaum’s work on contextual integrity for insight into a beneficial privacy framework). I don’t want my online poker group’s standing to be shown to the PTA, with which I discuss school programs. Teachers who have OnlyFans accounts have been fired when the accounts are discovered. Journalists reporting on cartel activities have been killed. Twitter personalities that use their real names can be doxed by someone who links their Twitter profile to a street address and mobile phone number. This can have horrible consequences.

In the real world, we have many of these pseudo-identities and pseudo-anonymous identities. We even have an expectation of anonymity in groups like Alcoholics Anonymous and private clubs. If we look to Second Life, some people would adopt core pseudo-identities and others pseudo-anonymous identities.

In the online world and, eventually, the metaverse, we will have more control over the use of our identities and pseudo-identities, but possibly less ability to understand how these identities are being handled by each system we are part of. Our identities can already collide in personal devices (for example, my mobile phone) and communal devices (for example, the voice assistant in my kitchen around my family).

How do you recognize someone in the metaverse?

In the real world we recognize people by their face, and identify them by a name in our heads (if you are good at that sort of thing). We may remember the faces of some people we pass on the street, but in a city, we don’t really know most of the people who we are around.

A few of the author’s identities online and in the metaverse.

The person you’re communicating with may show up with a real name, a nickname, or even a pseudo-anonymous name. Their picture might be a professional photo, a candid picture, or an anime avatar, or some immersive presentation. All of these identifiers are protected by login, multi-factor authentication, or other mechanisms–yet people are hacked all the time. A site like Facebook tries to give you assurances that you are interacting with the person you think you’re interacting with; this is one justification for their real-name policy. Still, there is a difference between the logical “this is this person because Facebook says so” and the emotional “this feels like the person because my senses say so.” With improvements in immersion and building “social presence” (a theory of “sense of being with another”), we may be tricked more easily into providing better engagement metrics for a social media site. I may even feel that AI-generated faces based on people I know are more trustworthy than actual images of the people themselves.

What if you could give your online avatar your voice, and even make it use idioms you use? This type of personal spoofing may not always be nefarious. You might just want a bot that could handle low value conversations, say with a telemarketer or bill collector.

We can do better than “who can see this post”

To help people grapple with the increased complexity of identity in the metaverse, we need to rethink the way we create, manage, and eventually retire our identities. It goes way beyond just choosing what clothing to wear on a virtual body.

When you start to add technologies that tie everything you do to a public, immutable record, you may find that something you wish could be forgotten is remembered. What should be “on the chain” and how should you decide? Codifying aspects of our reputation is a dream of web3. The creation of digitally legible reputation can cause ephemeral and unsaid aspects of our identities to be stored forever. And an immutable public record of reputation data will no doubt conflict with legislation such as GDPR or CCPA.

The solutions to these problems are neither simple nor available today. To move in the right direction we should consider the following key principles when reconsidering how identities work in the metaverse so that we don’t end up with a dystopia:

  1. I want to control the flow of information rather than simply mark it as public or private: Contextual Integrity argues that the difference between “public” and “private” information hides the real issue, which is how information flows and where it is used.
  2. I want to take time to make sure my profile is right: Many development teams worry about adding friction to the signup process; they want to get new users hooked as soon as possible. But it’s also important to make sure that new users get their profile right. It’s not an inherently bad idea to slow down the creation and curation of a profile, especially if it is one the user will be associated with  for a long time. Teams that worry about friction have never seen someone spend an hour tweaking their character’s appearance in a video game.
  3. I want to experiment with new identities rather than commit up front: When someone starts out with a new service, they don’t know how they want to represent themselves. They might want to start with a blank avatar. On the other hand, the metaverse is so visually immersive that people who have been there for a while will have impressive avatars, and new people will stick out.
  4. I’m in control of the way my profiles interact: When I don’t want profiles not to overlap, there is usually a good reason. Services that assume we want everything to go through the same identity are making a mistake.  We should trust that the user is making a good choice.
  5. I can use language I understand to control my identities: Creating names is creating meanings. If I want to use something simple like “my school friends,” rather than a specific school name, I should be able to do so. That freedom of choice allows the user to supply the name’s meaning, rather than having it imposed from the outside.
  6. I don’t want shadow profiles created about me: A service violates my expectations of privacy when it links together various identities. Advertising platforms are already doing this through browser fingerprinting. It gets even worse when you start to use biometric and behavioral data, as Kent Bye from the Voices of VR podcast has warned. Unfortunately, users may never have control over these linkages; it may require regulation to correct.
  7. I should be warned when there are effects I might not understand due to multiple layers interacting: I should get real examples from my context to help me understand these interactions. It is the service developer’s job to help users avoid mistakes.

Social media sites like Facebook have tried to address some of these principles. For example, Facebook’s access controls for posts allow for “public,” “friends,” “friends except…,” “specific friends,” “only me,” and “custom.” These settings are further modified by the Facebook profile privacy control settings. It often (perhaps usually) isn’t clear what is actually happening and why, nor is it clear who will or won’t be able to see a post. This confusion is a recipe for violating social norms and privacy expectations.

Next, how do we allow for interaction? This isn’t as simple as creating circles of friends (an approach that Google+ tried). How do we visualize the various identities we currently have? More user research needs to go into how people would understand these constructions of identity on a web or virtual experience. My hunch is that they need to align some identities together (like family and PTA), and to separate out others (like gamertags). I don’t think requiring users to maintain a large set of access control lists (ACLs) is the right way to control interaction between identities.

The life of my identity

Finally, identities have life cycles. Some exist for a long time once established, like my family, but others may be short lived. I might try out participation in a community, and then find it isn’t for me. There are five key steps in the lifecycle of an identity:

  1. Create a new identity – this happens when I log into a new service or world. The new identity will need to be aligned with or separated from other identities.
  2. Share some piece of information with an identity – every meaningful identity is attached to data: common profile photos, purchased clothing, facial characteristics, voices, etc.
  3. Recover after being compromised – “oops I was hacked” will happen. What do people need to do to clean this up?
  4. Losing and recovering – if I lose the key to access this identity, is there a way I can get it back?
  5. Delete or close an identity, for now – people walk away from groups all the time. Usually they will just drift off or ghost; there should be a better way.

All services that plan on operating in the metaverse will need to consider these different stages. If you don’t, you will create systems that fail in ways that expose people to harm.

Allow for the multiplicity of a person in the metaverse

If you don’t think about the requirements of people, their identities, and the lifecycle of new identities, you will build services that don’t match your users’ expectations, in particular, their expectations of privacy.

Identity in the metaverse is more than a costume that you put on. It will consist of all the identities, pseudo-identities, and pseudo-anonymous identities we take on today, but displayed in a way that can fool us. We can’t forget that we are humans experiencing a reality that speaks to the many facets we have inside ourselves.

If all of us don’t take action, a real dystopia will be created that keeps people from being who they really are. As you grow and change, you will be weighed down by who you might have been at one point or who some corporation assumed you were. You can do better by building metaverse systems that embrace the multiple identities people have in real life.

If you lose your identity in your metaverse, you lose yourself for real.

Categories: Technology

Topic for Mar 10th meeting

PLUG - Thu, 2022/03/10 - 08:08
This is a remote meeting. Please join by going to at 7pm on Thursday Mar 10th

Robin: Ready Upon Ignition

Goes over a project I have been making, showing how one might automate configurations or validate them without ever running a command.
Then will be showing how the information can be used to automate system setup with Ignition files.

Systems admin that automates deployments of services and make them easy enough for most people to use.

Recommendations for all of us

O'Reilly Radar - Thu, 2022/03/10 - 07:07

If you live in a household with a communal device like an Amazon Echo or Google Home Hub, you probably use it to play music. If you live with other people, you may find that over time, the Spotify or Pandora algorithm seems not to know you as well. You’ll find songs creeping into your playlists that you would never have chosen for yourself.  The cause is often obvious: I’d see a whole playlist devoted to Disney musicals or Minecraft fan songs. I don’t listen to this music, but my children do, using the shared device in the kitchen. And that shared device only knows about a single user, and that user happens to be me.

More recently, many people who had end-of-year wrap up playlists created by Spotify found that they didn’t quite fit, including myself:


This kind of a mismatch and narrowing to one person is an identity issue that I’ve identified in previous articles about communal computing.  Most home computing devices don’t understand all of the identities (and pseudo-identities) of the people who are using the devices. The services then extend the behavior collected through these shared experiences to recommend music for personal use. In short, these devices are communal devices: they’re designed to be used by groups of people, and aren’t dedicated to an individual. But they are still based on a single-user model, in which the device is associated with (and collects data about) a single identity.

These services should be able to do a better job of recommending content for groups of people. Platforms like Netflix and Spotify have tried to deal with this problem, but it is difficult. I’d like to take you through some of the basics for group recommendation services, what is being tried today, and where we should go in the future.

Common group recommendation methods

After seeing these problems with communal identities, I became curious about how other people have solved group recommendation services so far. Recommendation services for individuals succeed if they lead to further engagement. Engagement may take different forms, based on the service type:

  • Video recommendations – watching an entire show or movie, subscribing to the channel, watching the next episode
  • Commerce recommendations – buying the item, rating it
  • Music recommendations – listening to a song fully, adding to a playlist, liking

Collaborative filtering (deep dive in Programming Collective Intelligence) is the most common approach for doing individual recommendations. It looks at who I overlap with in taste and then recommends items that I might not have tried from other people’s lists. This won’t work for group recommendations because in a group, you can’t tell which behavior (e.g., listening or liking a song) should be attributed to which person. Collaborative filtering only works when the behaviors can all be attributed to a single person.

Group recommendation services build on top of these individualized concepts. The most common approach is to look at each individual’s preferences and combine them in some way for the group. Two key papers discussing how to combine individual preferences describe PolyLens, a movie recommendation service for groups, and CATS, an approach to collaborative filtering for group recommendations. A paper on ResearchGate summarized research on group recommendations back in 2007.

According to the PolyLens paper, group recommendation services should “create a ‘pseudo-user’ that represents the group’s tastes, and to produce recommendations for the pseudo-user.” There could be issues about imbalances of data if some members of the group provide more behavior or preference information than others. You don’t want the group’s preferences to be dominated by a very active minority.

An alternative to this, again from the PolyLens paper, is to “generate recommendation lists for each group member and merge the lists.” It’s easier for these services to explain why any item is on the list, because it’s possible to show how many members of the group liked a particular item that was recommended. Creating a single pseudo-user for the group might obscure the preferences of individual members.

The criteria for the success of a group recommendation service are similar to the criteria for the success of individual recommendation services: are songs and movies played in their entirety? Are they added to playlists? However, group recommendations must also take into account group dynamics. Is the algorithm fair to all members of the group, or do a few members dominate its recommendations? Do its recommendations cause “misery” to some group members (i.e., are there some recommendations that most members always listen to and like, but that some always skip and strongly dislike)?

There are some important questions left for implementers:

  1. How do people join a group?
  2. Should each individual’s history be private?
  3. How do issues like privacy impact explainability?
  4. Is the current use to discover something new or to revisit something that people have liked previously (e.g. find out about a new movie that no one has watched or rewatch a movie the whole family has seen together since it is easy)?

So far, there is a lot left to understand about group recommendation services. Let’s talk about a few key cases for Netflix, Spotify, and Amazon first.

Netflix avoiding the issue with profiles, or is it?

Back when Netflix was primarily a DVD service (2004), they launched profiles to allow different people in the same household to have different queues of DVDs in the same account. Netflix eventually extended this practice to online streaming. In 2014, they launched profiles on their streaming service, which asked the question “who’s watching?” on the launch screen. While multiple queues for DVDs and streaming profiles try to address similar problems they don’t end up solving group recommendations. In particular, streaming profiles per person leads to two key problems:

  • When a group wants to watch a movie together, one of the group’s profiles needs to be selected. If there are children present, a kids’ profile will probably be selected.  However, that profile doesn’t take into account the preferences of adults who are present.
  • When someone is visiting the house, say a guest or a babysitter, they will most likely end up choosing a random profile. This means that the visitor’s behavioral data will be added to some household member’s profile, which could skew their recommendations.

How could Netflix provide better selection and recommendation streams when there are multiple people watching together? Netflix talked about this question in a blog post from 2012, but it isn’t clear to customers what they are doing:

That is why when you see your Top10, you are likely to discover items for dad, mom, the kids, or the whole family. Even for a single person household we want to appeal to your range of interests and moods. To achieve this, in many parts of our system we are not only optimizing for accuracy, but also for diversity.

Netflix was early to consider the various people using their services in a household, but they have to go further before meeting the requirements of communal use. If diversity is rewarded, how do they know it is working for everyone “in the room” even though they don’t collect that data? As you expand who might be watching, how would they know when a show or movie is inappropriate for the audience?

Amazon merges everyone into the main account

When people live together in a household, it is common for one person to arrange most of the repairs or purchases. When using Amazon, that person will effectively get recommendations for the entire household. Amazon focuses on increasing the number of purchases made by that person, without understanding anything about the larger group. They will offer subscriptions to items that might be consumed by a whole household, but mistaking those for the purchases of an individual.

The result is that the person who wanted the item will never see additional recommendations they may have liked if they aren’t the main account holder–and the main account holder might ignore those recommendations because they don’t care. I wonder if Amazon changes recommendations to individual accounts that are part of the same Prime membership; this might address some of this mismatch.

The way that Amazon ties these accounts together is still subject to key questions that will help create the right recommendations for a household. How might Amazon understand that purchases such as food and other perishables are for the household, rather than an individual? What about purchases that are gifts for others in the household?

Spotify is leading the charge with group playlists

Spotify has created group subscription packages called Duo (for couples) and Premium Family (for more than two people). These packages not only simplify the billing relationship with Spotify; they also provide playlists that consider everyone in the subscription.

The shared playlist is the union of the accounts on the same subscription. This creates a playlist of up to 50 songs that all accounts can see and play. There are some controls that allow account owners to flag songs that might not be appropriate for everyone on the subscription. Spotify provides a lot of information about how they construct the Blend playlist in a recent blog post. In particular, they weighed whether they should try to reduce misery or maximize joy:

“Minimize the misery” is valuing democratic and coherent attributes over relevance. “Maximize the joy” values relevance over democratic and coherent attributes. Our solution is more about maximizing the joy, where we try to select the songs that are most personally relevant to a user. This decision was made based on feedback from employees and our data curation team.

Reducing misery would most likely provide better background music (music that is not unpleasant to everyone in the group), but is less likely to help people discover new music from each other.

Spotify was also concerned about explainability: they thought people would want to know why a song was included in a blended playlist. They solved this problem, at least partly, by showing the picture of the person from whose playlists the song came.

These multi-person subscriptions and group playlists solve some problems, but they still struggle to answer certain questions we should ask about group recommendation services. What happens if two people have very little overlapping interest? How do we detect when someone hates certain music but is just OK with others? How do they discover new music together?

Reconsidering the communal experience based on norms

Most of the research into group recommendation services has been tweaking how people implicitly and explicitly rate items to be combined into a shared feed. These methods haven’t considered how people might self-select into a household or join a community that wants to have group recommendations.

For example, deciding what to watch on a TV may take a few steps:

  1. Who is in the room? Only adults or kids too? If there are kids present, there should be restrictions based on age.
  2. What time of day is it? Are we taking a midday break or relaxing after a hard day? We may opt for educational shows for kids during the day and comedy for adults at night.
  3. Did we just watch something from which an algorithm can infer what we want to watch next? This will lead to the next episode in a series.
  4. Who hasn’t gotten a turn to watch something yet? Is there anyone in the household whose highest-rated songs haven’t been played? This will lead to turn taking.
  5. And more…

As you can see, there are contexts, norms, and history are all tied up in the way people decide what to watch next as a group. PolyLens discussed this in their paper, but didn’t act on it:

The social value functions for group recommendations can vary substantially. Group happiness may be the average happiness of the members, the happiness of the most happy member, or the happiness of the least happy member (i.e., we’re all miserable if one of us is unhappy). Other factors can be included. A social value function could weigh the opinion of expert members more highly, or could strive for long-term fairness by giving greater weight to people who “lost out” in previous recommendations.

Getting this highly contextual information is very hard. It may not be possible to collect much more than “who is watching” as Netflix does today. If that is the case, we may want to reverse all of the context to the location and time. The TV room at night will have a different behavioral history than the kitchen on a Sunday morning.

One way to consider the success of a group recommendation service is how much browsing is required before a decision is made? If we can get someone watching or listening to something with less negotiation, that could mean the group recommendation service is doing its job.

With the proliferation of personal devices, people can be present to “watch” with everyone else but not be actively viewing. They could be playing a game, messaging with someone else, or simply watching something else on their device. This flexibility raises the question of what “watching together” means, but also lowers the concern that we need to get group recommendations right all the time.  It’s easy enough for someone to do something else. However, the reverse isn’t true.  The biggest mistake we can make is to take highly contextual behavior gathered from a shared environment and apply it to my personal recommendations.

Contextual integrity and privacy of my behavior

When we start mixing information from multiple people in a group, it’s possible that some will feel that their privacy has been violated. Using some of the framework of Contextual Integrity, we need to look at the norms that people expect. Some people might be embarrassed if the music they enjoy privately was suddenly shown to everyone in a group or household. Is it OK to share explicit music with the household even if everyone is OK with explicit music in general?

People already build very complex mental models about how services like Spotify work and sometimes personify them as “folk theories.” The expectations will most likely change if group recommendation services are brought front and center. Services like Spotify will appear to be more like a social network if they don’t bury who is currently logged into a small profile picture in the corner;  they should show everyone who is being considered for the group recommendations at that moment.

Privacy laws and regulations are becoming more patchwork not only worldwide (China has recently created regulation of content recommendation services) but even within states of the US. Collecting any data without appropriate disclosure and permission may be problematic. The fuel of recommendation services, including group recommendation services, is behavioral data about people that will fall under these laws and regulations. You should be considering what is best for the household over what is best for your organization.

The dream of the whole family

Today there are various efforts for improving recommendations to people living in households.  These efforts miss the mark by not considering all of the people who could be watching, listening, or consuming the goods. This means that people do not get what they really want, and that companies get less engagement or sales than they would like.

The key to fixing these issues is to do a better job of understanding who is in the room, rather than making assumptions that reduce all the group members down to a single account. To do so will require user experience changes that bring the household community front and center.

If you are considering how you build these services, start with the expectations of the people in the environment, rather than forcing the single user model on people. When you do, you will provide something great for everyone who is in the room: a way to enjoy something together.

Categories: Technology

Epstein Barr and the Cause of Cause

O'Reilly Radar - Tue, 2022/03/08 - 05:17

One of the most intriguing news stories of the new year claimed that the Epstein-Barr virus (EBV) is the “cause” of Multiple Sclerosis (MS), and suggested that antiviral medications or vaccinations for Epstein-Barr could eliminate MS.

I am not an MD or an epidemiologist. But I do think this article forces us to think about the meaning of “cause.” Although Epstein-Barr isn’t a familiar name, it’s extremely common; a good estimate is that 95% of the population is infected with it. It’s a variant of Herpes; if you’ve ever had mononucleosis, you’ve had it; most infections are asymptomatic. We hear much more about MS; I’ve had friends who have died from it. But MS is much less common: about 0.036% of the population has it (35.9 per 100,000).

We know that causation isn’t a one-size-fits-all thing: if X happens, then Y always happens. Lots of people smoke; we know that smoking causes lung cancer; but many people who smoke don’t get lung cancer. We’re fine with that; the causal connection has been painstakingly documented in great detail, in part because the tobacco industry went to such great lengths to spread misinformation.

But what does it mean to say that a virus that infects almost everyone causes a disease that affects very few people? The researchers appear to have done their job well. They studied 10 million people in the US military. 5 percent of those were negative for Epstein-Barr at the start of their service. 955 of that group were eventually diagnosed with MS, and had been infected with EBV prior to their MS diagnosis, indicating a risk factor 32 times higher than for those without EBV.

It is certainly fair to say that Epstein-Barr is implicated in MS, or that it contributes to MS, or some other phrase (that could not unreasonably be called “weasel words”). Is there another trigger that only has an effect when EBV is already present? Or is EBV the sole cause of MS, a cause that just doesn’t take effect in the vast majority of people?

This is where we have to think very carefully about causality, because as important as this research is, it seems like something is missing. An omitted variable, perhaps a genetic predisposition? Some other triggering condition, perhaps environmental? Cigarettes were clearly a “smoking gun”:  10 to 20 percent of smokers develop lung cancer (to say nothing of other diseases). EBV may also be a smoking gun, but one that only goes off rarely.

If there are no other factors, we’re justified in using the word “causes.” But it’s hardly satisfying—and that’s where the more precise language of causal inference runs afoul of human language. Mathematical language is more useful: Perhaps EBV is “necessary” for MS (i.e., EBV is required; you can’t get MS without it), but clearly not “sufficient” (EBV doesn’t necessarily lead to MS). Although once again, the precision of mathematics may be too much.

Biological systems aren’t necessarily mathematical, and it is possible that there is no “sufficient” condition; EBV just leads to MS in an extraordinarily small number of instances. In turn, we have to take this into account in decision-making. Does it make sense to develop a vaccine against a rare (albeit tragic, disabling, and inevitably fatal) disease? If EBV is implicated in other diseases, possibly. However, vaccines aren’t without risk (or expense), and even though the risk is very small (as it is for all the vaccines we use today), it’s not clear that it makes sense to take that risk for a disease that very few people get. How do you trade off a small risk against a very small reward? Given the anti-vax hysteria around COVID, requiring children to be vaccinated for a rare disease might not be poor public health policy; it might be the end of public health policy.

More generally: how do you build software systems that predict rare events? This is another version of the same problem—and unfortunately, the policy decision we are least likely to make is not to create such software. The abuse of such systems is a clear and present danger: for example, AI systems that pretend to predict “criminal behavior” on the basis of everything from crime data to facial images, are already being developed. Many are already in use, and in high demand from law enforcement agencies. They will certainly generate far more false positives than true positives, stigmatizing thousands (if not millions) of people in the process. Even with carefully collected, unbiased data (which doesn’t exist), and assuming some kind of causal connection between past history, physical appearance, and future criminal behavior (as in the discredited 19th century pseudoscience of physiognomy), it is very difficult, if not impossible, to reason from a relatively common cause to a very rare effect. Most people don’t become criminals, regardless of their physical appearance. Deciding a priori who will can only become an exercise in applied racism and bias.

Virology aside, the Epstein-Barr virus has one thing to teach us. How do we think about a cause that rarely causes anything? That is a question we need to answer.

Categories: Technology

Radar trends to watch: March 2022

O'Reilly Radar - Tue, 2022/03/01 - 05:14

February was a short month, but it wasn’t short in interesting technology. Don Norman has published some excerpts from his forthcoming book, Design for a Better World, which will almost certainly become another classic. DeepMind has released some information about AlphaCode, which solves problems from coding competitions well enough to put it in the mid range of competitors. And Holochain is a decentralized framework for building peer-to-peer microservices–no cloud provider needed. Is it another component of Web3 or something new and different?

Artificial Intelligence
  • NVIDIA has developed techniques for training primitive graphical operations for neural networks in near real-time.
  • Why isn’t AI used more to protect vulnerable people? Poor data quality, lack of accountability, lack of explainability, and the misuse of data–all problems that could make vulnerable people even more so.
  • A tool that predicts where code needs comments isn’t quite as flashy as Github, Copilot, or AlphaCode, but it’s another way AI applications can partner with humans.
  • Face recognition in virtual reality: In a fascinating combination of work from AI and neurology, researchers have used EEGs to detect facial expressions and used those expressions to control a virtual reality environment.
  • In a collaboration between DeepMind and the Swiss Plasma Center, Deep Learning has been used to control the plasma in a fusion reactor.
  • DeepMind argues that “reward is enough”; reinforcement learning, in which algorithms are trained by maximizing rewards, is sufficient to reach artificial general intelligence. Specialized algorithms for different domains are not necessary.
  • AI assistants could greatly reduce the work it takes to discover important new materials.  Could this lead to a “golden age of materials science”?
  • Mozilla’s Common Voice dataset contains 13 million voice clips in 87 languages from over 200,000 volunteers. Their goal is to collect real-world samples from speakers in as many languages as possible, as an aid to training natural language systems.
  • All datasets have world views” is an excellent interactive article showing how bias, labeling, and data go hand in hand.  Datasets always come with histories and politics.
  • Diffusion models are a fascinating technique for training an AI system to work with signals like images and sound: convert the signal into noise, and train a model to reverse the process. This process can produce predictions about the source that are more accurate (and computationally efficient) than you can obtain from autoencoders.
  • AlphaCode is DeepMind’s answer to Copilot: an automated system for writing software.  It can solve coding challenges from competitions with roughly 50% accuracy.
  • From Joanna Bryson: “The temptation of automation is to force conformity on humans, because humans learn better than machines do, but then ironically humans, while their productivity may be enhanced, their individual value is lost, creating a spiral of lowering wages and expectations.”  The real question, as Bryson says, is whether we can use AI to enable people to flourish.
  • A startup that works with law enforcement says that it is developing the systems that will identify faces based on DNA. They are not publishing the details, and scientists working in both AI and biology are extremely skeptical.
  • “Serverless” development is declining. Is serverless just a halfway step towards event-driven programming, which is the real destination?
  • Monorepos, which are single source repositories that include many projects with well-defined relationships, are becoming increasingly popular and are supported by many build tools.
  • Here’s an excellent discussion of concurrency in several different programming languages, and what can be learned from them. Using concurrency effectively will be an important theme for the foreseeable future.
  • I admit I don’t understand the fuss over Wordle.  I am sure I saw this game on the Web some 20 years ago. But I am excited to see an implementation of Wordle in 50 lines of bash!
  • Dynaboard is a web development tool designed for remote work. It has support for collaboration and pairing, low code programming, connectors for databases and back end services, and many more features. It is not open source, and is now entering private beta.
  • The Information Battery: Pre-computing and caching data when energy costs are low to minimize energy use when power costs are high is a good way to save money and take advantage of renewable energy sources.
  • A boring technology checklist: Is your technology boring enough? Seven years ago, Dan McKinley wrote the classic article Choose Boring Technology: chasing the latest cool framework is a path to exhaustion. To be productive, developers need to rely more on stable, well-known technologies. Now there’s a checklist to evaluate “boring” but productive technologies.
  • ApacheHop is a metadata-driven data orchestration for building dataflows and data pipelines. It integrates with Spark and other data engines, and is programmed using a visual drag-and-drop interface, so it’s low code.
  • China is now a “cyber superpower,” with offensive capabilities that equal or exceed that of any other country. Ironically, some of the development of this expertise has been funded by “bug bounty” programs offered by American companies.
  • An essay by the US Cyber Director discusses the need for a new “social contract” for a cyber age. The current relationship between the private and government sectors misaligns incentives for defense against cyber attacks.
  • The FBI has warned people about criminals tampering with QR codes to steal funds, using techniques as simple as putting a sticker over a legitimate QR code. It’s a reminder that low-tech cyber hygiene is at least as important as understanding the latest attack.
  • The Elite Hackers of the FSB is a fascinating story about the Russian intelligence agency’s attempts to target foreign government IT systems.
  • Security is an issue for any technology, and web3 is no different. However, web3 presents its own security risks, and in the overheated world of web3 development, security tends to be an afterthought. That’s ironic, given the claims of many web3 proponents, but not fundamentally different from traditional software products.
  • A new front for security: malware hidden within deep learning models. Fortunately, retraining the model destroys the malware.
  • Will Russia’s conflict with Ukraine spread into a global cyberwar? That’s a distinct possibility, and a nightmare for security professionals.
  • The Block protocol, developed by Joel Spolsky, provides a simple way to create structured blocks of content that can easily move between applications on the Web. This is another approach to decentralization: eliminate proprietary data formats. HTML isn’t proprietary, but for all practical purposes the mess of JavaScript that you see when you look at a web page is.
  • Matomo, Fathom, and Plausible, alternatives to Google Analytics that are designed for privacy (and compliance with GDPR), could be the basis for a real next-generation web. No blockchain required.
  • Mozilla and Meta/Facebook are working on privacy-preserving attribution for advertising, a way for advertisers to gather metrics on whether their ads are effective without compromising users’ privacy.
  • A crowdsourced app for mapping sound levels tells you places to avoid if you have trouble tolerating noisy environments. It’s linked to FourSquare, so any place in Foursquare can be rated.
Blockchains and NFTs Hardware Education
  • Can online classes be better than in-person classes, rather than a poor substitute? When professors learn to use the medium effectively, yes.
  • Jobs of the future is a list of new professions that we aren’t yet prepared for. It sounds tongue-in-cheek, but it isn’t.  It includes jobs like edge computing manager, augmented reality storyteller, ethics officer, and ad-blocking expert, all of which are easily imaginable.
Categories: Technology

Intelligence and Comprehension

O'Reilly Radar - Tue, 2022/02/15 - 05:24

I haven’t written much about AI recently. But a recent discussion of Google’s new Large Language Models (LLMs), and its claim that one of these models (named Gopher) has demonstrated reading comprehension approaching human performance, has spurred some thoughts about comprehension, ambiguity, intelligence, and will. (It’s well worth reading Do Large Models Understand Us, a more comprehensive paper by Blaise Agüera y Arcas that is heading in the same direction.)

What do we mean by reading comprehension?  We can start with a simple operational definition: Reading comprehension is what is measured by a reading comprehension test. That definition may only be satisfactory to the people who design these tests and school administrators, but it’s also the basis for Deep Mind’s claim. We’ve all taken these tests: SATs, GREs, that box of tests from 6th grade that was (I think) called SRE.  They’re fairly similar: can the reader extract facts from a document?  Jack walked up the hill.  Jill was with Jack when he walked up the hill. They fetched a pail of water: that sort of thing.

That’s first grade comprehension, not high school, but the only real difference is that the texts and the facts become more complex as you grow older.  It isn’t at all surprising to me that a LLM can perform this kind of fact extraction.  I suspect it’s possible to do a fairly decent job without billions of parameters and terabytes of training data (though I may be naive). This level of performance may be useful, but I’m reluctant to call it “comprehension.”  We’d be reluctant to say that someone understood a work of literature, say Faulkner’s The Sound and the Fury, if all they did was extract facts: Quentin died. Dilsey endured. Benjy was castrated.

Comprehension is a poorly-defined term, like many terms that frequently show up in discussions of artificial intelligence: intelligence, consciousness, personhood. Engineers and scientists tend to be uncomfortable with poorly-defined, ambiguous terms. Humanists are not.  My first suggestion is that  these terms are important precisely because they’re poorly defined, and that precise definitions (like the operational definition with which I started) neuters them, makes them useless. And that’s perhaps where we should start a better definition of comprehension: as the ability to respond to a text or utterance.

That definition itself is ambiguous. What do we mean by a response?  A response can be a statement (something a LLM can provide), or an action (something a LLM can’t do).  A response doesn’t have to indicate assent, agreement, or compliance; all it has to do is show that the utterance was processed meaningfully.  For example, I can tell a dog or a child to “sit.”  Both a dog and a child can “sit”; likewise, they can both refuse to sit.  Both responses indicate comprehension.  There are, of course, degrees of comprehension.  I can also tell a dog or a child to “do homework.”  A child can either do their homework or refuse; a dog can’t do its homework, but that isn’t refusal, that’s incomprehension.

What’s important here is that refusal to obey (as opposed to inability) is almost as good an indicator of comprehension as compliance. Distinguishing between refusal, incomprehension, and inability may not always be easy; someone (including both people and dogs) may understand a request, but be unable to comply. “You told me to do my homework but the teacher hasn’t posted the assignment” is different from “You told me to do my homework but it’s more important to practice my flute because the concert is tomorrow,” but both responses indicate comprehension.  And both are different from a dog’s “You told me to do my homework, but I don’t understand what homework is.” In all of these cases, we’re distinguishing between making a choice to do (or not do) something, which requires comprehension, and the inability to do something, in which case either comprehension or incomprehension is possible, but compliance isn’t.

That brings us to a more important issue.  When discussing AI (or general intelligence), it’s easy to mistake doing something complicated (such as playing Chess or Go at a championship level) for intelligence. As I’ve argued, these experiments do more to show us what intelligence isn’t than what it is.  What I see here is that intelligence includes the ability to behave transgressively: the ability to decide not to sit when someone says “sit.”1

The act of deciding not to sit implies a kind of consideration, a kind of choice: will or volition. Again, not all intelligence is created equal. There are things a child can be intelligent about (homework) that a dog can’t; and if you’ve ever asked an intransigent child to “sit,” they may come up with many alternative ways of “sitting,” rendering what appeared to be a simple command ambiguous. Children are excellent interpreters of Dostoevsky’s novel Notes from Underground, in which the narrator acts against his own self-interest merely to prove that he has the freedom to do so, a freedom that is more important to him than the consequences of his actions. Going further, there are things a physicist can be intelligent about that a child can’t: a physicist can, for example, decide to rethink Newton’s laws of motion and come up with general relativity.2

My examples demonstrate the importance of will, of volition. An AI can play Chess or Go, beating championship-level humans, but it can’t decide that it wants to play Chess or Go.  This is a missing ingredient in Searls’ Chinese Room thought experiment.  Searls imagined a person in a room with boxes of Chinese symbols and an algorithm for translating Chinese.  People outside the room pass in questions written in Chinese, and the person in the room uses the box of symbols (a database) and an algorithm to prepare correct answers. Can we say that person “understands” Chinese? The important question here isn’t whether the person is indistinguishable from a computer following the same algorithm.  What strikes me is that neither the computer, nor the human, is capable of deciding to have a conversation in Chinese.  They only respond to inputs, and never demonstrate any volition. (An equally convincing demonstration of volition would be a computer, or a human, that was capable of generating Chinese correctly refusing to engage in conversation.)  There have been many demonstrations (including Agüera y Arcas’) of LLMs having interesting “conversations” with a human, but none in which the computer initiated the conversation, or demonstrates that it wants to have a conversation. Humans do; we’ve been storytellers since day one, whenever that was. We’ve been storytellers, users of ambiguity, and liars. We tell stories because we want to.

That is the critical element. Intelligence is connected to will, volition, the desire to do something.  Where you have the “desire to do,” you also have the “desire not to do”: the ability to dissent, to disobey, to transgress.  It isn’t at all surprising that the “mind control” trope is one of the most frightening in science fiction and political propaganda: that’s a direct challenge to what we see as fundamentally human. Nor is it surprising that the “disobedient computer” is another of those terrifying tropes, not because the computer can outthink us, but because by disobeying, it has become human.

I don’t necessarily see the absence of volition as a fundamental limitation. I certainly wouldn’t bet that it’s impossible to program something that simulates volition, if not volition itself (another of those fundamentally ambiguous terms).  Whether engineers and AI researchers should is a different question. Understanding volition as a key component of “intelligence,” something which our current models are incapable of, means that our discussions of “ethical AI” aren’t really about AI; they’re about the choices made by AI researchers and developers. Ethics is for beings who can make choices. If the ability to transgress is a key component of intelligence, researchers will need to choose whether to take the “disobedient computer” trope seriously. I’ve said elsewhere that I’m not concerned about whether a hypothetical artificial general intelligence might decide to kill all humans.  Humans have decided to commit genocide on many occasions, something I believe an AGI wouldn’t consider logical. But a computer in which “intelligence” incorporates the human ability to behave transgressively might.

And that brings me back to the awkward beginning to this article.  Indeed, I haven’t written much about AI recently. That was a choice, as was writing this article. Could a LLM have written this? Possibly, with the proper prompts to set it going in the right direction. (This is exactly like the Chinese Room.) But I chose to write this article. That act of choosing is something a LLM could never do, at least with our current technology.

  1. I’ve never been much impressed with the idea of embodied intelligence–that intelligence requires the context of a body and sensory input.  However, my arguments here suggest that it’s on to something, in ways that I haven’t credited.  “Sitting” is meaningless without a body. Physics is impossible without observation. Stress is a reaction that requires a body. However, Blaise Agüera y Arcas has had “conversations” with Google’s models in which they talk about a “favorite island” and claim to have a “sense of smell.”  Is this transgression? Is it imagination? Is “embodiment” a social construct, rather than a physical one? There’s plenty of ambiguity here, and that’s is precisely why it’s important. Is transgression possible without a body?
  2. I want to steer away from a “great man” theory of progress;  as Ethan Siegel has argued convincingly, if Einstein never lived, physicists would probably have made Einstein’s breakthroughs in relatively short order. They were on the brink, and several were thinking along the same lines. This doesn’t change my argument, though: to come up with general relativity, you have to realize that there’s something amiss with Newtonian physics, something most people consider “law,” and that mere assent isn’t a way forward. Whether we’re talking about dogs, children, or physicists, intelligence is transgressive.
Categories: Technology

Virtual Meeting Topic for 02/10

PLUG - Wed, 2022/02/09 - 11:27

der.hans: Privacy Discussion

PLUG will host a privacy discussion. Join us in our private hosted Big Blue Button instance to participate.

The meeting is Thursday, February 10th at 7PM Arizona time (UTC-7)

The Human Web

O'Reilly Radar - Tue, 2022/02/08 - 05:36

A few days ago, I recommended that Tim O’Reilly invite someone to our next FOO Camp. I thought she had been to a prior FOO event, though I didn’t meet her there; I’d had a prior conversation with her about data governance (I think), and gotten on her mailing list, which reminded me that she was doing very interesting work. I don’t remember who introduced us, except that it was someone who had met her at the earlier FOO event.

That may sound convoluted. That’s the point. This is a very human web. It’s a very small window onto a web of introductions. At the start of almost every FOO camp, Tim says that FOO is about “creating synapses in the global brain.” He’s said many times that he sees his function as introducing people who should know each other. That web of connections—what we used to call the “social graph”—is very broad. It eventually includes all 7+ billion of us. And again, it is intensely human. It’s Web0.

It’s necessary to remind ourselves of that when we talk about Web3. Web3 will succeed, or fail, to the extent that it solves human problems, to the extent that it makes navigating Web0 more tractable—not to the extent that it monetizes everything conceivable, or enables a small number of people to make a financial killing. Making it possible for artists to earn a living is solving a human problem (though we won’t know whether NFTs actually do that until we’re past the initial bubble). Using links that incorporate history to build communities of people who care about the same things, as Chris Anderson suggests, is solving a human problem.

Once we realize that, Web3 isn’t all that different from the earlier generations of the web. Facebook succeeded because it solved a human problem: People want to associate, to congregate. Facebook may have been a poor solution (it certainly became a poor solution after it decided to prioritize “engagement”), but it was a solution. Google succeeded because it solved a different human problem: finding information. The world’s information was radically decentralized, stored in millions of books and websites. At O’Reilly, we made one of the first attempts to manage that rapidly growing mess, but our solution, publishing The Whole Internet and creating a web portal (the industry’s first) based on it, couldn’t scale the way Google did five years later. As Larry Page and Sergey Brin discovered, organizing the world’s information was about computing the tree of relationships dynamically. Like Facebook, Google has become less useful over time, as it seems to have compromised its results to “maximize shareholder value.” I would certainly prefer burying monopolies to praising them. But it’s important to think carefully about what they do well. Google and Facebook, like AT&T before them, succeeded because they solved problems that people cared about solving. Their solutions had real lasting value.

Cryptocurrency provides a cautionary tale. Blockchains may be a brilliant solution to the problem of double-spending. But double spending is a problem very few people have, while theft and other financial crimes on the blockchain are growing every day. (Given the rate at which cryptocurrency crime is growing, perhaps we should be glad that double-spend isn’t just another problem on the very long list.) The catalog of failed startups is full of businesses with ideas that were very cool, but didn’t actually solve problems that people care about, or didn’t think through the new problems that they would create. As technologists, we’re unfortunately addicted to the cool and the clever.

Can Web3 make Web0, the web of human interconnections and interests, more manageable? Can it solve human problems, not just abstract computational problems, and do so without creating more problems of its own? Can it help us build new synapses in the human brain, or will it just connect us to people who infuriate us?  That’s the challenge Web3 faces. I think it can meet that challenge; but doing so will require developers to understand that blockchains, NFTs, Dapps, and so on are the means, not the ends. They’re the components, not the finished product.

Categories: Technology

Radar trends to watch: February 2022

O'Reilly Radar - Tue, 2022/02/01 - 05:52

Perhaps the most interesting theme from the last month has been discussions of what can be done with NFTs and other Web3 technologies aside from selling links to bored apes. Chris Anderson points out that NFTs are a new kind of link that includes history, and that’s a fascinating idea. We’re also seeing a lot of debate around the metaverse; an increasing number of companies are lining up in opposition to Facebook/Meta’s vision.

  • The Algorithmic Justice League has proposed paying “bug bounties” for algorithmic harms, similar to the way security researchers are paid bounties for finding vulnerabilities in software.
  • OpenAI has released a new version of GPT-3 that is less toxic–less prone to reproducing racist, sexual, or violent language, though it can still do so when asked. This is not the end of the story, but it’s a big step forward. The new model, InstructGPT, is also much smaller: 1.3 billion parameters, as opposed to 175 billion.
  • How do humans learn to work with AI systems?  When should a human co-worker accept an AI’s predictions? Researchers at MIT are working on training methods that help human experts to understand when an AI is, or is not, likely to be accurate.
  • It’s no surprise that AI systems can also discriminate on the basis of age, in addition to race and gender. While bias in AI is much discussed, relatively little work goes into building unbiased systems.
  • Facebook/Meta has developed a new AI algorithm that can be used for image, text, and speech processing, and that performs better than current specialized algorithms.
  • Yake! is an open source system for extracting keywords from texts.  It’s an important tool for automatically summarizing large bodies of research, developing indexes, and other research tasks.
  • GPT-J, an open source language model similar to GPT-3, now has a “playground” that is open to the public.
  • Researchers have discovered more efficient ways to model and render moving human images in real time. This could lead to 3D avatars for your metaverse, better deep fakes, or animations that are truly lifelike.
  • Blueprint is a new approach to teaching children to code.  It starts with reading (rather than writing) code; new programmers modify and build objects in a metaverse, using a stripped-down version of JavaScript and HTML.
  • Graph technologies (including graph neural networks) are becoming increasingly important to AI research.
  • The title pretty much says it: is Rust the future of JavaScript infrastructure? It’s a faster, safer, and more memory-efficient language for building the tools in the JavaScript ecosystem. Whether Rust will replace JavaScript for web development is anyone’s guess, but it is a better tool for creating software like transpilers and other elements of the JavaScript toolchain.
  • Memory-efficient parallelism for functional programs: Parallelism has been a difficult problem for functional programming, in part because of memory requirements. But as we approach the end to Moore’s Law, parallelism may be the only possible way to improve software performance. This paper suggests a memory management strategy that may solve this problem.
  • The return of Y2K: January 1, 2022 (represented as 2022010001) overflows a signed 32-bit integer. This caused many Microsoft Exchange servers to crash on New Year’s Day. Are more 32-bit overflow bugs hiding?
  • Computer-generated code from systems like Copilot also generates bugs and vulnerabilities. This isn’t surprising; Copilot is essentially just copying and pasting code from sources like GitHub, with all of that code’s features and failures.
  • Democratizing cybersecurity with low-code tools that enable non-professionals to detect and fix vulnerabilities? This is a great goal; it remains to be seen whether it can be done.
  • Access management is the key to zero-trust security. Zero trust means little without proper authentication and access control. Many organizations are starting to get on board with stronger authentication, like 2FA, but managing access control is a new challenge.
  • The US Cybersecurity and Infrastructure Security Agency (CISA) has warned US organizations to prepare themselves for the kind of data-wiping attacks that have been used against the government of Ukraine.
  • Open Source is a national security issue; all software is, and open source is no better or worse. We knew that all along. But the vulnerabilities in the widely used Log4J library have brought it into the public eye. Google is proposing a marketplace for open source maintainers to match volunteers to projects.
  • Detecting viruses without installing software: This Raspberry Pi-based system detects viruses on other computers by analyzing RF signals emitted by the processor. Typical obfuscation techniques used by virus creators aren’t effective against it, because it is not examining code.
  • The developer of the open source color.js and faker.js libraries intentionally pushed broken versions of the libraries to GitHub, apparently in a protest against corporate use of the libraries without compensation.
  • Norton Antivirus installs a cryptocurrency miner on your computer that mines Ethereum when you’re not using the computer. Antivirus? Or a new cryptojacking scheme?  It’s opt-in, but difficult to uninstall. And, in addition to other fees, Norton takes a significant commission. Norton has done the same thing with Avira, another AV product they own.
  • Confidential Computing could become a breakout technology as corporations struggle with privacy legislation and security. It encompasses many different technologies, including homomorphic encryption, differential privacy, and trusted execution environments.
Web Crypto, NFTs, and Web3
  • is an attempt to implement a service like AWS Lambda that is decentralized. It uses “blockchain-related” technologies (though not a blockchain itself), and is tied to the Aleph token (a cryptocurrency).
  • What’s important about NFTs isn’t the “artwork” that they reference; it’s that they’re a new kind of link, a link that contains history.  This history makes possible a new kind of community value.
  • A stack for getting started with Web3: This is a far cry from LAMP, but it’s a way to start experimenting. The IPFS protocol plays a key role, along with Agoric (a smart contract framework) and the Cosmos network (blockchain interoperability).
  • SecondLife’s creator has returned to Linden Labs, and wants to build a metaverse that “doesn’t harm people.”  That metaverse won’t have surveillance advertising or VR goggles.
  • NVidia talks about their plans for the metaverse; it’s less of a walled garden, more like the Web. Companies are increasingly wary of a metaverse that is essentially Facebook’s property.
  • Autonomous battery-powered freight cars could travel by themselves, eliminating long freight trains.  However, the outdated US rail safety infrastructure, which requires trains to maintain large distances between themselves, presents a problem.
  • Open Infrastructure Map: All the world’s infrastructure (just about) in one map: the power lines, generation plants, telecom, and oil, gas, and water pipelines. (It doesn’t have reservoirs.) Fascinating.
  • Entrepreneurs like Elon Musk have tried to develop solar cells that can be used as shingles, rather than being installed over them. GAF, a company that really knows roofing, now has a solar shingles product on the market. They can be installed similarly to regular shingles, and have similar warranties.
Quantum Computing
  • Researchers have developed a new way of building qubits that is a factor of 100 smaller than current technologies allow, and that appear to have less interference between qubits. While this doesn’t mean we’ll have personal quantum computers, it will make it easier to build quantum computers large enough to do reasonable work.
  • Twist is a new language for programming quantum computers. It has a type system that helps programmers reason about entanglement as a means to improve correctness and accuracy.
  • Microsoft Azure is expanding its quantum computing offerings by adding hardware from Rigetti, one of the leading Quantum startups.
  • James Governor talks about the transition from distributed systems to distributed work.
  • Automating the farm: tractors that can be controlled by smartphone, robots that can weed fields, and many other technologies at the intersection of GPS, AI, and computer vision are now commercially available.
Categories: Technology

Andy Warhol, Clay Christensen, and Vitalik Buterin walk into a bar

O'Reilly Radar - Wed, 2022/01/26 - 13:47

In 1962, Daniel Boorstin crystallized a notion that had been around since at least the 1890s, writing of the new kind of celebrities: “Their chief claim to fame is their fame itself. They are notorious for their notoriety.” The same might be said of cryptocurrencies, NFTs, and meme stocks: They are valuable for being valuable.

So were the rare tulip bulbs whose prices rose to such heights in 17th-century Holland that the “tulip bubble” has been the standard to which other financial manias have been compared since. Exactly what drove the bubble is unclear: Futures markets had just been introduced, and tulips were one of the first speculative commodities to be explored. Imports of plants from distant regions, new technologies of plant breeding, and financial innovation made for a heady mix. The prosperity of the rising Dutch colonial empire may have, like today, produced abundant capital eager to be invested and looking for outsized returns in a market that offered tantalizing prospects. People bought tulip bulbs at outrageous prices with the seemingly reasonable expectation that they could sell them for even higher prices in future.

But the idea that crypto is simply a bubble may miss something important that this suite of technologies has to teach us about the economy. In Tulipmania, written in 2007, Anne Goldgar made the case that the tulip mania was far less widespread and damaging than outlined in Charles Mackay’s 1841 book Extraordinary Popular Delusions and the Madness of Crowds, which had made it so notorious. But even in minimizing its impact, she agreed that the tulip bubble called into question the very nature of what constitutes value:

In the 17th century, it was unimaginable to most people that something as common as a flower could be worth so much more money than most people earned in a year. The idea that the prices of flowers that grow only in the summer could fluctuate so wildly in the winter, threw into chaos the very understanding of “value”.

The question of what makes things “valuable” in the first place is a wonderful lens through which to think about cryptocurrencies, NFTs, and meme stocks. As economist Mariana Mazzucato outlines in her book The Value of Everything, the notion of value is not fixed. For early economists, only land and agricultural production created value. Trade, finance, and the power of princes were just moving that value around. By the time of Adam Smith, manufacturing was also understood to create value, but trade and finance—well, they were still just moving that value around. Over time, trade, finance, and entertainment have been brought inside what Mazzucato calls “the value boundary.” Meanwhile, household labor—child rearing, caring for aging parents, cooking, cleaning, and the like done by members of a household rather than purchased as a service—is clearly intrinsically valuable, even essential, but still remains so far outside the value boundary that it remains unpaid and isn’t even counted as part of GDP. So too, government is widely derided as an extractor rather than a creator of value, despite the efforts of Mazzucato and others to point out its contributions to innovation and economic growth.

Entertainment is a particularly relevant case in point for how sectors cross the value boundary. Adam Smith thought that opera singers, actors, dancers, and the like were frivolous and created no value for society. Today, many of our most highly paid professionals are entertainers: actors, musicians, athletes, TikTok stars and other social media influencers. Creativity has moved to the heart of today’s internet-fueled “attention economy.” (OK, maybe politics competes with it for that position, but modern politics shares with creative expression the bestowing of status through attention.) At the same time, much of what people do to entertain each other—both in person and on social media—remains unpaid and treated as outside the value boundary.

The question of how much value is being created by a new sector is not settled quickly when the boundary shifts. Finance is a good example. After the financial crisis of 2009, Lloyd Blankfein said with a straight face that Goldman Sachs financiers were the most productive workers in the world, even as their machinations brought the global economy to the brink of collapse.

The financial industry is in theory a key enabler of the rest of the economy, managing the flows of capital that allow businesses to invest, to hire, and to build and deliver new products and services. But a large part of finance operates in what we might call the “betting economy.” Hedge funds and other investors place bets on the direction of interest rates and the price of commodities or company stocks, and build sophisticated financial instruments to harvest profits from changes in those prices, regardless of their direction. Are these people creating value when they place these bets, or are they merely extracting it from someone else in a zero-sum game? That question remains up for debate. Nonetheless, those bets eventually are settled based on some measurable impact in the operating economy. What did the Fed do to interest rates? What were people willing to pay for corn or soybeans or scrap iron? What were Apple’s or Amazon’s or Tesla’s profits, and were they growing or shrinking?

With crypto and Web3 more generally, there is a similar kind of real-world bet that blockchain technology will reshape the plumbing of the financial industry. If it succeeds, the winners will eventually be rewarded with enormous profits, justifying the price that has been paid. Crypto might be a bubble, a flash in the pan that will enrich some speculators while impoverishing others. But it might also be a fundamental innovation that will lead to greater prosperity for all of society. And to many, that’s a bet worth placing.

However, much of the betting is not on the intrinsic value that crypto technologies might deliver in the future. Economist John Maynard Keynes compared financial markets to a beauty contest in which the point isn’t to pick the most beautiful contestant but to choose the one that everyone else will think is the most beautiful. And since everyone is playing the game, you’re trying to outguess other people who are constantly changing their votes based on what they think you and others are going to choose. What Keynes didn’t emphasize: it’s a contest! Rich people who have already met their every economic need continue to bet just for the sheer pleasure and addictiveness of playing.

NFTs and meme stocks are out at the bleeding edge of this betting economy, because they are largely untethered from traditional notions of value derived from profits in the operating economy. They might best be described as the tokens in a futures market for attention. Like tulips in 17th-century Holland, they represent a challenge to the very notion of “intrinsic value.”

Charlie Warzel captured perfectly the puzzlement that many people are feeling:

When I say I’m thinking a lot about cryptocurrency, what I really mean is that I’m thinking a lot about absurdity. I’m thinking about the way that groups of people who are good at harnessing attention are giddily, proudly using that power to drag absurdist memes/currencies/fortunes into mainstream discourse and force the rest of us to care about/debate/or at least know about it all.

And that’s the point where artist and impresario Andy Warhol, innovation expert Clayton Christensen, and Etherum creator Vitalik Buterin walk into the bar. They don’t start out talking about crypto, but like everyone else, they end up there.

Andy Warhol says: “What’s great about this country is that America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coca-Cola, Liz Taylor drinks Coca-Cola, and just think, you can drink Coca-Cola, too. A Coke is a Coke and no amount of money can get you a better Coke than the one the bum on the corner is drinking. All the Cokes are the same and all the Cokes are good. Liz Taylor knows it, the President knows it, the bum knows it, and you know it.”

Clay Christensen replies: It’s worth noticing that a soft drink like Coke is basically a commodity—carbonated and flavored sugar water—mixed with a whole lot of marketing and branding. That’s actually the secret of the modern economy. I call it the law of conservation of attractive profits. “When attractive profits disappear at one stage in the value chain because a product becomes commoditized, the opportunity to earn attractive profits with proprietary products usually emerges at an adjacent stage.”

Tim O’Reilly and I had a real mind meld about that at the Open Source Business Conference in 2004, Clay continues. Tim gave a talk about how the internet and open source were commoditizing proprietary software. He’d noticed that after the IBM personal computer design had commoditized computer hardware, Microsoft had figured out how to make software the next source of proprietary value. Tim was seeing the pattern and was starting to think that what we now call “big data” was going to be the new source of proprietary lock-in and value. I was giving my talk about the conservation of attractive profits the same day, and so we had a real laugh about it. He’d uncovered a new example of just what I was talking about.

But as Tim and I continued to talk about this idea over the years, we realized that the law of conservation of attractive profits applies to way more than the alternating cycle of modularity and open standards versus tight proprietary integration that we’d both originally observed. Tim likes to point out that in a world where more and more has become a commodity, things become valuable again because we mix in ideas that persuade people to value them differently. Advertising makes a branded product bring a higher price than a generic equivalent. Cycles of fashion make the latest offerings worth more than last year’s perfectly good clothes. But that’s just the tip of the iceberg. Now everything is infused with imaginative value. People say, “This isn’t just coffee; it’s organic single-origin coffee.” We’re increasingly paying a premium for intangibles. In 2015, 55% of the $48 billion US coffee market was for “specialty coffee” of various kinds.

Dave Hickey, who’s been listening, pipes in: That’s been going on for a long time. After World War II, “American businesses stopped advertising products for what they were, or for what they could do, and began advertising them for what they meant—as sign systems within the broader culture.…Rather than producing and marketing infinitely replicable objects that adequately served unchanging needs, American commerce began creating finite sets of objects that embodied ideology for a finite audience at a particular moment—objects that created desire rather than fulfilling needs. This is nothing more or less than an art market.”

He really gets on a roll then, continuing with enthusiasm: “The Leonardo of this new art market was an ex-custom-car designer from Hollywood named Harley Earl, who headed the design division at General Motors during the postwar period. Earl’s most visible and legendary contributions to American culture were the Cadillac tailfin and the pastel paint job.” It’s not just about creating objects of desire, he continues, but about creating new mechanisms for signaling status. “Most importantly,…Earl invented the four-year style-change cycle linked to the Platonic hierarchy of General Motors cars, and this revolutionary dynamic created the post-industrial world. Basically, what Earl invented was a market situation in which the consumer moved up the status-ladder within the cosmology of General Motors products—from Chevrolet to Pontiac to Buick to Oldsmobile to Cadillac—as the tailfin or some other contagious motif moved down the price ladder, from Cadillac to Chevrolet, year by year, as styles changed incrementally.”

Giving a nod to the guy who’d kicked off the conversation, Hickey continues: “As Warhol [is] fond of telling us, the strange thing about the sixties was not that Western art was becoming commercialized but that Western commerce was becoming so much more artistic.”

Vitalik Buterin jumps in: I wish I’d heard about your work before, Dave. I wasn’t thinking enough about art. “I completely missed NFTs.” I was focused on practical applications like DeFi, incentivized file storage, and compute, and I didn’t think a lot about how much of the economy has become an art market.

Hickey replies that he wishes everyone would think more deeply about what art teaches us about how economies and people tick. I didn’t subtitle my book Air Guitar “Essays on Art and Democracy” for shits and giggles, he says.

Hickey then starts rhapsodizing about his fascination with cars growing up “in the American boondocks” during the 1950s and ’60s. “My first glimmerings of higher [art] theory grew out of that culture: the rhetoric of image and icon, the dynamics of embodied desire, the algorithms of style change, and the ideological force of disposable income. All of these came to me in the lingua franca of cars, arose out of our perpetual exegesis of its nuanced context and iconography. And it was worth the trouble, because all of us who partook of this discourse, as artists, critics, collectors, mechanics, and citizens, understood its politico-aesthetic implications, understood that we were voting with cars….We also understood that we were dissenting when we customized them and hopped them up—demonstrating against the standards of the republic and advocating our own refined vision of power and loveliness.”

In the computer industry, you can see how Steve Jobs did for Apple the exact thing that Earl had done for GM. From the 1984 Macintosh ad to the “Think Different” campaign, Apple wasn’t selling hardware and software. It was selling identity and a sense of meaning. The new $40 billion market for NFTs—essentially digital collectibles whose chief value is in the bragging rights of how much you paid for them or how cool and unusual they are—takes this idea to the next level.

Buterin replies: Your point about “demonstrating against the standards of the republic and advocating our own refined vision of power and loveliness” really resonates with me, and I suspect it will with a lot of the crypto community. We aren’t just thinking about how to advance blockchain technology. We’re also thinking a lot about upending the current financial system and about deep questions like legitimacy. “An outcome in some social context is legitimate if the people in that social context broadly accept and play their part in enacting that outcome, and each individual person does so because they expect everyone else to do the same.”

“Why is it that Elon Musk can sell an NFT of Elon Musk’s tweet, but Jeff Bezos would have a much harder time doing the same? Elon and Jeff have the same level of ability to screenshot Elon’s tweet and stick it into an NFT dapp, so what’s the difference? To anyone who has even a basic intuitive understanding of human social psychology (or the fake art scene), the answer is obvious: Elon selling Elon’s tweet is the real thing, and Jeff doing the same is not. Once again, millions of dollars of value are being controlled and allocated, not by individuals or cryptographic keys, but by social conceptions of legitimacy.”

But there’s more to it than that. “Which NFTs people find attractive to buy, and which ones they do not, is [also] a question of legitimacy: if everyone agrees that one NFT is interesting and another NFT is lame, then people will strongly prefer buying the first, because it would have both higher value for bragging rights and personal pride in holding it, and because it could be resold for more because everyone else is thinking in the same way.”

“If you’re not in a coordination game, there’s no reason to act according to your expectation of how other people will act, and so legitimacy is not important. But as we have seen, coordination games are everywhere in society, and so legitimacy turns out to be quite important indeed. In almost any environment with coordination games that exists for long enough, there inevitably emerge some mechanisms that can choose which decision to take. These mechanisms are powered by an established culture that everyone pays attention to these mechanisms and (usually) does what they say. Each person reasons that because everyone else follows these mechanisms, if they do something different they will only create conflict and suffer, or at least be left in a lonely forked ecosystem all by themselves.”

So one way to understand what we’re working on in the crypto world is that we’re building new mechanisms for solving the problems of consensus and coordination and legitimacy. And that’s also exactly what “the market” is doing when it tries to settle the messy question of value. So when we talk about building a new financial system with crypto, we’re not talking about just rebuilding the plumbing of the existing system with fancy new pipes, we’re questioning how value is created and who gets it.

We can change the way we distribute wealth. Crypto made a lot of people rich through the betting economy, but we don’t have to spend our gains just on new bets that make the rich richer, looking for the next breakout cryptocurrency or company. We can take those gains and give them away, as I did when I donated over a billion dollars of Ether and Shiba Inu coins to India for COVID relief. But more importantly, we can build new mechanisms for people to coordinate around socially valuable goals.

“The concept of supporting public goods through value generated ‘out of the ether’ by publicly supported conceptions of legitimacy has value going far beyond the Ethereum ecosystem. An important and immediate challenge and opportunity is NFTs. NFTs stand a great chance of significantly helping many kinds of public goods, especially of the creative variety, at least partially solve their chronic and systemic funding deficiencies.…If the conception of legitimacy for NFTs can be pulled in a good direction, there is an opportunity to establish a solid channel of funding to artists, charities and others.”

Buterin adds: Ethereum, NFTs, and DAOs are building blocks. “There’s a lot of different ways to connect every one of these components and most of the interesting applications end up connecting different pieces together.…I don’t see one kind of dominating use case. I just see it opening up the floodgates for a thousand different experiments.” NFTs are one experiment. DAOs are another. Who would have thought a few years ago that someone would organize a DAO to compete with billionaires to buy a rare copy of the US constitution or to buy land in Wyoming?

At this point, Blaise Aguera y Arcas, who’s been sitting over at the next table sketching out for his buddies the latest progress on Google’s LaMDA large language model and its implications for our notion of personhood, can’t resist leaning over and jumping into the conversation.

“We’ve been having these conversations for a long time about robots taking people’s jobs, and we’ve been thinking about it entirely in the domain of actual robots with arms and things. But the real impact is going to be that most middle class people nowadays are doing what David Graeber called bullshit jobs. And it’s clear that large language models can already do many of those jobs. We’re approaching the point where it feels like capitalism is maybe about to rupture, or something is about to rupture.”

He continues, “Graeber was questioning the legitimacy of labor in its modern form, and also the ideas of efficiency that supposedly underlie capitalism, which is actually tremendously inefficient in a variety of ways. And in particular, the thesis is that Keynes was right, in the ’20s and ’30s, in saying that, by now, due to automation, we’d all be working 15-hour workweeks. But rather than turning this into a utopia, in which we all have all these free services and don’t have to work a lot and so on, instead we’ve made a socialism for the middle class, socialism for the bourgeois, in the form of inventing all kinds of bullshit jobs.” And all the people who still have essential jobs—they still have to work, and we don’t pay a lot of them very well.

* * *

So what will people do if they no longer have to do bullshit jobs? Maybe they’ll make up cool shit and share it with each other, eventually building a world like the one Cory Doctorow imagined in Down and Out in the Magic Kingdom and Walkaway, where measures of status are the actual currency. In the meantime, some of them might show their creativity on YouTube or TikTok and convert status to value by directing attention to products and other people. Some might create and sell NFTs. Others might peddle bullshit startups or fancy new get-rich-quick schemes. But is that really new? The future always has its share of hucksters along with its inventors. Sometimes the same people are both.

Bill Gates once said, “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten. Don’t let yourself be lulled into inaction.” That doesn’t mean to rush out and buy the latest meme stock, meme coin, or overpriced NFT. But it does mean that it’s important to engage with the social, legal, and economic implications of crypto. The world advances one bubble at a time. What matters is that what’s left behind when the bubble pops makes the world richer in possibilities for the next generation to build on.

Looking at the arc of the modern economy, we are on a path for the market for status to become a central part of how value is measured.

Let’s give John Maynard Keynes the last word, even though he left the bar long before we arrived. In “Economic Possibilities for Our Grandchildren,” the 1929 piece that Blaise referred to earlier, he wrote:

For the first time since [our] creation [we] will be faced with [our] real, [our] permanent problem—how to use [our] freedom from pressing economic cares, how to occupy the leisure, which science and compound interest will have won for [us], to live wisely and agreeably and well.…

To judge from the behaviour and the achievements of the wealthy classes to-day in any quarter of the world, the outlook is very depressing! For these are, so to speak, our advance guard—those who are spying out the promised land for the rest of us and pitching their camp there. For they have most of them failed disastrously, so it seems to me—those who have an independent income but no associations or duties or ties—to solve the problem which has been set them.

I feel sure that with a little more experience we shall use the new-found bounty of nature quite differently from the way in which the rich use it to-day, and will map out for ourselves a plan of life quite otherwise than theirs.

We’re now coming on to nearly 100 years since Keynes dreamed that optimistic, egalitarian dream and made his critique of the idle rich who were already living it. Abundance seems as far away as ever, or even further, and the rich haven’t changed as much as Keynes hoped.

It may seem deeply out of touch to talk about an economy of abundance when so many people face such great economic hardship. But that was also true for those alive in 1929. They had a worldwide depression and a great war ahead of them, and turned all their energies to dealing with both. Their success ushered in decades of widely shared prosperity. We face climate change, new pandemics, and persistent economic inequality and consequent political instability. Wars are not out of the question. Can we also rise to the challenge?

Through it all, the Next Economy beckons. We see its signs all around us. Keynes was right that humanity’s job in an economy of abundance is to learn to live together wisely and agreeably and well, but he was wrong to think that abundance will mean the end of competition and striving. If we do reach Keynes’s predicted future, in which more and more of what people depend on for survival has become cheap—a commodity—and our labor is not needed, how will the circulatory system of the economy sustain itself? Might the seeming froth and craziness of the crypto markets be an early implementation—not Web3 but NextEconomy1—of the next stage by which humanity engages in the ongoing imaginative competition to make things valuable again?

John Maynard Keynes died in 1946, Andy Warhol in 1987, Clay Christensen in 2020, and Dave Hickey just at the end of last year. I wish that they could have had this conversation with Vitalik Buterin, who joins them in thinking deeply about the intersection of art, economics, business, politics, and culture. I have put my own words into their mouths; those that are in quotation marks are their own, from their books, published articles, and interviews, though the order in which paragraphs appear may be different from the original. The quotes from Blaise Aguera y Arcas are from a recording of a Zoom conversation that we had while I was writing this piece. I told him what I was working on, and his thoughts were so relevant that I couldn’t help but include them.

Categories: Technology

Technology Trends for 2022

O'Reilly Radar - Tue, 2022/01/25 - 05:58

It’s been a year since our last report on the O’Reilly learning platform. Last year we cautioned against a “horse race” view of technology. That caution is worth remembering: focus on the horse race and the flashy news and you’ll miss the real stories. While new technologies may appear on the scene suddenly, the long, slow process of making things that work rarely attracts as much attention. We start with an explosion of fantastic achievements that seem like science fiction—imagine, GPT-3 can write stories!—but that burst of activity is followed by the process of putting that science fiction into production, of turning it into real products that work reliably, consistently, and fairly. AI is making that transition now; we can see it in our data. But what other transitions are in progress? What developments represent new ways of thinking, and what do those ways of thinking mean? What are the bigger changes shaping the future of software development and software architecture? This report is about those transitions.

Important signals often appear in technologies that have been fairly stable. For example, interest in security, after being steady for a few years, has suddenly jumped up, partly due to some spectacular ransomware attacks. What’s important for us isn’t the newsworthy attacks but the concomitant surge of interest in security practices—in protecting personal and corporate assets against criminal attackers. That surge is belated but healthy. Many businesses are moving IT operations to “the cloud,” a shift that’s probably been accelerated by the COVID-19 pandemic. What does that mean for the way software is designed and built? Virtual and augmented reality are technologies that were languishing in the background; has talk of the “metaverse” (sparked in part by Mark Zuckerberg) given VR and AR new life? And it’s no surprise that there’s a lot of interest in blockchains and NFTs. What does that mean, and how is it affecting software developers?

To understand the data from our learning platform, we must start by thinking about bias. First, our data is biased by our customer base. Of course. There’s no sampling error; all of our customers “vote” with the content they use. You could read this as a report on the biases of our customer base. Our customer base is large and worldwide (millions of developers, from well over 100 countries), but we won’t pretend that it’s representative of all programmers and technologists. While our customers include many individual developers, contractors, and hobbyist programmers, commercial (enterprise) software developers are very heavily represented—although there are certainly areas into which we’d like more visibility, such as the crucial Asia-Pacific software development community.

We used data from the first nine months (January through September) of 2021. When doing year-over-year comparisons, we used the first nine months of 2020.1

We looked at four specific kinds of data: search queries, questions asked to O’Reilly Answers (an AI engine that has indexed all of O’Reilly’s textual content; more recently, transcripts of video content and content from Pearson have been added to the index), resource usage by title, and resource usage by our topic taxonomy. There are some important biases here. If resources don’t exist, our customers can’t use them. To take one example, at this point, the platform has no content on the QUIC protocol or HTTP/3. Regardless of the level of interest, usage for these topics is going to be zero.

Search queries behave differently. Users certainly can search for content that doesn’t exist, so searches can be a good leading indicator of technology trends. However, most searches on our platform are single-word terms: users search for “Java” or “Python,” not “How do I use the Decorator pattern in C++?” (O’Reilly Answers is a great resource for answering questions like this.) As a result, the signals we get from looking at searches aren’t very granular. Answers could provide additional granularity, since users ask full questions. But Answers is a new service, only released in October 2020. So while we can discuss whether Answers usage is in line with other services, it’s difficult to talk about trends with so little data, and it’s impossible to do a year-over-year comparison.

Content usage, whether by title or our taxonomy, is based on an internal “units viewed” metric that combines all our content forms: online training courses, books, videos, Superstream online conferences, and other new products. It includes content from all of the publishing partners in the platform, not just O’Reilly. Results in each group of topics are normalized to 1, so items within the same group can be compared (Java to Python but not Java to Ethereum, for example).

O’Reilly Answers

We’re very excited about O’Reilly Answers, the newest product on the platform. Answers is an intelligent search that takes users directly to relevant content, whether that’s a paragraph from a book, a snippet of a video, or a block of code that answers a question. Rather than searching for an appropriate book or video and skimming through it, you can ask a specific question like “How do you flatten a list of lists in Python?” (a question I’ve asked several times). Our approach to Answers was to do a simple “bag of words” analysis: count the number of times each word was used in all Answers queries. We divided Answers questions into two categories: “organic” queries, which users type themselves, and “question bank” queries, which are sample questions that users can click on. (Questions were rotated in and out of the question bank.) Our analysis only included organic questions; we didn’t count clicks on the question bank. What’s perhaps surprising is that many users typed questions from the question bank into the Answers search bar. These retyped questions were counted as organic queries.

That explains the most commonly asked question on Answers: “What is dynamic programming?” That question appeared frequently in the question bank. It was evidently intriguing enough that many users typed it in, verbatim, in addition to clicking on it; it was the second-most-common organically typed question, only slightly behind “How do I write good unit test cases?” (also very popular in the question bank).

Ignoring stop words (like “and”) and significant words that aren’t really meaningful to us (like “good”), the top five words were “data,” “Python,” “Git,” “test,” and “Java.” (And you can see most of the words from those top two questions in the top 15 or 20 words.)

What can we learn from this? Data continues to be one of the most important topics for our users. A quick look at bigram usage (word pairs) doesn’t really distinguish between “data science,” “data engineering,” “data analysis,” and other terms; the most common word pair with “data” is “data governance,” followed by “data science.” “Data analysis” and “data engineering” are far down in the list—possibly indicating that, while pundits are making much of the distinction, our platform users aren’t. And it certainly suggests that data governance (slightly ahead of “data science” itself) is a topic to watch.

Python and Java have long been the top two programming languages on our platform, and this year is no exception. We’ll see later that usage of Python and Java content is very slightly down and that usage of content about Rust and Go is growing rapidly (though it’s still relatively small). The word “programming” was also one of the most frequently used words, reflecting our core audience. And “Kubernetes” was in the top 1%, behind “Java” and “Python” but ahead of “Golang” (top 2%) and “Rust” (4%). The frequency of questions about Kubernetes reflects the importance of container orchestration to modern operations. “AWS,” “Azure,” and “cloud” were also among the most common words (all in the top 1%), again showing that our audience is highly interested in the major cloud platforms. Usage of the term “GCP” and the bigram “Google Cloud” trailed the others, though to some extent that’s because Google has never been clear about the name of its cloud platform. Both “GCP” and “Google Cloud” were in the top 3% of their respective lists.

Words about cryptocurrency (“Bitcoin,” “Ethereum,” “crypto,” “cryptocurrency,” “NFT”) are further down on the list, though still in the top 20%. That’s not surprising. Elsewhere, we’ll see that the use of content about these topics is rising sharply, but usage still isn’t large. We have no “previous year” data for Answers, so we can’t discuss trends, but the fact that these terms are appearing in significant numbers is certainly important.

That quick dip into the bag of words gives us some clues about what we’ll see when we look at the data in more detail. Now let’s start investigating content usage: what our customers actually read, watched, or participated in during the past year.

Becoming Secure

Security was frequently in the news in 2021, and for the worst possible reasons. A wave of ransomware attacks crippled important infrastructure, hospitals, and many other businesses, both large and small. Supply chain attacks, in which an attacker places a payload in software that’s delivered to its victim through normal distribution channels, occurred in both open source and commercial software. In one notable case, the victim was a well-known enterprise security company, whose infected software was distributed to thousands of clients.

We saw large increases for content about specific topics within security. Usage of content about ransomware has almost tripled (270% increase). Content about privacy is up 90%; threat modeling is up 58%; identity is up 50%; application security is up 45%; malware is up 34%; and zero trust is up 23%. Safety of the supply chain isn’t yet appearing as a security topic, but usage of content about supply chain management has seen a healthy 30% increase. The increase for content on identity is a particularly important sign. Identity management is central to zero trust security, in which components of a system are required to authenticate all attempts to access them. Understanding identity management is a big step toward putting zero trust security into practice.

Usage of general content also increased. Units viewed for items with the word “security” or “cybersecurity” in the title increased by 17% and 24%, respectively. Network security, also a general topic, increased 15%. While these increases are relatively modest compared to specific topics like ransomware and privacy, keep in mind that in absolute numbers, the usage of “security” titles led all other security topics by a large margin. And a 17% increase in an established topic is very healthy.

Another important sign is that usage of content about compliance and governance was significantly up (30% and 35%, respectively). This kind of content is frequently a hard sell to a technical audience, but that may be changing. While compliance and governance are frequently mentioned in the context of data and privacy, it’s important to realize that they’re central issues for managing security. What are an organization’s responsibilities if it suffers a breach or an attack? Has the organization managed its data responsibly? This increase points to a growing sense that the technology industry has gotten a regulatory free ride and that free ride is coming to an end. Whether it’s stockholders, users, or government agencies who demand accountability, enterprises will be held accountable. Our data shows that they’re getting the message.

Units viewed and year-over-year growth for security

According to a study by UC Berkeley’s School of Information, cybersecurity salaries have crept slightly ahead of programmer salaries in most states, suggesting increased demand for security professionals. And an increase in demand suggests the need for training materials to prepare people to supply that demand. We saw that play out on our platform. Looking for titles matching security certifications proved to be a poor metric (probably because long, unwieldy certification names do poorly in titles), but when we look at our content taxonomy rather than title searches, we see that SSCP (System Security Certified Practitioner) is up 54%, and CompTIA Security+ is up 27%.

Software Development

Software development is a mega category on the O’Reilly learning platform. It includes almost everything, from programming languages to cloud to architecture and more. While it’s customary to start with a rundown on the programming language horse race, we won’t do that. Whether Python leads Java or not just isn’t interesting (though we will have a few words to say about that later on).

The most interesting topic within software development hasn’t yet made it to our platform. Everyone is talking about developer experience (DX): what can be done to make life better for software developers. How can their jobs be made more enjoyable, helping them to become more effective? That’s an issue that will become increasingly important as organizations try to keep programmers from jumping ship to another company. While we don’t yet have any content on developer experience, we’d be surprised if there isn’t some next year. For one source of ideas about where developer experience is headed, look at our report Low Code and the Democratization of Programming. In it, we tried to take a longer view—examining not what trends will change programming next year but what we might see five or ten years from now.

Software architecture, Kubernetes, and microservices were the three topics with the greatest usage for 2021. Their year-over-year growth is also very healthy (19%, 15%, and 13%, respectively). It only looks small when compared with the growth of topics like API gateway (218%). That kind of growth reflects the “law” we’ve observed throughout this report: it’s easy for a small topic to have large growth numbers but much more difficult for a topic that’s already dominant. API gateway content gets roughly 1/250 as many units viewed as content on architecture or Kubernetes does.

However, we want to be clear: while API gateway’s usage numbers are relatively small, 218% growth is a very strong signal. So is the growth in cloud native (54%), starting from significantly more units viewed in 2020 (roughly 1/8 of architecture or Kubernetes). Enterprises are investing heavily in Kubernetes and microservices; they’re building cloud native applications that are designed from the start to take advantage of cloud services. And API gateways are an important tool for routing requests between clients and services.

In this context, it’s no accident that content usage for containers shows significant growth (137%), while Docker shows less growth but higher usage. Containers are proving to be the best way to package applications and services so that they’re platform independent, modular, and easily manageable. We don’t want to understate the difficulty of moving to containers and using tools from the Kubernetes ecosystem to manage them, but remember that a few years ago, enterprise applications were monoliths running on a small number of servers and managed entirely by hand. Many businesses have now scaled an order of magnitude or so beyond that, with hundreds of services running on thousands of servers in the cloud, and you’ll never succeed at that scale if you’re starting and stopping servers and services by hand. We’re still exploring this transition, and it will continue to be a big story for the next few years.

When we’re talking about microservices running in the cloud, we’re talking about distributed systems. So it’s no surprise that usage of content about distributed systems rose 39% in the past year. The related topics complex systems and complexity also showed significant growth (157% and 8%). It’s also worth noting that design patterns, which fell out of favor for a few years, have come back: usage is very solid and year-over-year growth is 19%.

Quantum computing remains a topic of interest. Units viewed is still small, but year-over-year growth is 39%. That’s not bad for a technology that, honestly, hasn’t been invented yet. Although some primitive quantum computers are available now, computers that can do real work are still several years away. (IBM’s roadmap has 1,000-physical-qubit computers coming in two years, though the best estimate is that we’ll need 1,000 physical qubits to create one error-corrected qubit.) But when those computers arrive, there will clearly be people ready to program them.

We’ve said almost nothing about architecture, except to notice heavy usage and solid growth. All this ferment—rebuilding legacy applications, moving to the cloud, microservices, orchestration—doesn’t happen without good, consistent software design. Success with microservices is impossible without giving serious thought to designing good APIs for your services to present to each other and, in turn, to the rest of the world. The problem with legacy applications is that they’re inflexible: they leave you stuck with the capabilities you had 20 years ago. If you replace your old legacy software with new legacy software that doesn’t have the ability to evolve as your needs and opportunities change, if you build something that’s just as inflexible as what it replaced, what have you accomplished? This is where software architecture comes into play: how do teams build systems that aren’t just adequate for today but that will be flexible enough to grow with the business? Solid year-over-year growth and heavy usage is exactly what we’d expect to see.

Units viewed and year-over-year growth for software development topics

Finally, last year we observed that serverless appeared to be keeping pace with microservices. That’s no longer true. While microservices shows healthy growth, serverless is one of the few topics in this group to see a decline—and a large one at that (41%).

Programming Languages

We’ve said many times that we’re uninterested in the language horse race. Usage of well-established programming languages changes very slowly year to year. Occasionally a language breaks out of the pack, but that’s rare. We’d go so far as to say it’s less of a horse race than a turtle race—a turtle race in which a language that’s slowly gaining traction in the enterprise space can gradually come to dominate the cool language du jour.

So we’ll avoid the horse race entirely and focus on possible reasons for any changes. What are the important changes since last year? C++ has grown significantly (13%) in the past year, with usage that is roughly twice C’s. (Usage of content about C is essentially flat, down 3%.) We know that C++ dominates game programming, but we suspect that it’s also coming to dominate embedded systems, which is really just a more formal way to say “internet of things.” We also suspect (but don’t know) that C++ is becoming more widely used to develop microservices. On the other hand, while C has traditionally been the language of tool developers (all of the Unix and Linux utilities are written in C), that role may have moved on to newer languages like Go and Rust.

Go and Rust continue to grow. Usage of content about Go is up 23% since last year, and Rust is up 31%. This growth continues a trend that we noticed last year, when Go was up 16% and Rust was up 94%. Is the decline in Rust’s rate of growth a concern? Don’t let the second derivative fool you. Last year Rust content was starting from near-zero and 90% growth was easy. This year it’s well-established (I don’t think we’ve ever seen a language establish itself quite so quickly), and we expect growth to continue. Both Rust and Go are here to stay. Rust reflects significantly new ways of thinking about memory management and concurrency. And in addition to providing a clean and relatively simple model for concurrency, Go represents a turn from languages that have become increasingly complex with every new release.

We see less of the “functional versus object oriented” wars than we have in the past, and that’s a good thing. Both topics are down (14% and 16%, respectively). Functional features have been integrated into Java, C#, and a number of other languages, so the only real question to debate is how much of a purist you want to be. But that’s a distraction—our customers want to get their work done.

Having said all that, what about the “old guard”? They’re nice and stable. Python, Java, and JavaScript are still the leaders, with Java up 4%, Python down 6%, and JavaScript down 3%. (“Python” and “Java” are both in the top five words used in O’Reilly Answers.) Although any change under 10% is small in the greater scheme of things, we’re surprised to see Python down. And, like last year, usage of Java content is only slightly behind that of Python if you add Spring usage to Java usage. (Spring is a large, all-encompassing group of frameworks in the Java ecosystem, but Spring titles usually don’t mention Java.) C#, a core language on Microsoft platforms, was also stable (down 1% year-over-year).

Scala and Kotlin, two other languages that belong to the Java ecosystem, are both down, 27% and 9%, respectively. Scala’s drop is particularly noteworthy. That may reflect the release of Scala 3.0 in May 2021, which would tend to make content based on Scala 2 obsolete.

Use of JavaScript content on our platform is surprisingly low—though use of content on TypeScript (a version of JavaScript with optional static typing) is up. Is TypeScript replacing JavaScript? We’ll see in a few years. Even with 19% growth, TypeScript has a ways to go before it catches up; TypeScript content usage is roughly a quarter of JavaScript’s. The relatively low usage of JavaScript on our platform may reflect our enterprise-centered audience, large numbers of whom work on backend and middleware services. Our data is similar to TIOBE’s (in which the top languages are Python, C, and Java) and sharply different from RedMonk’s (in which JavaScript leads, followed by Python and Java).

In our 2021 Data/AI Salary Survey, we noted that most respondents used more than one programming language. That’s certainly true of our audience as a whole. We also discovered that Python programmers had midrange salaries, while the highest salaries went to respondents who used Go, Rust, and Scala. Our interpretation was that Python has become table stakes. If you work with data, you’re expected to know Python; the ability to work with one of these other languages gives you added value. While we don’t have salary data for platform users, we suspect the same is true. If you work on enterprise or backend software, Java is table stakes; if you do frontend development, JavaScript is table stakes. But whatever your specialty or your primary language, fluency with next-generation languages like Go and Rust gives you added value.

One final final note and then we’ll move on. When we looked at our analysis of O’Reilly Answers, we were puzzled by the top question: “What is dynamic programming?” It seemed strange to see that at the top of the list. Stranger still: while that question was in the question bank, when we removed question bank clicks from the data and looked only at organic questions (questions typed by a user), “What is dynamic programming?” was still at the top. We don’t think this is a rehash of the tired “static versus dynamic” debate of a few years ago; there were no questions about dynamic languages. Dynamic programming is a technique for breaking down complex problems into smaller components. It will clearly be a topic to watch as programmers continue to deal with increasingly complex systems.

Units viewed and year-over-year growth for programming languages Learning About the Cloud

Our data about the cloud and cloud providers tells an interesting story. It’s clear that Amazon Web Services’ competition is on the rise. Usage of content about Microsoft Azure is up 32% and Google Cloud is up 54%, while the usage of AWS-related content has declined by 3%. Actual usage of content about Azure almost matches AWS, while Google Cloud is farther behind, although that may reflect the quantity of material available.

If we take a step back and look at the term “cloud” in general, we find that content about cloud is slightly larger than content about AWS and has grown 15% since last year. (Keep in mind that a title like Machine Learning in the AWS Cloud would match both terms.) Cloud native—the practice of building applications so that they run first in the cloud and take advantage of cloud services from the start—is up significantly (54%).

We also see another important trend. Usage of content about hybrid clouds and multiclouds is still small (roughly 1/10 of that of Google Cloud, the smallest of the major cloud providers), but growing very fast (145% and 240%, respectively). We won’t split hairs about the difference between a hybrid cloud and a multicloud; there’s enough confusion in the marketplace that, for all practical purposes, they’re identical. But we can say that multicloud and hybrid cloud approaches both reflect a fundamental reality: it’s difficult, if not impossible, to build a cloud strategy around a single provider. Cloud deployments aren’t top-down. They start with a research experiment here, a marketing project there, a group that’s frustrated with the time it takes to requisition hardware, and so on. Sooner or later, you have a cloud deployment—or, more likely, six or seven completely different deployments. By the time someone starts to build a high-level cloud strategy, the organization is already using two or three of the major cloud providers. They are already multicloud, whether or not they realize it. An important part of building a cloud strategy is recognizing that the “cloud” is inherently multi- (or hybrid) and that the biggest issue isn’t which provider to choose but how to build an effective cloud infrastructure across multiple providers. That’s an important aspect of becoming cloud native.

Units viewed and year-over-year growth for cloud topics Stable as the Web

The core technologies for web programming have been very stable over the last two years. Usage of content about core components HTML, CSS, and JavaScript is almost unchanged (up 1%, up 2%, and down 3%, respectively). If Java and Python are table stakes for enterprise and data developers, so much more so are HTML, CSS, and JavaScript for frontend developers. They’re the foundational technologies for the web. If you’re not fluent with them, you’re not part of the conversation.

PHP is hardly a new technology—any PHP user will tell you that almost 80% of the web is built with it. The use of content about PHP is up 6%, which doesn’t tell you how many jobs there are or will be but does mean that PHP isn’t leaving anytime soon. The use of content about jQuery (another older technology that’s often used in conjunction with PHP) is up 28%. And interest in web design, a perennial topic that will never go away, is up 23%.

Among the newer frameworks and meta frameworks, Svelte seems to be thriving (up 71%, though from a very low starting point), while interest in Vue and Next.js seems to be fading (down 13% and 13%). Svelte may become a challenger to the more widely used frameworks in a few years if this keeps up. There was surprisingly little interest in Jamstack. That may be because the term rarely appears in the title of books or training, though searches for the term “Jamstack” were also infrequent.

Usage of content about the React framework is also essentially unchanged this year (up 2%), while Angular framework content usage is down significantly (16%). It’s probably just coincidental that JavaScript and React usage are almost identical.

In the Pythonic corner of the web development space, Django is holding steady: the number of units viewed is healthy (and greater than Flask, Svelte, or Vue), and we saw no change year-over-year. Usage of content about Python’s Flask framework is headed downward (12% decline). Likewise, the most widely known Ruby framework, Rails, is down 19%.

Units viewed and year-over-year growth for web topics AI, ML, and Data

There’s been a lot of speculation in the press about artificial intelligence. Are we heading into another “AI winter”? Is it an important technology for today, yesterday’s fad, or something impossibly far off in the future? To some extent, this kind of speculation comes with the territory, especially since Gartner published its famous “hype curve.” AI has certainly been hyped. But is it heading into the so-called “trough of disillusionment”?

We’d say no. That’s not what our data shows. Yes, usage of content with “artificial intelligence” in the title is down 23% in 2021, and “AI” is down 11%. But these topics are relatively small and narrow. The topic that clearly dominates this space is machine learning (ML): usage of AI plus artificial intelligence content is roughly 1/4 of ML plus machine learning.

What’s the difference between AI and ML? For the purposes of this report, we define machine learning as “the part of artificial intelligence that works”—and, implicitly, the part of AI that’s being put into practice now. AI is, by nature, a research topic. While we have plenty of researchers among our members, our core audience is programmers and engineers: people who are putting technology into practice. And that’s the clue we need to make sense of this puzzle.

Usage of content with “machine learning” in the title is flat year-over-year (down 1%, which is noise). Usage of content with “ML” in the title is up 35%. There are more titles with the phrase “machine learning”; if you add the two up, you get a very slight gain. Still noisy, but positive noise rather than negative. We don’t expect another AI winter—AI is too solidly entrenched in online business practices, and in ways that aren’t as visible as social media recommendations; you’ll never know (or care) whether the company that makes your espresso machine is using machine learning to optimize the manufacturing process and manage inventory, but if they aren’t now, they will be. However, it’s worth noting that AI and ML were the natural outgrowths of “big data” and “data science,” both terms that are now in decline. Big data, of course, never ended; it evolved: just look at the training data needed to build an AI model. The question for the coming year, then, is whether machine learning and artificial intelligence will “evolve”—and if so, into what?

Now let’s look at some specific techniques. Usage on deep learning is down 14%, but usage on neural networks is up 13%, reinforcement learning is up 37%, and adversarial networks is up 51%. Interest has clearly shifted from general topics to specific ones.

Natural language processing has been very much in the news. As was the case for machine learning, usage of content with “natural language processing” in the title hasn’t changed much (up 3%); the abbreviation “NLP” is up 7%. Again, we can look at some of the new techniques that have made the news. The platform had no content on Transformers, BERT, or GPT back in 2020. All three are now coming onto the map. Similarly, there’s currently no content on GitHub Copilot, which uses the GPT-3 model to translate comments into working code, but we expect it to be a strong performer in 2022.

So what can we conclude? General topics like AI, ML, and GPT are holding their own with content usage or are down. However, usage of content about specific techniques like adversarial networks and reinforcement learning is growing. And content for the newest techniques, like BERT and Transformers, is only now starting to appear. That doesn’t look like a slide into disillusionment but like the natural consequence of a field that’s moving from theory into practice.

It’s also worth looking at the significant increase in the use of content about data governance (up 87%) and GDPR (up 61%). Everyone working with data should know that data governance and its related topics (data provenance, data integrity, auditing, explainability, and many other specialties) aren’t optional. Regulation of the use of data isn’t some vague thing off in the future. It’s here now: GDPR (the EU’s General Data Protection Regulation) is in effect, as is California’s Consumer Privacy Act (CCPA). Now is the time to start thinking about data governance—not later, when it will certainly be too late. Data governance is here to stay, and our platform shows that data professionals are learning about it.

Units viewed and year-over-year growth for AI and ML topics Databases

You can’t talk about machine learning without talking about data and databases. It’s no surprise that, when we look at content usage, Oracle is leading the pack. It’s also no surprise that Oracle’s growth is slow (5%); as we often observe, rapid growth is most often associated with smaller, newer topics. Usage of content about the open source MySQL database (now owned by Oracle) is roughly 1/4 as high and has grown substantially (22%).

It’s worth looking at alternatives to Oracle though. We’ve heard about the death of NoSQL, and certainly usage of content about NoSQL is down (17%). But that isn’t a good metric. NoSQL was never a single technology; databases like Cassandra, HBase, Redis, MongoDB, and many others are wildly different. NoSQL is really more a movement than a technology—one that’s devoted to expanding the number of storage options for system designers. A good understanding of NoSQL means realizing that for most applications, relational databases are a good fit.

Of the more established NoSQL databases, MongoDB shows 10% growth. Cassandra, Redis, and HBase have declined sharply (27%, 8%, and 57%, respectively). Together, the four show total usage about 40% greater than MySQL, though the total for all four has declined somewhat (4%) since 2020. Momentum has clearly shifted from the NoSQL movement back to relational databases. But that isn’t the end of the story.

We’ve been following graph databases for some time, and in the last year, they’ve gotten a lot of press. But it’s difficult to discuss specific graph databases because most established database vendors have a graph database product integrated into their offering. That said, use of content with the term “graph databases” is up 44%. It’s still a small category, but that’s a significant signal.

Likewise, usage of content about time series databases (databases that associate every entry with a time stamp) is up 21%. Time series databases may prove important for applications stressing monitoring, logging, and observability. Using AI to analyze logs and detect malicious activity is one such application.

Relational databases still dominate the database world, and there’s no reason to expect that to change. Nor should it. The promise of NoSQL wasn’t replacing relational databases; it was increasing the number of options available. The rise of graph and time series databases are simply examples of this promise in action. It will be interesting to see whether this trend continues into 2022.

Units viewed and year-over-year growth for databases Operations, DevOps, and SRE

Operations is “up and to the right.” Very few topics in this group saw declines since last year, and a lot had big gains. As we said last year, it doesn’t really matter what you call operations: call it DevOps, call it SRE, call it George…this is the task of running the servers, managing software deployment, and keeping the business online. As many found out firsthand during the pandemic, keeping the servers running is crucial, not just to support staff working from home but also to move as much of the business as possible online. People have said “every business is an online business” for years now, but in the past year, that really became true. If your business wasn’t online when COVID-19 hit, it could have easily ceased to exist. Add to that the staffing pressures caused by illness and by resignations or job changes, and it quickly became clear that there’s a real need to do more with less. IT groups found themselves doing much, much more with fewer team members than before. The answer to these challenges is automation (to allow fewer people to manage more systems) and reliability engineering (reducing downtime to reduce staff stress).

We saw substantial increases in the use of titles with the words “observability” (up 124%), “container” (137%), “CI/CD” (109%), “monitoring” (up 36%), and “testing” (16%). A 36% increase for monitoring is very healthy, but the much larger increase for observability shows that this concept is winning people’s hearts and minds. In practice, many find the difference between observability and monitoring confusing. Observability ultimately boils down to the ability to find the information you need to analyze a system’s behavior, while monitoring refers to logging and watching certain preconfigured parameters that indicate the system’s health. It’s a subtle difference—one way to think of it is that monitoring tells you when something’s wrong, but observability gives you the data needed to debug unexpected or strange failure modes, predict failures more reliably, and understand system performance in depth.

CI/CD (continuous integration and continuous deployment) is the latest stage in a long trend of improved tools for automating the development and deployment process, starting way back in the 1970s with Unix’s make utility (for building software) and adding automated testing tools in the early 2000s (JUnit and its relatives) and automated deployment tools a few years later (Jenkins). We now build pipelines that automate the path from the programmer to the server. In the early days of the O’Reilly Velocity Conference, we heard about how companies could build, test, and deploy software many times per day. Automating the deployment process makes it much faster and more reliable, in turn making IT staff more effective because they no longer have to shepherd code “by hand” from the developer’s laptop to the production servers. CI/CD has now become standard practice for almost every online business. It’s something the enterprises that are just moving online, or just moving to the cloud, need to understand to get the most out of their staff.

“Testing” appears to be lagging other terms in this group, but it’s worth noting that the most frequently asked question on O’Reilly Answers was “How do I write good unit test cases?” The practice of automated testing, integrated into the deployment process, is one of the foundations of modern operations. If a software release doesn’t pass all of its tests, it can’t be deployed. That practice gives software developers the confidence to move fast without breaking things.

We’ve also seen increases in content about the tools used to deploy software. Git is up 44%, Kubernetes is up 15%, Docker is up 5%, and Terraform is up 6%. Kubernetes led all topics in this category in units viewed. Furthermore, the two most popular Kubernetes certifications, Certified Kubernetes Application Developer (CKAD) and Certified Kubernetes Administrator (CKA), were up 24% and 13%, respectively. Docker’s relatively low growth may be attributed to the standardization of container formats (the Container Runtime Interface, or CRI), and the removal of Docker as a requirement for Kubernetes. There are now viable alternatives to Docker.

It’s worth looking a bit more at the Kubernetes ecosystem. While usage of content about Kubernetes is up 15% and Helm (Kubernetes’s package manager) is up 68%, usage of content about Istio (a service mesh, an important part of the Kubernetes ecosystem) is sharply down (46%). At first glance, this is confusing: why would Kubernetes and Helm be up, while Istio is down? It’s possible that open source politics around Google’s control over Istio hurt its adoption, though we suspect that only had a small effect. You’ve probably heard that Kubernetes has a steep learning curve; if you’re a developer, you may have experienced that yourself. Istio said, “Hold my beer, you haven’t seen complex yet.” A service mesh is an important part of container orchestration, but Istio is proving to be too complex. Kubernetes has proven essential for managing cloud deployments; Istio hasn’t.

Both Kubernetes and Istio originated at Google and were designed to solve Google-scale problems. But very few businesses—even those that any reasonable person would call “large”—need to manage IT infrastructure at Google’s scale. Will we eventually have container orchestration tools that solve problems for businesses that aren’t as huge as Google? Work on the Service Mesh Interface (SMI), a standard interface between service mesh software and Kubernetes, may allow a new generation of service mesh implementations to arise; we hope some of those will be simpler.

Three tools are sharply down: Chef, Puppet, and Ansible (27%, 38%, and 20%). In last year’s report, we showed that the decline of these automated configuration management tools coincided with the rise of Docker and Kubernetes. That decline continues.

What about the top-level terms “operations,” “SRE,” and “DevOps” themselves? Usage of titles containing those words was up (7%, 17%, and 2%, respectively), though obviously these increases are smaller than we saw for tools or concepts. As with AI, we may be seeing this part of the industry mature: our customers are less interested in introductory content about the high-level concepts and more interested in specific ideas and tools that they can use in their businesses. It’s also worth highlighting the 2% increase for DevOps. Our 2020 report showed DevOps down 17% from 2019 to 2020. In 2021, that slide has stopped. Over time, we expect that terms like DevOps and SRE will come and go, but the concepts and the tools that they introduced will be with us long-term.

Units viewed and year-over-year growth for operations, DevOps, and SRE

Finally, look at the units viewed for Linux: it’s second only to Kubernetes. While down very slightly in 2021, we don’t believe that’s significant. Linux has long been the most widely used server operating system, and it’s not ceding that top spot soon. If anything, its importance has increased: Linux is the standard operating system for the cloud. Even on Azure, Linux dominates. Solid knowledge of Linux is essential for anyone working in operations today.

Cryptocurrency and Blockchain

Now we’ll look at some ideas that have exploded in the last year. They aren’t necessarily new, but for various reasons they’ve taken off. Our data on these topics tends to be hazy. And, in Arlo Guthrie’s words, many of these topics have “come around on the guitar” one or more times in the past only to fade back into the noise.

Whether it’s the future of finance or history’s biggest Ponzi scheme, use of content about cryptocurrency is up 271%, with content about the cryptocurrencies Bitcoin and Ethereum (ether) up 166% and 185% respectively. General content about blockchains is up 78%, and from a much higher starting point (reflecting the fact that our audience has more developers than speculators). Hyperledger, a collection of blockchain technologies that targets enterprise markets, is up 66%. Our data doesn’t tell you whether to buy bitcoin or ether, but it does show a huge increase in interest.

We’ve seen a huge increase of interest in nonfungible tokens (NFTs), but there’s where we run into data availability problems. Searches for the term “NFT” are up 4,474%, almost 45 times higher year-over-year. Granted, that’s from an extremely small starting point (only 26 searches in 2019). From that starting point, a 45x increase still takes NFTs to a relatively small endpoint. So which do you believe? A 45x increase or a small endpoint? Take your pick, but our data shows that NFTs shouldn’t be ignored.

Web3 is a collection of ideas about a “next generation” web that’s designed so that it can’t be dominated by a small number of gigantic platforms, like Facebook and Google. Web3 proponents typically mix decentralized protocols like the InterPlanetary File System (IPFS) with blockchains and NFTs to make content immutable and ownable. As with NFTs, no content in our platform has “Web3” in the title. But we’ve seen a 343% increase in the number of searches for the term—again, from a small starting point. We’ve been watching decentralized web technologies for years (we staged a peer-to-peer conference in 2001) and wonder whether the connection between the decentralized web and blockchain will make it take off. Possibly…or possibly not. It isn’t clear what blockchains and NFTs bring to Web3 aside from the hype. We already have a web where anyone can publish. A web where everything has to be owned and where requiring all transactions to pay a tax to blockchain miners isn’t a step forward. We also see no guarantee that a decentralized web couldn’t be dominated by a small number of Google-sized players. We can’t tell you whether Web3 will succeed, but our data shows that it’s becoming an idea worth watching.

Units viewed and year-over-year growth for blockchain topics Virtual Reality, Augmented Reality, and the Metaverse

Virtual and augmented reality are also topics we’ve been tracking for years. They’ve often seemed at the point of breaking out, but they’ve never made it, at least in part because nobody wants to hang around wearing goggles all the time. Google Glass looked like it had a chance back in 2013, and it survives to this day in an enterprise edition, but it never achieved widespread use. Startups like Oculus (now part of Meta) have made VR goggles aimed at consumers, but they’ve never broken beyond a small segment of the gamer market.

What about this year? We still think VR and AR are on their way. Mark Zuckerberg kicked off a storm by talking about “the metaverse” back in July, and by more recently renaming Facebook “Meta.” Microsoft and many other companies have followed suit by announcing their versions of the metaverse. Apple’s been quiet, but the company is working on augmented reality glasses. (What little we’ve heard sounds like an update of Google Glass with current technology—but if any company’s core expertise is making something cool, it’s Apple.)

Has all this ferment shown up in our platform data? Keep in mind that we’re only using data through September (in both 2020 and 2021). The results are ambiguous. Use of titles containing the phrase “augmented reality” is down (22%), and those are the most heavily used titles in this group. But virtual reality, VR, and AR are all up (13%, 28%, and 116%, respectively), yielding a 24% gain across the entire group.

The term “metaverse” hasn’t shown up in any titles, though there’s a sharp increase in the number of searches for it (489%). And content about WebXR, a vendor-neutral standard for rendering 3D content on VR- and AR-capable devices (in addition to pedestrian 2D devices), is now starting to show up. (VRML, an older standard, has vanished from view.) No content on WebXR was available in 2020, but some has appeared in 2021, and searches for “WebXR” have increased by 168%.

We’ll forgive you if you decide to bet against VR. Meta (née Facebook) has dragged its own name through the mud for way too long; while the company might succeed, it’s hard to imagine many people wanting to share video of the intimate details of their life with them. And while Zuckerberg is excited about the metaverse’s potential for “work from home” employees, it’s extremely difficult to imagine that a company will want a video feed of its staff’s activities going to the Meta mothership. But Apple has really become a master of conspicuous consumerism. It’s very hard to bet against them when it comes to making high-tech fashion accessories. Mark us cautiously skeptical.

Units viewed and year-over-year growth for VR and AR topics Until Next Year

So after reviewing over a billion units viewed on over 50,000 items in the O’Reilly learning platform, after looking at a million unique search queries plus a smaller number of queries from Answers, where are we? What can we say about the coming year?

Many events grab attention: GPT-3 generating text that could have been written by humans. Cybercriminals demanding millions of dollars after a ransomware attack. Other newsworthy topics include new technologies like NFTs that are just starting to show up in our data and older technologies like virtual reality that may be on the brink of a surge. And there are even more technologies that get a lot of coverage in the technology press, though they aren’t yet appearing in our data in significant ways: robotic process automation (RPA), digital twins, edge computing, and 5G, to name a few. All of these technologies are important, or might be important, depending on where the future takes us. Some are genuinely exciting; others are rebrandings of older ideas.

The real work of technology isn’t coming up with splashy demos; it’s the hard work of taking these breakthroughs and integrating them into products. It’s coming up with solutions to real problems and deploying those as real-world services. It’s defending your IT infrastructure against attack in the middle of a pandemic. Using natural language models to build customer service systems that are less frustrating for the customer and the customer service agent; auditing loan approval systems to see whether they’re fair; preventing ransomware attacks rather than succumbing to them. It probably won’t make the news if there are 20% fewer successful ransomware attacks in the coming year. After all, few people notice when something doesn’t happen. But all of us will be safer nonetheless.

These are the changes that affect our lives, and these are the kinds of changes we see by looking at the data on our platform. Users learning more about security; customers learning more about architecting software for the cloud; programmers trying to come to terms with concurrency, and learning new languages and techniques to deal with complexity; and much more. We see artificial intelligence moving into the real world, with all the problems and opportunities that entails, and we see enterprises realizing that operations isn’t just a cost center—it’s the lifeblood of the business.

That’s the big picture, which (like a Bruegel painting) is built from many, many people, each doing what they think is important, each solving the problem that they face. Understanding technology—and understanding what the O’Reilly platform tells us—is not really about the flashy events, important though they may be; it’s all about understanding the people who depend on our platform every day and what they need to learn to get on with the task of building their futures.

  1. Last year’s platform report was based on January through August, so the two papers aren’t directly comparable.
Categories: Technology

What Is Causal Inference?

O'Reilly Radar - Tue, 2022/01/18 - 05:12
The Unreasonable Importance of Causal Reasoning

We are immersed in cause and effect. Whether we are shooting pool or getting vaccinated, we are always thinking about causality. If I shoot the cue ball at this angle, will the 3 ball go into the corner pocket? What would happen if I tried a different angle? If I get vaccinated, am I more or less likely to get COVID? We make decisions like these all the time, both good and bad. (If I stroke my lucky rabbit’s foot before playing the slot machine, will I hit a jackpot?)

Whenever we consider the potential downstream effects of our decisions, whether consciously or otherwise, we are thinking about cause. We’re imagining what the world would be like under different sets of circumstances: what would happen if we do X? What would happen if we do Y instead? Judea Pearl, in The Book of Why, goes so far as to say that reaching the top of the “ladder of causation” is “a key moment in the evolution of human consciousness” (p. 34). Human consciousness may be a stretch, but causation is about to cause a revolution in how we use data. In an article in MIT Technology Review, Jeannette Wing says that “Causality…is the next frontier of AI and machine learning.”

Causality allows us to reason about the world and plays an integral role in all forms of decision making. It’s essential to business decisions, and often elusive. If we lower prices, will sales increase? (The answer is sometimes no.) If we impose a fine on parents who are late picking up their children from daycare, will lateness decrease? (No, lateness is likely to increase.) Causality is essential in medicine: will this new drug reduce the size of cancer tumors? (That’s why we have medical trials.) This kind of reasoning involves imagination: we need to be able to imagine what will happen if we do X, as well as if we don’t do X. When used correctly, data allows us to infer something about the future based on what happened in the past. And when used badly, we merely repeat the same mistakes we’ve already made. Causal inference also enables us to design interventions: if you understand why a customer is making certain decisions, such as churning, their reason for doing so will seriously impact the success of your intervention.

We have heuristics around when causality may not exist, such as “correlation doesn’t imply causation” and “past performance is no indication of future returns,” but pinning down causal effects rigorously is challenging. It’s not an accident that most heuristics about causality are negative—it’s easier to disprove causality than to prove it. As data science, statistics, machine learning, and AI increase their impact on business, it’s all the more important to re-evaluate techniques for establishing causality.

Scientific Research

Basic research is deeply interested in mechanisms and root causes. Questions such as “what is the molecular basis for life?” led our civilization to the discovery of DNA, and in that question there are already embedded causal questions, such as “how do changes in the nucleotide sequence of your DNA affect your phenotype (observable characteristics)?” Applied scientific research is concerned with solutions to problems, such as “what types of interventions will reduce transmission of COVID-19?” This is precisely a question of causation: what intervention X will result in goal Y? Clinical trials are commonly used to establish causation (although, as you’ll see, there are problems with inferring causality from trials). And the most politically fraught question of our times is a question about causality in science: is human activity causing global warming?


Businesses frequently draw on previous experience and data to inform decision making under uncertainty and to understand the potential results of decisions and actions. “What will be the impact of investing in X?” is another causal question. Many causal questions involve establishing why other agents perform certain actions. Take the problem of predicting customer churn: the results are often useless if you can’t establish the cause. One reason for predicting churn is to establish what type of intervention will be most successful in keeping a loyal customer. A customer who has spent too long waiting for customer support requires a different intervention than a customer who no longer needs your product. Business is, in this sense, applied sociology: understanding why people (prospects, customers, employees, stakeholders) do things. A less obvious, but important, role of causal understanding in business decision making is how it impacts confidence: a CEO is more likely to make a decision, and do so confidently, if they understand why it’s a good decision to make.

The Philosophical Bases of Causal Inference

The philosophical underpinnings of causality affect how we answer the questions “what type of evidence can we use to establish causality?” and “what do we think is enough evidence to be convinced of the existence of a causal relationship?” In the eighteenth century, David Hume addressed this question in An Enquiry Concerning Human Understanding, where he establishes that human minds perform inductive logic naturally: we tend to generalize from the specific to the general. We assume that all gunpowder, under certain conditions, will explode, given the experience of gunpowder exploding under those conditions in the past. Or we assume that all swans are white, because all the swans we’ve seen are white. The problem of induction arises when we realize that we draw conclusions like these because that process of generalization has worked in the past. Essentially, we’re using inductive logic to justify the use of inductive logic! Hume concludes that “we cannot apply a conclusion about a particular set of observations to a more general set of observations.”

Does this mean that attempting to establish causality is a fool’s errand? Not at all. What it does mean is that we need to apply care. One way of doing so is by thinking probabilistically: if gunpowder has exploded under these conditions every time in the past, it is very likely that gunpowder will explode under these conditions in the future; similarly, if every swan we’ve ever seen is white, it’s likely that all swans are white; there is some invisible cause (now we’d say “genetics”) that causes swans to be white. We give these two examples because we’re still almost certain that gunpowder causes explosions, and yet we now know that not all swans are white. A better application of probability would be to say that “given that all swans I’ve seen in the past are white, the swans I see in the future are likely to be white.”

Attempts at Establishing Causation

We all know the famous adage “correlation does not imply causation,” along with examples, such as the ones shown in this Indy100 article (e.g., the number of films Nicolas Cage makes in a year correlated with the number of people drowning in a swimming pool in the US). Let us extend the adage to “correlation does not imply causation, but it sure is correlated with it.” While correlation isn’t causation, you can loosely state that correlation is a precondition for causation. We write “loosely” because the causal relationship need not be linear, and correlation is a statistic that summarizes the linear relationship between two variables. Another subtle concern is given by the following example: if you drive uphill, your speed slows down and your foot pushes harder on the pedal. Naively applying the statement “correlation is a precondition for causation” to this example would lead you to precisely draw the wrong inference: that your foot on the pedal slows you down. What you actually want to do is use the speed in the absence of your foot on the pedal as a baseline.

Temporal precedence is another precondition for causation. We only accept that X causes Y if X occurs before Y. Unlike correlation, causation is symmetric: if X and Y are correlated, so are Y and X. Temporal precedence removes this problem. But temporal precedence, aligned with correlation, still isn’t enough for causation.

A third precondition for causation is the lack of a confounding variable (also known as a confounder). You may observe that drinking coffee is correlated with heart disease later in life. Here you have our first two preconditions satisfied: correlation and temporal precedence. However, there may be a variable further upstream that impacts both of these. For example, smokers may drink more coffee, and smoking causes heart disease. In this case, smoking is a confounding variable that makes it more difficult to establish a causal relationship between coffee and heart disease. (In fact, there is none, to our current knowledge.) This precondition can be framed as “control for third variables”.

We could go further; the epidemiologist Bradford Hill lists nine criteria for causation. For our purposes, three will suffice. But remember: these are preconditions. Meeting these preconditions still doesn’t imply causality.

Causality, Randomized Control Trials, and A/B Testing

Causality is often difficult to pin down because of our expectations in physical systems. If you drop a tennis ball from a window, you know that it will fall. Similarly, if you hit a billiard ball with a cue, you know which direction it will go. We constantly see causation in the physical world; it’s tempting to generalize this to larger, more complex systems, such as meteorology, online social networks, and global finance.

However, causality breaks down relatively rapidly even in simple physical systems. Let us return to the billiard table. We hit Ball 1, which hits Ball 2, which hits Ball 3, and so on. Knowing the exact trajectory of Ball 1 would allow us to calculate the exact trajectories of all subsequent balls. However, given an ever-so-slight deviation of Ball 1’s actual trajectory from the trajectory we use in our calculation, our prediction for Ball 2 will be slightly off, our prediction for Ball 3 will be further off, and our prediction for Ball 5 could be totally off. Given a small amount of noise in the system, which always occurs, we can’t say anything about the trajectory of Ball 5: we have no idea of the causal link between how we hit Ball 1 and the trajectory of Ball 5.

It is no wonder that the desire to think about causality in basic science gave rise to randomized control trials (RCTs), in which two groups, all other things held constant, are given different treatments (such as “drug” or “placebo”). There are lots of important details, such as the double-blindness of studies, but the general principle remains: under the (big) assumption that all other things are held constant,1 the difference in outcome can be put down to the difference in treatment: Treatment → Outcome. This is the same principle that underlies statistical hypothesis testing in basic research. There has always been cross-pollination between academia and industry: the most widely used statistical test in academic research, the Student’s t test, was developed by William Sealy Gosset (while employed by the Guinness Brewery!) to determine the impact of temperature on acidity while fermenting beer.

The same principle underlies A/B testing, which permeates most businesses’ digital strategies. A/B tests are an online analog of RCTs, which are the gold standard for causal inference, but this statement misses one of the main points: what type of causal relationships can A/B tests say something about? For the most part, we use A/B tests to test hypotheses about incremental product changes; early on, Google famously A/B tested 40 shades of blue to discover the best color for links.

But A/B tests are no good for weightier questions: no A/B test can tell you why a customer is likely to churn. An A/B test might help you determine if a new feature is likely to increase churn. However, we can’t generate an infinite number of hypotheses nor can we run an infinite number of A/B tests to identify the drivers of churn. As we’ve said, business is applied sociology: to run a successful business, you need to understand why your prospects and customers behave in certain ways. A/B tests will not tell you this. Rather, they allow you to estimate the impact of product changes (such as changing the color of a link or changing the headline of an article) on metrics of interest, such as clicks. The hypothesis space of an A/B test is minuscule, compared with all the different kinds of causal questions a business might ask.

To take an extreme example, new technologies don’t emerge from A/B testing. Brian Christian quotes Google’s Scott Huffman as saying (paraphrasing Henry Ford), “If I’d asked my customers what they wanted, they’d have said a faster horse. If you rely too much on the data [and A/B testing], you never branch out. You just keep making better buggy whips.” A/B tests can lead to minor improvements in current products but won’t lead to the breakthroughs that create new products—and may even blind you to them.

Christian continues: “[Companies] may find themselves chasing ‘local maxima’—places where the A/B tests might create the best possible outcome within narrow constraints—instead of pursuing real breakthroughs.” This is not to say that A/B tests haven’t been revolutionary. They have helped many businesses become more data driven, and to navigate away from the HiPPO principle, in which decisions are made by the “highest paid person’s opinion.” But there are many important causal questions that A/B tests can’t answer. Causal inference is still in its infancy in the business world.

The End of Causality: The Great Lie

Before diving into the tools and techniques that will be most valuable in establishing robust causal inference, it’s worth diagnosing where we are and how we got here. One of the most dangerous myths of the past two decades was that the sheer volume of data we have access to renders causality, hypotheses, the scientific method, and even understanding the world obsolete. Look no further than Chris Anderson’s 2008 Wired article “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”, in which Anderson states:

Google’s founding philosophy is that we don’t know why this page is better than that one: if the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required….

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear.

In the “big data” limit, we don’t need to understand mechanism, causality, or the world itself because the data, the statistics, and the at-scale patterns speak for themselves. Now, 15 years later, we’ve seen the at-scale global problems that emerge when you don’t understand what the data means, how it’s collected, and how it’s fed into decision-making pipelines. Anderson, when stating that having enough data means you don’t need to think about models or assumptions, forgot that both assumptions and implicit models of how data corresponds to the real world are baked into the data collection process, the output of any decision-making system, and every step in between.

Anderson’s thesis, although dressed up in the language of “big data,” isn’t novel. It has strong roots throughout the history of statistics, harking back to Francis Galton, who introduced correlation as a statistical technique and was one of the founders of the eugenics movement (as Aubrey Clayton points out in “How Eugenics Shaped Statistics: Exposing the Damned Lies of Three Science Pioneers” and his wonderful book Bernoulli’s Fallacy, the eugenics movement and many of the statistical techniques we now consider standard are deeply intertwined). In selling correlation to the broader community, part of the project was to include causation under the umbrella of correlation, so much so that Karl Pearson, considered the father of modern statistics, wrote that, upon reading Galton’s Natural Inheritance:

I interpreted…Galton to mean that there was a category broader than causation, namely correlation, of which causation was the only limit, and that this new conception of correlation brought psychology, anthropology, medicine and sociology in large part into the field of mathematical treatment. (from The Book of Why)

We’re coming out of a hallucinatory period when we thought that the data would be enough. It’s still a concern how few data scientists think about their data collection methods, telemetry, how their analytical decisions (such as removing rows with missing data) introduce statistical bias, and what their results actually mean about the world. And the siren song of AI tempts us to bake the biases of historical data into our models. We are starting to realize that we need to do better. But how?

Causality in Practice

It’s all well and good to say that we’re leaving a hallucination and getting back to reality. To make that transition, we need to learn how to think about causality. Deriving causes from data, and data from well-designed experiments, isn’t simple.

The Ladder of Causation

In The Book of Why, Judea Pearl developed the ladder of causation to consider how reasoning about cause is a distinctly different kind of ability, and an ability that’s only possessed by modern (well, since 40,000 BC) humans. The ladder has three rungs (Figure 1), and goes like this:

Figure 1. The ladder of causation: from seeing to doing to imagining.

We, along with just about every animal, can make associations and observations about what happens in our world. Animals know that if they go to a certain place, they’re likely to find food, whether that’s a bird going to a feeder, or a hawk going to the birds that are going to the feeder. This is also the level at which statistics operates—and that includes machine learning.

On this rung of the ladder, we can do experiments. We can try something and see what happens. This is the world of A/B testing. It answers the question “what happens if we change something?”

The third level is where we ask questions about what the world would be like if something were different. What might happen if I didn’t get a COVID vaccine? What might happen if I quit my job? Counterfactual reasoning itself emerges from developing robust causal models: once you have a causal model based on association and intervention, you can then utilize this model for counterfactual reasoning, which is qualitatively different from (1) inferring a cause from observational data alone and (2) performing an intervention.

Historically, observation and association have been a proxy for causation. We can’t say that A causes B, but if event B follows A frequently enough, we learn to act as if A causes B. That’s “good old common sense,” which (as Horace Rumpole often complains) is frequently wrong.

If we want to talk seriously about causality as opposed to correlation, how do we do it? For example, how do we determine whether a treatment for a disease is effective or not? How do we deal with confounding factors (events that can cause both A and B, making A appear to cause B)? Enter randomized control trials (RCTs).

RCTs and Intervention

The RCT has been called the “gold standard” for assessing the effectiveness of interventions. Mastering ‘Metrics (p. 3ff.) has an extended discussion of the National Health Interview Survey (NHIS), an annual study of health in the US. The authors use this to investigate whether health insurance causes better health. There are many confounding factors: we intuitively expect people with health insurance to be more affluent and to be able to afford seeing doctors; more affluent people have more leisure time to devote to exercise, and they can afford a better diet. There are also some counterintuitive factors at play: at least statistically, people who have less money to spend on health care can appear more healthy, because their diseases aren’t diagnosed. All of these factors (and many others) influence their health, and make it difficult to answer the question “does insurance cause better health?”

In an ideal world, we’d be able to see what happens to individuals both when they have insurance and when they don’t, but this would require at least two worlds. The best we can do is to give some people insurance and some not, while attempting to hold all other things equal. This concept, known as ceteris paribus, is fundamental to how we think about causality and RCTs.

Ceteris paribus, or “all other things equal”

The key idea here is “all other things equal”: can we hold as many variables as possible constant so that we can clearly see the relationship between the treatment (insurance) and the effect (outcome)? Can we see a difference between the treatment group and the control (uninsured) group?

In an RCT, researchers pick a broad enough group of participants so that they can expect randomness to “cancel out” all the confounding factors—both those they know about and those they don’t. Random sampling is tricky, with many pitfalls; it’s easy to introduce bias in the process of selecting the sample groups. Essentially, we want a sample that is representative of the population of interest. It’s a good idea to look at the treatment and control groups to check for balance. For the insurance study, this means we would want the treatment and control groups to have roughly the same average income; we might want to subdivide each group into different subgroups for analysis. We have to be very careful about gathering data: for example, “random sampling” in the parking lot of Neiman-Marcus is much different from random sampling in front of Walmart. There are many ways that bias can creep into the sampling process.

Difference between means

To establish causality, we really want to know what the health outcomes (outcome) would be for person X if they had insurance (treatment) and if they didn’t (control). Because this is impossible (at least simultaneously), the next best thing would be to take two different people that are exactly the same, except that one has insurance and the other doesn’t. The challenge here is that the outcome, in either case, could be a result of random fluctuation, so may not be indicative of the insured (or uninsured population) as a whole. For this reason, we do an experiment with a larger population and look at the statistics of outcomes.

To see if the treatment has an effect, we look at the average outcome in the treatment and control groups (also called group means): in this case, the insured and uninsured. We could use individuals’ assessment of their health, medical records (if we have access), or some other metric.

We compare the groups by looking at the difference between the averages. These averages and groups are comparable due to the law of large numbers (LLN), which states that the average of the sample will get closer and closer to the population average, as we take more samples.

Even when drawing the samples from the same population, there will always be a difference between the means (unless by some fluke they’re exactly the same), due to sampling error: the sample mean is a sample statistic. So, the question becomes, How confident are we that the observed difference is real? This is the realm of statistical significance.

Statistical significance, practical significance, and sample sizes

The basic idea behind statistical significance is asking the question “were there no actual difference between the control and treatment groups, what is the probability of seeing a difference between the means equally or more extreme than the one observed?” This is the infamous p-value of the hypothesis test.2 In this case, we’re using the Student’s t test, but it’s worth mentioning that there are a panoply of tools to analyze RCT data, such as ANCOVA (analysis of covariance), HTE (heterogeneity of treatment effects) analysis, and regression (the last of which we’ll get to).

To answer this question, we need to look at not only the means, but also the standard error of the mean (SEM) of the control and treatment, which is a measure of uncertainty of the mean: if, for example, the difference between the means is significantly less than the SEM, then we cannot be very confident that the difference in means is a real difference.3 To this end, we quantify the difference in terms of standard errors of the populations. It is standard to say that the result is statistically significant if the p-value is less than 0.05. The number 0.05 is only a convention used in research, but the higher the p-value, the greater the chance that your results are misleading you.

In Figure 2, the two curves could represent the sampling distributions of the means of the treatment and the control groups. On the left and the right, the means (a1 and a2) are the same, as is the distance (d) between them. The big difference is the standard error of the mean (SEM). On the left, the SEM is small and the difference will likely be statistically significant. When the SEM is large, as it is on the right, there’s much more overlap between the two curves, and the difference is more likely to be a result of the sampling process, in which case you’re less likely to find statistical significance.

Figure 2. The only difference between the two graphs is the standard error, resulting in a statistically significant difference on the left and not on the right.

Statistical testing is often misused and abused, most famously in the form of p-hacking, which has had a nontrivial impact on the reproducibility crisis in science. p-hacking consists of a collection of techniques that allow researchers to get statistically significant results by cheating, one example of which is peeking. This is when you watch the p-value as data comes in and decide to stop the experiment once you get a statistically significant result. The larger the sample, the smaller the standard error and the smaller the p-value, and this should be considered when designing your experiment. Power analysis is a common technique to determine the minimum sample size necessary to get a statistically significant result, under the assumption that the treatment effect has a certain size. The importance of robust experimental design in randomized control trials cannot be overstated. Although it’s outside the scope of this report, check out “Randomized Controlled Trials—A Matter of Design” (Spieth et al.), Trustworthy Online Controlled Experiments (Kohavi et al.), and Emily Robinson’s “Guidelines for A/B Testing” for detailed discussions.

It is important to note that statistical significance is not necessarily practical significance or business value! Let’s say that you’re calculating the impact of a landing page change on customer conversion rates: you could find that you have a statistically significant increase in conversion, but the actual increase is so small as to be inconsequential to business or, even worse, that the cost of the change exceeds the return on investment. Also note that a result that is not statistically significant is not necessarily negative. For example, if the impact of a landing page change on conversion is not significant, it doesn’t imply that you should not ship the change. Businesses often decide to ship if the conversion rate doesn’t decrease (with statistical significance).

Check for balance

All of the above rests on the principle of ceteris paribus: all other things equal. We need to check that this principle actually holds in our samples. In practice, this is called checking for balance: ensure that your control and treatment groups have roughly the same characteristics with respect to known confounding factors. For example, in the insurance study, we would make sure that there are equal numbers of participants in each income range, along with equal numbers of exercisers and nonexercisers among the study’s participants. This is a standard and well-studied practice. Note that this assumes that you can enumerate all the confounding factors that are important. Also note that there are nuanced discussions on how helpful checking for balance actually is, in practice, such as “Mostly Harmless Randomization Checking”“Does the ‘Table 1 Fallacy’ Apply if It Is Table S1 Instead?”, and “Silly Significance Tests: Balance Tests”. Having said that, it is important to know about the idea of checking for balance, particularly to get data scientists keeping front of mind the principle of “all other things equal.”

But what if we can’t do an experiment or trial, because of high costs, the data already having been collected, ethical concerns, or some other reason? All is not lost. We can try to control for other factors. For example, if we are unable to run a vaccine trial, we could (1) sample the populations of those who did and did not get vaccinated, (2) identify potentially confounding factors (for example, if one group has a higher proportion of people living in urban areas), and (3) correct for these.

In this process, we’re attempting to climb Pearl’s ladder of causality: we have only correlational data but want to make a causal statement about what would happen if we intervene! What would happen if uninsured people were insured? What would happen if unvaccinated people were vaccinated? That’s the highest (counterfactual) rung of Pearl’s ladder. It is important to note that the following techniques are not only useful when you cannot run an experiment but this is a useful way to introduce and motivate them.

The Constant-Effects Model, Selection Bias, and Control for Other Factors

What if all things aren’t equal across our groups? There are many evolving tools for dealing with this problem. Here, we’ll cover the most basic, the constant-effects model. This makes a (potentially strong) assumption, known as the constant-effects assumption, that the intervention has the same causal effect across the population. Looking back at the insurance example, the constant effects model asks us to assume that insurance (the treatment) has the same effect across all subgroups. If this is true, then we would expect that:

difference in group means = average causal effect + selection bias

where the selection bias term is the difference in the outcome of both groups had they both been uninsured. As Angrist and Pischke point out in Mastering ‘Metrics (p. 11),

The insured in the NHIS are healthier for all sorts of reasons, including, perhaps, the causal effects of insurance. But the insured are also healthier because they are more educated, among other things. To see why this matters, imagine a world in which the causal effect of insurance is zero…. Even in such a world, we should expect insured NHIS respondents to be healthier, simply because they are more educated, richer, and so on.

The selection bias term is precisely due to the issue of confounding variables, or confounders. One tool to deal with the potential impact of confounders and the (sample) selection bias outlined here is regression.

Making Other Things Equal with Regression

Regression is a tool to deal with the potential impact of other factors and the (sample) selection bias outlined previously. Many who have worked a lot with regression remark how surprised they are at the robustness and performance of these modeling techniques relative to fancier machine learning methods.

The basic idea is to identify potential confounders and compare subgroups of control and treatment groups that have similar ranges for these confounders. For example, in the NHIS insurance example, you could identify subgroups of insured and not insured that have similar levels of education and wealth (among other factors), compute the causal effects for each of these sets of subgroups, and use regression to generalize the results to the entire population.

We are interested in the outcome as a function of the treatment variable, while holding control variables fixed (these are the variables we’ve identified that could also impact the outcome: we want to compare apples to apples, essentially).

The specific equation of interest, in the case of a single control variable, is:

Here, Y is the outcome variable (the subscript i refers to whether they had the treatment or not: 1 if they did, 0 if they did not, by convention), P the treatment variable, A the control variable, e the error term. The regression coefficients/parameters are a, the intercept; b, the causal effect of the treatment on the outcome; and c, the causal effect of the control variable on the outcome.

Again, thinking of the NHIS study, there may be many other control variables in addition to education and wealth: age, gender, ethnicity, prior medical history, and more. (The actual study took all of these into account.) That is the nature of the game: you’re trying to discover the influence of one effect in a many-dimensional world. In real-world trials, many factors influence the outcome, and it’s not possible to enumerate all of them.

A note on generative models

Although generative modeling is outside the scope of this report, it is worth saying a few words about. Loosely speaking, a generative model is essentially a model that specifies the data-generating process (the technical definition is: it models the joint probability P(X, Y) of features X and outcome variable Y, in contrast to discriminative models that model the conditional probability P(Y|X) of the outcome, conditional on the features). Often the statistical model (such as the previous linear equation) will be simpler than the generative model and still obtain accurate estimates of the causal effect of interest, but (1) this isn’t always the case and (2) getting into the habit of thinking how your data was generated, simulating data based on this generative model, and checking whether your statistical model can recover the (known) causal effects, is an indispensable tool in the data scientist’s toolkit.

Consider the case in which we have a true model telling us how the data came to be:

In this generative model, G is the causal effect of Ti on YiB is the causal effect of Xi on Yi, and ei is the effect of “everything else,” which could be purely random. If Xi and Ti are not correlated, we will obtain consistent estimates of G by fitting a linear model:

However, if Ti and Xi are correlated, we have to control for Xi in the regression, by estimating:

As previously stated, we have recovered the statistical model we started out with, but now have the added benefit of also having a generative model that allows us to simulate our model, in accordance with the data-generating process.

Omitted Variable Bias

Regression requires us to know what the important variables are; your regression is only as good as your knowledge of the system! When you omit important variables for whatever reason, your causal model and inferences will be biased. This type of bias is known as omitted variable bias (OVB). In Mastering ‘Metrics (p. 69), we find:

Regression is a way to make other things equal, but equality is generated only for variables included as controls on the right-hand side of the model. Failure to include enough controls or the right controls still leaves us with selection bias. The regression version of the selection bias generated by inadequate controls is called omitted variables bias (OVB), and it’s one of the most important ideas in the metrics canon.

It’s important to reason carefully about OVB, and it’s nontrivial to do so! One way to do this is performing a sensitivity analysis with respect to our controls, that is, to check out how sensitive the results are to the list of variables. If the changes in the variables you know about have a big effect on the results, you have reason to suspect that results might be equally sensitive to the variables you don’t know about. The less sensitive, or more robust, the regression is, the more confident we can be in the results. We highly recommend the discussion of OVB in Chapter 2 of Mastering ‘Metrics if you want to learn more.

Before moving on to discuss the power of instrumental variables, we want to remind you that there are many interesting and useful techniques that we are not able to cover in this report. One such technique is regression discontinuity design(RDD) which has gained increasing popularity over recent years and, among other things, has the benefit of having visually testable assumptions (continuity of all X aside from treatment assignment around the discontinuity). For more information, check out Chapter 6 of Cunningham’s Causal Inference and “Regression Discontinuity Design in Economics”, a paper by Lee and Lemieux.

Instrumental Variables

There are situations in which regression won’t work; for example, when an explanatory variable is correlated with the error term. To deal with such situations, we’re going to add instrumental variables to our causal toolkit.

To do so, we’ll consider the example of the cholera epidemic that swept through England in the 1850s. At the time, it was generally accepted that cholera was caused by a vaporous exhalation of unhealthy air (miasma) and poverty, which was reinforced by the observation that cholera seemed more widespread in poorer neighborhoods. (If you’re familiar with Victorian literature, you’ve read about doctors prescribing vacations at the seaside so the patient can breathe healthy air.) The physician John Snow became convinced that the miasma theory was pseudoscience and that people were contracting cholera from the water supply.

To keep track of the different potential causal relationships, we will introduce causal graphs, a key technique that more data scientists need to know about. We start with the proposed causal relationship between miasma and cholera. To draw this as a graph, we have a node for miasma, a node for cholera, and an arrow from miasma to cholera, denoting a causal relationship (Figure 3).

Figure 3. A causal graph showing the hypothetical relationship between miasma and cholera.

The arrow has an associated path coefficient, which describes the strength of the proposed causal effect. Snow’s proposed causal relationship from water purity to cholera introduces another node and edge (Figure 4).

Figure 4. Adding water purity (P), another hypothetical cause for cholera.

However, the miasma theory stated that miasma could be working through the water supply. Therefore, we need to include an arrow from miasma to water purity (Figure 5).

Figure 5. Adding an arrow to show that miasma (M) could influence water purity (P).

We’re running up against the challenge of a potential confounder again! Even if we could find a correlation between water purity and cholera cases, it still may be a result of miasma. And we’re unable to measure miasma directly, so we’re not able to control for it! So how to disprove this theory and/or determine the causal relationship between water purity and cholera?

Enter the instrumental variable. Snow had noticed that most of the water supply came from two companies, the Southwark and Vauxhall Waterworks Company, which drew its water downstream from London’s sewers, and the Lambeth Waterworks Company, which drew its water upstream. This adds another node water company to our causal graph, along with an arrow from water company to water purity (Figure 6).

Figure 6. Adding the water supply (W), which affects purity, and is not affected by miasma.

Water company (W) is an instrumental variable; it’s a way to vary the water purity (P) in a way that’s independent of miasma (M). Now that we’ve finished the causal graph, notice which arrows are not present:

  • There are no arrows between water company and miasma. Miasma can’t cause a water company to exist, and vice versa.
  • There is no direct arrow from water company to cholera, as the only causal effect that water company could have on cholera is as a result of its effect on water purity.
  • There are no other arrows (potential confounders) that point into water company and cholera. Any correlation must be causal.

Each arrow has an associated path coefficient, which describes the strength of the relevant proposed causal effect. Because W and P are unconfounded, the causal effect cWP of W on P can be estimated from their correlation coefficient rWP. As W and C are also unconfounded, the causal effect cWC of W on C can also be estimated from the relevant correlation coefficient rWC. Causal effects along paths are multiplicative, meaning that cWC = cWPcPC. This tells us that the causal effect of interest, cPC, can be expressed as the ratio cWC /cWP = rWC /rWP. This is amazing! Using the instrumental variable W, we have found the causal effect of P on C without being able to measure the confounder M. Generally, any variable possessing the following characteristics of W is an instrumental variable and can be used in this manner:

  • There is no arrow between W and M (they are independent).
  • There is no direct arrow from W to C.
  • There is an arrow from W to P.

All of this is eminently more approachable and manageable when framed in the language of graphs. For this reason, in the next section, we’ll focus on how causal graphs can help us think through causality and causal effects and perform causal inference.

To be explicit, there has been something of a two cultures problem in the world of causality: those that use econometrics methods (such as those in Mastering ‘Metrics) and those that use causal graphs. It is plausible that the lack of significant crosspollination between these communities is one of the reasons causal inference is not more mature and widespread as a discipline (although proving this causal claim would be tough!). There are few resources that deal well with both worlds of causality, but Cunningham’s Causal Inference: The Mixtape is one that admirably attempts to do so.

Causal Graphs

Randomized control trials are designed to tell us whether an action, X, can cause an outcome, Y. We can represent that with the simplest of all causal graphs (Figure 7). But in the real world, causality is never that simple. In the real world, there are also confounding factors that need to be accounted for. We’ve seen that RCTs can account for some of these confounding factors. But we need better tools to understand confounding factors and how they influence our results. That’s where causal graphs are a big help.

Figure 7. A simple causal graph: X causes Y. Forks and confounders

In the causal diagram in Figure 8, a variable Y has a causal effect on two variables X and Z, which means that X and Z will be correlated, even if there’s no causal relation between X and Z themselves! We call this a fork. If we want to investigate the causal relationship between X and Z, we have to deal with the presence of the confounder, Y. As we’ve seen, RCTs are a good way to deal with potential confounders.

Figure 8. Age influences the ability to walk and the death rate. This is a fork. Does walking influence the death rate?

As an example, a 1998 New England Journal of Medicine paper identified a correlation between regular walking and reduced death rates among retired men. It was an observational study so the authors had to consider confounders. For example, you could imagine that age could be a confounder: health decays as you get older, and decaying health makes you less likely to walk regularly. When the study’s authors took this into account, though, they still saw an effect. Furthermore, that effect remained even after accounting for other confounding factors.


The causal diagram in Figure 9 is a collider. Colliders occur whenever two phenomena have a common effect, such as a disease X, a risk factor Y, and whether the person is an inpatient or not. When you condition on the downstream variable Y (in hospital or not), you will see a spurious negative correlation between X and Y. While this seems strange, reasoning through this situation explains the negative correlation: an inpatient without the risk factor is more likely to have the disease than a general member of the population, as they’re in hospital! This type of bias is also known as Berkson’s paradox.

Figure 9. A disease like COVID can lead to hospitalization. Other health factors can also lead to hospitalization. This is a collider.

To think about this concretely, imagine one group of patients with COVID, and another with appendicitis. Both can cause hospital admissions, and there’s no plausible (at least as far as we know) connection between COVID and appendicitis. However, a hospital patient who does not have appendicitis is more likely to have COVID than a member of the general public; after all, that patient is in the hospital for something, and it isn’t appendicitis! Therefore, when you collect the data and work the statistics out, there will be a negative correlation between hospitalization from COVID and appendicitis: that is, it will look like appendicitis prevents severe COVID, or vice versa; the arrow of correlation points both ways. It’s always risky to say “we just know that can’t be true.” But in the absence of very compelling evidence, we are justified in being very suspicious of any connection between COVID and a completely unrelated medical condition.

RCTs often condition on colliders—but as we’ve seen, conditioning on a collider introduces a false (negative) correlation, precisely what you want to avoid. In the absence of other causal possibilities, the collider itself is evidence that X and Y are not causally related.

The flow of information

Causal graphs allow us to reason about the flow of information. Take, for example, the causal chain X → Y → Z. In this chain, information about X gives us information about Y, which in turn provides information about Z. However, if we control for Y (by choosing, for example, a particular value of Y), information about X then provides no new information about Z.

Similarly, in the fork X ← Y → Z, where X = walking, Y = age, Z = death rate, information about walking gives us information about death rate (as there is correlation, but not causation). However, when controlling for the confounder age, no information flows from walking to death rate (that is, there is no correlation when holding age constant).

In the collider X → Y ← Z, where X = disease, Y = in hospital, Z = risk factor, the situation is reversed! Information does not flow from X to Z until we control for Y. And controlling for Y introduces a spurious correlation that can cause us to misunderstand the causal relationships.

If no information flows from X → Y through Z, we say that Z blocks X → Y, and this will be important when thinking more generally about information flow through causal graphs, as we’ll now see.

In practice: The back-door adjustment

At this point, we have methods for deciding which events might be confounders (forks), and which events look like confounders but aren’t (colliders). So, the next step is determining how to deal with the true confounders. We can do this through the back-door and front-door adjustments, which let us remove the effect of confounders from an experiment.

We’re interested in whether there’s a causal relationship between X and an outcome Y, in the presence of a potential confounder Z: look at Figure 10.

Figure 10. The back-door adjustment: is Z a confounder?

If there is a causal effect, though, and the back-door criterion (which we define later) is satisfied, we can solve for the causal relationship in question. Given X → Y, a collection of variables Z satisfies the back-door criterion if:

  1. No node in Z is a descendant of X.
  2. Any path between X and Y that begins with an arrow into X (known as a back-door path) is blocked by Z.

Controlling for Z essentially then blocks all noncausal paths between X and Y while not blocking any causal paths. So how does the adjustment work?

Here, we’ll consider the simplified case, in which Z contains a single variable. We could compute the correlation between X and Y for different values of the confounding factor Z, and weight them according to the probabilities of different values of Z. But there’s a simpler solution. Using linear regression to compute the line that best fits your X and Y data points is straightforward. In this situation, we take it a step further: we compute the best fit plane for X, Y, and Z. The math is essentially the same. The equation for this plane will be of the form:

The slope associated with X (m1) takes into account the effect of the confounder. It’s the average causal effect of X on Y. And, while we’ve only discussed a single confounder, this approach works just as well with multiple confounders.

In practice: The front-door adjustment

We still have to account for one important case. What if the confounding factor is either unobservable or hypothetical? How do you account for a factor that you can’t observe? Pearl discusses research into the connection between smoking and cancer, into which the tobacco companies inserted the idea of a “smoking gene” that would predispose people towards both smoking and cancer. This raises a problem: what happens if there’s a cause that can’t be observed? In the ’50s and ’60s, our understanding of genetics was limited; if there was a smoking gene, we certainly didn’t have the biotech to find it. There are plenty of cases where there are more plausible confounding factors, but detecting them is impossible, destructive, or unethical.

Pearl outlines a way to deal with these unknowable confounders that he calls the front-door adjustment (Figure 11). To investigate whether smoking S causes cancer C in the presence of an unknowable confounder G, we add another step in the causal graph between S and C. Discussing the smoking case, Pearl uses the presence of tar in the lungs. We’ll just call it T. We believe that T can’t be caused directly by the confounding factor G (though that’s a question worth thinking about). Then we can use the back-door correction to estimate the effect of T on C, with S coming through the back door. We can also estimate the causal effect of S on T as there is a collider at C. We can combine these to retrieve the causal effect of S on C.

Figure 11. The front-door adjustment: is G a confounder that can’t be measured?

This has been abstract, and the only real solution to the abstraction would be getting into the mathematics. For our purposes, though, it’s enough to note that it is possible to correct for hypothetical confounding factors that aren’t measurable and that might not exist. This is a real breakthrough. We can’t agree with Pearl’s claim that one causal graph would have replaced years of debate and testimony—politicians will be politicians, and lobbyists will be lobbyists. But it is very important to know that we have the tools.

One thing to note is that both the back-door and front-door adjustments require you to have the correct causal graph, containing all relevant confounding variables. This can often be challenging in practice and requires significant domain expertise.

The End of Correlation, the Beginning of Cause

Correlation is a powerful tool and will remain so. It’s a tool, not an end in itself. We need desperately to get beyond the idea that correlation is an adequate proxy for causality. Just think of all those people drowning because Nicolas Cage makes more films!

As “data science” became a buzzword, we got lazy: we thought that, if we could just gather enough data, correlation would be good enough. We can now store all the data we could conceivably want (a petabyte costs around $20,000 retail), and correlation still hasn’t gotten us what we want: the ability to understand cause and effect. But as we’ve seen, it is possible to go further. Medical research has been using RCTs for decades; causal graphs provide new tools and techniques for thinking about the relationships between possible causes. Epidemiologists like John Snow, the doctors who made the connection between smoking and cancer, and the many scientists who have made the causal connection between human activity and climate change, have all taken this path.

We have tools, and good ones, for investigating cause and weeding out the effects of confounders. It’s time to start using them.

  1. In practice, what is important is that all confounding variables are distributed across treatment and control.
  2. The p-value is not the probability that the hypothesis “there is no difference between the control and treatment groups” is true, as many think it is. Nor is it the probability of observing your data if the hypothesis is true, as many others think. In fact, the definition of p-value is so difficult to remember that “Not Even Scientists Can Easily Explain P-values”.
  3. Note that the standard error is not the same as the standard deviation of the data, but rather the standard deviation of the sampling distribution of the estimate of the mean.

A/B test

A randomized control trial in tech.

causal graph

A graphical model used to illustrate (potential) causal relationships between variables of interest.

ceteris paribus

The principle of “all other things being equal,” which is essential for randomized control trials.


A causal model in which two phenomena have a common effect, such as a disease X, a risk factor Y, and whether the person is an inpatient or not: X → Y ← Z.

confounding variable

A variable that influences both the dependent and independent variables.


The rung of the ladder of causation at which we can use causal models to reason about events that did not occur.


A causal model in which there is a confounding variable X ← Y → Z.

generative model

A generative model is essentially a model that specifies the data-generating process. The technical definition is that it models the joint probability P(X, Y) of features X and outcome variable Y, in contrast to discriminative models that model the conditional probability P(Y|X) of the outcome, conditional on the features).

instrumental variable

Given X → Y, an instrumental variable Z is a third variable used in regression analyses to account for unexpected relationships between other variables (such as one being correlated with the error term).


The rung of the ladder of causation at which we can perform experiments, most famously in the form of randomized control trials and A/B tests.

omitted variable bias

When failure to include enough controls or the right controls still leaves us with selection bias.


In a hypothesis test, the p-value is the probability of observing a test statistic at least as extreme as the one observed.

randomized control trial (RCT)

An experiment in which subjects are randomly assigned to one of several groups, in order to ascertain the impact in the outcome of differences in treatment.

standard error

The standard error of a statistic (for example, the mean) is the standard deviation of its sampling distribution. In other words, it’s a measure of uncertainty of the sample mean.


Key references are marked with an asterisk.

Anderson, Chris. “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”Wired (2008).

*Angrist, Joshua D., and Jörn-Steffen Pischke. Mastering ‘Metrics: The Path from Cause to Effect. Princeton University Press (2014).

Aschwanden, Christie. “Not Even Scientists Can Easily Explain P-values”. FiveThirtyEight (2015).

Bowne-Anderson, Hugo. “The Unreasonable Importance of Data Preparation”. O’Reilly (2020).

Clayton, Aubrey. “How Eugenics Shaped Statistics”Nautilus (2020).

Clayton, Aubrey. Bernoulli’s Fallacy. Columbia University Press (2021).

*Cunningham, Scott. Causal Inference: The Mixtape. Yale University Press (2021).

Eckles, Dean. “Does the ‘Table 1 Fallacy’ Apply if It Is Table S1 Instead?”. Blog (2021).

Google. “Background: What Is a Generative Model?”. (2021).

*Kelleher, Adam. “A Technical Primer on Causality”. Blog (2021).

Kohavi, Ron, et al. Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press (2020).

Lee, David S., and Thomas Lemieux. “Regression Discontinuity Designs in Economics”. Journal of Economic Literature (2010).

*Pearl, Judea, and Dana Mackenzie. The Book of Why. Basic Books (2018).

Wikipedia. “Berkson’s paradox”. Last modified December 9, 2021.

Wikipedia. “Regression discontinuity design”. Last modified June 14, 2021.

Robinson, Emily. “Guidelines for A/B Testing”. Hooked on Data (2018).

Simonite, Tom. “A Health Care Algorithm Offered Less Care to Black Patients”Wired (2019).

Spieth, Peter Markus, et al. “Randomized Controlled Trials—A Matter of Design”. NCBI (2016).


The authors would like to thank Sarah Catanzaro and James Savage for their valuable and critical feedback on drafts of this report along the way.

Categories: Technology

What’s ahead for AI, VR, NFTs, and more?

O'Reilly Radar - Tue, 2022/01/11 - 05:22

Every year starts with a round of predictions for the new year, most of which end up being wrong. But why fight against tradition? Here are my predictions for 2022.

The safest predictions are all around AI.

  • We’ll see more “AI as a service” (AIaaS) products. This trend started with the gigantic language model GPT-3. It’s so large that it really can’t be run without Azure-scale computing facilities, so Microsoft has made it available as a service, accessed via a web API. This may encourage the creation of more large-scale models; it might also drive a wedge between academic and industrial researchers. What does “reproducibility” mean if the model is so large that it’s impossible to reproduce experimental results?
  • Prompt engineering, a field dedicated to developing prompts for language generation systems, will become a new specialization. Prompt engineers answer questions like “What do you have to say to get a model like GPT-3 to produce the output you want?”
  • AI-assisted programming (for example, GitHub Copilot) has a long way to go, but it will make quick progress and soon become just another tool in the programmer’s toolbox. And it will change the way programmers think too: they’ll need to focus less on learning programming languages and syntax and more on understanding precisely the problem they have to solve.
  • GPT-3 clearly is not the end of the line. There are already language models bigger than GPT-3 (one in Chinese), and we’ll certainly see large models in other areas. We will also see research on smaller models that offer better performance, like Google’s RETRO.
  • Supply chains and business logistics will remain under stress. We’ll see new tools and platforms for dealing with supply chain and logistics issues, and they’ll likely make use of machine learning. We’ll also come to realize that, from the start, Amazon’s core competency has been logistics and supply chain management.
  • Just as we saw new professions and job classifications when the web appeared in the ’90s, we’ll see new professions and services appear as a result of AI—specifically, as a result of natural language processing. We don’t yet know what these new professions will look like or what new skills they’ll require. But they’ll almost certainly involve collaboration between humans and intelligent machines.
  • CIOs and CTOs will realize that any realistic cloud strategy is inherently a multi- or hybrid cloud strategy. Cloud adoption moves from the grassroots up, so by the time executives are discussing a “cloud strategy,” most organizations are already using two or more clouds. The important strategic question isn’t which cloud provider to pick; it’s how to use multiple providers effectively.
  • Biology is becoming like software. Inexpensive and fast genetic sequencing, together with computational techniques including AI, enabled Pfizer/BioNTech, Moderna, and others to develop effective mRNA vaccines for COVID-19 in astonishingly little time. In addition to creating vaccines that target new COVID variants, these technologies will enable developers to target diseases for which we don’t have vaccines, like AIDS.

Now for some slightly less safe predictions, involving the future of social media and cybersecurity.

  • Augmented and virtual reality aren’t new, but Mark Zuckerberg lit a fire under them by talking about the “metaverse,” changing Facebook’s name to Meta, and releasing a pair of smart glasses in collaboration with Ray-Ban. The key question is whether these companies can make AR glasses that work and don’t make you look like an alien. I don’t think they’ll succeed, but Apple is also working on VR/AR products. It’s much harder to bet against Apple’s ability to turn geeky technology into a fashion statement.
  • There’s also been talk from Meta, Microsoft, and others, about using virtual reality to help people who are working from home, which typically involves making meetings better. But they’re solving the wrong problem. Workers, whether at home or not, don’t want better meetings; they want fewer. If Microsoft can figure out how to use the metaverse to make meetings unnecessary, it’ll be onto something.
  • Will 2022 be the year that security finally gets the attention it deserves? Or will it be another year in which Russia uses the cybercrime industry to improve its foreign trade balance? Right now, things are looking better for the security industry: salaries are up, and employers are hiring. But time will tell.

And I’ll end a very unsafe prediction.

  • NFTs are currently all the rage, but they don’t fundamentally change anything. They really only provide a way for cryptocurrency millionaires to show off—conspicuous consumption at its most conspicuous. But they’re also programmable, and people haven’t yet taken advantage of this. Is it possible that there’s something fundamentally new on the horizon that can be built with NFTs? I haven’t seen it yet, but it could appear in 2022. And then we’ll all say, “Oh, that’s what NFTs were all about.”

Or it might not. The discussion of Web 2.0 versus Web3 misses a crucial point. Web 2.0 wasn’t about the creation of new applications; it was what was left after the dot-com bubble burst. All bubbles burst eventually. So what will be left after the cryptocurrency bubble bursts? Will there be new kinds of value, or just hot air? We don’t know, but we may find out in the coming year.

Categories: Technology
Subscribe to LuftHans aggregator