You are here

Feed aggregator

Formal Informal Languages

O'Reilly Radar - Tue, 2022/11/08 - 04:58

We’ve all been impressed by the generative art models: DALL-E, Imagen, Stable Diffusion, Midjourney, and now Facebook’s generative video model, Make-A-Video. They’re easy to use, and the results are impressive. They also raise some fascinating questions about programming languages. Prompt engineering, designing the prompts that drive these models, is likely to be a new specialty. There’s already a self-published book about prompt engineering for DALL-E, and an excellent tutorial about prompt engineering for Midjourney. Ultimately, what we’re doing when crafting a prompt is programming–but not the kind of programming we’re used to. The input is free form text, not a programming language as we know it. It’s natural language, or at least it’s supposed to be: there’s no formal grammar or syntax behind it.

Books, articles, and courses about prompt engineering are inevitably teaching a language, the language you need to know to talk to DALL-E. Right now, it’s an informal language, not a formal language with a specification in BNF or some other metalanguage. But as this segment of the AI industry develops, what will people expect? Will people expect prompts that worked with version 1.X of DALL-E to work with version 1.Y or 2.Z? If we compile a C program first with GCC and then with Clang, we don’t expect the same machine code, but we do expect the program to do the same thing. We have these expectations because C, Java, and other programming languages are precisely defined in documents ratified by a standards committee or some other body, and we expect departures from compatibility to be well documented. For that matter, if we write “Hello, World” in C, and again in Java, we expect those programs to do exactly the same thing. Likewise, prompt engineers might also expect a prompt that works for DALL-E to behave similarly with Stable Diffusion. Granted, they may be trained on different data and so have different elements in their visual vocabulary, but if we can get DALL-E to draw a Tarsier eating a Cobra in the style of Picasso, shouldn’t we expect the same prompt to do something similar with Stable Diffusion or Midjourney?

In effect, programs like DALL-E are defining something that looks somewhat like a formal programming language. The “formality” of that language doesn’t come from the problem itself, or from the software implementing that language–it’s a natural language model, not a formal language model. Formality derives from the expectations of users. The Midjourney article even talks about “keywords”–sounding like an early manual for programming in BASIC. I’m not arguing that there’s anything good or bad about this–values don’t come into it at all. Users inevitably develop ideas about how things “ought to” behave. And the developers of these tools, if they are to become more than academic playthings, will have to think about users’ expectations on issues like backward compatibility and cross-platform behavior.

That begs the question: what will the developers of programs like DALL-E and Stable Diffusion do? After all, they are already more than academic playthings: they are already used for business purposes (like designing logos), and we already see business models built around them. In addition to charges for using the models themselves, there are already startups selling prompt strings, a market that assumes that the behavior of prompts is consistent over time. Will the front end of image generators continue to be large language models, capable of parsing just about everything but delivering inconsistent results? (Is inconsistency even a problem for this domain? Once you’ve created a logo, will you ever need to use that prompt again?) Or will the developers of image generators look at the DALL-E Prompt Reference (currently hypothetical, but someone eventually will write it), and realize that they need to implement that specification? If the latter, how will they do it?  Will they develop a giant BNF grammar and use compiler-generation tools, leaving out the language model? Will they develop a natural language model that’s more constrained, that’s less formal than a formal computing language but more formal than *Semi-Huinty?1 Might they use a language model to understand words like Tarsier, Picasso, and eating, but treat phrases like “in the style of” more like keywords? The answer to this question will be important: it will be something we really haven’t seen in computing before.

Will the next stage in the development of generative software be the development of informal formal languages?

  1. *Semi-Huinty is a hypothetical hypothetical language somewhere in the Germanic language family. It exists only in a parody of historical linguistics that was posted on a bulletin board in a linguistics department.
Categories: Technology

Radar Trends to Watch: November 2022

O'Reilly Radar - Tue, 2022/11/01 - 04:15

Maintaining a separate category for AI is getting difficult. We’re seeing important articles about AI infiltrating security, programming, and almost everything else; even biology. That sounds like a minor point, but it’s important: AI is eating the world. What does it mean when an AI system can reconstruct what somebody wants to say from their brainwave? What does it mean when cultured brain cells can be configured to play Pong? They don’t play well, but it’s not long since that was a major achievement for AI.

Artificial Intelligence
  • Hugo Bowne-Anderson interviews Shreya Shankar about her ethnographic study of MLOps Practices. This is a must-listen! Shreya talks about pain points, good practices, and how the real world differs from what you’re taught in school.
  • Useful Sensors is a startup that produces sensors with AI built in.  Their first product is a PersonSensor, a $10 camera that detects faces and computes their location relative to the camera.
  • The Bias Buccaneers is a group of volunteers who are creating a competition for detecting bias in AI systems. Microsoft and Amazon, among others, are backing it. The practice of auditing AI systems, while it has had a slow start, is poised to grow as regulations covering AI gain traction.
  • Microsoft has released an open source toolkit for AI-based precision farming.
  • Facebook’s No Language Left Behind project has released an open source model (along with code and training data) that can translate between any of 200 languages.
  • The creators of StableDiffusion have announced HarmonyAI, a community for building AI tools for generating music. They have released an application called Dance Diffusion.
  • Researchers have developed a turtle-like robot that can both swim and walk on land by changing the shape of its legs. Applications may include monitoring aquatic ecosystems and underwater farming.
  • If you’re interested in writing AI software to generate code (and not just using Copilot), Evan Pu has begun a series of blog posts on program synthesis.
  • An AI application called Transkribus is capable of reading old handwriting. Anyone who has done archival research will know immediately what a big problem this is.
  • Transformers revolutionized natural language processing. CapitalOne is exploring the use of Transformers for tabular data, which could lead to a similar revolution in financial applications.
  • Google’s AudioLM uses large language model techniques to produce spoken audio and music. The prompts are audio clips, rather than texts, and the output sounds more natural than other audio synthesis software.
  • Can AI be used to develop new algorithms for problems humans think are well-understood, like matrix multiplication? Deep Mind’s AlphaTensor says yes. This result won’t get as much attention as generative art, but it is likely to be more important.
  • The White House has revealed its AI Bill of Rights. It’s an impressive document but, unlike similar efforts in Europe and elsewhere, says little about enforcement.
  • Tesla has demonstrated a prototype of Optimus, its humanoid home robot. Reactions are mixed; the demonstration was unimpressive, though they appear to have done years worth of R&D in a very short time. It’s also not clear that their robot is solving the right problems.
  • Not to be outdone by Make-A-Video and Phenaki (another text-to-video AI generator), Google has announced Imagen Video and DreamFusion.  DreamFusion generates 2D images that can be viewed from any angle. (Others have done something similar based on Stable Diffusion.)
  • An AI system can now reconstruct continuous language sequences from non-invasive recordings of brain activity.
  • A proposed law in the EU would allow people to sue AI companies for damages after being harmed by a result from an AI system.
  • A fascinating method for detecting audio deepfakes has achieved 99% accuracy. It models what a human vocal tract would have to do to produce the sounds. Most deep fakes require impossible configurations of the vocal tracts.
  • Clive Thompson’s thoughts on the future of programming are worth reading, particularly on the influence of tools like Copilot on “code-adjacent” programmers; that is, non-professionals who do limited programming as part of their jobs. For them, Copilot will be a superpower.
  • Another kind of automatic code generation: the OpenAPI Generator is a tool that automatically generates client libraries, stubs, and other code for APIs that are documented according to the OpenAPI Specification.
  • Contributions of code to open source software projects appear to be declining, possibly because of security concerns. If this hypothesis is correct, it is counterproductive. Open source is critical infrastructure, and critical infrastructure needs to be maintained.
  • wasmCloud is a platform for developing components with WebAssembly that can run anywhere, including in the cloud or on small networked devices. It includes the ability to form a mesh network that’s independent of where components run.
  • Matt Welsh predicts that the future of computing will not be about programming; it will be about training large models for specialized applications.
  • Another tool for deploying containers: nerdctl. How many is too many? We don’t know whether nerdctl is a winner, but it’s important to watch for lightweight alternatives to Kubernetes.
  • Terraform will be offering a no code option for cloud configuration.  This will simplify cloud deployment, making it possible for developers to deploy directly without assistance from an operations group.
  • The Cassandra database will support ACID transactions, taking advantage of a new consensus protocol named Accord.
  • More alternatives to the Electron framework: Last month, we mentioned Rust’s Tauri. Now there’s an Electron-like framework for Go, called Wails.
  • Steve Yegge has emerged from retirement to take a job as Head of Engineering at Sourcegraph, a company that’s making tools for searching, navigating, and understanding code across multiple repositories. We normally wouldn’t consider a “new hire” noteworthy, but everything Steve does is worth watching. Be on the lookout for some excellent tools.
  • Constellation is the first implementation of Confidential Kubernetes: a Kubernetes distribution designed for confidential computing. Confidential computing isn’t limited to individual nodes; Constellation authenticates all of the nodes in a Kubernetes cluster, and ensures that data is always encrypted, especially in transit.
  • A cryptocurrency mining operation named PurpleUrchin is using free resources offered by services like GitHub and Heroku. The security community suspects that their goal isn’t mining coins but executing a 51% attack against one of the smaller currencies.
  • A standard for passkeys that is supported by Google, Apple, and Microsoft, and that is easy to use, means that (at last, maybe) we can do away with passwords.
  • Model spinning is a new attack against language models that causes them to take a specific viewpoint on a subject in response to trigger words in the prompt–for example, taking a positive or negative viewpoint on a political figure. It could be used to generate interactive propaganda.
  • Random number generation is fundamental to many algorithms and games–particularly algorithms related to privacy and security. However, generating good random sequences is difficult. Flaws in devices like automatic card shufflers show how tricky the problem can be.
  • The platform engineering movement is gaining steam. Platform engineering requires treating the developer’s environment as a product, and developing that environment into a platform that’s easy to work in, and that makes it simple and safe for developers to push working code into production.
  • Aurora is a collaboration between the Chrome browser’s development team and developers of frameworks like React and Next.JS that intends to make the browser a better target for these frameworks.
  • Mitre’s D3FEND is a public knowledge graph of cybersecurity countermeasures. It is a counterpart to ATT&CK, a knowledge graph of tactics and techniques used by attackers.
  • Sternum is an observability and security platform designed for IoT devices based on Linux or RTOS. It is difficult to get information from devices once they’re in the field. Sternum performs anomaly detection, in addition to providing information about user-defined traces.
  • What can you trust in the software supply chain? Nothing, not even compilers; a new paper shows that compilers can be used to insert undetectable backdoors into models for machine learning. Trust nothing.
  • Downcoding is a new attack against common methods for anonymizing data. It was designed specifically as a challenge to privacy regulations: organizations that collect and share data have to do better to preserve privacy. Anonymization isn’t enough.
  • Apple’s AppStore now allows apps that purchase NFTs, or deliver services through NFTs. However, all payments must be made through in-app purchases.
  • The British artist Damien Hirst has started to burn the originals of artworks that he has sold as NFTs to “complete the transformation” of the work into the digital world. The artworks are burned in a public exhibition, making the burning itself a work of performance art.
  • Metatheft: A threat actor has injected dApps into cryptocurrency scam sites. These dApps divert funds sent to the scammer to the threat actor’s accounts, thus stealing directly from the scam’s victims. The scammers presumably have no interest in reporting these thefts to authorities.
  • A Korean research group has developed a platform for collaboration in the Metaverse. Fundamental ideas behind this platform are enabling people to work together in a virtual space; location recognition; minimizing latency between users; and gesture recognition.
  • It’s rumored that Apple’s VR headset will perform an iris scan when someone puts it on, to authenticate the user to apps.
  • Facebook/Meta figures out how to add legs to its Metaverse avatars. This was their “most requested feature.” Nobody seems impressed.
Biology Quantum Computing
  • Researchers have used a quantum computer to find solutions to the Fermi-Hubbard model, a problem that can’t be solved with classical computers. Unlike previous demonstrations of quantum advantage, which had no practical value, the Fermi-Hubbard model has important implications for battery and solar cell research.
Categories: Technology

What We Learned Auditing Sophisticated AI for Bias

O'Reilly Radar - Tue, 2022/10/18 - 04:14

A recently passed law in New York City requires audits for bias in AI-based hiring systems. And for good reason. AI systems fail frequently, and bias is often to blame. A recent sampling of headlines features sociological bias in generated images, a chatbot, and a virtual rapper. These examples of denigration and stereotyping are troubling and harmful, but what happens when the same types of systems are used in more sensitive applications? Leading scientific publications assert that algorithms used in healthcare in the U.S. diverted care away from millions of black people. The government of the Netherlands resigned in 2021 after an algorithmic system wrongly accused 20,000 families–disproportionately minorities–of tax fraud. Data can be wrong. Predictions can be wrong. System designs can be wrong. These errors can hurt people in very unfair ways.

When we use AI in security applications, the risks become even more direct. In security, bias isn’t just offensive and harmful. It’s a weakness that adversaries will exploit. What could happen if a deepfake detector works better on people who look like President Biden than on people who look like former President Obama? What if a named entity recognition (NER) system, based on a cutting-edge large language model (LLM), fails for Chinese, Cyrillic, or Arabic text? The answer is simple—bad things and legal liabilities.

As AI technologies are adopted more broadly in security and other high-risk applications, we’ll all need to know more about AI audit and risk management. This article introduces the basics of AI audit, through the lens of our practical experience at BNH.AI, a boutique law firm focused on AI risks, and shares some general lessons we’ve learned from auditing sophisticated deepfake detection and LLM systems.

What Are AI Audits and Assessments?

Audit of decision-making and algorithmic systems is a niche vertical, but not necessarily a new one. Audit has been an integral aspect of model risk management (MRM) in consumer finance for years, and colleagues at BLDS and QuantUniversity have been conducting model audits for some time. Then there’s the new cadre of AI audit firms like ORCAA, Parity, and babl, with BNH.AI being the only law firm of the bunch. AI audit firms tend to perform a mix of audits and assessments. Audits are usually more official, tracking adherence to some policy, regulation, or law, and tend to be conducted by independent third parties with varying degrees of limited interaction between auditor and auditee organizations. Assessments tend to be more informal and cooperative. AI audits and assessments may focus on bias issues or other serious risks including safety, data privacy harms, and security vulnerabilities.

While standards for AI audits are still immature, they do exist. For our audits, BNH.AI applies external authoritative standards from laws, regulations, and AI risk management frameworks. For example, we may audit anything from an organization’s adherence to the nascent New York City employment law, to obligations under Equal Employment Opportunity Commission regulations, to MRM guidelines, to fair lending regulations, or to NIST’s draft AI risk management framework (AI RMF).

From our perspective, regulatory frameworks like MRM present some of the clearest and most mature guidance for audit, which are critical for organizations looking to minimize their legal liabilities. The internal control questionnaire in the Office of the Comptroller of the Currency’s MRM Handbook (starting pg. 84) is an extraordinarily polished and complete audit checklist, and the Interagency Guidance on Model Risk Management (also known as SR 11-7) puts forward clear cut advice on audit and the governance structures that are necessary for effective AI risk management writ large. Given that MRM is likely too stuffy and resource-intensive for nonregulated entities to adopt fully today, we can also look to NIST’s draft AI Risk Management Framework and the risk management playbook for a more general AI audit standard. In particular, NIST’s SP1270 Towards a Standard for Identifying and Managing Bias in Artificial Intelligence, a resource associated with the draft AI RMF, is extremely useful in bias audits of newer and complex AI systems.1

For audit results to be recognized, audits have to be transparent and fair. Using a public, agreed-upon standard for audits is one way to enhance fairness and transparency in the audit process. But what about the auditors? They too must be held to some standard that ensures ethical practices. For instance, BNH.AI is held to the Washington, DC, Bar’s Rules of Professional Conduct. Of course, there are other emerging auditor standards, certifications, and principles. Understanding the ethical obligations of your auditors, as well as the existence (or not) of nondisclosure agreements or attorney-client privilege, is a key part of engaging with external auditors. You should also be considering the objective standards for the audit.

In terms of what your organization could expect from an AI audit, and for more information on audits and assessments, the recent paper Algorithmic Bias and Risk Assessments: Lessons from Practice is a great resource. If you’re thinking of a less formal internal assessment, the influential Closing the AI Accountability Gap puts forward a solid framework with worked documentation examples.

What Did We Learn From Auditing a Deepfake Detector and an LLM for Bias?

Being a law firm, BNH.AI is almost never allowed to discuss our work due to the fact that most of it is privileged and confidential. However, we’ve had the good fortune to work with IQT Labs over the past months, and they generously shared summaries of BNH.AI’s audits. One audit addressed potential bias in a deepfake detection system and the other considered bias in LLMs used for NER tasks. BNH.AI audited these systems for adherence to the AI Ethics Framework for the Intelligence Community. We also tend to use standards from US nondiscrimination law and the NIST SP1270 guidance to fill in any gaps around bias measurement or specific LLM concerns. Here’s a brief summary of what we learned to help you think through the basics of audit and risk management when your organization adopts complex AI.

Bias is about more than data and models

Most people involved with AI understand that unconscious biases and overt prejudices are recorded in digital data. When that data is used to train an AI system, that system can replicate our bad behavior with speed and scale. Unfortunately, that’s just one of many mechanisms by which bias sneaks into AI systems. By definition, new AI technology is less mature. Its operators have less experience and associated governance processes are less fleshed out. In these scenarios, bias has to be approached from a broad social and technical perspective. In addition to data and model problems, decisions in initial meetings, homogenous engineering perspectives, improper design choices, insufficient stakeholder engagement, misinterpretation of results, and other issues can all lead to biased system outcomes. If an audit or other AI risk management control focuses only on tech, it’s not effective.

If you’re struggling with the notion that social bias in AI arises from mechanisms besides data and models, consider the concrete example of screenout discrimination. This occurs when those with disabilities are unable to access an employment system, and they lose out on employment opportunities. For screenout, it may not matter if the system’s outcomes are perfectly balanced across demographic groups, when for example, someone can’t see the screen, be understood by voice recognition software, or struggles with typing. In this context, bias is often about system design and not about data or models. Moreover, screenout is a potentially serious legal liability. If you’re thinking that deepfakes, LLMs and other advanced AI wouldn’t be used in employment scenarios, sorry, that’s wrong too. Many organizations now perform fuzzy keyword matching and resume scanning based on LLMs. And several new startups are proposing deepfakes as a way to make foreign accents more understandable for customer service and other work interactions that could easily spillover to interviews.

Data labeling is a problem

When BNH.AI audited FakeFinder (the deepfake detector), we needed to know demographic information about people in deepfake videos to gauge performance and outcome differences across demographic groups. If plans are not made to collect that kind of information from the people in the videos beforehand, then a tremendous manual data labeling effort is required to generate this information. Race, gender, and other demographics are not straightforward to guess from videos. Worse, in deepfakes, bodies and faces can be from different demographic groups. Each face and body needs a label. For the LLM and NER task, BNH.AI’s audit plan required demographics associated with entities in raw text, and possibly text in multiple languages. While there are many interesting and useful benchmark datasets for testing bias in natural language processing, none provided these types of exhaustive demographic labels.

Quantitative measures of bias are often important for audits and risk management. If your organization wants to measure bias quantitatively, you’ll probably need to test data with demographic labels. The difficulties of attaining these labels should not be underestimated. As newer AI systems consume and generate ever-more complicated types of data, labeling data for training and testing is going to get more complicated too. Despite the possibilities for feedback loops and error propagation, we may end up needing AI to label data for other AI systems.

We’ve also observed organizations claiming that data privacy concerns prevent data collection that would enable bias testing. Generally, this is not a defensible position. If you’re using AI at scale for commercial purposes, consumers have a reasonable expectation that AI systems will protect their privacy and engage in fair business practices. While this balancing act may be extremely difficult, it’s usually possible. For example, large consumer finance organizations have been testing models for bias for years without direct access to demographic data. They often use a process called Bayesian-improved surname geocoding (BISG) that infers race from name and ZIP code to comply with nondiscrimination and data minimization obligations.

Despite flaws, start with simple metrics and clear thresholds

There are many mathematical definitions of bias. More are published all the time. More formulas and measurements are published because the existing definitions are always found to be flawed and simplistic. While new metrics tend to be more sophisticated, they’re often harder to explain and lack agreed-upon thresholds at which values become problematic. Starting an audit with complex risk measures that can’t be explained to stakeholders and without known thresholds can result in confusion, delay, and loss of stakeholder engagement.

As a first step in a bias audit, we recommend converting the AI outcome of interest to a binary or a single numeric outcome. Final decision outcomes are often binary, even if the learning mechanism driving the outcome is unsupervised, generative, or otherwise complex. With deepfake detection, a deepfake is detected or not. For NER, known entities are recognized or not. A binary or numeric outcome allows for the application of traditional measures of practical and statistical significance with clear thresholds.

These metrics focus on outcome differences across demographic groups. For example, comparing the rates at which different race groups are identified in deepfakes or the difference in mean raw output scores for men and women. As for formulas, they have names like standardized mean difference (SMD, Cohen’s d), the adverse impact ratio (AIR) and four-fifth’s rule threshold, and basic statistical hypothesis testing (e.g., t-, x2-, binomial z-, or Fisher’s exact tests). When traditional metrics are aligned to existing laws and regulations, this first pass helps address important legal questions and informs subsequent more sophisticated analyses.

What to Expect Next in AI Audit and Risk Management?

Many emerging municipal, state, federal, and international data privacy and AI laws are incorporating audits or related requirements. Authoritative standards and frameworks are also becoming more concrete. Regulators are taking notice of AI incidents, with the FTC “disgorging” three algorithms in three years. If today’s AI is as powerful as many claim, none of this should come as a surprise. Regulation and oversight is commonplace for other powerful technologies like aviation or nuclear power. If AI is truly the next big transformative technology, get used to audits and other risk management controls for AI systems.

  1. Disclaimer: I am a co-author of that document.
Categories: Technology

The Collaborative Metaverse

O'Reilly Radar - Wed, 2022/10/12 - 13:01

We want to congratulate Dylan Field on his startup Figma, which Adobe recently purchased for $20B. Dylan started his career with O’Reilly Media when he was in high school—not that long ago. With Figma, he’s made the big time.

It’s worth thinking about why Figma has been so successful, and why Adobe was willing to pay so much for it. Since the beginning, Figma has been about collaboration. Yes, it was a great design tool. Yes, it ran completely in the browser, no downloads and installation required. But more than anything else, Figma was a tool for collaboration. That was a goal from the beginning. Collaboration wasn’t an afterthought; it was baked in.

My thesis about the Metaverse is that it is, above all, about enabling collaboration. VR goggles and AR glasses? Fine, but the Metaverse will fail if it only works for those who want to wear a headset. Crypto? I strongly object to the idea that everything needs to be owned—and that every transaction needs to pay a tax to anonymous middlemen (whether they’re called miners or stakers). Finally, I think that Facebook/Meta, Microsoft, and others who say that the Metaverse is about “better meetings” are just plain headed in the wrong direction. I can tell you—anyone in this industry can tell you—that we don’t need better meetings, we need fewer meetings.

But we still need people working together, particularly as more and more of us are working remotely. So the real question facing us is: how do we minimize meetings, while enabling people to work together? Meetings are, after all, a tool for coordinating people, for transferring information in groups, for circulating ideas outside of one-to-one conversations. They’re a tool for collaboration. That’s precisely what tools like Figma are for: enabling designers to work together on a project conveniently, without conflicting with each other. They’re about demonstrating designs to managers and other stakeholders. They’re about brainstorming new ideas (with Figjam) with your team members. And they’re about doing all this without requiring people to get together in a conference room, in Zoom, or in any of the other conferencing services. The problem with those tools isn’t really the flat screen, the “Brady Bunch” design, or the absence of avatars; the problem is that you still have to interrupt some number of people and get them in the same (virtual) place at the same time, breaking whatever flow that they were in.

We don’t need better meetings; we need better tools for collaboration so that we don’t need as many meetings. That’s what the Metaverse means for businesses. Tools like GitHub and Google’s Colab are really about collaboration, as are Google Docs and Microsoft Office 365. The Metaverse is strongly associated with gaming, and if you look at games like Overwatch and Fortnite, and you’ll see that those games are really about collaboration between online players. That’s what makes these games fun. I’ve got nothing against VR goggles, but what makes the experience special is the interaction with other players in real time. You don’t need goggles for that.

Collaboration made Figma worth $20B. It’s one of the first “enterprise Metaverse” applications. It certainly won’t be the last. Congratulations again to the team at Figma, and to our alumnus Dylan. And congratulations to Adobe, for realizing Figma’s importance.

Categories: Technology

What Is Hyperautomation?

O'Reilly Radar - Tue, 2022/10/11 - 03:59

Gartner has anointed “Hyperautomation” one of the top 10 trends for 2022. Should it be? Is it a real trend, or just a collection of buzzwords? As a trend, it’s not performing well on Google; it shows little long-term growth, if any, and gets nowhere near as many searches as terms like “Observability” and “Generative Adversarial Networks.” And it’s never bubbled up far enough into our consciousness to make it into our monthly Trends piece. As a trend, we’re openly skeptical about Hyperautomation.

However, that skeptical conclusion is too simplistic. Hyperautomation may just be another ploy in the game of buzzword bingo, but we need to look behind the game to discover what’s important. There seems to be broad agreement that hyperautomation is the combination of Robotic Process Automation with AI. Natural language generation and natural language understanding are frequently mentioned, too, but they’re subsumed under AI. So is optical character recognition (OCR)–something that’s old hat now, but is one of the first successful applications of AI. Using AI to discover tasks that can be automated also comes up frequently. While we don’t find the multiplication of buzzwords endearing, it’s hard to argue that adding AI to anything is uninteresting–and specifically adding AI to automation.

It’s also hard to argue against the idea that we’ll see more automation in the future than we see now.  We’ll see it in the processing of the thousands of documents businesses handle every day. We’ll see it in customer service. We’ll see it in compliance. We’ll see it in healthcare. We’ll see it in banking. Several years ago, the “Automate all the things!” meme originated in IT’s transformation from manual system administration to automated configuration management and software deployment. That may be the first instance of what’s now been christened Hyperautomation. We can certainly apply the slogan to many, if not all, clerical tasks–and even to the automation process itself. “Automate all the things” is itself a thing. And yes, the meme was always partially ironic–so we should be on the lookout for promises that are easily made but hard to keep. Some tasks should not be automated; some tasks could be automated, but the company has insufficient data to do a good job; some tasks can be automated easily, but would benefit from being redesigned first.

So we’re skeptical about the term Hyperautomation, but we’re not skeptical about the desire to automate. A new buzzword may put automation on executives’ radar–or it may be little more than a technique for rebranding older products. The difference is focusing on your business needs, rather than the sales pitch. Automating routine office tasks is an important and worthwhile project–and redesigning routine tasks so that they can be integrated into a larger workflow that can be automated more effectively is even more important. Setting aside the buzzword, we can start by asking what a successful automation project requires. In the long run, the buzzword is unimportant; getting the job done is what matters.

Automating Office Processes

It’s easy to observe that in most companies, there are many processes that can be automated but aren’t. Processing invoices, managing inventory, customer service, handling loan applications, taking orders, billing customers: these are all processes that are largely routine and open to automation. At some companies, these tasks are already automated, at least in part. But I don’t want to trivialize the thinking that goes into automating a process. What’s required?

Office staff usually perform tasks like invoice processing by filling in a web form. Automating this process is simple. Selenium, the first tool for automated browser testing (2004), could be programmed to find fields on a web page, click on them or insert text, click “submit,” scrape the resulting web page, and collect results. Robotic process automation (RPA) has a fancier name, but that’s really all it is. This kind of automation predates modern AI. It’s purely rules-based: click here, add a name there, use some fairly simple logic to fill in the other fields, and click submit. It’s possible to augment this basic process with OCR so the application can find data on paper forms, or to use natural language processing to gather information through a chat server. But the core of the process is simple, and hasn’t changed much since the early days of web testing. We could see it as an example of 1980s-style “expert systems,” based on deterministic business rules.

That simple scenario doesn’t hold up for more complex tasks. Consider an application for filling a prescription at a pharmacy. That application has to:

  • look up when the prescription was last filled
  • look up patient data to see whether there are any refills left
  • look up the prescriber and generate a message, if there are no refills left
  • look up the patient’s other medications to determine whether there are any drug interactions
  • look up regulations about restricted substances, in which case other rules apply (for example, requiring ID when the patient picks up the medication)
  • look up the pharmacy’s stock to see whether the medication is in stock (and order it if it isn’t)
  • look up the patient’s insurance to generate charges for the insurance company 
  • look up the patient’s credit card information to generate a charge for the co-pay

There are probably even more steps (I am not a pharmacist) and variations: new prescriptions, expired prescriptions, uninsured patients, expired credit cards, and no doubt many more corner cases. None of these steps is particularly difficult by itself, and each could be viewed as a separate task for automation, giving you a web of interconnected tasks–more complex, but not necessarily a bad result. However, one thing should be obvious: to fill a prescription, you need to access many different kinds of data, in many different databases. Some of these data sources will be owned by the pharmacy; others aren’t. Most are subject to privacy regulations. They are all likely to exist in some kind of silo that’s difficult to access from the outside the group that created the silo–and the reason for that difficulty may be political as well as technological. So from the start, we have a data integration problem compounded with a compliance problem. Data integration and regulatory compliance are particularly tough in healthcare and medicine, but don’t kid yourself: if you’re working with data, you will face integration problems, and if you’re working with personal data, you need to think about compliance. An AI project that doesn’t address data integration and governance (including compliance) is bound to fail, regardless of how good your AI technology might be. Buzzword or not, Hyperautomation is doing a service if it focuses attention on these issues.

Data integration problems aren’t pretty; they’re boring, uninteresting, the “killing field of any modeling project,” as Lorien Pratt has said. So we really can’t talk about automating any significant task without seeing it as a non-trivial data integration project: matching IDs, reconciling slightly different definitions of database columns, de-duping, named entity recognition, all of that fun stuff. Some of these tasks have been automated, but many aren’t. Andrew Ng, Christopher Ré, and others have pointed out that in the past decade, we’ve made a lot of progress with algorithms and hardware for running AI. Our current set of AI algorithms are good enough, as is our hardware; the hard problems are all about data. That’s the cutting edge for AI research: automating ways to find quality data, clean it, label it, and merge it with data from other sources. While that research is only starting to filter into practice, and much remains to be done, “automating all the things” will require confronting data problems from the beginning.

Another sad reality is that a company’s data is less rich than they’d like to think. We don’t need to look any further than O’Reilly for an example. Like any online company, we have good visibility into what happens on the O’Reilly Learning Platform. We can see what books and courses our customers are using, and for how long. We know if customers only read the first chapter of some book, and can think about what how to improve it. The data available to our retail business is much more limited. We know we’ve sold X books to Amazon, and Y books to wholesalers, but we never know anything about the customers who buy those books, when they buy them, or even if they buy them. Books can sit on shelves or in warehouses for a long time before they come back as returns. The online business is information-rich; the retail business is information-poor. Most real-world business lie somewhere between those extremes.

That’s the bad news. The good news is that we’re talking about building something exciting. We’re talking about applications that use APIs to pull data from many different sources, and deliver better results than humans can. We’re talking about applications that integrate all of those sources into a single course of action, and can do so seamlessly. There are resonances between this and what, in other application domains, is being called a “metaverse.” While we’re skeptical about how the term “Hyperautomation” has been used, we also wonder: is Hyperautomation, considered properly, the business version of the metaverse? One component of a business metaverse would certainly be seamless access to data wherever it resides; the metaverse would be populated by bots that automate routine tasks. Hold that thought; we’ll return to it.

Making Good Business Decisions

Finding processes to automate is called process discovery. We have to be careful about process discovery because automating the wrong processes, or automating them in inappropriate ways, wastes resources at best; at worst, it can make a business uncompetitive. There are products that use AI to discover which processes can be automated, but in real life, process discovery will rely heavily on people: your knowledge of the business, the knowledge of subject matter experts, and the knowledge of staff members who are actually doing the work, and whose input is often ignored.  I’m reminded of a friend who was hired to build a new application to check in patients at a doctor’s office. The receptionists hated the old app. No one knew why, until my friend insisted on sitting down at the receptionist’s desk. Then it was painfully obvious why the staff hated the old application–and the problem was easy to correct.

Over the past decade, one problem with data science and its successors has been the assumption that all you need is data, and lots of it; analyzing that data will lead you to new products, new processes, new strategies: just follow the data and let it transform your business. But we also know that most AI projects fail, just as most IT projects fail. If you don’t want your projects to be among the failures, you can’t make naive assumptions about what data can do. All businesses like “up and to the right,” and data is good at revealing trends that look “up and to the right.” However, growth always ends: nothing grows exponentially forever, not even Facebook and Google. You’ll eventually run out of potential new customers, raw material, credit at the bank–something will get in the way. The historical trends revealed by data will eventually end. Data isn’t very good at telling you where the growth curve will flatten out, and for an executive, that’s probably the most important information. What will cause those trends to end, and what strategies will the business need to adopt? It is difficult to answer that kind of question with nothing but data.

Lorien Pratt outlines a four-step process for using data effectively to make business decisions:

  • Understand the business outcomes that you want to achieve.
  • Understand the actions that you can take in your current business situation.
  • Map out the paths between actions and outcomes. If you take some action, what changes? Most actions have multiple effects. 
  • Decide where data fits in. What data do you have? How can you use it to analyze your current situation, and measure the results of any actions you take?

These four steps are the heart of decision intelligence. It is a good process for any business decision, but it’s particularly important when you’re implementing automation. If you start from the data, rather than the business outcomes and the levers you can use to change the situation, you are likely to miss important possibilities. No dataset tells you the structure of the world; that requires human expertise and experience. You’ll find small, local optimizations, but you’re likely to miss important use cases if you don’t look at the larger picture. This leads to a “knowledge decision gap.” Pratt mentions the use of satellite imagery to analyze data relevant to climate change: predicting fires, floods, and other events. The models exist, and are potentially very useful; but on the ground, firefighters and others who respond to emergencies still use paper maps. They don’t have access to up to date maps and forecasts, which can show what roads can be used safely, and where severe damage has occurred. Data needs to become the means, a tool for making good decisions. It is not an end in itself.

Donald Farmer says something similar. It’s easy to look at some process (for example, invoice processing, or checking in patients) and decide to automate it. You analyze what your staff does to process an invoice, and then design a system to perform that process. You may use some process discovery tools to help. If the process you are automating requires making some simple decisions, AI can probably be used to automate those decisions. You will probably succeed, but this approach overlooks two big problems. First, many business processes are failing processes. They’re inefficient, poorly designed, and perhaps even wholly inappropriate for the task. Never assume that most businesses are well run, and that they represent some sort of “best practice.” If you automate a poor process, then all you have is a faster poor process. That may be an improvement, but even if it’s an improvement, it’s sure to be far from optimal.

Farmer’s second point is related, but goes much deeper. Business processes never exist in isolation. They connect to other processes in a complex web. That web of connected processes is really what makes the business work. Invoice processing has tendrils into accounting. Manufacturing affects quality control, customer support, finance, shipping and receiving, accounts receivable, and more. HR processes have effects throughout the organization. Redesigning one process might give you a local improvement, but rethinking how the business works is a much bigger opportunity.  Farmer points to Blackline, a company that does process automation for financial services. They don’t automate a single process: they automate all of a client’s financial processes, with the result that all actions are processed immediately; the books are always closed. This kind of automation has huge consequences. You don’t have to wait for a few weeks after the end of a month (or quarter or year) to close the books and find out your results; you know the results continuously. As a result, your relationship to many important financial metrics changes. You always know your cash flow; you always know your credit line. Audits take on a completely different meaning because the business is always auditing itself. New strategies are possible because you have information that you’ve never had before.

Other areas of a company could be treated similarly. What would supply chain management be like if a company had constant, up-to-date information about inventory, manufacturing, new orders, and shipping? What would happen to product design, sales, and engineering if a constant digest of issues from customer service were available to them?

These changes sound like something that we’ve often talked about in software development: continuous integration and continuous delivery. Just as CI/CD requires IT departments to automate software deployment pipelines, continuous business processes come from automating–together–all of the processes that make businesses work. Rethinking the entirety of a business’s processes in order to gain new insights about the nature of the business, to change your relationship to critical measures like cash flow, and to automate the business’s core to make it more effective is indeed Hyperautomation. It’s all about integrating processes that couldn’t be integrated back when the processes were done by hand; that pattern recurs repeatedly as businesses transform themselves into digital businesses. Again, does this sound like a business Metaverse? After all, the consumer Metaverse is all about sharing immersive experience. While automating business processes doesn’t require VR goggles, for an executive I can’t imagine anything more immersive than immediate, accurate knowledge of every aspect of a company’s business. That’s surely more important than taking a meeting with your bank’s 3D avatars.

This kind of automation doesn’t come from a superficial application of AI to some isolated business tasks. It’s all about deep integration of technology, people, and processes. Integration starts with a thorough understanding of a business’s goals, continues with an understanding of the actions you can take to change your situations, and ends with the development of data-driven tools to effect the changes you want to see. While AI tools can help discover processes that can be automated, AI tools can’t do this job alone. It can’t happen without subject matter experts. It requires collaboration between people who know your business well, the people who are actually performing those tasks, and the stakeholders–none of which have the entire picture. Nor can it be undertaken without addressing data integration problems head-on. For some problems, like pharmacy prescription application we’ve already touched on, data integration isn’t just another problem; it is the problem that dwarfs all other problems.

We also need to be aware of the dangers. On one hand, automating all of a company’s processes to make a single coherent whole sounds like a great idea. On the other hand, it sounds like the kind of massive boil-the-ocean IT project that’s almost certainly bound to fail, or remain forever unfinished. Is there a happy medium between automating a single process and embarking on an endless task? There has to be. Understand your business’s goals, understand what levers can affect your performance, understand where you can use data–and then start with a single process, but a process that you have understood in the broader context. Then don’t just build applications. Build services, and applications that work by using those services. Build an API that can integrate with other processes that you automate. When you build services, you make it easier to automate your other tasks, including tasks that involve customers and suppliers. This is how Jeff Bezos built Amazon’s business empire.

The Humans in the Loop

Developers who are automating business systems have to determine where humans belong in the loop. This is a sensitive issue: many employees will be afraid of losing their jobs, being “replaced by a machine.” Despite talk about making jobs more interesting and challenging, it would be unrealistic to deny that many executives look at process automation and think about reducing headcount. Employees’ fears are real. Still, as of mid-2022, we remain in a job market where hiring is difficult, at any level, and if a business is going to grow, it needs the human resources to grow. Automating processes to make decisions in routine situations can be a way to do more without adding staff: if pharmacy employees can rely on an automated process to look up drug interactions, regulations, and medical records, in addition to managing the insurance process, they are free to take on more important or more difficult tasks.

Making jobs more challenging (or difficult) can be a double-edged sword. While many people in the automation industry talk about “relieving staff of boring, routine tasks,” they often aren’t familiar with the realities of clerical work. Boring, routine tasks are indeed boring and routine, but few people want to spend all their time wrestling with difficult, complex tasks. Everybody likes an “easy win,” and few people want an environment where they’re constantly challenged and facing difficulties–if nothing else, they’ll end up approaching every new task when they’re tired and mentally exhausted. Tired and overstressed employees are less likely to make good decisions, and more likely to think “what’s the easiest way to get this decision off of my desk.” The question of how to balance employees’ work experiences, giving them both the “easy wins,” but enabling them to handle the more challenging cases hasn’t been resolved. We haven’t seen an answer to this question–for the time, it’s important to recognize that it’s a real issue that can’t be ignored.

It’s also very easy to talk about “human in the loop” without talking about where, exactly, the human fits in the loop. Designing the loop needs to be part of the automation plan. Do we want humans evaluating and approving all the AI system’s decisions?  That begs the question of exactly what, or why, we’re automating. That kind of loop might be somewhat more efficient, because software would look up information and fill in forms automatically. But the gain in efficiency would be relatively small. Even if they didn’t need to spend time looking up information, an office worker would still need to understand each case. We want systems that implement end-to-end automation, as much as possible. We need employees to remain in the loop, but their role may not be making individual decisions. Human employees need to monitor the system’s behavior to ensure that it is working effectively. For some decisions, AI may only play an advisory role: a human may use AI to run a number of simulations, look at possible outcomes, and then make set a policy or execute some action. Humans aren’t managed by the machine; it’s the other way around. Humans need to understand the context of decisions, and improve the system’s ability to make good decisions.

If we want to leave as many decisions as possible to the system, what roles do we want humans to have? Why do we want humans in the loop? What should they be doing?

  • Humans need to manage and improve the system
  • Humans need to investigate and rectify bad decisions

Neither role is trivial or simple. “Managing and improving the system” encompasses a lot, ranging from automating new tasks to improving the system’s performance on current tasks. All AI models have a finite lifetime; at some point, their behavior won’t reflect the “real world,” possibly because the system itself has changed the way the real world behaves. Models are also subject to bias; they are built from historical data, and historical data almost never reflects our ideals of fairness and justice.  Therefore, managing and improving the system includes careful monitoring, understanding and evaluating data sources, and handling the data integration problems that result. We’re talking about a job that’s much more technical than a typical clerical position.

This understanding of the “human in the loop” suggests a user interface that’s more like a dashboard than a web form. People in this role will  need to know how the system is operating on many levels, ranging from basic performance (which could be measured in actions per second, time taken to generate and communicate an action), to aggregate statistics about decisions (how many users are clicking on recommended products), to real-time auditing of the quality of the decisions (are they fair or biased, and if biased, in what way).

Likewise, all decision-making processes are going to produce bad decisions from time to time. For better or for worse, that’s baked into the foundations of AI. (And as humans, we can’t claim that we don’t also make bad decisions.) Those bad decisions will range from simple misdiagnoses, poor recommendations, and errors to subtle examples of bias. We can’t make the mistake of assuming that an automated decision will always be correct. It’s possible that automated decision-making will be  an improvement over human decision-making; but bad decisions will still be made. The good news is that, at least in principle, AI systems are auditable. We know exactly what decisions were made, we know the data that the system used.

We can also ask an AI system to explain itself, although explainability is still an area of active research. We need explanations for two reasons. Staff will need to explain decisions to customers: people have never liked the feeling that they are interacting with a machine, and while that preference might change, “that’s what the computer said” will never be a satisfactory explanation. The system’s explanation of its decisions needs to be concise and intelligible. Saying that a loan applicant was on the wrong side of some abstract boundary in a high-dimensional space won’t do it; a list of three or four factors that affected the decision will satisfy many users. A loan applicant needs to know that they don’t have sufficient income, that they have a poor credit history, or that the item they want to purchase is overpriced. Once that reasoning is on the table, it’s possible to move forward and ask whether the automated system was incorrect, and from there, to change the decision. We can’t let automation become another way for management to “blame the computer” and avoid accountability.

Improving the system so that it gives better results requires a more technical explanation. Is the system too sensitive to certain factors? Was it trained using biased, unfair data? Is it inferring qualities like gender or ethnicity from other data? Relatively simple tests, like higher error rates for minority groups, are often a sign of bias. Data is always historical, and history doesn’t score very well on fairness. Fairness is almost always aspirational: something we want to characterize the decisions we’re making now and in the future. Generating fair results from biased data is still a subject for research, but again, we have an important advantage: decisions made by machines are auditable.

To override an automated decision, we need to consider interfaces for performing two different tasks: correcting the action, and preventing the incorrect action from being taken again. The first might be a simple web form that overrides the original decision–no matter how hard we try to automate “simple web forms” out of existence, they have a way of returning. The second needs to feed back into the metrics and dashboards for monitoring the system’s behavior. Is retraining needed? Is special-purpose training to fine-tune a model’s behavior an option?

Although re-training an AI system can be expensive, and auditing training data is a big project, they’re necessary, and have to be part of the plan. Even when there are no egregious errors, models need to be retrained to remain relevant. For example, fashion recommendations from a model that hasn’t been retrained in a year are not likely to be relevant.

Another problem with interfaces between humans and AI systems arises when we position the system as an “oracle”: a voice of truth that provides “the right answer.” We haven’t yet developed user interfaces that allow users to discuss or argue with a computer; users can’t question authority.  (Such interfaces might grow out of the work on large language models that’s being done by Google, Facebook, OpenAI, HuggingFace, and others.) Think about a diagnostic system in a doctor’s office. The system might look at a photo of a patient’s rash and say “That’s poison ivy.” So can a doctor or a nurse, and they’re likely to say “I didn’t need an expensive machine to tell me that,” even if the machine allows them to treat more patients in an hour. But there’s a deeper problem: what happens if that diagnosis (whether human or automated) is wrong? What if, after treatment, the patient returns with the same rash? You can’t give the same diagnosis again.

Shortly after IBM’s Watson won Jeopardy, I was invited to a demonstration at their lab. It included a short game (played against IBM employees), but what interested me the most was when they showed what happened when Watson gave an incorrect answer. They showed the last five alternatives, from which Watson chose its answer. This level wasn’t just a list: it included pros and cons for each answer under consideration, along with the estimated probability that each answer was correct. Choose the highest probability and you have an “oracle.” But if the oracle is wrong, the most useful information will be on the layer with the rejected answers: the other answers that might have been correct. That information could help the doctor whose patient returns because their poison ivy was actually a strange food allergy: a list of other possibilities, along with questions to ask that might lead to a resolution. Our insistence on AI systems as oracles, rather than knowledgeable assistants, has prevented us from developing user interfaces that support collaboration and exploration between a computer and a human.

Automation isn’t about replacing humans; it’s about collaboration between humans and machines. One important area of research for the “office metaverse” will be rethinking user interface designs for AI systems. We will need better dashboards for monitoring the performance of our automation systems; we’ll need interfaces that help workers research and explore ambiguous areas; and we probably won’t get away from filling in web forms, though if automation can handle all the simple cases, that may be all right.

Putting It All Together

Hyperautomation may or may not be the biggest technology trend of 2022. That game of buzzword bingo is unimportant. But “automating all the things”–that’s sure to be on every senior manager’s mind. As you head in this direction, here are some things to keep in mind:

  • Businesses are complex systems. While you should start with some simple automation tasks, remember that these simple tasks are components of these larger systems. Don’t just automate poor processes; take the opportunity to understand what you are doing and why you are doing it, and redesign your business accordingly.
  • Humans must always be in the loop. Their (our) primary role shouldn’t be to accept or reject automated decisions, but to understand where the system is succeeding and failing, and to help it to improve. 
  • The most important function of the “human in the loop” is accountability. If a machine makes a bad decision, who is accountable and who has the authority to rectify it?
  • Answers and decisions don’t arise magically out of the data. Start by understanding the business problems you are trying to solve, the actions that will have an influence on those problems, and then look at the data you can bring to bear.
  • Companies marketing AI solutions focus on the technology.  But the technology is useless without good data–and most businesses aren’t as data-rich as they think they are.

If you keep these ideas in mind, you’ll be in good shape. AI isn’t magic. Automation isn’t magic. They’re tools, means to an end–but that end can be reinventing your business. The industry has talked about digital transformation for a long time, but few companies have really done it. This is your opportunity to start.

Special thanks to Jennifer Stirrup, Lorien Pratt, and Donald Farmer, for conversations about Hyperautomation, Decision Intelligence, and automating business decisions. Without them, this article wouldn’t have been possible. All three have upcoming books from O’Reilly. Donald Farmer’s Embedded Analytics is currently available in Early Release, and Lorien Pratt has a preview of The Decision Intelligence Handbook on her website.

Categories: Technology

Radar Trends to Watch: October 2022

O'Reilly Radar - Tue, 2022/10/04 - 04:15

September was a busy month. In addition to continued fascination over art generation with DALL-E and friends, and the questions they pose for intellectual property, we see interesting things happening with machine learning for low-powered processors: using attention, mechanisms, along with a new microcontroller that can run for a week on a single AA battery. In other parts of the technical universe, “platform engineering” has been proposed as an alternative to both DevOps and SRE. We’ve seen demonstrations of SQL injection-like attacks against GPT-3; and companies including Starbucks, Chipotle, and Universal Studios are offering NFT-based loyalty programs. (In addition to a Chipotle’s steak grilling demo in the Metaverse.)

Artificial Intelligence
  • Facebook/Meta ups the ante on AI-generated images: they have a system that creates short videos from a natural language description.  Videos are currently limited to five seconds. It isn’t open to the public.
  • Transformers, which have a key to the progress in natural language processing, are now being adapted for work in computer vision, displaying convolutional neural networks.
  • A group of researchers are talking about bringing attention mechanisms to resource-constrained TinyML applications. Attention mechanisms are the central innovation that led to language tools like GPT-3. Low power attention could revolutionize embedded AI applications.
  • AGENT is a new benchmark for “common sense” in AI. It consists of a series of 3D animations. An AI model has to rate the videos as “surprising” or “expected.” To score highly, the model needs to demonstrate a human-like ability to plan, in addition to understanding concepts like basic physics.
  • Whisper is a new speech-to-text AI model from OpenAI. Its accuracy is impressive and, unlike other OpenAI products, it is open source.
  • Google’s Sparrow is an experimental AI chatbot that has been trained not to generate “dangerous” replies (ranging from hate speech to financial advice and claims of sentience). It is far from perfect, but it appears to be a significant improvement over current chat technology.
  • Have I been trained is a web application that searches for specific images in the LAION-5B data set, which was used to train several image generation models. You can search using images or text. It’s useful for discovering whether your artwork or photos were used in training.
  • Art generated by AI tools like Midjourney and Stable Diffusion is starting to appear on stock photography web sites. Getty Images has banned AI-generated content because they are concerned about copyright violations.
  • A new model for analyzing chest x-ray images learns from natural language medical reports written when the image was taken, rather than images labeled after the fact. Its accuracy is roughly equivalent to human radiologists.
  • Amodal panoptic segmentation is a new vision algorithm that allows systems to identify objects that are partially obscured by objects in front. This could be an important technology for improving autonomous vehicles’ ability to identify pedestrians successfully.
  • Huggingface has released a toolkit for building diffusion models. Diffusion models are the technology used by DALL-E, Stable Diffusion, and other AI tools that build images through random processes.
  • English is the dominant language for AI research, and that inevitably introduces bias into models. IGLUE (Image-Grounded Language Understanding Evaluation) is a benchmark that tests an AI system’s performance in 20 different languages, and includes culture-specific images.
  • PromptBase is a secondary market where you can buy and sell prompts for machine learning systems. They’re currently soliciting prompts for DALL-E, Midjourney, Stable Diffusion, and GPT-3. This world is developing very quickly.
  • AutoCounterspeech is a language model that generates appropriate replies that confront and contest hate speech. It’s another example of a large language that has been adapted for a specific purpose with specialized training.
  • Simon Willison and Andy Baio have created a tool to explore 12 billion of the images used to train the Stable Diffusion image generator. Their results are fascinating.
  • Neuromorphic computing, which is based on specialized chips that emulate human neurons, is better at identifying objects than traditional neural networks, and uses much less power.
  • What does GPT-3 know about you? Possibly quite a lot; much of it may be incorrect; and some of it could be damaging (for example, being linked to “terror”).
  • A teenager has built a tool that uses machine learning to detect elephants and humans in real time from infrared images taken by drones. This could be invaluable in preventing poaching.
  • Stephen O’Grady’s article on bait-and-switch open source licenses is a must-read.
  • Is platform engineering an alternative to both DevOps and SRE? Platform engineering is the discipline of “building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era.”
  • Nbdev2 lets git and Jupyter notebooks play well together, solving a major problem for collaboration with notebooks. Collaboration and version control no longer work at cross-purposes.
  • Tauri is a Rust-based framework for building desktop apps. It is conceptually similar to Electron, but uses Rust for the backend, and generates much smaller executable files.
  • For those who don’t get along with IDEs, here’s a quick HowTo about running Github Copilot in the terminal with Vim. Has anyone done this with Emacs?
  • Bryan Cantrill on Rust and the future of low latency embedded systems: Rust is the first language since C to live at the border between hardware and software.
  • Explainshell looks up the documentation for every command and its arguments on a bash shell command line. Clever.
  • HTTP QUERY is a new method that has been added to HTTP to support building APIs. QUERY requests are safe; they never alter the resource being queried. The query is placed in the payload of the request, rather than the URI. And responses from a QUERY are cacheable.
  • Fuzzing is a powerful testing technique; it means watching how the software under test handles random data. Dr. Chaos is a new fuzzing framework for C, C++, and Objective-C.
  • Trace-based testing is the next step forward in observability. It means using data from tests run during software development in operations, to determine exactly what kinds of events can occur and how.
  • Software supply chain security is more important than ever; Microsoft claims that the Lazurus cybercrime group, which is sponsored by North Korea, is adding backdoors to many widely used Open Source programs and libraries.
  • Chaos is new malware that can infect both Windows and Linux devices, including routers, firewalls, and other networking hardware. It is spreading in the wild; it propagates by taking advantage of known vulnerabilities.
  • Prompt injection attacks against GPT-3: Simon Willison demonstrates a new security threat that is similar to SQL injection. This will be an issue for GPT-3 applications that combine prompts from untrusted users with prompts that are generated by the application.
  • The Atlantic Council has published a report describing an international strategy for securing the Internet of Things. The report is based on case studies in the US, UK, and APAC, and focuses on smart homes, networking, and telecommunications.
  • Domain shadowing, in which a criminal group hijacks a DNS server to insert its own domains under the legitimate domains, without modifying the legitimate domains, is becoming an increasingly important threat.
  • An experiment demonstrating the danger of automated surveillance showed that it was possible to find individuals and locations in Instagram photos using data feeds from cameras (both open and private) installed in public places.
  • The popularity of browser-in-browser attacks, in which a compromised site steals information by creating a fake browser within the active browser window, is rising.
  • Street View gives Google a head start on building immersive experiences of different places. Is this a down payment on the Metaverse?
  • The LockBit ransomware group may be preparing to use distributed denial of service (DDOS) attacks as another form of extortion. They are also learning to defend themselves against ransomware victims who attack them with DDOS rather than paying.
  • Starbucks, Chipotle, and even Universal Studios have developed NFT-based loyalty programs. Chipotle even has a simulated grilling experience, conducted in their Metaverse property.
  • Cryptocurrency can be used to pay taxes in Colorado. Utah is set to follow.
  • Can Web3 be used as a tool to combat climate change? Fred Wilson points to efforts like New Atlantis, for marine biodiversity, and the Toucan Protocol, a voluntary carbon market. Wilson’s thesis is that work against climate change will be crowd-funded.
  • Andreessen Horowitz has introduced a “Don’t Be Evil” license for NFTs, similar (in concept) to the Creative Commons licenses. There are six distinct kinds of license, including an “exclusive commercial rights” license and a “universal license”; some licenses provide automatic revocation for hate speech.
  • Some studies show that surgery patients who are given a virtual reality program to view during a procedure require less anaesthetic. VR may also help in post-operative recovery.
  • A modeling agency is using real models to create Metaverse avatars for use in advertising. Faces are based on 3D photos; bodies are synthesized. The models are given unique voices and personalities. The avatars are sold as NFTs that expire after a given time.
  • Ethereum has made the transition to Proof of Stake. PoS provides its own set of challenges, but requires much less energy and should support significantly higher transaction rates. Nothing broke, the price of the major cryptocurrencies remained stable, and the used equipment market is now flooded with GPUs.
  • Neal Stephenson says that the Metaverse will “start on the wrong foot” if it leaves behind people using 2D screens. In the 1990s, he didn’t forsee the sophistication of modern gaming, specifically the ability to navigate 3D spaces with 2D hardware. Stephenson is co-founding Lamina1, a company building a “base layer” for an Open Metaverse.
  • Roblox is developing avatars that can reflect their owners’ facial expressions during game play in real time.
Quantum Computing Biology
  • A memory prosthesis might be able to restore memory to people with diseases like Alzheimer’s. The prosthesis generates signals that are similar to the signals that neurons create when creating or activating memories.
  • Manufacturers of high performance biomaterials, such as spider silk protein and mycelium, are starting to scale up production. Synthetic biology is becoming real.
  • A new genetic therapy attempts to design human B cells, the cells that make antibodies, to target rare diseases by manufacturing missing enzymes.
  • The MAX78002 is a low power microcontroller designed for running neural networks in edge computing applications. There are claims that it can run for a week on a single AA battery. It has 64 parallel processors and can run a network with up to 3.5 million parameters.
  • The Chinese are planning to build a dam with a distributed 3D printer, using no direct human labor. There’s arguably no printer at all; the work is done by AI-controlled robots that pour the concrete and roll it out in layers.
  • NVidia has a new GPU chip with specialized hardware for training transformer models. It is 4.5x faster than their previous high-performance data center GPU.
  • China has developed its own GPUs, the Biren 100 and Biren 104. This will greatly reduce its dependence on NVidia for high performance computing hardware.
  • Battery power played an important role in helping California’s electrical grid survive September’s heat wave without outages.
Categories: Technology

The Problem with Intelligence

O'Reilly Radar - Tue, 2022/09/13 - 04:21

Projects like OpenAI’s DALL-E and DeepMind’s Gato and LaMDA have stirred up many discussions of artificial general intelligence (AGI). These discussions tend not to go anywhere, largely because we don’t really know what intelligence is. We have some ideas–I’ve suggested that intelligence and consciousness are deeply connected to the ability to disobey, and others have suggested that intelligence can’t exist outside of embodiment (some sort of connection between the intelligence and the physical world). But we really don’t have a definition. We have a lot of partial definitions, all of which are bound to specific contexts.

For example, we often say that dogs are intelligent. But what do we mean by that? Some dogs, like sheep dogs, are very good at performing certain tasks. Most dogs can be trained to sit, fetch, and do other things. And they can disobey. The same is true of children, though we’d never compare a child’s intelligence to a dog’s. And cats won’t do any of those things, though we never refer to cats as unintelligent.

I’m very impressed with Irene Pepperberg’s work on parrot intelligence. She’s shown that her parrots can have an understanding of numbers, can use language intelligently, and can even invent new vocabulary. (“Banerry” for apple, probably because birds don’t have lips and can’t say Ps very well. And apples look like giant cherries and taste like bananas, at least to parrots.) But I wonder if even this is getting the question wrong. (I think Dr. Pepperberg would agree.) We ask birds to be intelligent about things humans are intelligent about. We never ask humans to be intelligent about things birds are intelligent about: navigating in three-dimensional space, storing food for use during winter (a boreal chickadee will store as many as 80,000 seeds in different places, and remember where they’re all located), making use of the many colors birds see that we can’t (their vision extends well into the ultraviolet). It’s easy to imagine a bird thinking, “Those poor humans. They can’t find their home without taking out that strange little black box (which is actually colored octarine).”

In a similar vein, we often say that dolphins and elephants are intelligent, but it’s never clear what exactly we mean by that. We’ve demonstrated that dolphins can recognize patterns and that they recognize themselves in mirrors, and they’ve demonstrated a (limited) ability to communicate with humans, but their intelligence certainly goes much further. I wouldn’t be the least bit surprised if animals like dolphins had an oral literature. We penalize them on the intelligence scale because they don’t have hands and can’t pick up a pen. Likewise, some research shows that elephants communicate with each other using low frequency rumbles that can be heard for miles (if you’re an elephant). Information theory suggests that this communication can’t be fast, but that doesn’t mean that it can’t be rich.

Humans are intelligent. After all, we get to define what “intelligence” means. Controlling the definition of intelligence has always been a source of cultural and political power; just read anything written in America in the 19th century about the intelligence of women, Asians, Africans, or even the Irish and Italians. We have “intelligence tests” to measure intelligence–or do they just measure test-taking ability? We also talk about “emotional” and other kinds of intelligence. And we recognize that mathematical, linguistic, and artistic ability rarely go hand-in-hand. Our own view of our own intelligence is highly fractured, and often has more to do with pseudo-science than anything we could use as a metric in machine learning experiments. (Though GPT-3 and LaMDA are no doubt very good at taking tests.)

Finally, there’s also been a lot of talk recently about the possibility of discovering life on other planets. Life is one thing, and my decidedly amateur opinion is that we will find life fairly common. However, to discover intelligent life, we would need a working definition of intelligence. The only useful definition I can imagine is “able to generate signals that can be received off planet and that are indisputably non-natural.” But by that definition, humans have only been intelligent for roughly 100 years, since the early days of radio. (I’m not convinced that the early electrical experiments from the 19th century and spark-based radio from the first two decades of the 20th century could be detected off planet.) There may be fantastically intelligent creatures living under the ice covering Saturn’s moon Titan, but we’ll never be able to detect them without going there. For Titan, a visit may be possible. For planets elsewhere in our galaxy, probably not.

Even more important: these definitions aren’t just different. They’re different in kind. We’re not saying that a parrot or a crow is intelligent if it scores 0.3 (on a scale of 0 to 1) on some test, but an autonomous vehicle has to score .99. The definitions aren’t remotely comparable. I don’t know what it would mean to ask GPT-3 about soaring on air currents. If we asked, we would get an answer, and quite likely a good one with a lot of information about aerodynamics, but would that have anything to do with an eagle’s understanding of flight? I could tell Gato to “sit,” but how would I know if it complied?

So what does this tell us about intelligence that’s artificial? Context is important; an appropriate definition of “intelligence” has to start with what we want the system to do. In some cases, that’s generating publishable papers and good PR. With natural language systems like GPT-3, we tend to ignore the fact that you often have to try several prompts to produce reasonable output. (Would we consider a human intelligent if they had to try 5 times to answer a question?) As has often been noted, systems like GPT-3 often get basic facts wrong. But humans often respond to prompts incoherently, and we frequently get our facts wrong.  We get things wrong in different ways, and for different reasons; investigating those differences might reveal something about how our intelligence works, and might lead us to a better understanding of what an “artificial intelligence” might mean.

But without that investigation, our standard for intelligence is fairly loose. An AI system for making product recommendations can be successful even if most of the recommendations are wrong–just look at Amazon. (I’m not being ironic. If there are 10 recommendations and you’re interested in one of them, Amazon has won.) An AI system for an autonomous vehicle has to work to a much higher standard. So do many systems where safety isn’t an issue. We could happily talk about the “intelligence” of an AI chess engine that can beat the average human player, but a chess playing product that can only beat the average human and couldn’t play on a world championship level would be an embarrassment.

Which is just to say that intelligence, especially of the artificial sort, is many things. If you read Turing’s paper on the Imitation Game, you’ll see quickly that Turing is more interested in the quality of the interaction than the correctness of the result. In his examples, the machine says that it’s not good at writing poetry; hesitates before giving answers; and even gets some results wrong. Turing’s thought experiment is more about whether a machine can behave like a human than about whether it can master many different disciplines. The word “intelligence” only appears once in the body of the paper, and then it refers to a human experimenter.

That leads me to a conclusion: Intelligence doesn’t have any single definition, and shouldn’t. Intelligence is always specific to the application.  Intelligence for a search engine isn’t the same as intelligence for an autonomous vehicle, isn’t the same as intelligence for a robotic bird, isn’t the same as intelligence for a language model. And it certainly isn’t the same as the intelligence for humans or for our unknown colleagues on other planets.

If that’s true, then why are we talking about “general intelligence” at all?  General intelligence assumes a single definition. Discarding the idea of a single unifying definition of “intelligence” doesn’t cost us much, and gains a lot: we are free to create definitions of “intelligence” that are appropriate to specific projects. When embarking on a new project, it’s always helpful to know exactly what you’re trying to achieve. This is great for practical, real-world engineering. And even big, expensive research projects like DALL-E, Gato, LaMDA, and GPT-3 are ultimately engineering projects. If you look beyond the link-bait claims about general intelligence, sentience, and the like, the computer scientists working on these projects are working against well-defined benchmarks. Whether these benchmarks have anything to do with “intelligence” isn’t relevant. They aren’t trying to create an artificial human, or even an artificial dog. (We’ll leave artificial dogs to Boston Dynamics.) They are trying–with considerable success–to extend the range of what computers can do. A model that can work successfully in over 600 different contexts is an important achievement. Whether or not that’s “general intelligence” (or intelligence at all) is a side show we don’t need.

Categories: Technology

Radar Trends to Watch: September 2022

O'Reilly Radar - Tue, 2022/09/06 - 04:21

It’s hardly news to talk about the AI developments of the last month. DALL-E is increasingly popular, and being used in production. Google has built a robot that incorporates a large language model so that it can respond to verbal requests. And we’ve seen a plausible argument that natural language models can be made to reflect human values, without raising the question of consciousness or sentience.

For the first time in a long time we’re talking about the Internet of Things. We’ve got a lot of robots, and Chicago is attempting to make a “smart city” that doesn’t facilitate surveillance. We’re also seeing a lot in biology. Can we make a real neural network from cultured neurons? The big question for biologists is how long it will take for any of their research to make it out of the lab.

Artificial Intelligence
  • Stable Diffusion is a new text-to-image model that has been designed to run on consumer GPUs. It has been released under a license that is similar to permissive open source licenses, but has restrictions requiring the model to be used ethically.
  • Researches claim that they can use a neural network to reconstruct images (specifically, faces) that humans are seeing. They use fMRI to collect brain activity, and a neural decoding algorithm to turn that activity into images that are scarily similar to the photos the subjects were shown.
  • Research from Google and other institutions investigates the emergent properties of large language models: their ability to do things that can’t be predicted by scale alone.
  • DALL-E’s popularity is soaring and, like Copilot, it’s being adopted as a tool. It’s fun to play with, relatively inexpensive, and it’s increasingly being used for projects like designing logos and generating thumbnail images for a blog.
  • Elon Musk has announced that Tesla will have a robot capable of performing household chores by the end of 2022. That is almost certainly overly ambitious (and we hope it works better than his self-driving vehicles), but it’s no doubt coming.
  • Google has demonstrated a robot that can respond to verbal statements (for example, bringing food when you say “I’m hungry”) without being trained on those specific statements; it uses a large language model to interpret the statement and determine a response.
  • Molecular modeling with deep learning has been used to predict the way ice forms. This can be very important for understanding weather patterns; the technique may be applicable to developing new kinds of materials.
  • Brain.js is a deep learning library for JavaScript, designed to run in the browser and using the computer’s GPU (if available).
  • Graph neural networks may be able to predict sudden flareups in burning homes, the largest cause of death among firefighters.
  • While avoiding the question of whether language models are “intelligent,” Blaise Aguera y Arcas argues that language models can be trained to reflect particular moral values and standards of behavior.
  • A webcam mounted on a 3-D gimbal uses AI to automatically track moving objects. This could be a step towards making virtual reality less virtual.
  • A new political party in Denmark has policies determined entirely by AI. The Synthetic Party plans to run candidates for parliament in 2023.
  • One irony of AI work is that neural networks are designed by human intuition. Researchers are working on new AutoML systems that can quickly and efficiently design neural networks for specific tasks.
  • To succeed at deploying AI, AI developers need to understand and use DevOps methods and tools.
  • Cerebras, the company that released a gigantic (850,000 core) processor, claims their chip will democratize the hardware needed to train and run very large language models by eliminating the need to distribute computation across thousands of smaller GPUs.
  • Large language models are poor at planning and reasoning. Although they have done well on “common sense” benchmarks, they fail at planning tasks requiring more than one or two steps and that can’t be solved by doing simple statistics on their training data. Better benchmarks for planning and reasoning are needed to make progress.
  • A GPT-3 based application can answer philosophical questions well enough to fool philosophers. The authors make it clear that the machine is not “thinking”; it was intended as an experiment to demonstrate the danger of automated plagiarism.
  • DoNotPay has built a tool that finds racist language in real estate documents, and automates the process of having it removed. Not very surprisingly, it quickly discovered that clauses preventing the sale of property to non-Whites are extremely common.
  • Researchers have developed analog “neurons” that can build analog neural networks programmed similarly to digital neural networks. They are potentially much faster and require much less power.
  • A startup called Language I/O does machine translation by leveraging translations from Google, Facebook, and Amazon, then uses AI to choose the best and fine-tune the result, using customer-supplied vocabularies with minimal training.
  • KSplit is an automated framework for isolating operating system device drivers from each other and the OS kernel. Isolating device drivers is very difficult for human programmers, but greatly reduces vulnerabilities and bugs.
  • A report on the API economy says that one of the biggest obstacles is the lack of API design skills.
  • A bit of history comes back to life: An archive of everything written by why the lucky stiff (aka _why) is now online. _why was a mainstay of the Ruby community in the early 2000s; he disappeared from the community and took all of his content offline when a reporter revealed his name. Well worth reading; maybe even well worth re-acquainting yourself with Ruby.
  • Bun is a new JavaScript framework that aims to replace Node.js. It’s still early, and doesn’t yet support some important NPM packages. But it’s very fast, and (like Deno) implements Typescript.
  • Observability needs to “shift left”: that is, become a primary concern of developers, in addition to operations. Most observability tools are oriented towards operations, rather than software development.
  • mCaptcha is a proof-of-work-based Captcha system that avoids any human interaction like identifying images. It imposes a small penalty on genuine users that actors who want to pound web sites at scale won’t be willing to pay.
  • RStudio is renaming itself Posit. We don’t normally deal in corporate names, but this change is significant. Although R will remain a focus, RStudio has been looking beyond R; specifically, they’re interested in Python and their Jupyter-based Quarto publishing system.
  • Google is releasing open source tools for designing chips, and funding a program that allows developers to have their custom designs built at a fabrication facility. The goal is to jump-start an open source ecosystem for silicon.
  • Zero trust adoption has soared in the past year. According to Okta, 97% of the respondents to their recent “state of zero trust” survey say they have zero trust initiatives in place, or will have them within the next year.
  • An online tool named InAppBrowser can detect whether the browsers that are built in to mobile apps can be used to inject JavaScript into sites you visit. This kind of JavaScript injection isn’t always dangerous, but is often used to inject tracking code.
  • Google blocked a distributed denial of service attack (DDOS) against one of its cloud customers that peaked at 26 million requests per second, a record. The customer was using Google’s Cloud Armor service.
  • Chatbots backed by AI and NLP are becoming a significant problem for security. Well-designed chatbots can perform social engineering, execute denial of service attacks on customer service by generating complaints, and generate fake account credentials in bulk.
  • A security researcher has created a $25 tool that allows users to run custom code on terminals for the Starlink network. It requires attaching a board to your dish, but we suspect that enough Starlink users would be interested in “exploring” the satellite network to become a serious problem.
  • Message Franking is a cryptographic technology that includes end-to-end encryption, but also allows abusers to be held to account for misinformation–without revealing the content of the message.
  • One trick for detecting live deepfakes in video calls: ask the caller to turn sideways. Deepfake software is good at generating head-on views, but tends to fail badly at profiles.
  • Bruce Schneier on cryptographic agility: We need the ability to swap in cryptographic algorithms quickly, in light of the possibility that quantum computers will soon be able to break current codes. Industry adoption of new algorithms takes a long time, and we may not have time.
  • SHARPEXT is malware that installs a browser extension on Chrome or Edge that allows an attacker to read gmail. It can’t be detected by email services. Users are tricked into installing it through a phishing attack.
  • Passage offers biometric authentication services that work across devices using WebAuthn. Biometric data is encrypted, of course, and never leaves the user’s device.
  • Watch the progress of the American Data Privacy Protection Act, which has bipartisan support in Congress. This is the first serious attempt to provide nationwide digital privacy standards in the US.
  • A lawsuit filed in California claims that Oracle is selling a detailed social graph that incorporates information about 5 billion distinct users, roughly ⅔ the population of the planet. This information was gathered almost entirely without consent.
  • Finland is planning to test digital passports later this year. Volunteers with digital passports will be issued a smartphone app, rather than papers. Digital passports will require travelers to send plans to border control agencies, and a photo of them will be taken at the border.
  • A startup is attempting to grow a new liver inside a human body, as an alternative to a transplant. They will inject the patient’s lymph nodes with cells that will hopefully be able to reproduce and function as an alternate liver.
  • Tiny caps for tiny brains: Researchers have developed “caps” that can measure activity in brain organoids (cultured clusters of human neurons). It’s possible that groups of organoids can then be connected and networked. Is this the next neural network?
  • A bioengineered cornea made from collagen collected from pig skin, could be an important step in treating keratoconus and other causes of blindness. Artificial corneas would eliminate the problem of donor shortage, and can be stored for much longer than donated corneas.
  • A startup in Israel is creating artificial human embryos from human cells. These embryos, which survive for several days but are not viable, could be used to harvest very early-stage organs for transplants.
  • Materials that can think: Researchers have developed a mechanical integrated circuit that can respond to physical stresses, like touch, and perform computation on those stresses, and generate digital output.
  • Eutelsat, a European satellite operator, has launched a commercial “software defined satellite”: a satellite that can be reconfigured for different missions once it’s in space.
  • Developing robots just got easier. Quad-SDK is an open source stack for four-legged locomotion that’s compatible with ROS, the Robot Operating System.
  • Artificial intelligence isn’t just about humans. A startup is reverse-engineering insect brains to develop efficient vision and motion systems for robots.
  • A Japanese firm has developed robots that are being used to stock shelves in a convenience store chain.
  • Chicago’s Array of Things is an edge network for a smart city: an array of inexpensive temporary sensors to report on issues like traffic, safety, and air quality. Although the sensors include cameras, they only send processed data (not video) and can’t be used for surveillance.
  • The US Department of Energy is funding research on using sensors, drones, and machine learning to predict and detect wildfires. This includes identifying power line infrastructure that’s showing signs of arcing and in need of maintenance.
  • The UK is developing “flyways” for drones: Project Skyway will reserve flight paths for drone aircraft between six major cities.
Work Web3
  • Ethereum will be moving to proof-of-stake in September. Fred Wilson has an analysis of what this will mean for the network. The current proof-of-work blockchain will continue to exist.
  • Beginning in November, international payments will begin moving to blockchains, based on the ISO 20022 standard. A small number of cryptocurrencies comply with this standard. (Bitcoin and Ethereum are not on the list.)
  • Application-specific blockchains, or appchains, may be the way to go, rather than using a Layer 1 blockchain like Ethereum. Appchains can be built to know about each other, making it easier to develop sophisticated applications; and why let fees go to the root blockchain’s miners?
  • Cryptocurrency scans and thefts are old news these days, but now we’ve seen the first decentralized robbery. The attackers posted a “how to” on public servers, allowing others to join in the theft, and giving the original thieves cover.
Quantum Computing
  • Practical quantum computers may still be years away, but Quantum Serverless is coming. For almost all users, quantum computers will be in some provider’s cloud, and they’ll be programmed using APIs that have been designed for serverless access.
Categories: Technology

Ad Networks and Content Marketing

O'Reilly Radar - Tue, 2022/08/16 - 04:21

In a recent Radar piece, I explored N-sided marketplaces and the middlemen who bring disparate parties together. One such marketplace is the world of advertising, in which middlemen pair hopeful advertisers with consumer eyeballs. And this market for attention is absolutely huge, with global ad spend weighing in at $763 billion in 2021 revenues.

Most of that money is spent on digital ads, like the ones that follow you across websites to offer you deals on items you’ve just bought. Those are typically based on your online activity. Ad networks trail behind you as you browse the web, trying to get an idea of who you are and what you’re likely to buy, so they can pair you with hopeful merchants.

While merchants are clearly happy with targeted ads—at least, I’d hope so, given how much they’re spending—consumers have, understandably, expressed concerns over personal privacy. Apple took note, and limited iOS apps’ ability to track users across sites. Google has announced changes that would further limit advertisers’ reach. Who knows? Maybe the next step will be that the ad industry gets stronger regulations.

There’s also the question of whether targeted advertising even works.  While the ad networks aren’t required to disclose their stats, there are even people inside those companies who think that their product is “almost all crap.”

Maybe it’s time for a different approach? Recently, Disney’s video streaming service, Disney+, threw its hat into the advertising ring by announcing a new ad-supported plan. (Credit where it’s due: I originally found this in Les Echos, which may be paywalled. Here’s the official, English-language press release from Disney.)

It may be easy to disregard this Disney+ move, since so much of the online world is ad-supported these days. But I think this merits more attention than it may seem on the surface.

To be clear: I have no inside information here. But it at least looks like Disney+ can run its ad platform in a fairly low-tech fashion while also preserving privacy. That’s a pretty big deal for Disney, for consumers, and for the wider space of online advertising.

Everything old is new again

To understand why, let’s first consider the idea of “content marketing.” This is a new term for the age-old practice of selling ad space next to curated content that aligns with a particular theme. For example, let’s say you’ve created a magazine about cars. Motoring enthusiasts will read your magazine, which means advertisers (merchants) who want to reach them will place ads in your pages. The content is what draws readers and advertisers to the same spot.

What’s nice about content marketing is that the ad’s placement is based on the content, not the specific person reading it.

This addresses the privacy concern at the core of targeted advertising, because content marketing doesn’t require that you build a detailed profile of a person based on their every browsing habit. You’re not pairing an ad to a person; you’re pairing an ad to a piece of content. So you shift your analytical focus from the reader to what they’re reading.

The mouse has a large library

Now, consider Disney: its catalog spans decades’ worth of cartoons, tween sitcoms, and movies. Its recent acquisition of the Star Wars franchise gives it access to an even wider fanbase. And don’t forget that Disney owns ESPN, which adds sports content to the portfolio. It now makes that content available through its video-on-demand (VOD) platform of Disney+.

Disney already has to keep track of that catalog of content as part of its day-to-day business, which means we can reasonably assume that every show, movie, and sporting event on Disney+ has been assigned some number of descriptive tags or labels.

From the perspective of content marketing, all of this adds up to Disney+ being able to place ads on that content without having to do much extra work. The parent company, Disney, already owns the content and it’s already been tagged. The depth and breadth of the video catalog will certainly attract a large number and wide variety of viewers. That shifts the heavy lifting to the ad-matching system, which connects advertisers with the content.

Tracking your ad budget

You’ve likely heard the John Wanamaker adage: “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.” It’s a well-founded complaint about billboard or magazine advertising, since an advertiser can’t really tell how many people saw a given ad.

(Some early advertising pioneers, David Ogilvy among them, learned to supply coupons with print ads so stores could track which one had resonated the most. While this added a new level of analytical rigor to the field, it still wasn’t a perfect solution to Wanamaker’s plight.)

Delivering content-based ads through a well-curated streaming platform addresses that somewhat. Disney+ can provide an advertiser a detailed analysis of their ad spend without revealing any individual’s identity: “N number of people watched Variant V, your ad for Product P, during Show S, with the following breakdowns for time of day…”

And that leads me to my next point:

Minimal ML/AI

When you review the setup—a curated and labeled catalog, with broad-brush marketing characteristics—Disney+ has the ability to run this ad service using minimal ML/AI.

(Once again: I’m speculating from the outside here. I don’t know for sure how much ML/AI Disney+ is using or plans to use. I’m working through one hypothetical-yet-seemingly-plausible scenario.)

Disney+ can use those content labels—”pro football,” “tween comedy,” “gen-X cartoon”—to pair a piece of content with an advertisement. They may not get a perfect hit rate on these ads; but given that they’re building on top of work they’ve already done (the catalog and the streaming platform) then the ad system can run at a relatively low cost. And providing stats to advertisers is a matter of counting. Since those calculations are so trivial, I expect the toughest part of that BI will be scaling it to Disney’s audience size.

Can Disney+ still use ML/AI in places? They most certainly can, but they don’t have to. Disney+ has the option to run this using a smaller team of data scientists and a far smaller data analysis infrastructure. Whether you call this “smaller budget” or “higher margins,” the net effect is the same: the company ends the day with money in its pocket.

Disney+ can task that ML team with building models that better tag content, or that improve matches between content and advertisers. They don’t have to spend money analyzing the specific actions of a specific individual in the hopes of placing ads.

Future-proofing the ad system

Assuming that the Disney+ ad system will indeed run on a content marketing concept, that means the company has one more card to play: They have just sidestepped potential future privacy laws that limit the use of personal information.

Yes, Disney+ can get a person’s contact information when they subscribe to the service. Yes, the company can track customer behavior on- and off-platform, through a mix of first- and third-party data. But, contrary to targeted advertising, they don’t need all of that to run ads. All the company needs is to pair content with an advertisement. Given that this is the modern-day equivalent of a billboard or newspaper article, I imagine it would be difficult for Disney+ to run afoul of any present-day or upcoming privacy regulation with such an ad setup.

There’s still some room for trouble…

Going back to our car magazine example, Disney’s library is the equivalent of hundreds or even thousands of magazines. And if a single magazine is a hint as to a single interest, what can a larger number of magazines tell us?

By tracking what content a person watches, how they watch it (phone, tablet, TV), and what time of day, Disney+ could infer quite a bit about that person and household: the number and age of adults; marital or relationship status; age and number of children; whether this is a multi-generational household; and even some clues as to viewers’ gender. (I emphasize the term “infer” here, since it would hardly be perfect.)

In turn, Disney could use this for ad targeting, or to provide even more-detailed breakdowns to advertisers, or even find ways to share the data with other companies. This could get creepy quickly, so let’s hope they don’t take this route. And based on what we’ve covered thus far, Disney+ has every opportunity to run an ad network that preserves a reasonable amount of privacy.

Could the tail someday wag the dog?

Another possible wrinkle would be in how advertising weighs on future content.

Disney already has a good eye for what people will want to watch. And right now, those viewers are Disney’s customers. But when Disney+ becomes an ad marketplace, they’ll officially be a middleman, which means they’ll have to keep both sides of the ad equation happy. At what point does Disney use the Disney+ advertising as a compass, feeding back into decisions around what content to create?

And would Disney ever stretch beyond its own character lines, to build TV and movies around someone else’s toys?  It’s not too far-fetched of an idea. In The Great Beanie Baby Bubble, author Zac Bisonette points out that:

[A TV show deal] was the kind of product-based programming that was responsible for billions per year in sales and could turn toys that no one wanted into hits through sheer exposure. Lines such as He-Man, My Little Pony, and the ThunderCats had all become hundred-million-dollar brands with the help of the product-based TV shows that accompanied their launches.

Creating content in one side of the businesses while running ads in the other, it’s not unlike running an investment bank and retail bank under one roof: sure, it can lead to all kinds of interesting business opportunities.  It can also lead to trouble.

When it comes to content marketing, you need to strike a balance: you want to create evergreen content, so you can continue to run ads. And when that content is going into the Disney catalog—some of which currently spans multiple generations—it has to be absolutely timeless. Giving in to the whims of a single advertiser, or a single fad, can lead to short-term gains but also short-lived content.

Beyond the Magic Kingdom

Despite those challenges, content marketing has huge potential for generating revenue, preserving privacy, and avoiding future regulation that could hinder targeted advertising. By building this system on BI and content tagging, Disney could do so at a smaller price tag than an AI-based, targeted-ad marketplace.

And this isn’t just a Disney opportunity. I’ve focused on them in this piece but other VOD providers have already seen the benefit in monetizing their catalog. According to Jason Kilar, former CEO of WarnerMedia, “Close to 50% of every new [HBO Max] subscriber is choosing the ad tier. Hulu, the last stat they shared publicly, is they are north of 60%.” Amazon will rename its ad-supported IMDb TV service to Freevee. (I first saw this in Der Spiegel; I’ve since found a US  press release.)  And Netflix, long a holdout in the ad-supported space, hinted at plans for a similar offering.

To be clear, content marketing at this scale is not exactly a get-rich-quick scheme. It works best for groups that already have a large amount of content—video, image, text, audio—that they can monetize. This certainly holds true for the platforms I’ve just mentioned. Maybe it’s also true for your company?

It may require getting creative as you comb through your attic. And maybe there’s an option for a new kind of ad marketplace, one that groups people with a small amount of content into a larger content ecosystem. Sort of like what EthicalAds does for developer documentation. If low-cost, non-invasive content marketing is an option, it can’t hurt to try.

Many thanks to Chris Butler for reviewing an early draft of this article. I always appreciate his insights. The section on the tail wagging the dog was based on his idea and I give him full credit for pointing this out to me.

Categories: Technology

On Technique

O'Reilly Radar - Tue, 2022/08/09 - 04:12

In a previous article, I wrote about how models like DALL-E and Imagen disassociate ideas from technique. In the past, if you had a good idea in any field, you could only realize that idea if you had the craftsmanship and technique to back it up. With DALL-E, that’s no longer true. You can say, “Make me a picture of a lion attacking a horse,” and it will happily generate one. Maybe not as good as the one that hangs in an art museum, but you don’t need to know anything about canvas, paints, and brushes, nor do you need to get your clothes covered with paint.

This raises some important questions, though. What is the connection between expertise and ideation? Does technique help you form ideas? (The Victorian artist William Morris is often quoted as saying “You can’t have art without resistance in the materials,” though he may only have been talking about his hatred of typewriters.) And what kinds of user interfaces will be effective for collaborations between humans and computers, where the computers supply the technique and we supply the ideas? Designing the prompts to get DALL-E to do something extraordinary requires a new kind of technique that’s very different from understanding pigments and brushes. What kinds of creativity does that new technique enable? How are these works different from what came before?

As interesting as it is to talk about art, there’s an area where these questions are more immediate. GitHub Copilot (based on a model named Codex, which is derived from GPT-3) generates code in a number of programming languages, based on comments that the user writes. Going in the other direction, GPT-3 has proven to be surprisingly good at explaining code. Copilot users still need to be programmers; they need to know whether the code that Copilot supplies is correct, and they need to know how to test it. The prompts themselves are really a sort of pseudo-code; even if the programmers don’t need to remember details of the language’s syntax or the names of library functions, they still need to think like programmers. But it’s obvious where this is trending. We need to ask ourselves how much “technique” we will ask of future programmers: in the 2030s or 2040s, will people just be able to tell some future Copilot what they want a program to be? More to the point, what sort of higher-order knowledge will future programmers need? Will they be able to focus more on the nature of what they want to accomplish, and less on the syntactic details of writing code?

It’s easy to imagine a lot of software professionals saying, “Of course you’ll have to know C. Or Java. Or Python. Or Scala.” But I don’t know if that’s true. We’ve been here before. In the 1950s, computers were programmed in machine language. (And before that, with cables and plugs.) It’s hard to imagine now, but the introduction of the first programming languages–Fortran, COBOL, and the like–was met with resistance from programmers who thought you needed to understand the machine. Now almost no one works in machine language or assembler. Machine language is reserved for a few people who need to work on some specialized areas of operating system internals, or who need to write some kinds of embedded systems code.

What would be necessary for another transformation? Tools like Copilot, useful as they may be, are nowhere near ready to take over. What capabilities will they need? At this point, programmers still have to decide whether or not code generated by Copilot is correct. We don’t (generally) have to decide whether the output of a C or Java compiler is correct, nor do we have to worry about whether, given the same source code, the compiler will generate identical output. Copilot doesn’t make that guarantee–and, even if it did, any change to the model (for example, to incorporate new StackOverflow questions or GitHub repositories) would be very likely to change its output. While we can certainly imagine compiling a program from a series of Copilot prompts, I can’t imagine a program that would be likely to stop working if it was recompiled without changes to the source code. Perhaps the only exception would be a library that could be developed once, then tested, verified, and used without modification–but the development process would have to re-start from ground zero whenever a bug or a security vulnerability was found. That wouldn’t be acceptable; we’ve never written programs that don’t have bugs, or that never need new features. A key principle behind much modern software development is minimizing the amount of code that has to change to fix bugs or add features.

It’s easy to think that programming is all about creating new code. It isn’t; one thing that every professional learns quickly is that most of the work goes into maintaining old code. A new generation of programming tools must take that into account, or we’ll be left in a weird situation where a tool like Copilot can be used to write new code, but programmers will still have to understand that code in detail because it can only be maintained by hand. (It is possible–even likely–that we will have AI-based tools that help programmers research software supply chains, discover vulnerabilities, and possibly even suggest fixes.) Writing about AI-generated art, Raphaël Millière says, “No prompt will produce the exact same result twice”; that may be desirable for artwork, but is destructive for programming. Stability and consistency is a requirement for next-generation programming tools; we can’t take a step backwards.

The need for greater stability might drive tools like Copilot from free-form English language prompts to some kind of more formal language. A book about prompt engineering for DALL-E already exists; in a way, that’s trying to reverse-engineer a formal language for generating images. A formal language for prompts is a move back in the direction of traditional programming, though possibly with a difference. Current programming languages are all about describing, step by step, what you want the computer to do in great detail. Over the years, we’ve gradually progressed to higher levels of abstraction. Could building a language model into a compiler facilitate the creation of a simpler language, one in which programmers just described what they wanted to do, and let the machine worry about the implementation, while providing guarantees of stability? Remember that it was possible to build applications with graphical interfaces, and for those applications to communicate about the Internet, before the Web. The Web (and, specifically, HTML) added a new formal language that encapsulated tasks that used to require programming.

Now let’s move up a level or two: from lines of code to functions, modules, libraries, and systems. Everyone I know who has worked with Copilot has said that, while you don’t need to remember the details of the programming libraries you’re using, you have to be even more aware of what you’re trying to accomplish. You have to know what you want to do; you have to have a design in mind. Copilot is good at low-level coding; does a programmer need to be in touch with the craft of low-level coding to think about the high-level design? Up until now that’s certainly been true, but largely out of necessity: you wouldn’t let someone design a large system who hasn’t built smaller systems. It is true (as Dave Thomas and Andy Hunt argued in The Pragmatic Programmer) that knowing different programming languages gives you different tools and approaches for solving problems.  Is the craft of software architecture different from the craft of programming?

We don’t really have a good language for describing software design. Attempts like UML have been partially successful at best. UML was both over- and under-specified, too precise and not precise enough; tools that generated source code scaffolding from UML diagrams exist, but aren’t commonly used these days. The scaffolding defined interfaces, classes, and methods that could then be implemented by programmers. While automatically generating the structure of a system sounds like a good idea, in practice it may have made things more difficult: if the high-level specification changed, so did the scaffolding, obsoleting any work that had been put into implementing with the scaffold. This is similar to the compiler’s stability problem, modulated into a different key. Is this an area where AI could help?

I suspect we still don’t want source code scaffolding, at least as UML envisioned it; that’s bound to change with any significant change in the system’s description. Stability will continue to be a problem. But it might be valuable to have a AI-based design tool that can take a verbal description of a system’s requirements, then generate some kind of design based on a large library of software systems–like Copilot, but at a higher level. Then the problem would be integrating that design with implementations of the design, some of which could be created (or at least suggested) by a system like Copilot. The problem we’re facing is that software development takes place on two levels: high level design and mid-level programming. Integrating the two is a hard problem that hasn’t been solved convincingly.  Can we imagine taking a high-level design, adding our descriptions to it, and going directly from the high-level design with mid-level details to an executable program? That programming environment would need the ability to partition a large project into smaller pieces, so teams of programmers could collaborate. It would need to allow changes to the high-level descriptions, without disrupting work on the objects and methods that implement those descriptions. It would need to be integrated with a version control system that is effective for the English-language descriptions as it is for lines of code. This wouldn’t be thinkable without guarantees of stability.

It was fashionable for a while to talk about programming as “craft.”  I think that fashion has waned, probably for the better; “code as craft” has always seemed a bit precious to me. But the idea of “craft” is still useful: it is important for us to think about how the craft may change, and how fundamental those changes can’t be. It’s clear that we are a long way from a world where only a few specialists need to know languages like C or Java or Python. But it’s also possible that developments like Copilot give us a glimpse of what the next step might be. Lamenting the state of programing tools, which haven’t changed much since the 1960s, Alan Kay wrote on Quora that “the next significant threshold that programming must achieve is for programs and programming systems to have a much deeper understanding of both what they are trying to do, and what they are actually doing.” A new craft of programming that is focused less on syntactic details, and more on understanding what the systems we are building are trying to accomplish, is the goal we should be aiming for.

Categories: Technology

Scaling False Peaks

O'Reilly Radar - Thu, 2022/08/04 - 04:12

Humans are notoriously poor at judging distances. There’s a tendency to underestimate, whether it’s the distance along a straight road with a clear run to the horizon or the distance across a valley. When ascending toward a summit, estimation is further confounded by false summits. What you thought was your goal and end point turns out to be a lower peak or simply a contour that, from lower down, looked like a peak. You thought you made it–or were at least close–but there’s still a long way to go.

The story of AI is a story of punctuated progress, but it is also the story of (many) false summits.

In the 1950s, machine translation of Russian into English was considered to be no more complex than dictionary lookups and templated phrases. Natural language processing has come a very long way since then, having burnt through a good few paradigms to get to something we can use on a daily basis. In the 1960s, Marvin Minsky and Seymour Papert proposed the Summer Vision Project for undergraduates: connect a TV camera to a computer and identify objects in the field of view. Computer vision is now something that is commodified for specific tasks, but it continues to be a work in progress and, worldwide, has taken more than a few summers (and AI winters) and many more than a few undergrads.

We can find many more examples across many more decades that reflect naiveté and optimism and–if we are honest–no small amount of ignorance and hubris. The two general lessons to be learned here are not that machine translation involves more than lookups and that computer vision involves more than edge detection, but that when we are confronted by complex problems in unfamiliar domains, we should be cautious of anything that looks simple at first sight, and that when we have successful solutions to a specific sliver of a complex domain, we should not assume those solutions are generalizable. This kind of humility is likely to deliver more meaningful progress and a more measured understanding of such progress. It is also likely to reduce the number of pundits in the future who mock past predictions and ambitions, along with the recurring irony of machine-learning experts who seem unable to learn from the past trends in their own field.

All of which brings us to DeepMind’s Gato and the claim that the summit of artificial general intelligence (AGI) is within reach. The hard work has been done and reaching AGI is now a simple matter of scaling. At best, this is a false summit on the right path; at worst, it’s a local maximum far from AGI, which lies along a very different route in a different range of architectures and thinking.

DeepMind’s Gato is an AI model that can be taught to carry out many different kinds of tasks based on a single transformer neural network. The 604 tasks Gato was trained on vary from playing Atari video games to chat, from navigating simulated 3D environments to following instructions, from captioning images to real-time, real-world robotics. The achievement of note is that it’s underpinned by a single model trained across all tasks rather than different models for different tasks and modalities. Learning how to ace Space Invaders does not interfere with or displace the ability to carry out a chat conversation.

Gato was intended to “test the hypothesis that training an agent which is generally capable on a large number of tasks is possible; and that this general agent can be adapted with little extra data to succeed at an even larger number of tasks.” In this, it succeeded. But how far can this success be generalized in terms of loftier ambitions? The tweet that provoked a wave of responses (this one included) came from DeepMind’s research director, Nando de Freitas: “It’s all about scale now! The game is over!”

The game in question is the quest for AGI, which is closer to what science fiction and the general public think of as AI than the narrower but applied, task-oriented, statistical approaches that constitute commercial machine learning (ML) in practice.

The claim is that AGI is now simply a matter of improving performance, both in hardware and software, and making models bigger, using more data and more kinds of data across more modes. Sure, there’s research work to be done, but now it’s all about turning the dials up to 11 and beyond and, voilà, we’ll have scaled the north face of the AGI to plant a flag on the summit.

It’s easy to get breathless at altitude.

When we look at other systems and scales, it’s easy to be drawn to superficial similarities in the small and project them into the large. For example, if we look at water swirling down a plughole and then out into the cosmos at spiral galaxies, we see a similar structure. But these spirals are more closely bound in our desire to see connection than they are in physics. In looking at scaling specific AI to AGI, it’s easy to focus on tasks as the basic unit of intelligence and ability. What we know of intelligence and learning systems in nature, however, suggests the relationships between tasks, intelligence, systems, and adaptation is more complex and more subtle. Simply scaling up one dimension of ability may simply scale up one dimension of ability without triggering emergent generalization.

If we look closely at software, society, physics or life, we see that scaling is usually accompanied by fundamental shifts in organizing principle and process. Each scaling of an existing approach is successful up to a point, beyond which a different approach is needed. You can run a small business using office tools, such as spreadsheets, and a social media page. Reaching Amazon-scale is not a matter of bigger spreadsheets and more pages. Large systems have radically different architectures and properties to either the smaller systems they are built from or the simpler systems that came before them.

It may be that artificial general intelligence is a far more significant challenge than taking task-based models and increasing data, speed, and number of tasks. We typically underappreciate how complex such systems are. We divide and simplify, make progress as a result, only to discover, as we push on, that the simplification was just that; a new model, paradigm, architecture, or schedule is needed to make further progress. Rinse and repeat. Put another way, just because you got to basecamp, what makes you think you can make the summit using the same approach? And what if you can’t see the summit? If you don’t know what you’re aiming for, it’s difficult to plot a course to it.

Instead of assuming the answer, we need to ask: How do we define AGI? Is AGI simply task-based AI for N tasks and a sufficiently large value of N? And, even if the answer to that question is yes, is the path to AGI necessarily task-centric? How much of AGI is performance? How much of AGI is big/bigger/biggest data?

When we look at life and existing learning systems, we learn that scale matters, but not in the sense suggested by a simple multiplier. It may well be that the trick to cracking AGI is to be found in scaling–but down rather than up.

Doing more with less looks to be more important than doing more with more. For example, the GPT-3 language model is based on a network of 175 billion parameters. The first version of DALL-E, the prompt-based image generator, used a 12-billion parameter version of GPT-3; the second, improved version used only 3.5 billion parameters. And then there’s Gato, which achieves its multitask, multimodal abilities with only 1.2 billion.

These reductions hint at the direction, but it’s not clear that Gato’s, GPT-3’s or any other contemporary architecture is necessarily the right vehicle to reach the destination. For example, how many training examples does it take to learn something? For biological systems, the answer is, in general, not many; for machine learning, the answer is, in general, very many. GPT-3, for example, developed its language model based on 45TB of text. Over a lifetime, a human reads and hears of the order of a billion words; a child is exposed to ten million or so before starting to talk. Mosquitoes can learn to avoid a particular pesticide after a single non-lethal exposure. When you learn a new game–whether video, sport, board or card–you generally only need to be told the rules and then play, perhaps with a game or two for practice and rule clarification, to make a reasonable go of it. Mastery, of course, takes far more practice and dedication, but general intelligence is not about mastery.

And when we look at the hardware and its needs, consider that while the brain is one of the most power-hungry organs of the human body, it still has a modest power consumption of around 12 watts. Over a life the brain will consume up to 10 MWh; training the GPT-3 language model took an estimated 1 GWh.

When we talk about scaling, the game is only just beginning.

While hardware and data matter, the architectures and processes that support general intelligence may be necessarily quite different to the architectures and processes that underpin current ML systems. Throwing faster hardware and all the world’s data at the problem is likely to see diminishing returns, although that may well let us scale a false summit from which we can see the real one.

Categories: Technology

The Metaverse Is Not a Place

O'Reilly Radar - Tue, 2022/08/02 - 11:38

The metaphors we use to describe new technology constrain how we think about it, and, like an out-of-date map, often lead us astray. So it is with the metaverse. Some people seem to think of it as a kind of real estate, complete with land grabs and the attempt to bring traffic to whatever bit of virtual property they’ve created.

Seen through the lens of the real estate metaphor, the metaverse becomes a natural successor not just to Second Life but to the World Wide Web and to social media feeds, which can be thought of as a set of places (sites) to visit. Virtual Reality headsets will make these places more immersive, we imagine.

But what if, instead of thinking of the metaverse as a set of interconnected virtual places, we think of it as a communications medium? Using this metaphor, we see the metaverse as a continuation of a line that passes through messaging and email to “rendezvous”-type social apps like Zoom, Google Meet, Microsoft Teams, and, for wide broadcast, Twitch + Discord. This is a progression from text to images to video, and from store-and-forward networks to real time (and, for broadcast, “stored time,” which is a useful way of thinking about recorded video), but in each case, the interactions are not place based but happening in the ether between two or more connected people. The occasion is more the point than the place.

In an interview with Lex Fridman, Mark Zuckerberg disclaimed the notion of the metaverse as a place, but in the same sentence described its future in a very place-based way:

A lot of people think that the Metaverse is about a place, but one definition of this is it’s about a time when basically immersive digital worlds become the primary way that we live our lives and spend our time.

Think how much more plausible this statement might be if it read:

A lot of people think that the Metaverse is about a place, but one definition of this is it’s about a time when immersive digital worlds become the primary way that we communicate and share digital experiences.

My personal metaverse prototype moment does not involve VR at all, but Zoom. My wife Jen and I join our friend Sabrina over Zoom each weekday morning to exercise together. Sabrina leads the sessions by sharing her Peloton app, which includes live and recorded exercise videos. Our favorites are the strength training videos with Rad Lopez and the 15-minute abs videos with Robin Arzón. We usually start with Rad and end with Robin, for a vigorous 45-minute workout.

Think about this for a moment: Jen and I are in our home. Sabrina is in hers. Rad and Robin recorded their video tracks from their studios on the other side of the county. Jen and Sabrina and I are there in real time. Rad and Robin are there in stored time. We have joined five people in four different places and three different times into one connected moment and one connected place, “the place between” the participants.

Sabrina also works out on her own on her Peloton bike, and that too has this shared quality, with multiple participants at various “thicknesses” of connection. While Jen and Sabrina and I are “enhancing” the sharing using real-time Zoom video, Sabrina’s “solo” bike workouts use the intrinsic sharing in the Peloton app, which lets participants see real-time stats from others doing the same ride.

This is the true internet—the network of networks, with dynamic interconnections. If the metaverse is to inherit that mantle, it has to have that same quality. Connection.

Hacker News user kibwen put it beautifully when they wrote:

A metaverse involves some kind of shared space and shared experience across a networked medium. Not only is it more than just doing things in VR, a metaverse doesn’t even require VR.

The metaverse as a vector

It’s useful to look at technology trends (lines of technology progression toward the future, and inheritance from the past) as vectors—quantities that can only be fully described by both a magnitude and a direction and that can be summed or multiplied to get a sense of how they might cancel, amplify, or redirect possible pathways to the future.

I wrote about this idea back in 2020, in a piece called “Welcome to the 21st Century,” in the context of using scenario planning to imagine the post-COVID future. It’s worth recapping here:

Once you’ve let loose your imagination, observe the world around you and watch for what scenario planners sometimes call “news from the future”—data points that tell you that the world is trending in the direction of one or another of your imagined scenarios. As with any scatter plot, data points are all over the map, but when you gather enough of them, you can start to see the trend line emerge.…

If you think of trends as vectors, new data points can be seen as extending and thickening the trend lines and showing whether they are accelerating or decelerating. And as you see how trend lines affect each other, or that new ones need to be added, you can continually update your scenarios (or as those familiar with Bayesian statistics might put it, you can revise your priors). This can be a relatively unconscious process. Once you’ve built mental models of the world as it might be, the news that you read will slot into place and either reinforce or dismantle your imagined future.

Here’s how my thinking about the metaverse was formed by “news from the future” accreting around a technology-development vector:

  1. I had a prior belief, going back decades, that the internet is a tool for connection and communication, and that advances along that vector will be important. I’m always looking with soft focus for evidence that the tools for connection and communication are getting richer, trying to understand how they are getting richer and how they are changing society. 
  2. I’ve been looking at VR for years, trying various headsets and experiences, but they are mostly solo and feel more like stand-alone games or if shared, awkward and cartoonish. Then I read a thoughtful piece by my friend Craig Mod in which he noted that while he lives his physical life in a small town in Japan or walking its ancient footpaths, he also has a work life in which he spends time daily with people all over the world. I believe he made the explicit connection to the metaverse, but neither he nor I can find the piece that planted this thought to confirm that. In any case, I think of Craig’s newsletter as where the notion that the metaverse is a continuation of the communications technologies of the internet took hold for me.
  3. I began to see the connection to Zoom when friends started using interesting backgrounds, some of which make them appear other than where they are and others that make clear just where they are. (For example, my friend Hermann uses as a background the beach behind his home in New Zealand, which is more vividly place based than his home office, which could be anywhere.) That then brought my exercise sessions with Sabrina and Jen into focus as part of this evolving story.
  4. I talked to Phil Libin about his brilliant service mmhmm, which makes it easy to create and deliver richer, more interactive presentations over Zoom and similar apps. The speaker literally gets to occupy the space of the presentation. Phil’s presentation on “The Out of Office World” was where it all clicked. He talks about the hierarchy of communication and the tools for modulating it. (IMO this is a must-watch piece for anyone thinking about the future of internet apps. I’m surprised how few people seem to have watched it.)
  1. Trying Supernatural using the Meta Quest 2 headset completed the connection between my experience using Zoom and Peloton for fitness with friends and the VR-dominant framing of the metaverse. Here I was, standing on the edge of one of the lava lakes at Erta Ale in Ethiopia, an astonishing volcano right out of central casting for Mount Doom in The Lord of the Rings, working through warm-up exercises with a video of a fitness instructor green-screened into the scene, before launching into a boxing training game. Coach Susie was present in stored time, just like Robin and Rad. All that was missing was Jen and Sabrina. I’m sure that such shared experiences in remarkable places are very much part of the VR future.

That kind of shared experience is central to Mark Zuckerberg’s vision of socializing in the metaverse.

In that video, Zuck shows off lavishly decorated personal spaces, photorealistic and cartoon avatars, and an online meeting interrupted by a live video call. He says:

It’s a ways off but you can start to see some of the fundamental building blocks take shape. First the feeling of presence. This is the defining quality of the metaverse. You’re going to really feel like you’re there with other people. You’ll see their facial expressions, you’ll see their body language, maybe figure out if they’re actually holding a winning hand—all the subtle ways that we communicate that today’s technology can’t quite deliver.

I totally buy the idea that presence is central. But Meta’s vision seems to miss the mark in its focus on avatars. Embedded video delivers more of that feeling of presence with far less effort on the part of the user than learning to create avatars that mimic our gestures and expressions.

Chris Milk, the CEO of Within, the company that created Supernatural, both agreed and disagreed about avatars when explaining the company’s origin story to me in a phone conversation a few months ago:

What we learned early on was that photorealism matters a lot in terms of establishing presence and human connection. Humans, captured using photorealistic methods like immersive video, allow for a deeper connection between the audience and the people recorded in the immersive VR experience. The audience feels present in the story with them. But it’s super hard to do from a technical standpoint and you give up a bunch of other things. The trade-off is that you can have photorealism but sacrifice interactivity, as the photorealistic humans need to be prerecorded. Alternatively, you can have lots of interactivity and human-to-human communication, but you give up on anyone looking real. In the latter, the humans need to be real-time-rendered avatars, and those, for the moment, don’t look remotely like real humans.

At the same time, Milk pointed out that humans are able to read a lot into even crude avatars, especially when they’re accompanied by real-time communication using voice.

Especially if it’s someone you already know, then the human connection can overcome a lot of missing visual realism. We did an experiment back in 2014 or 2015, probably. Aaron [Koblin, the cofounder of Within] was living in San Francisco, and I was in Los Angeles. We had built a VR prototype where we each had a block for the head and two blocks for our hands. I got into my headset in LA, and Aaron’s blocks were sitting over on the floor across from me as his headset and hand controllers were sitting on his floor in San Francisco. All of a sudden the three blocks jumped up off the ground into the air as he picked up his headset and put it on. The levitating cubes “walked” up to me, waved, and said, “Hey.” Immediately, before I even heard the voice, I recognized the person in those blocks as Aaron. I recognized through the posture and gait the spirit of Aaron in these three cubes moving through space. The resolution, or any shred of photorealism, was completely absent, but the humanity still showed through. And when his voice came out of them, my brain just totally accepted that the soul of Aaron now resides in these three floating cubes. Nothing was awkward about communicating back and forth. My brain just accepted it instantly.

And that’s where we get back to vectors. Understanding the future of photorealism in the metaverse depends on the speed and direction of progress in AI. In many ways, a photorealistic avatar is a kind of deepfake, and we know how computationally expensive their creation is today. How long will it be before the creation of deepfakes is cheap enough and fast enough that hundreds of millions of people can be creating and using them in real time? I suspect it will be a while.

Mmhmm’s blending of video and virtual works really well, using today’s technology. It’s ironic that in Meta’s video about the future, video is only shown on a screen in the virtual space rather than as an integral part of it. Meta could learn a lot from mmhmm.

On the other hand, creating a vast library of immersive 3D still images of amazing places into which either avatars or green-screened video images can be inserted seems much closer to realization. It’s still hard, but the problem is orders of magnitude smaller. The virtual spaces offered by Supernatural and other VR developers give an amazing taste of what’s possible here.

In this regard, an interesting sidenote came from a virtual session that we held earlier this year at the Social Science Foo Camp (an event put together annually by O’Reilly, Meta, and Sage) using the ENGAGE virtual media conferencing app. The group began their discussion in one of the default meeting spaces, but one of the attendees, Adam Flaherty, proposed that they have it in a more appropriate place. They moved to a beautifully rendered version of Oxford’s Bodleian Library, and attendees reported that the entire tenor of the conversation changed.

Two other areas worth thinking about:

  1. Social media evolved from a platform for real-time interaction (real-time status updates, forums, conversations, and groups) to one that’s often dominated by stored-time interaction (posts, stories, reels, et al). Innovation in formats for stored-time communications is at the heart of future social media competition, as TikTok has so forcefully reminded Facebook. There’s a real opportunity for developers and influencers to pioneer new formats as the metaverse unfolds.
  2. Bots are likely to play a big role in the metaverse, just as they do in today’s gaming environments. Will we be able to distinguish bots from humans? Chris Hecker’s indie game SpyParty, prototyped in 2009, made this a central feature of its game play, requiring two human players (one spy and one sniper) to find or evade each other among a party crowded with bots (what game developers call non-player characters or NPCs). Bots and deepfakes are already transforming our social experiences on the internet; expect this to happen on steroids in the metaverse. Some bots will be helpful, but others will be malevolent and disruptive. We will need to tell the difference.
The need for interoperability

There’s one thing that a focus on communications as the heart of the metaverse story reminds us: communication, above all, depends on interoperability. A balkanized metaverse in which a few big providers engage in a winner-takes-all competition to create the Meta- or Apple- or whatever-owned metaverse will take far longer to develop than one that allows developers to create great environments and experiences and connect them bit by bit with the innovations of others. It would be far better if the metaverse were an extension of the internet (“the network of networks”) rather than an attempt to replace it with a walled garden.

Some things that it would be great to have be interoperable:

  • Identity. We should be able to use the digital assets that represent who we are across platforms, apps, and places offered by different companies.
  • Sensors. Smartwatches, rings, and so forth are increasingly being used to collect physiological signals. This technology can be built into VR-specific headsets, but we would do better if it were easily shared between devices from different providers.
  • Places. (Yes, places are part of this after all.) Rather than having a single provider (say Meta) become the ur-repository of photorealistic 360-degree immersive spaces, it would be great to have an interoperability layer that allows their reuse.
  • Bot identification. Might NFTs end up becoming the basis for a nonrepudiable form of identity that must be produced by both humans and bots? (I suspect we can only force bots to identify themselves as such if we also require humans to do so.)
Foundations of the metaverse

You can continue this exercise by thinking about the metaverse as the combination of multiple technology trend vectors progressing at different speeds and coming from different directions, and pushing the overall vector forward (or backward) accordingly. No new technology is the product of a single vector.

So rather than settling on just “the metaverse is a communications medium,” think about the various technology vectors besides real-time communications that are coming together in the current moment. What news from the future might we be looking for?

  • Virtual Reality/Augmented Reality. Lighter and less obtrusive headsets. Advances in 3D video recording. Advances in sensors, including eye-tracking, expression recognition, physiological monitoring, even brain-control interfaces. Entrepreneurial innovations in the balance between AR and VR. (Why do we think of them as mutually exclusive rather than on a continuum?)
  • Social media. Innovations in connections between influencers and fans. How does stored time become more real time?
  • Gaming. Richer integration between games and communications. What’s the next Twitch + Discord?
  • AI. Not just deepfakes but the proliferation of AIs and bots as participants in social media and other communications. NPCs becoming a routine part of our online experience outside of gaming. Standards for identification of bots versus humans in online communities.
  • Cryptocurrencies and “Web3.” Does crypto/Web3 provide new business models for the metaverse? (BTW, I enjoyed the way that Neal Stephenson, in Reamde, had his character design the business model and money flows for his online game before he designed anything else. Many startups just try to get users and assume the business model will follow, but that has led us down the dead end of advertising and surveillance capitalism.)
  • Identity. Most of today’s identity systems are centralized in one way or another, with identity supplied by a trusted provider or verifier. Web3 proponents, however, are exploring a variety of systems for decentralized “self-sovereign identity,” including Vitalik Buterin’s “soulbound tokens.” The vulnerability of crypto systems to Sybil attacks in the absence of verifiable identity is driving a lot of innovation in the identity space. Molly White’s skeptical survey of these various initiatives is a great overview of the problem and the difficulties in overcoming it. Gordon Brander’s “Soulbinding Like A State,” a riff on Molly White’s post and James C. Scott’s Seeing Like A State, provides a further warning: “Scott’s framework reveals…that the dangers of legibility are not related to the sovereignty of an ID. There are many reasons self-sovereignty is valuable, but the function of a self-sovereign identity is still to make the bearer legible. What’s measured gets managed. What’s legible gets controlled.” As is often the case, no perfect solution will be found, but society will adopt an imperfect solution by making trade-offs that are odious to some, very profitable to others, and that the great mass of users will passively accept.

There’s a lot more we ought to be watching. I’d love your thoughts in the comments.

Categories: Technology

Radar Trends to Watch: August 2022

O'Reilly Radar - Tue, 2022/08/02 - 04:18

The large model train keeps rolling on. This month, we’ve seen the release of Bloom, an open, large language model developed by the BigScience collaboration, the first public access to DALL-E (along with a guide to prompt engineering), a Copilot-like model for generating regular expressions from English-language prompts, and Simon Willison’s experiments using GPT-3 to explain JavaScript code.

On other fronts, NIST has released the first proposed standard for post-quantum cryptography (i.e., cryptography that can’t be broken by quantum computers). CRISPR has been used in human trials to re-engineer a patient’s DNA to reduce cholesterol. And a surprising number of cities are paying high tech remote workers to move there.

Artificial Intelligence
  • Regardless of where a company is based, to avoid legal problems later, it’s a good idea to build AI and other data-based systems that observe the EU’s data laws.
  • Public (beta) access to DALL-E is beginning! It might take a while to get in because there are over a million on the waitlist. Accepted users get 50 free credits the first month, 15/month thereafter; a credit allows you to give one prompt, which returns 4 images. Users can buy additional credits.
  • Researchers have used reinforcement learning to build a robotic dog that learns to walk on its own in the real world (i.e., without prior training and use of a simulator).
  • Princeton held a workshop on the reproducibility crisis that the use of machine learning is causing in science. Evaluating the accuracy of results from machine learning is a problem that most scientific disciplines aren’t yet equipped to deal with.
  • Microsoft has revised its Responsible AI standard, making recommendations more concrete, particularly in the areas of accountability, transparency, fairness, safety, privacy, and inclusiveness. Microsoft also provides tools and resources to help developers build responsible AI systems.
  • The Dallery Gallery has published a Prompt Engineering Guide to DALL-E. (DALL-E is maintaining a waitlist for free trial accounts.)
  • Simon Willison has successfully used GPT-3 to explain how code works. It is amazingly good and, as Simon pointed out, works both on code that he understands, and code that he doesn’t.
  • Bloom, the open and transparent large language model developed by the BigScience group, is finished!  You can try it out, download it, and read its specifications. Unlike all other large language models, Bloom was developed in public, and is open to the public.
  • Radiologists outperform AI systems operating by themselves at detecting breast cancer from mammograms. However, a system designed to collaborate with radiologists in making decisions is better than either radiologists or AI alone. (The big question is whether these results hold up when taken to other hospitals.)
  • You liked Copilot? Try Autoregex: GPT-3 to generate regular expressions from natural language descriptions.
  • No Language Left Behind (NLLB) is a Meta AI project that translates text directly between any pair of over 200 languages. Benchmarks, training code, and models are all open source.
  • Democratic AI is an experiment in human-in-the-loop design that enables an AI system to design a social mechanism with human collaboration.
  • The Allen Institute, Microsoft, and others have developed a tool to measure the energy use and emissions generated by training AI models on Azure. They have found that emissions can be reduced substantially by training during periods when renewable power is at its peak.
  • Minerva is a large language model that Google has trained to solve quantitative reasoning (i.e., mathematics) problems, generating simple proofs in addition to answers. The problem domain extends through pre-calculus, including algebra and geometry, roughly at a high school level. Minerva has also been trained and tested in chemistry and physics.
  • Perhaps the scariest exploit in security would be a rootkit that cannot be detected or removed, even by wiping the disk and reinstalling the operating system. Such rootkits were recently discovered (one is named CosmicStrand); they have apparently been in the wild since 2016.
  • AWS is offering some customers a free multi factor authentication (MFA) security key.
  • Lost passwords are an important attack vector for industrial systems. A system is installed; the default password is changed; the person who changed the password leaves; the password is lost; the company installs password recovery software, which is often malware-infested, to recover the password.
  • A new technique for browser de-anonymization is based on correlating users’ activities on different websites.
  • Ransomware companies are now using search engines to allow their users to search the data they have stolen.
  • Ransomware doesn’t get as much attention in the news as it did last year, but in the past week one ransomware operation has shut down and released its decryptors, and two new ones (RedAlert and omega) have started.
  • Apple has added “lockdown mode” to iOS.  Lockdown mode provides an extreme degree of privacy; it is intended for people who believe they are being targeted by state-sponsored mercenary spyware.
  • The Open Source Security Mobilization Plan is an initiative that aims to address major areas of open source security, including education, risk assessment, digital signatures, memory safety, incident response, and software supply chain management.
  • Mitre has released their annual list of the 25 most dangerous software weaknesses (bugs, flaws, vulnerabilities).
  • Patches for the Log4J vulnerability were released back in February, 2022, but many organizations have not applied them, and remain vulnerable to attack.
  • Microsoft and Oracle have announced Oracle Data Service, which allows applications running on Azure to manage and use data in Oracle’s cloud. It’s a multicloud strategy that’s enabled by the cloud providers.
  • Google has announced a new programming language, Carbon, that is intended to be the successor to C++. One goal is complete interoperability between Carbon and existing C++ code and libraries.
  • How to save money on AWS Lambda: watch your memory!  Don’t over-allocate memory. This probably only applies to a few of your functions, but those functions are what drive the cost up.
  • SocialCyber is a DARPA program to understand the internals of open source software, along with the communities that create the software. They plan to use machine learning heavily, both to understand the code and to map and analyze communications within the communities. They are concerned about potential vulnerabilities in the software that the US military depends on.
  • WebAssembly in the cloud? Maybe it isn’t just a client-side technology. As language support grows, so do the kinds of applications Wasm can support.
  • A surveyreports that 62% of its respondents were only “somewhat confident” that open source software was “secure, up-to-date, and well-maintained.”  Disappointing as this may be, it’s actually an improvement over prior results.
  • Is low-code infrastructure as code the future of cloud operations?
  • Tiny Core Linux is amazingly small: a 22MB download, and runs in 48MB of RAM. As a consequence, it’s also amazingly fast. With a few exceptions, making things small has not been a trend over the past few years. We hope to see more of this.
  • Yet another JavaScript web framework? Fresh does server-side rendering, and is based on Deno rather than NodeJS.
  • Facebook is considering whether to rescind its bans on health misinformation. The pandemic is over, after all. Except that it isn’t. However, being a conduit for health misinformation is clearly profitable.
  • Priority Hints are a way for web developers to tell the browser which parts of the page are most important, so that they can be rendered quickly. They are currently supported by the Chrome and Edge browsers.
  • Hotwire, HTMX, and Unpoly are frameworks for building complex web applications while minimizing the need for complex Javascript. Are they an alternative to heavyweight JavaScript frameworks like React? Could a return to server-side web applications lead to a resurgence of platforms like Ruby on Rails?
  • Facebook has started encrypting the portions of URLs that are used to track users, preventing the Firefox and Brave browsers from stripping the tracking portion of the URL.
  • A priori censorship?  A popular cloud-based word processor in China has been observed censoring content upon the creation of a link for sharing the content. The document is locked; it cannot be edited or even opened by the author.
  • The Pirate Library Mirror is exactly what it says: a mirror of libraries of pirated books. It is focused on the preservation of human knowledge. There is no search engine, and it is only accessible by using BitTorrent over TOR.
  • Minecraft has decided that they will not “support or allow” the integration of NFTs into their virtual worlds. They object to “digital ownership based on scarcity and exclusion.”
  • Mixers are cryptocurrency services that randomize the currency you use; rather than pay with your own coin, you deposit money in a mixer and pay with randomly selected coins from other users. It’s similar to a traditional bank in that you never withdraw the same money you deposited.
  • So much for privacy. Coinbase, one of the largest cryptocurrency exchanges, sells geolocation data to ICE (the US Immigration and Customs Enforcement agency).
Quantum Computing
  • Quantum computers aren’t limited to binary: That limit is imposed by analogy to traditional computers, but some quantum computers have access to more state, and taking advantage of those states may make applications like simulating physical or biological systems easier.
  • Is quantum-aided computing for some industrial applications just around the corner? IonQ and GE have announced a results from a hybrid system for risk management. The quantum computer does random sampling from probability distributions, which are computationally expensive for classical computers; the rest of the computation is classical.
  • Quantum networking is becoming real: researchers have created entangled qubits via a 33-mile fiber optic connection. In addition to their importance for secure communications, quantum networks may be a crucial step in building quantum computers at scale.
  • NIST has announced four candidate algorithms for post-quantum cryptography. While it may be years before quantum computing can break current algorithms, many organizations are anxious to start the transition from current algorithms.
  • Not long ago (2020), DeepMind released AlphaFold, which used AI to solve protein folding problems. In 2021, they announced a public database containing the structure of a million proteins. With their latest additions, that database now contains the structure of over 200 million proteins, almost every protein known to science.
  • A motor made of DNA!  This nanoscale motor uses ideas from origami to fold DNA in a way that causes it to rotate when an electrical field is applied.
  • An electrode has been implanted into the brain of an ALS patient that will allow them to communicate thoughts via computer. The patient has otherwise lost the ability to move or speak.
  • Genetic editing with CRISPR was tested in a human to permanently lower LDL (“bad cholesterol”) levels. If this works, it could make heart attacks much rarer, and could be the first widespread use of CRISPR in humans.
Energy Work
  • Some cities (largely in the US South and Midwest) are giving cash bonuses to tech workers who are willing to move there and work remotely.
  • The FBI is warning employers that they are seeing an increasing number of fraudulent applications for remote work in which the application uses stolen personal information and deepfake imagery.
Categories: Technology
Subscribe to LuftHans aggregator