You are here

Feed aggregator

Checking Jeff Bezos’s Math

O'Reilly Radar - Fri, 2021/04/23 - 13:43

“If you want to be successful in business (in life, actually), you have to create more than you consume. Your goal should be to create value for everyone you interact with. Any business that doesn’t create value for those it touches, even if it appears successful on the surface, isn’t long for this world. It’s on the way out.” So wrote Jeff Bezos in his final letter to shareholders, released last week. It’s a great sentiment, one I heartily agree with and wish that more companies embraced. But how well does he practice what he preaches? And why is practicing this so hard by the rules of today’s economy?

Jeff started out by acknowledging the wealth that Amazon has created for shareholders—$1.6 trillion is the number he cites in the second paragraph. That’s Amazon’s current market capitalization. Jeff himself now owns only about 11% of Amazon stock, and that’s enough to make him the richest person in the world. But while his Amazon stock is worth over $160 billion, that means that over $1.4 trillion is owned by others.

“I’m proud of the wealth we’ve created for shareowners,” Jeff continued. “It’s significant, and it improves their lives. But I also know something else: it’s not the largest part of the value we’ve created.” That’s when he went on to make the statement with which I opened this essay. He went on from there to calculate the value created for employees, third-party merchants, and Amazon customers, as well as to explain the company’s Climate Pledge.

Jeff’s embrace of stakeholder capitalism is meaningful and important. Ever since Milton Friedman penned the 1970 op-ed in which he argued that “the social responsibility of business is to increase its profits,” other constituencies—workers, suppliers, society at large, and even customers—have too often been sacrificed on the altar of shareholder value. Today’s economy, rife with inequality, is the result.

While I applaud the goal of understanding “who gets what and why” (which in many ways is the central question of economics), I struggle a bit with Jeff’s math. Let’s walk through those of his assertions that deserve deeper scrutiny.

How much went to shareholders?

“Our net income in 2020 was $21.3 billion. If, instead of being a publicly traded company with thousands of owners, Amazon were a sole proprietorship with a single owner, that’s how much the owner would have earned in 2020.”

Writing in The Information, Martin Peers made what seems to be an obvious catch: “Instead of calculating value by looking at the increase in Amazon’s market cap last year—$679 billion—Bezos uses the company’s net income of $21 billion. That hides the fact that shareholders got the most value out of Amazon last year, far more than any other group.”

But while Peers has put his finger on an important point, he is wrong. The amount earned by shareholders from Amazon is indeed only the company’s $21.3 billion net income. The difference between that number and the $679 billion increase in market cap didn’t come from Amazon. It came from “the market,” that is from other people trading Amazon’s stock and placing bets on its future value. Understanding this difference is crucial because it undercuts so many facile criticisms of Jeff Bezos’s wealth, in which he is pictured as a robber baron hoarding the wealth accumulated from his company at the expense of his employees.

The fact that Jeff is the world’s richest person makes him an easy target. What we really need to come to grips with is the way that our financial system has been hijacked to make the rich richer. Low interest rates, meant to prop up business investment and hiring, have instead been diverted to driving up the price of stocks beyond reasonable bounds. Surging corporate profits have been used not to fuel hiring or building new factories or bringing new products to market, but on stock buybacks designed to artificially boost the price of stocks. The state of “the market” has become a very bad proxy for prosperity. Those lucky enough to own stocks are enjoying boom times; those who do not are left out in the cold.

Financial markets, in effect, give owners of stocks the value of future earnings and cash flow today—in Amazon’s case, about 79 years worth. But that’s nothing. Elon Musk is the world’s second-richest person because the market values Tesla at over 1,000 years of its present earnings!

The genius of this system is that it allows investors and entrepreneurs to bet on the future, bootstrapping companies like Amazon and Tesla long before they are able to demonstrate their worth. But once a company has become established, it often no longer needs money from investors. Someone who buys a share of a hugely profitable company like Apple, Amazon, Google, Facebook, or Microsoft, isn’t investing in these companies. They are simply betting on the future of its stock price, with the profits and losses coming from others around the gaming table.

In my 2017 book, WTF?: What’s the Future and Why It’s Up to Us, I wrote a chapter on this betting economy, which I called “supermoney” after the brilliant 1972 book with that title by finance writer George Goodman (alias Adam Smith.) Stock prices are not the only form of supermoney. Real estate is another. Both are rife with what economists call “rents”—that is, income that comes not from what you do but from what you own. And government policy seems designed to prop up the rentier class at the expense of job creation and real investment. Until we come to grips with this two-track economy, we will never tame inequality.

The fact that in the second paragraph of his letter Jeff cites Amazon’s market cap as the value created for shareholders but uses the company’s net income when comparing gains by shareholders to those received by other stakeholders is a kind of sleight of hand. Because of course corporate profits—especially the prospect of growth of corporate profits—and market capitalization are related. If Amazon gets $79 of market cap for every dollar of profit (which is what that price-earnings ratio of 79 means), then if Amazon were to raise wages for employees or give a better deal to its third-party merchants (many of them small businesses), that would lower its profits, and presumably its market cap, by an enormous ratio.

Every dollar given up to these other groups isn’t just a dollar out of the pocket of shareholders. It is many times that. This of course does provide a very powerful incentive for public companies to squeeze these other parties for every last dollar of profit, encouraging lower wages, outsourcing to eliminate benefits, and many other ills that contribute to our two-tier economy. It may not be Amazon’s motivation—Jeff has always been a long-term thinker and was able to persuade financial markets to go along for the ride even when the company’s profits were small—but it is most certainly the motivation for much of the extractive behavior by many companies today. The pressure to increase earnings and keep stock prices high is enormous.

These issues are complex and difficult. Stock prices are reflexive, as financier George Soros likes to observe. That is, they are based on what people believe about the future. Amazon’s current stock price is based on the collective belief that its profits will be even higher in future. Were people to believe instead that they would be meaningfully lower, the valuation might fall precipitously. To understand the role of expectations of future increases in earnings and cash flow, you have only to compare Amazon with Apple. Apple’s profits are three times Amazon’s and free cash flow four times, yet it is valued at only 36 times earnings and has a market capitalization less than 50% higher than Amazon. As expectations and reality converge, multiples tend to come down.

How did Amazon’s third-party sellers fare?

“[We] estimate that, in 2020, third-party seller profits from selling on Amazon were between $25 billion and $39 billion, and to be conservative here I’ll go with $25 billion.”

That sounds pretty impressive, but how much of a profit margin is it really?

Amazon doesn’t explicitly disclose the gross merchandise volume of those third-party sellers, but there is enough information in the letter and in the company’s 2020 annual report to make a back-of-the-napkin estimate. The letter says that Amazon’s third-party sales represent “close to 60%” of its online sales. If the 40% delivered by Amazon’s first-party sales come out to $197 billion, that would imply that sales in the third-party marketplace were almost $300 billion. $25 to $39 billion in profit on $300 billion works out to a profit margin between 8% and 13%.

But is Amazon calculating operating income, EBITDA, or net income? “Profit” could refer to any of the three, yet they have very different values.

Let’s generously assume that Amazon is calculating net income. In that case, small retailers and manufacturers selling on Amazon are doing quite well, since net income from US retailers’ and manufacturers’ overall operations are typically between 5 and 8%. Without knowing which profit number Amazon’s team is estimating, though, and the methodology they use to arrive at it, it is difficult to be sure whether these numbers are better or worse than what these sellers achieve through other channels.

One question that’s also worth asking is whether selling on Amazon in 2020 was more or less profitable than it was in 2019. While Amazon didn’t report a profit number for its third-party sellers in 2019, it did report how much its sellers paid for the services Amazon provided to them. In 2019, that number was about $53.8 billion; in 2020, it was $80.5 billion, which represents a 50% growth rate. Net of these fees, income to Amazon but a cost to sellers, we estimate that seller revenue grew 44%. Since fees appear to be growing faster than revenues, that would suggest that in 2020, Amazon took a larger share of the pie and sellers got less. Of course, without clearer information from Amazon, it is difficult to tell for sure.

Meanwhile, Amazon took in another $21.5 billion in “other income,” which is primarily from advertising by sellers on Amazon’s platform. That grew by 52% from 2019’s $14 billion, again suggesting that Amazon’s share of the net is growing. And unlike some forms of advertising that bring in new customers, much of Amazon’s ad business represents a zero-sum competition between merchants bidding for top position, a position that in Amazon’s earlier years was granted on the basis of factors such as price, popularity, and user ratings.

How about employees?

“In 2020, employees earned $80 billion, plus another $11 billion to include benefits and various payroll taxes, for a total of $91 billion.”

There’s no question that the $91 billion that Amazon paid out in wages and benefits in 2020 is meaningful. Some of those employees were very well compensated, others not so well, but all of them have jobs. Amazon is now one of the largest employers in the country. It is an exception to the tech industry in that it creates a large number of jobs, and not just high-end professional jobs, and that some of the jobs it creates are in locations where work is scarce.

That being said, Jeff’s description of the amount earned by employees is misleading. In every other case, he makes an effort to estimate the profit earned by a particular group. For employees, he treats the gross earnings of employees as if it were profit, writing, “If each group had an income statement representing their interactions with Amazon, the numbers above would be the ‘bottom lines’ from those income statements.”

No, Jeff, employee earnings are their top line. Just as a company has gross income before expenses, so do employees. The bottom line is what’s left over after all those expenses have been met. And for many of Amazon’s lower-paid employees—as is the case for lower-paid workers all over the modern economy—that true bottom line is negative, that is, less than they need to survive. Like workers at other giant profitable companies like Walmart and McDonald’s, a significant fraction of Amazon warehouse employees require government assistance. So, in effect, taxpayers are subsidizing Amazon, because the share of the enterprise’s profits allocated to its lowest-paid employees was not enough for them to pay their bills.

That points to a major omission from the list of Amazon’s stakeholders: society at large. How does Amazon do when it comes to paying its fair share? According to a 2019 study, Amazon was the “worst offender” among a rogues’ gallery of high-tech companies that use aggressive tax avoidance strategies. “Fair Tax Mark said this means Amazon’s effective tax rate was 12.7% over the decade when the headline tax rate in the US has been 35% for most of that period.” In 2020, Amazon made provision for taxes of $2.863 billion on pretax income of $24,178 billion, or about 11.8%. This may be legal, but it isn’t right.

Amazon is clearly moving in the right direction with employees. It introduced a $15 minimum wage in 2018, ahead of many of its peers. And given the genius of the company, the commitment to workplace safety and other initiatives to make Amazon a better employer that Jeff highlighted in his letter are likely to have a big payoff. When Amazon sets out to do something, it usually invents and learns a great deal along the way.

“We have always wanted to be Earth’s Most Customer-Centric Company,” Jeff wrote. “We won’t change that. It’s what got us here. But I am committing us to an addition. We are going to be Earth’s Best Employer and Earth’s Safest Place to Work. In my upcoming role as Executive Chair, I’m going to focus on new initiatives. I’m an inventor. It’s what I enjoy the most and what I do best. It’s where I create the most value….We have never failed when we set our minds to something, and we’re not going to fail at this either.”

I find that an extremely heartening statement. At Amazon’s current stage of development, it has the opportunity, and is beginning to make a commitment, to put its remarkable capabilities to work on new challenges.

Stakeholder value means solving multiple equations simultaneously

I was very taken with Jeff’s statement that “if any shareowners are concerned that Earth’s Best Employer and Earth’s Safest Place to Work might dilute our focus on Earth’s Most Customer-Centric Company, let me set your mind at ease. Think of it this way. If we can operate two businesses as different as consumer ecommerce and AWS, and do both at the highest level, we can certainly do the same with these two vision statements. In fact, I’m confident they will reinforce each other.”

One of my criticisms of today’s financial-market-driven economy is that by focusing on a single objective, it misses the great opportunity of today’s technology, summed up by Paul Cohen, the former DARPA program manager for AI and now a professor at the University of Pittsburgh, when he said, “The opportunity of AI is to help humans model and manage complex interacting systems.” If any company has the skills to do that, I suspect it will be Amazon. And as Jeff wrote elsewhere in his letter, “When we lead, others follow.”

Amazon is also considering environmental impact. “Not long ago, most people believed that it would be good to address climate change, but they also thought it would cost a lot and would threaten jobs, competitiveness, and economic growth. We now know better,” Jeff wrote. “Smart action on climate change will not only stop bad things from happening, it will also make our economy more efficient, help drive technological change, and reduce risks. Combined, these can lead to more and better jobs, healthier and happier children, more productive workers, and a more prosperous future.” Amen to that!

In short, despite my questions and criticisms, there is a great deal to like about the directions Jeff set forth for Amazon in his final shareholder letter. In addition to the commitment to work more deeply on behalf of other stakeholders beyond customers and shareholders, I was taken with his concluding advice to the company: “The world will always try to make Amazon more typical—to bring us into equilibrium with our environment. It will take continuous effort, but we can and must be better than that.”

It is in the spirit of that aspiration that I offer the critiques found in this essay.

Categories: Technology

AI Adoption in the Enterprise 2021

O'Reilly Radar - Mon, 2021/04/19 - 05:20

During the first weeks of February, we asked recipients of our Data and AI Newsletters to participate in a survey on AI adoption in the enterprise. We were interested in answering two questions. First, we wanted to understand how the use of AI grew in the past year. We were also interested in the practice of AI: how developers work, what techniques and tools they use, what their concerns are, and what development practices are in place.

The most striking result is the sheer number of respondents. In our 2020 survey, which reached the same audience, we had 1,239 responses. This year, we had a total of 5,154. After eliminating 1,580 respondents who didn’t complete the survey, we’re left with 3,574 responses—almost three times as many as last year. It’s possible that pandemic-induced boredom led more people to respond, but we doubt it. Whether they’re putting products into production or just kicking the tires, more people are using AI than ever before.

Executive Summary

  • We had almost three times as many responses as last year, with similar efforts at promotion. More people are working with AI.
  • In the past, company culture has been the most significant barrier to AI adoption. While it’s still an issue, culture has dropped to fourth place.
  • This year, the most significant barrier to AI adoption is the lack of skilled people and the difficulty of hiring. That shortage has been predicted for several years; we’re finally seeing it.
  • The second-most significant barrier was the availability of quality data. That realization is a sign that the field is growing up.
  • The percentage of respondents reporting “mature” practices has been roughly the same for the last few years. That isn’t surprising, given the increase in the number of respondents: we suspect many organizations are just beginning their AI projects.
  • The retail industry sector has the highest percentage of mature practices; education has the lowest. But education also had the highest percentage of respondents who were “considering” AI.
  • Relatively few respondents are using version control for data and models. Tools for versioning data and models are still immature, but they’re critical for making AI results reproducible and reliable.

Of the 3,574 respondents who completed this year’s survey, 3,099 were working with AI in some way: considering it, evaluating it, or putting products into production. Of these respondents, it’s not a surprise that the largest number are based in the United States (39%) and that roughly half were from North America (47%). India had the second-most respondents (7%), while Asia (including India) had 16% of the total. Australia and New Zealand accounted for 3% of the total, giving the Asia-Pacific (APAC) region 19%. A little over a quarter (26%) of respondents were from Europe, led by Germany (4%). 7% of the respondents were from South America, and 2% were from Africa. Except for Antarctica, there were no continents with zero respondents, and a total of 111 countries were represented. These results that interest and use of AI is worldwide and growing.

This year’s results match last year’s data well. But it’s equally important to notice what the data doesn’t say. Only 0.2% of the respondents said they were from China. That clearly doesn’t reflect reality; China is a leader in AI and probably has more AI developers than any other nation, including the US. Likewise, 1% of the respondents were from Russia. Purely as a guess, we suspect that the number of AI developers in Russia is slightly smaller than the number in the US. These anomalies say much more about who the survey reached (subscribers to O’Reilly’s newsletters) than they say about the actual number of AI developers in Russia and China.

Figure 1. Respondents working with AI by country (top 12)

The respondents represented a diverse range of industries. Not surprisingly, computers, electronics, and technology topped the charts, with 17% of the respondents. Financial services (15%), healthcare (9%), and education (8%) are the industries making the next-most significant use of AI. We see relatively little use of AI in the pharmaceutical and chemical industries (2%), though we expect that to change sharply given the role of AI in developing the COVID-19 vaccine. Likewise, we see few respondents from the automotive industry (2%), though we know that AI is key to new products such as autonomous vehicles.

3% of the respondents were from the energy industry, and another 1% from public utilities (which includes part of the energy sector). That’s a respectable number by itself, but we have to ask: Will AI play a role in rebuilding our frail and outdated energy infrastructure, as events of the last few years—not just the Texas freeze or the California fires—have demonstrated? We expect that it will, though it’s fair to ask whether AI systems trained on normative data will be robust in the face of “black swan” events. What will an AI system do when faced with a rare situation, one that isn’t well-represented in its training data? That, after all, is the problem facing the developers of autonomous vehicles. Driving a car safely is easy when the other traffic and pedestrians all play by the rules. It’s only difficult when something unexpected happens. The same is true of the electrical grid.

We also expect AI to reshape agriculture (1% of respondents). As with energy, AI-driven changes won’t come quickly. However, we’ve seen a steady stream of AI projects in agriculture, with goals ranging from detecting crop disease to killing moths with small drones.

Finally, 8% of respondents said that their industry was “Other,” and 14% were grouped into “All Others.” “All Others” combines 12 industries that the survey listed as possible responses (including automotive, pharmaceutical and chemical, and agriculture) but that didn’t have enough responses to show in the chart. “Other” is the wild card, comprising industries we didn’t list as options. “Other” appears in the fourth position, just behind healthcare. Unfortunately, we don’t know which industries are represented by that category—but it shows that the spread of AI has indeed become broad!

Figure 2. Industries using AI Maturity

Roughly one quarter of the respondents described their use of AI as “mature” (26%), meaning that they had revenue-bearing AI products in production. This is almost exactly in line with the results from 2020, where 25% of the respondents reported that they had products in production (“Mature” wasn’t a possible response in the 2020 survey.)

This year, 35% of our respondents were “evaluating” AI (trials and proof-of-concept projects), also roughly the same as last year (33%). 13% of the respondents weren’t making use of AI or considering using it; this is down from last year’s number (15%), but again, it’s not significantly different.

What do we make of the respondents who are “considering” AI but haven’t yet started any projects (26%)? That’s not an option last year’s respondents had. We suspect that last year respondents who were considering AI said they were either “evaluating” or “not using” it.

Figure 3. AI practice maturity

Looking at the problems respondents faced in AI adoption provides another way to gauge the overall maturity of AI as a field. Last year, the major bottleneck holding back adoption was company culture (22%), followed by the difficulty of identifying appropriate use cases (20%). This year, cultural problems are in fourth place (14%) and finding appropriate use cases is in third (17%). That’s a very significant change, particularly for corporate culture. Companies have accepted AI to a much greater degree, although finding appropriate problems to solve still remains a challenge.

The biggest problems in this year’s survey are lack of skilled people and difficulty in hiring (19%) and data quality (18%). It’s no surprise that the demand for AI expertise has exceeded the supply, but it’s important to realize that it’s now become the biggest bar to wider adoption. The biggest skills gaps were ML modelers and data scientists (52%), understanding business use cases (49%), and data engineering (42%). The need for people managing and maintaining computing infrastructure was comparatively low (24%), hinting that companies are solving their infrastructure requirements in the cloud.

It’s gratifying to note that organizations starting to realize the importance of data quality (18%). We’ve known about “garbage in, garbage out” for a long time; that goes double for AI. Bad data yields bad results at scale.

Hyperparameter tuning (2%) wasn’t considered a problem. It’s at the bottom of the list—where, we hope, it belongs. That may reflect the success of automated tools for building models (AutoML, although as we’ll see later, most respondents aren’t using them). It’s more concerning that workflow reproducibility (3%) is in second-to-last place. This makes sense, given that we don’t see heavy usage of tools for model and data versioning. We’ll look at this later, but being able to reproduce experimental results is critical to any science, and it’s a well-known problem in AI.

Figure 4. Bottlenecks to AI adoption Maturity by Continent

When looking at the geographic distribution of respondents with mature practices, we found almost no difference between North America (27%), Asia (27%), and Europe (28%). In contrast, in our 2018 report, Asia was behind in mature practices, though it had a markedly higher number of respondents in the “early adopter” or “exploring” stages. Asia has clearly caught up. There’s no significant difference between these three continents in our 2021 data.

We found a smaller percentage of respondents with mature practices and a higher percentage of respondents who were “considering” AI in South America (20%), Oceania (Australia and New Zealand, 18%), and Africa (17%). Don’t underestimate AI’s future impact on any of these continents.

Finally, the percentage of respondents “evaluating” AI was almost the same on each continent, varying only from 31% (South America) to 36% (Oceania).

Figure 5. Maturity by continent Maturity by Industry

While AI maturity doesn’t depend strongly on geography, we see a different picture if we look at maturity by industry.

Looking at the top eight industries, financial services (38%), telecommunications (37%), and retail (40%) had the greatest percentage of respondents reporting mature practices. And while it had by far the greatest number of respondents, computers, electronics, and technology was in fourth place, with 35% of respondents reporting mature practices. Education (10%) and government (16%) were the laggards. Healthcare and life sciences, at 28%, were in the middle, as were manufacturing (25%), defense (26%), and media (29%).

On the other hand, if we look at industries that are considering AI, we find that education is the leader (48%). Respondents working in government and manufacturing seem to be somewhat further along, with 49% and 47% evaluating AI, meaning that they have pilot or proof-of-concept projects in progress.

This may just be a trick of the numbers: every group adds up to 100%, so if there are fewer “mature” practices in one group, the percentage of “evaluating” and “considering” practices has to be higher. But there’s also a real signal: respondents in these industries may not consider their practices “mature,” but each of these industry sectors had over 100 respondents, and education had almost 250. Manufacturing needs to automate many processes (from assembly to inspection and more); government has been as challenged as any industry by the global pandemic, and has always needed ways to “do more with less”; and education has been experimenting with technology for a number of years now. There is a real desire to do more with AI in these fields. It’s worth pointing out that educational and governmental applications of AI frequently raise ethical questions—and one of the most important issues for the next few years will be seeing how these organizations respond to ethical problems.

Figure 6. Maturity by industry (percent) The Practice of AI

Now that we’ve discussed where mature practices are found, both geographically and by industry, let’s see what a mature practice looks like. What do these organizations have in common? How are they different from organizations that are evaluating or considering AI?


First, 82% of the respondents are using supervised learning, and 67% are using deep learning. Deep learning is a set of algorithms that are common to almost all AI approaches, so this overlap isn’t surprising. (Participants could provide multiple answers.) 58% claimed to be using unsupervised learning.

After unsupervised learning, there was a significant drop-off. Human-in-the-loop, knowledge graphs, reinforcement learning, simulation, and planning and reasoning all saw usage below 40%. Surprisingly, natural language processing wasn’t in the picture at all. (A very small number of respondents wrote in “natural language processing” as a response, but they were only a small percentage of the total.) This is significant and definitely worth watching over the next few months. In the last few years, there have been many breakthroughs in NLP and NLU (natural language understanding): everyone in the industry has read about GPT-3, and many vendors are betting heavily on using AI to automate customer service call centers and similar applications. This survey suggests that those applications still haven’t moved into practice.

We asked a similar question to respondents who were considering or evaluating the use of AI (60% of the total). While the percentages were lower, the technologies appeared in the same order, with very few differences. This indicates that respondents who are still evaluating AI are experimenting with fewer technologies than respondents with mature practices. That suggests (reasonably enough) that respondents are choosing to “start simple” and limit the techniques that they experiment with.

Figure 7. AI technologies used in mature practices Data

We also asked what kinds of data our “mature” respondents are using. Most (83%) are using structured data (logfiles, time series data, geospatial data). 71% are using text data—that isn’t consistent with the number of respondents who reported using NLP, unless “text” is being used generically to include any data that can be represented as text (e.g., form data). 52% of the respondents reported using images and video. That seems low relative to the amount of research we read about AI and computer vision. Perhaps it’s not surprising though: there’s no reason for business use cases to be in sync with academic research. We’d expect most business applications to involve structured data, form data, or text data of some kind. Relatively few respondents (23%) are working with audio, which remains very challenging.

Again, we asked a similar question to respondents who were evaluating or considering AI, and again, we received similar results, though the percentage of respondents for any given answer was somewhat smaller (4–5%).

Figure 8. Data types used in mature practices Risk

When we asked respondents with mature practices what risks they checked for, 71% said “unexpected outcomes or predictions.” Interpretability, model degradation over time, privacy, and fairness also ranked high (over 50%), though it’s disappointing that only 52% of the respondents selected this option. Security is also a concern, at 42%. AI raises important new security issues, including the possibility of poisoned data sources and reverse engineering models to extract private information.

It’s hard to interpret these results without knowing exactly what applications are being developed. Privacy, security, fairness, and safety are important concerns for every application of AI, but it’s also important to realize that not all applications are the same. A farming application that detects crop disease doesn’t have the same kind of risks as an application that’s approving or denying loans. Safety is a much bigger concern for autonomous vehicles than for personalized shopping bots. However, do we really believe that these risks don’t need to be addressed for nearly half of all projects?

Figure 9. Risks checked for during development Tools

Respondents with mature practices clearly had their favorite tools: scikit-learn, TensorFlow, PyTorch, and Keras each scored over 45%, with scikit-learn and TensorFlow the leaders (both with 65%). A second group of tools, including Amazon’s SageMaker (25%), Microsoft’s Azure ML Studio (21%), and Google’s Cloud ML Engine (18%), clustered around 20%, along with Spark NLP and spaCy.

When asked which tools they planned to incorporate over the coming 12 months, roughly half of the respondents answered model monitoring (57%) and model visualization (49%). Models become stale for many reasons, not the least of which is changes in human behavior, changes for which the model itself may be responsible. The ability to monitor a model’s performance and detect when it has become “stale” will be increasingly important as businesses grow more reliant on AI and in turn demand that AI projects demonstrate their value.

Figure 10. Tools used by mature practices

Responses from those who were evaluating or considering AI were similar, but with some interesting differences: scikit-learn moved from first place to third (48%). The second group was led by products from cloud vendors that incorporate AutoML: Microsoft Azure ML Studio (29%), Google Cloud ML Engine (25%), and Amazon SageMaker (23%). These products were significantly more popular than they were among “mature” users. The difference isn’t huge, but it is striking. At risk of over-overinterpreting, users who are newer to AI are more inclined to use vendor-specific packages, more inclined to use AutoML in one of its incarnations, and somewhat more inclined to go with Microsoft or Google rather than Amazon. It’s also possible that scikit-learn has less brand recognition among those who are relatively new to AI compared to packages from organizations like Google or Facebook.

When asked specifically about AutoML products, 51% of “mature” respondents said they weren’t using AutoML at all. 22% use Amazon SageMaker; 16% use Microsoft Azure AutoML; 14% use Google Cloud AutoML; and other tools were all under 10%. Among users who are evaluating or considering AI, only 40% said they weren’t using AutoML at all—and the Google, Microsoft, and Amazon packages were all but tied (27–28%). AutoML isn’t yet a big part of the picture, but it appears to be gaining traction among users who are still considering or experimenting with AI. And it’s possible that we’ll see increased use of AutoML tools among mature users, of whom 45% indicated that they would be incorporating tools for automated model search and hyperparameter tuning (in a word, AutoML) in the coming yet.

Deployment and Monitoring

An AI project means nothing if it can’t be deployed; even projects that are only intended for internal use need some kind of deployment. Our survey showed that AI deployment is still largely unknown territory, dominated by homegrown ad hoc processes. The three most significant tools for deploying AI all had roughly 20% adoption: MLflow (22%), TensorFlow Extended, a.k.a. TFX (20%), and Kubeflow (18%). Three products from smaller startups—Domino, Seldon, and Cortex—had roughly 4% adoption. But the most frequent answer to this question was “none of the above” (46%). Since this question was only asked of respondents with “mature” AI practices (i.e., respondents who have AI products in production), we can only assume that they’ve built their own tools and pipelines for deployment and monitoring. Given the many forms that an AI project can take, and that AI deployment is still something of a dark art, it isn’t surprising that AI developers and operations teams are only starting to adopt third-party tools for deployment.

Figure 11. Automated tools used in mature practices for deployment
and monitoring Versioning

Source control has long been a standard practice in software development. There are many well-known tools used to build source code repositories.

We’re confident that AI projects use source code repositories such as Git or GitHub; that’s a standard practice for all software developers. However, AI brings with it a different set of problems. In AI systems, the training data is as important as, if not more important than, the source code. So is the model built from the training data: the model reflects the training data and hyperparameters, in addition to the source code itself, and may be the result of hundreds of experiments.

Our survey shows that AI developers are only starting to use tools for data and model versioning. For data versioning, 35% of the respondents are using homegrown tools, while 46% responded “none of the above,” which we take to mean they’re using nothing more than a database. 9% are using DVC, 8% are using tools from Weights & Biases, and 5% are using Pachyderm.

Figure 12. Automated tools used for data versioning

Tools for model and experiment tracking were used more frequently, although the results are fundamentally the same. 29% are using homegrown tools, while 34% said “none of the above.” The leading tools were MLflow (27%) and Kubeflow (18%), with Weights & Biases at 8%.

Figure 13. Automated tools used for model and experiment tracking

Respondents who are considering or evaluating AI are even less likely to use data versioning tools: 59% said “none of the above,” while only 26% are using homegrown tools. Weights & Biases was the most popular third-party solution (12%). When asked about model and experiment tracking, 44% said “none of the above,” while 21% are using homegrown tools. It’s interesting, though, that in this group, MLflow (25%) and Kubeflow (21%) ranked above homegrown tools.

Although the tools available for versioning models and data are still rudimentary, it’s disturbing that so many practices, including those that have AI products in production, aren’t using them. You can’t reproduce results if you can’t reproduce the data and the models that generated the results. We’ve said that a quarter of respondents considered their AI practice mature—but it’s unclear what maturity means if it doesn’t include reproducibility.

The Bottom Line

In the past two years, the audience for AI has grown, but it hasn’t changed much: Roughly the same percentage of respondents consider themselves to be part of a “mature” practice; the same industries are represented, and at roughly the same levels; and the geographical distribution of our respondents has changed little.

We don’t know whether to be gratified or discouraged that only 50% of the respondents listed privacy or ethics as a risk they were concerned about. Without data from prior years, it’s hard to tell whether this is an improvement or a step backward. But it’s difficult to believe that there are so many AI applications for which privacy, ethics, and security aren’t significant risks.

Tool usage didn’t present any big surprises: the field is dominated by scikit-learn, TensorFlow, PyTorch, and Keras, though there’s a healthy ecosystem of open source, commercially licensed, and cloud native tools. AutoML has yet to make big inroads, but respondents representing less mature practices seem to be leaning toward automated tools and are less likely to use scikit-learn.

The number of respondents who aren’t addressing data or model versioning was an unwelcome surprise. These practices should be foundational: central to developing AI products that have verifiable, repeatable results. While we acknowledge that versioning tools appropriate to AI applications are still in their early stages, the number of participants who checked “none of the above” was revealing—particularly since “the above” included homegrown tools. You can’t have reproducible results if you don’t have reproducible data and models. Period.

In the past year, AI in the enterprise has grown; the sheer number of respondents will tell you that. But has it matured? Many new teams are entering the field, while the percentage of respondents who have deployed applications has remained roughly constant. In many respects, this indicates success: 25% of a bigger number is more than 25% of a smaller number. But is application deployment the right metric for maturity? Enterprise AI won’t really have matured until development and operations groups can engage in practices like continuous deployment, until results are repeatable (at least in a statistical sense), and until ethics, safety, privacy, and security are primary rather than secondary concerns. Mature AI? Yes, enterprise AI has been maturing. But it’s time to set the bar for maturity higher.

Categories: Technology

Virtual Meeting Topic for meeting on 8/4

PLUG - Tue, 2021/04/06 - 17:29

Ben Cotton: Fedora and Future of Operating Systems

Join the Meeting for going to on Apr. 4th at 7pm MST

Operating systems are not boring, but they’re no longer the end of the Linux development pipeline. As interest shifts up the stack to containers and other abstraction technologies, what are operating system makers to do? This talk describes how the Fedora Project views the future of the Linux distribution, why it’s still relevant, and how we’re going to get there. The operating system may not get the attention it used to, but it still plays an important role in providing the foundation that modern applications are built on.

About Ben:
Ben is a meteorologist by training, but weather makes a great hobby. Ben works as the Fedora Program Manager at Red Hat. He is a member of the Open Source Initiative and a supporter of Software Freedom Conservancy. Ben is an Correspondent Emeritus and an Open Organization Ambassador. Find him on Twitter (@FunnelFiasco) or at

NFTs: Owning Digital Art

O'Reilly Radar - Tue, 2021/04/06 - 11:43

It would be hard to miss the commotion around non-fungible tokens (NFTs). Non-fungible tokens are, to a first approximation, purchased digital goods that exist on a blockchain. At this point, NFTs exist on the Ethereum blockchain, but there’s no reason that they couldn’t be implemented on others; it seems reasonably likely that specialized blockchains will be built for NFTs.

What kinds of value do NFTs create?  It’s certainly been claimed that they create a market for digital art, that digital artists can now get “paid” for their work.  Wikipedia points to a number of other possible uses: they could also be used to represent other collectible objects (a digital equivalent to baseball trading cards), or to represent assets in online games, or even to represent shares in a real-world athlete’s contract–or a share in an athlete’s body. Of course, there’s a secondary market in trading NFTs, just as a collector might sell a work of art from a collection.

All of these transactions rely on the idea that an NFT establishes “provenance” for a digital object. Who owns it? Who previously owned it? Who created it? Which of the many, many copies is the “original”? These are important questions for many valuable and unique physical objects: works of art, historical documents, antiques, and even real estate. NFTs present the possibility of bringing “ownership” to the virtual world: Who owns a tweet?  Who owns a jpeg, gif, or png file?

Regardless of whether you think ownership for virtual objects is important, keep in mind that digital objects are close to meaningless if they aren’t copied. If you can’t see a png or jpg in your browser, it might as well be hanging on the wall in a museum.  And that’s worth talking about, because the language of “provenance” comes directly from the museum world. If I have a painting—say, a Rembrandt—its provenance is the history of its ownership, ideally tracing it back to its original source.

An artwork’s provenance serves two purposes: academic and commercial. Provenance is important academically because it allows you to believe you’re studying the right thing: a real Rembrandt, not a copy (copying famous paintings is a time-honored part of a painter’s training, in addition to an opportunity for forgery), or something that happens to look like Rembrandt, but isn’t (“hey, dark, depressing paintings of Dutch people are sort of cool; maybe I can do one”).

Commercially, provenance allows artworks to become extremely expensive. It allows them to become fetishized objects of immense value, at least to collectors. Particularly to collectors: “Hey, my Rembrandt, is worth more than your Vermeer.” It’s a lot harder to bid a painting’s price up into the millions if you are unsure about its provenance.

NFTs enable the commercial function of provenance; they allow @jack’s first tweet to become a fetishized object that’s worth millions, at least until people decide that there’s something else they’d rather pay for. They establish a playground for the ultra-wealthy; if you have so much money that you don’t care how you spend it, why not buy Jack’s first tweet? You don’t even have to stick it on the wall and look at those old Dutch guys, or worry about burglar alarms. (You do have a good password, don’t you?)

But I don’t think that’s worth very much. What about the academic function? There’s some value in studying the early history of Twitter, possibly including @jack’s first tweet. But what exactly is the NFT showing me? That these are, indeed, Jack’s bits? Certainly not; who knows (and who cares) what became of the 0s and 1s that originally lived on Jack’s laptop and Twitter’s servers? Even if the original bits still existed, they wouldn’t be meaningful—lots of people have, or have had, the same set of bits on their computers.  As any programmer knows, equality and identity aren’t the same.  In this case, equality is important (is this what @jack wrote?); identity isn’t.

However, an NFT doesn’t certify that the tweet is what @Jack actually said. An NFT is only about a bunch of bits, not about what the creator (or anyone else) asserts about the bits. @Jack could easily be mistaken, or dishonest (in literature, we deal all the time with authors who want to change what they have “said,” or what they meant by what they said). Our beliefs about the contents of @jack’s first tweet have everything to do with our beliefs about @jack and Twitter (where you can still find it), and nothing to do with the NFT.

A tweet is one thing; what about a digital artwork? Does an NFT establish the provenance of a digital artwork? That depends on what is meant by “the provenance of a digital artwork.” A copy of a Rembrandt is still a copy, meaning it’s not the artifact that Rembrandt created. There are all sorts of techniques, ranging from very low to very high tech, to establish the link between artist and artwork. Those techniques are meaningless in the digital world, which eliminates noise, eliminates error in making copies. So, why would I care if my copy of the bits isn’t the artist’s original? The artist’s bits aren’t the “original,” either. That sort of originality is meaningless in the digital world: did the artist ever restore from backup? Was the artwork never swapped to disk, and swapped back in? 

What “originality” really means is “this is the unique product of my mind.” We can ask any number of questions about what that might mean, but let’s keep it simple. Whatever that statement means, it’s not a statement on which an NFT or a blockchain has any bearing. We’ve already seen instances of people creating NFTs for other people’s work, and thus “owning” it.  Is this theft of intellectual property, or a meta-art form of its own? (One of my favorite avant-garde piano compositions contains the instructions “The performer should prepare any composition and then perform it as well as he can.”)

So then, what kind of statement about the originality, uniqueness, or authorship of an artwork could be established by an NFT? Beeple, who sold an NFT titled “Everydays: The First 5000 Days” for over $69 Million, says that the NFT is not about ownership of the copyright: “You can display the token and show you own the token, but, you don’t own the copyright.” I presume Beeple still owns the copyright to his work–does that mean he can sell it again? The NFT doesn’t typically include the bits that make up the artwork (I think this is possible, but only for very small objects); as @jonty points out, what the NFT actually contains isn’t the work, but a URL, a link.  That URL points to a resource (a JSON metadata file or an IPFS hash) that’s most likely on a server operated by a startup. And that resource points to the work. If that link becomes invalid (for example, if the startup goes bust), then all you “own” is an invalid link. A 404.

Some of these problems may be addressable; some aren’t.  The bottom line, though, is that the link between a creator and a work of art can’t be established by cryptographic checksums.

So do NFTs create a market for artwork that didn’t exist before?  Perhaps–though if what’s bought and sold isn’t the actual work (which remains infinitely and perfectly reproducible), or even the right to reproduce the work (copyright), it’s not clear to me how this really benefits artists, or even how it changes the picture much.  I suppose this is a sort of 21st century patronage, in which someone rich gives an artist a pile of money for being an artist (or gives Jack Dorsey money for being @jack). As patronage, it’s more like Prince Esterhazy than Patreon. A few artists will make money, perhaps even more money than they would otherwise, because I see no reason you can’t sell the work itself in addition to the NFT. Or sell multiple NFTs referencing the same work. But most won’t. The irreducible problem of being an artist–whether that’s a musician, a painter, or a sculptor, whether the medium is digital or physical–is that there are more people who want the job than are people willing to pay.

In the end, what do NFTs create? A kind of digital fetishism around possessing bits, but perhaps not much else. An NFT shows that you are able to spend money on something–without involving the “something” itself. As Beeple says, “you can display the token.” This is conspicuous consumption in perhaps its purest form. It’s like buying jewelry and framing the receipt. That an explosion in conspicuous consumption should arise at this point in history isn’t surprising. The tech community is awash in wealth: wealth from unicorn startups that will never make a cent of profit, wealth from cryptocurrencies that are very difficult to use to buy or sell anything. What’s the value of being rich if you can’t show it off? How do you show something off during a socially distanced pandemic? And if all you care about is showing off your wealth, the NFT is where the real value lies, not in the artwork. You can buy, sell, or trade them, just like baseball cards. Just don’t mistake an NFT for “ownership” in anything but the NFT itself.

Banksy’s self-destroying artwork was much more to the point. Unlike Banksy’s many public murals, which anyone can enjoy for free, this painting shredded itself as soon as it was bought at auction. Buying it destroyed it.

Categories: Technology

Radar trends to watch: April 2021

O'Reilly Radar - Thu, 2021/04/01 - 04:30

March was a busy month. There’s been a lot of talk about augmented and virtual reality, with hints and speculation about products from Apple and Facebook. In the next two years, we’ll see whether this is more than just talk. We’ve also seen more people discussing operations for machine learning and AI, including a substantive talk by Andrew Ng. We’ve long believed that operations was the unacknowledged elephant in the room; it’s finally making it into the open. And we’ve had our share of bad news: proposals for military use of AI, increased surveillance (for example, automated license plate readers at luxury condominiums connected to police departments). More than ever, we have to ask ourselves what kind of world we want to build.

  • Contentyze is a free, publicly available language model that claims to be GPT-3-like. It works fairly well. Wired also points to a free GPT-3-like model called Eleuther.
  • The AI Infrastructure Alliance wants to describe a canonical stack for AI, analogous to LAMP or MEAN; they see see it as a way to free AI from domination by the technology giants.
  • Global treaties on the use of AI in warfare?  The time may have come.  But verifying compliance is extremely difficult.  Nuclear weapons are easy in comparison.
  • Operations for Machine Learning (i.e., integrating it into CI/CD processes) is the big challenge facing businesses in the coming years. This isn’t the first time operations for ML and AI have appeared in Trends…  but people are getting the message.
  • The next step in AI is Multimodal: AI that combines multiple abilities and multiple senses, starting with computer vision and natural language.
  • Smart drones kill moths by crashing into them, to prevent damage to crops. Pesticide-free agricultural pest control.
  • Tesla’s fully self-driving car isn’t fully self-driving, and that’s the good part. Musk still seems to think he can have a fully self-driving car by the end of 2021, apparently by skipping the hard work.
  • Turn any dial into an API with a camera and some simple computer vision: Pete Warden’s notion of TinyAI could be used to make everything machine-readable, including electric meters and common appliances.
  • The National Security Commission on Artificial Intelligence has published a huge and wide-ranging report on the future development of AI in the US, covering both business and military applications. Recommendations include the military development of AI-based weapons, and the creation of a quasi-military academy for developing AI expertise.
  • A robotic lifeguard: an autonomous underwater robot for rescuing swimmers.
  • We have been building centralized data systems for the past decade. The pendulum is about to swing the other way: data decentralization will be driven in part by regulation, in part by changes in advertising platforms, and in part by competition between cloud platforms.
  • Thoughtworks’ thoughts on building a digital healthcare ecosystem: put the patients first (not the providers), make data accountable, build and share knowledge, leverage new technology.
  • Empowering the public to resist the surveillance state: data strikes, data poisoning, reimagined as collective action, in a paper presented at the FaccT conference.
Social Media
  • Zuckerberg proposes that social media platforms “should be required to demonstrate that they have systems in place for identifying unlawful content and removing it.” Such a policy would give a significant advantage to established players–but in that, it’s not unlike laws requiring safe disposal of toxic waste.
  • Either ignoring or unaware of the potential for abuse, Slack added a feature allowing unblockable direct messages from paid users to any users of the system (not just users from the same organization). While message delivery in Slack can be stopped, email containing the message body can’t. Slack is promising to fix this feature.
  • Nokia has released the Plan 9 Operating System (started by Rob Pike, Brian Kernighan, and Dennis Ritchie) under the open source MIT license.  No one knows whether it will prosper, but it is the first significantly new operating system we’ve seen in years.
  • An important take on performance: it’s not about speeds, it’s about statistics and what happens at the edges of the distribution. Understanding queuing theory is the key, not MHz and Mbps.
  • Is Microsoft’s low-code, Excel-based open source programming language Power Fx what brings programming to the masses?
  • Non-Intrusive Production Debugging: Is this a trend? Or just a flash in the pan? The ability to run a debugger on code running in production and observe what is happening line-by-line seems like magic.
Augmented Reality
  • As part of its augmented reality strategy, Facebook is developing a non-invasive wristband-based neural interface that lets you control digital objects with thought.
  • The killer app for AR might be audio: smart headphones and hearing aids that can extract important sounds (conversations, for example) from a sea of noise.
  • Mojo Vision has developed very low power chips for use in AR contact lenses.
  • Facebook is talking more about its AR/VR glasses, along with new kinds of user interfaces, in which AI mediates literally every part of the wearer’s experience.
  • Google’s ProjectZero security team, which has been responsible for disclosing many vulnerabilities (and getting vendors to fix them), has just exposed a number of vulnerabilities that were actively being used by government organizations in counter-terrorist activities.
  • Botnets have been observed storing key configuration information in cryptocurrency blockchains, including the IP addresses of infected systems. Taking down the botnet’s control server is no longer an effective defense, because the server can easily be rebuilt.
  • Tens of thousands of Microsoft Exchange Server installations have been compromised. Some of the servers may have been attacked by a group connect to the Chinese government, though there are several variants of the attack, suggesting multiple actors.
  • The problem with a walled garden: once the attackers are in, the walls are protecting them, too.  iOS’s security features make successful attacks very difficult; but when they succeed, they are almost impossible to detect.
  • The CRISPR equivalent of a laptop: Onyx is a small, portable, and (relatively) inexpensive tool for automating CRISPR gene editing.  It could make CRISPR much more widely accessible, much as the laptop did for computing.
  • AI and NVidia have made a breakthrough in using deep learning for genetic research. In addition to reducing the time to do some analyses from days to hours, they have significantly reduced the number of cells needed, making it easier to do research on rare genetic diseases.
  • California has banned user interface “dark patterns”: intentionally confusing user interface designs used to prevent people from opting out of data collection.
  • “Headless” wordpress: WordPress as an API for content, using the JAMstack (JavaScript, APIs, and Markup) for rendering rather than PHP.
  • Project Gemini claims to be recreating the web.  It’s more than gopher, but not much more.  My biggest question is whether anyone cares about old-style “internet browsing” any more?
  • Is the next step for web developers HTML over WebSockets?  Developers are starting to realize that browser-side JavaScript development has resulted in spiraling complexity, poor performance, and buggy applications.
Quantum Computing Blockchain
  • Non-fungible tokens (NFTs) have taken the blockchain world by storm. But it’s not clear that NFTs have any real application. What is the value of proving that you own a tweet or an emoji?
Categories: Technology

InfoTribes, Reality Brokers

O'Reilly Radar - Tue, 2021/03/23 - 07:40

It seems harder than ever to agree with others on basic facts, let alone to develop shared values and goals: we even claim to live in a post-truth era1. With anti-vaxxers, QAnon, Bernie Bros, flat earthers, the intellectual dark web, and disagreement worldwide as to the seriousness of COVID-19 and the effectiveness of masks, have we lost our shared reality? For every piece of information X somewhere, you can likely find “not X” elsewhere. There is a growing disbelief and distrust in basic science and government. All too often, conversations on social media descend rapidly to questions such as “What planet are you from?

Reality Decentralized

What has happened? Reality has once again become decentralized. Before the advent of broadcast media and mass culture, individuals’ mental models of the world were generated locally, along with their sense of reality and what they considered ground truth. With broadcast media and the culture industries came the ability to forge top-down, national identities that could be pushed into the living rooms of families at prime time, completing the project of the press and newspapers in nation-forming2. The creation of the TV dinner was perhaps one of the most effective tools in carving out a sense of shared reality at a national level (did the TV dinner mean fewer people said Grace?).

The rise of the Internet, Search, social media, apps, and platforms has resulted in an information landscape that bypasses the centralized knowledge/reality-generation machine of broadcast media. It is, however, driven by the incentives (both visible and hidden) of significant power structures, such as Big Tech companies. With the degradation of top-down knowledge, we’ve seen the return of locally-generated shared realities, where local now refers to proximity in cyberspace. Content creators and content consumers are connected, share information, and develop mental models of the world, along with shared or distinct realities, based on the information they consume. They form communities and shared realities accordingly and all these interactions are mediated by the incentive systems of the platforms they connect on.

As a result, the number of possible realities has proliferated and the ability to find people to share any given reality with has increased. This InfoLandscape we all increasingly occupy is both novel and shifting rapidly. In it, we are currently finding people we can share some semblance of ground truth with: we’re forming our own InfoTribes, and shared reality is splintering around the globe.

To understand this paradigm shift, we need to comprehend:

  • the initial vision behind the internet and the InfoLandscapes that have emerged,
  • how we are forming InfoTribes and how reality is splintering,
  • that large-scale shared reality has merely occupied a blip in human history, ushered in by the advent of broadcast media, and
  • who we look to for information and knowledge in an InfoLandscape that we haven’t evolved to comprehend.
The InfoLandscapes

“Cyberspace. A consensual hallucination experienced daily by billions of legitimate operators, in every nation, by children being taught mathematical concepts… A graphic representation of data abstracted from the banks of every computer in the human system. Unthinkable complexity. Lines of light ranged in the nonspace of the mind, clusters, and constellations of data. Like city lights, receding.”

Neuromancer, William Gibson (1984)

There are several ways to frame the origin story of the internet. One is how it gave rise to new forms of information flow: the vision of a novel space in which anybody could publish anything and everyone could find it. Much of the philosophy of early internet pioneers was couched in terms of the potential to “flatten organizations, globalize society, decentralize control, and help harmonize people” (Nicholas Negraponte, MIT)3.

As John Perry Barlow (of Grateful Dead fame) wrote in A Declaration of the Independence of Cyberspace (1996):

We are creating a world that all may enter without privilege or prejudice accorded by race, economic power, military force, or station of birth. We are creating a world where anyone, anywhere may express his or her beliefs, no matter how singular, without fear of being coerced into silence or conformity. Your legal concepts of property, expression, identity, movement, and context do not apply to us. They are all based on matter, and there is no matter here.

This may have been the world we wanted but not the one we got. We are veering closer to an online and app-mediated environment similar to Deleuze’s Societies of Control, in which we are increasingly treated as our data and what Deleuze calls “dividuals”: collections of behavior and characteristics, associated with online interactions, passwords, spending, clicks, cursor movements, and personal algorithms, that can be passed into statistical and predictive models and guided and incentivized to behave and spend in particular ways. Put simply, we are reduced to the inputs of an algorithm. On top of this, pre-existing societal biases are being reinforced and promulgated at previously unheard of scales as we increasingly integrate machine learning models into our daily lives.

Prescient visions of society along these lines were provided by William Gibson and Neal Stephenson’s 1992 Snow Crash: societies increasingly interacting in virtual reality environments and computational spaces, in which the landscapes were defined by information flows4. Not only this, but both authors envisioned such spaces being turned into marketplaces and segmented and demarcated by large corporations, only a stone’s throw from where we find ourselves today. How did we get here?

Information Creation

In the early days of the internet, you needed to be a coder to create a website. The ability to publish material was relegated to the technical. It was only in walled gardens such as CompuServe and AOL or after the introduction of tools like Blogger that regular punters were able to create their own websites with relative ease. The participatory culture and user-generated content of Web 2.0 opened up the creative space, allowing anyone and everyone to create content, as well as respond to, rate, and review it. Over the last decade, two new dynamics have drastically increased the amount of information creation, and, therefore, the “raw material” with which the landscape can be molded:

  1. Smartphones with high-resolution video cameras and
  2. The transformation of the attention economy by “social media” platforms, which incentivize individuals to digitize more of their experiences and broadcast as much as possible.

And it isn’t only the generation of novel content or the speed at which information travels. It is also the vast archives of human information and knowledge that are being unearthed, digitized, and made available online. This is the space of content creation.

Information Retrieval

The other necessary side of information flow is discoverability, how it is organized, and where it’s surfaced. When so much of the world’s information is available, what is the method for retrieval? Previously the realm of chat rooms and bulletin boards, this question eventually gave rise to the creation of search engines, social media platforms, streaming sites, apps, and platforms.

Platforms that automate the organizing and surfacing of online content are necessary, given the amount of content currently out there and how much is being generated daily. And they also require interrogating, as we humans base our mental models of how the world works on the information we receive, as we do our senses of reality, the way we make decisions, and the communities we form. Platforms such as Facebook have erected walled gardens in our new InfoLandscape and locked many of us into them, as predicted by both Gibson and Stephenson. Do we want such corporatized and closed structures in our networked commons?

InfoTribes, Shared Reality

A by-product of algorithmic polarization and fragmentation has been the formation of more groups that agree within their own groups and disagree far more between groups, not only on what they value but on ground truth, about reality.

Online spaces are novel forms of community: people who haven’t met and may never meet in real life interacting in cyberspace. As scholars such as danah boyd have made clear, “social network sites like MySpace and Facebook are networked publics, just like parks and other outdoor spaces can be understood as publics.”

One key characteristic of any community is a sense of shared reality, something agreed upon. Communities are based around a sense of shared reality, shared values, and/or shared goals. Historically, communities have required geographical proximity to coalesce, whereas online communities have been able to form outside the constraints of meatspace. Let’s not make the mistake of assuming online community formation doesn’t have constraints. The constraints are perhaps more hidden, but they exist: they’re both technological and the result of how the InfoLandscapes have been carved out by the platforms, along with their technological and economic incentives5. Landscapes and communities have co-evolved, although, for most of history, on different timescales: mountain ranges can separate parts of a community and, conversely, we build tunnels through mountains; rivers connect communities, cities, and commerce, and humans alter the nature of rivers (an extreme example being the reversal of the Chicago River!).

The past two decades have seen the formation of several new, rapidly and constantly shifting landscapes that we all increasingly interact with, along with the formation of new information communities, driven and consolidated by the emergent phenomena of filter bubbles and echo chambers, among many others, themselves driven by the platforms’ drive for engagement. What the constituents of each of these communities share are mental models of how the world works, senses of reality, that are, for the most part, reinforced by the algorithms that surface content, either by 1) showing content you agree with to promote engagement or 2) showing content you totally disagree with to the same end. Just as the newspaper page has historically been a mish-mash collection of movie ads, obituaries, and opinions stitched together in a way that made the most business and economic sense for any given publisher, your Facebook feed is driven by a collection of algorithms that, in the end, are optimizing for growth and revenue6. These incentives define the InfoLandscape and determine the constraints under which communities form. It just so happens that dividing people increases engagement and makes economic sense. As Karen Hao wrote recently in the MIT Technology Review, framing it as a result of “Zuckerberg’s relentless desire for growth,” which is directly correlated with economic incentives:

The algorithms that underpin Facebook’s business weren’t created to filter out what was false or inflammatory; they were designed to make people share and engage with as much content as possible by showing them things they were most likely to be outraged or titillated by.

The consequence? As groups of people turn inward, agreeing more amongst their in-group, and disagreeing more fervently with those outside of it, the common ground in between, the shared reality, which is where perhaps the truth lies, is slowly lost. Put another way, a by-product of algorithmic polarization and fragmentation has been the formation of more groups that agree within their own groups and disagree far more with other groups, not only on what they value but on ground truth, about reality.

We’ve witnessed the genesis of information tribes or InfoTribes and, as these new ideological territories are carved up, those who occupy InfoLandscapes hold that ground as a part of an InfoTribe7. Viewed in this way, the online flame wars we’ve become all too accustomed to form part of the initial staking out of territory in these new InfoLandscapes. Anthropologists have long talked about tribes as being formed around symbols of group membership, symbols that unite a people, like totem animals, flags, or… online content.

Reality Brokers, Reality Splintering

The platforms that “decide” what we see and when we see it are reality brokers in a serious sense: they guide how individuals construct their sense of the world, their own identities, what they consider ground truth, and the communities they become a part of.

Arguably, many people aren’t particularly interested in the ground truth per se, they’re interested in narratives that support their pre-existing mental models of the world, narratives that help them sleep at night. This is something that 45 brilliantly, and perhaps unwittingly, played into and made starkly apparent, by continually sowing seeds of confusion, gaslighting the global community, and questioning the reality of anything that didn’t serve his own purposes.

This trend isn’t confined to the US. The rise of populism more generally in the West can be seen as the result of diverging senses of reality, the first slice splitting people across ideological and party lines. Why are these divergences in a sense of shared reality becoming so exacerbated and apparent now? The unparalleled velocity at which we receive information is one reason, particularly as we likely haven’t evolved to even begin to process the vast amounts we consume. But it isn’t only the speed and amount, it’s the structure. The current media landscape is highly non-linear, as opposed to print and television. Our sense-making and reality-forming faculties are overwhelmed daily by the fractal-like nature of (social) media platforms and environments that are full of overlapping phenomena and patterns that occur at many different frequencies8. Moreover, the information we’re served is generally driven by opaque and obscure economic incentives of platforms, which are protected by even more obscure legislation in the form of Section 230 in the US (there are other incentives at play, themselves rarely surfaced, in the name of “trade secrets”).

But let’s be careful here: it isn’t tech all the way down. We’re also deep in a several decades-long erosion of institutional knowledge, a mistrust in both science and government being the two most obvious. Neoliberalism has carved out the middle class while the fruits of top-down knowledge have left so many people unserved and behind. On top of this, ignorance has been actively cultivated and produced. Look no further than the recent manufacturing of ignorance from the top down with the goals of chaos creation, sowing the seeds of doubt, and delegitimizing the scientific method and data reporting (the study of culturally induced ignorance is known as agnotology and Proctor and Scheibinger’s book Agnotology: The Making and Unmaking of Ignorance is canonical). On top of this, we’ve seen the impact of bad actors and foreign influence (not mutually exclusive) on the dismantling of shared reality, such as Russian interference around the 2016 US election.

This has left reality up for grabs and, in an InfoLandscape exacerbated by a global pandemic, those who control and guide the flow of information also control the building of InfoTribes, along with their shared realities. Viewed from another perspective, the internet is a space in which information is created and consumed, a many-sided marketplace of supply-and-demand in which the dominant currency is information, albeit driven by a shadow market of data, marketing collateral, clicks, cash, and crypto. The platforms that “decide” what we see and when we see it are reality brokers in a serious sense: they guide how individuals construct their sense of the world, their own identities, what they consider ground truth, and the communities they become a part of. In some cases, these reality brokers may be doing it completely by accident. They don’t necessarily care about the ground truth, just about engagement, attention, and profit: the breakdown of shared reality as collateral damage of a globalized, industrial-scale incentive system. In this framework, the rise of conspiracy theories is an artefact of this process: the reality brokered and formed, whether it be a flat earth or a cabal of Satan-worshipping pedophiles plotting against 45, is a direct result of the bottom-up sense-making of top-down reality splintering, the dissolution of ground truth and the implosion of a more general shared reality. Web 2.0 has had a serious part to play in this reality splintering but the current retreat away into higher signal and private platforms such as newsletters, Slack, Discord, WhatsApp, and Signal groups could be more harmful, in many ways.

Shared reality is breaking down. But was it even real in the first place?

Shared Reality as Historical Quirk

Being born after World War Two could lead one to believe that shared reality is foundational for the functioning of the world and that it’s something that always existed. But there’s an argument that shared reality, on national levels, was really ushered in by the advent of broadcast media, first the radio, which was in over 50% of US households by the mid-1930s, and then the television, nuclear suburban families, and TV dinners. The hegemonic consolidation of the American dream was directly related to the projection of ABC, CBS, and NBC into each and every household. When cable opened up TV to more than three major networks, we began to witness the fragmentation and polarization of broadcast media into more camps, including those split along party lines, modern exemplars being Fox News and CNN. It is key to recognize that there were distinct and differing realities in this period, split along national lines (USA and Soviet Russia), ideological lines (pro- and anti-Vietnam), and scientific lines (the impact of smoking and asbestos). Even then, it was a large number of people with a small number of shared realities.

The spread of national identity via broadcast media didn’t come out of the blue. It was a natural continuation of similar impacts of “The Printed Word,” which Marshall McLuhan refers to as an “Architect of Nationalism” in Understanding Media:

Socially, the typographic extension of man brought in nationalism, industrialism, mass markets, and universal literacy and education. For print presented an image of repeatable precision that inspired totally new forms of extending social energies.

Note that the shared realities generated in the US in the 20th century weren’t only done so by national and governmental interests, but also by commercial and corporate interests: mass culture, the culture industries, culture at scale as a function of the rise of the corporation. There were strong incentives for commercial interests to create shared realities at scale across the nation because it’s easier to market and sell consumer goods, for example, to a homogeneous mass: one size fits all, one shape fits all. This was achieved through the convergence of mass media, modern marketing, and PR tactics.

Look no further than Edward Bernays, a double nephew of Freud who was referred to in his obituary as “the Father of Public Relations.” Bernays famously “used his Uncle Sigmund Freud’s ideas to help convince the public, among other things, that bacon and eggs was the true all-American breakfast.” In the abstract of his 1928 paper “Manipulating Public Opinion: The Why and the How,” Bernays wrote:

If the general principles of swaying public opinion are understood, a technique can be developed which, with the correct appraisal of the specific problem and the specific audience, can and has been used effectively in such widely different situations as changing the attitudes of whites toward Negroes in America, changing the buying habits of American women from felt hats to velvet, silk, and straw hats, changing the impression which the American electorate has of its President, introducing new musical instruments, and a variety of others.

The Century of Marketing began, in some ways, with psychoanalytical tools, marketing as a mode of reality generation, societal homogenization, and behavioral modification. A paradigm of this is how DeBeers convinced the West to adopt diamonds as the necessary gem for engagement rings. A horrifying and still relevant example is Purdue Pharma and the Sackler dynasty’s marketing of OxyContin.

The channels used by marketers were all of the culture industries, including broadcast media, a theme most evident in the work of the Frankfurt School, notably in that of Theodor Adorno and Max Horkheimer. Look no further than Adorno’s 1954 essay “How to Look at Television“:

The old cultured elite does not exist any more; the modern intelligentsia only partially corresponds to it. At the same time, huge strata of the population formerly unacquainted with art have become cultural “consumers.”

Although it was all the culture industries of the 20th century that worked to homogenize society at the behest of corporate interests, television was the one that we brought into our living rooms and that we eventually watched with family over dinner. Top-down reality-generation was centralized and projected into nuclear suburban homes.

Fast forward to today, the post-broadcast era, in which information travels close to the speed of light, in the form of lasers along fiber-optic cables and it’s both multi-platformed and personalized and everyone is a potential creator: reality, once again, is decentralized. In this frame, the age of shared reality was the anomaly, the exception rather than the rule. It’s perhaps ironic that one of the final throes of the age of shared reality was the advent of reality TV, a hyper-simulation of reality filtered through broadcast media. So now, in a fractured and fractal InfoLandscape, who do we look to in our efforts to establish some semblance of ground truth?

Verified Checkmarks and Village Elders

If our online communities are our InfoTribes, then the people we look to for ground truth are our village elders, those who tell stories around the campfire.

When COVID-19 hit, we were all scrambling around for information about reality in order to make decisions, and not only were the stakes a matter of life and death but, for every piece of information somewhere, you could find the opposite somewhere else. The majority of information, for many, came through social media feeds. Even when the source was broadcast media, a lot of the time it would be surfaced in a social media feed. Who did I pay attention to? Who did I believe? How about you? For better or for worse, I looked to my local (in an online sense) community, those whom I considered closest to me in terms of shared values and shared reality. On top of this, I looked to those respected in my communities. On Twitter, for example, I paid attention to Dr Eleanor Murray and Professor Nicholas Christakis, among many others. And why? They’re both leaders in their fields with track records of deep expertise, for one. But they also have a lot of Twitter followers and have the coveted blue verified checkmarks: in an InfoLandscape of such increasing velocity, we use rules of thumbs and heuristics around what to believe and what to not, including the validity and verifiability of the content creator, signaled by the number of followers, who the followers are (do I follow any of them? And what do I think of them?), and whether or not the platform has verified them.

If our online communities are our InfoTribes, then the people we look to for ground truth are our village elders, those who tell stories around the campfire. In the way they have insight into the nature of reality, we look to them as our illiterate ancestors looked to those who could read or as Pre-Reformation Christians looked to the Priests who could read Biblical Latin. With the emergence of these decentralized and fractured realities, we are seeing hand-in-hand those who rise up to define the realities of each InfoTribe. It’s no wonder the term Thought Leader rose to prominence as this landscape clarified itself. We are also arguably in the midst of a paradigm shift from content being the main object of verification online to content creators themselves being those verified. As Robyn Caplan points out astutely in Pornhub Is Just the Latest Example of the Move Toward a Verified Internet:

It is often said that pornography drives innovation in technology, so perhaps that’s why many outlets have framed Pornhub’s verification move as “unprecedented.” However, what is happening on Pornhub is part of a broader shift online: Many, even most, platforms are using “verification” as a way to distinguish between sources, often framing these efforts within concerns about safety or trustworthiness.

But mainstream journalists are more likely to be verified than independent journalists, men more likely than women, and, as Caplan points out “there is a dearth of publicly available information about the demographics of verification in general—for instance, whether BIPOC users are verified at the same rates as white users.” And it is key to note that many platforms are increasingly verifying and surfacing content created by “platform partners,“ an approach also driven by business incentives. Who decides who we listen to? And, as Shoshana Zuboff continually asks, Who decides who decides?

This isn’t likely to get better anytime soon, with the retreat to private and higher signal communication channels, the next generation of personalized products, the advent of deep fakes, the increasing amount of information we’ll be getting from voice assistants over the coming 5-10 years, the proportion of information consumed via ephemeral voice-only apps such as Clubhouse, and the possibility of augmented reality playing an increasing role in our daily lives.

So what to do? Perhaps instead of trying to convince people of what we believe to be true, we need to stop asking “What planet are you from?” and start looking for shared foundations in our conversations, a sense of shared reality. We also have a public awareness crisis on our hands as the old methods of media literacy and education have stopped working. We need to construct new methods for people to build awareness, educate, and create the ability to dissent. Public education will need to bring to light the true contours of the emergent InfoLandscapes, some key aspects of which I have attempted to highlight in this essay. It will also likely include developing awareness of all our information platforms as multi-sided marketplaces, a growing compendium of all the informational dark patterns at play, the development of informational diets and new ways to count InfoCalories, and bringing antitrust suits against the largest reality brokers. Watch these spaces.

Many thanks to Angela Bowne, Anthony Gee, Katharine Jarmul, Jamie Joyce, Mike Loukides, Emanuel Moss, and Peter Wang for their valuable and critical feedback on drafts of this essay along the way.


1. A term first coined in 1990 by the playwright Steve Teisch and that was the Oxford Dictionaries 2016 Word of the Year (source: Post-Truth and Its Consequences: What a 25-Year-Old Essay Tells Us About the Current Moment)
2. See Benedict Anderson’s Imagined Communities for more about the making of nations through shared reading of print media and newspapers.
3. I discovered this reference in Fred Turner’s startling book From Counterculture to Cyberculture, which traces the countercultural roots of the internet to movements such as the New Communalists, leading many tech pioneers to have a vision of the web as “a collaborative and digital utopia modeled on the communal ideals” and “reimagined computers as tools for personal [and societal] liberation.”
4. There is a growing movement recognizing the importance of information flows in society. See, for example, OpenMined’s free online courses which are framed around the theme that “Society runs on information flows.”
5. Think Twitter, for example, which builds communities by surfacing specific tweets for specific groups of people, a surfacing that’s driven by economic incentives, among others; although do note that TweetDeck, owned by Twitter, does not show ads, surface tweets, or recommend follows: perhaps the demographic that mostly uses TweetDeck doesn’t click on ads?
6. Having said this, there are some ethical constraints in the physical publishing business, for example, you can’t run an ad for a product across from an article or review of the product; there are also forms of transparency and accountability in physical publishing: we can all see what any given broadsheet publishes, discuss it, and interrogate it collectively.
7. Related concepts are the digital tribe, a group of people who share common interests online, and the memetic tribe, “a group of agents with a meme complex, or memeplex, that directly or indirectly seeks to impose its distinct map of reality—along with its moral imperatives—on others.”
8. Is it a coincidence that we’re also currently seeing the rise of non-linear note-taking, knowledge base, and networked thought tools, such as Roam Research and Obsidian?

Categories: Technology

The End of Silicon Valley as We Know It?

O'Reilly Radar - Thu, 2021/03/11 - 10:22

High-profile entrepreneurs like Elon Musk, venture capitalists like Peter Thiel and Keith Rabois, and big companies like Oracle and HP Enterprise are all leaving California. During COVID-19, Zoom-enabled tech workers have discovered the benefits of remote work from cheaper, less congested communities elsewhere. Is this the end of Silicon Valley as we know it? Perhaps. But other challenges to Silicon Valley’s preeminence are more fundamental than the tech diaspora.

Understanding four trends that may shape the future of Silicon Valley is also a road map to some of the biggest technology-enabled opportunities of the next decades:

  1. Consumer internet entrepreneurs lack many of the skills needed for the life sciences revolution.
  2. Internet regulation is upon us.
  3. Climate response is capital intensive, and inherently local.
  4. The end of the betting economy.
Inventing the future

“The best way to predict the future is to invent it,” Alan Kay once said. 2020 proved him both right and wrong. The coronavirus pandemic, or something worse, had long been predicted, but it still caught the world unprepared, a better future not yet invented. Climate change too has been on the radar, not just for decades but for over a century, since Arrhenius’s 1896 paper on the greenhouse effect. And it has long been known that inequality and caste are corrosive to social stability and predict the fate of nations. Yet again and again the crisis finds us unprepared when it comes.

In each case, though, the long-predicted future is still not foreordained. It is up to us whether we are steamrollered by events beyond our control or whether we have the collective power to invent a better future. Awakening may have come later than we might have wished, but crises like the pandemic and climate change can still be massive drivers of innovation. If entrepreneurs, investors, and governments step up to solve the hard problems that we face today, the future remains bright. But one thing is certain: the inventions we most urgently need will take us in a very different direction than the consumer internet and social media revolution that is coming to an unsightly end.

The coronavirus is a case in point. The explosion of biomedical invention that it has accelerated may well have impacts that extend well beyond the pandemic itself. mRNA vaccines have given us a promising path to COVID immunity, developed in record time. Moderna’s vaccine was created within only two days after Chinese scientist Yong-Zhen Zhang released the genetic sequence of the virus! And mRNA vaccines are also easily tweaked, raising the possibility of even quicker response to mutations, and even the creation of a framework for rapid development of many more vaccines. We are starting to see the payoff of radically new approaches to biomedical innovation, and in particular, the way that machine learning is turbocharging research. During 2020, more than 21,000 biomedical research papers made reference to AI and machine learning.

The recent announcement by DeepMind that its AlphaFold technology is able to predict protein structure with accuracy comparable to slow and costly experimental methods is a harbinger of breakthroughs to come. As geneticist Tim Hubbard wrote, “The genomes we believed were blueprints for life were effectively encrypted—this will unlock them and transform biological and biomedical research.”

Prediction: The nexus of machine learning and medicine, biology, and materials science will be to the coming decades what Silicon Valley has been to the late 20th and early 21st century.

Why might this mark the end of Silicon Valley as we know it? First, the required skills are different. Yes, machine learning, statistical analysis, and programming are all needed, but so is deep knowledge of relevant science. The hubs where that knowledge can be found are not the special province of Silicon Valley, suggesting that other regions may take the lead. Second, many of the markets where fortunes will be made are regulated; navigating regulated markets also takes skills that are conspicuously missing in Silicon Valley. Finally, as Theranos demonstrated so vividly, it is harder to sustain a hype balloon in a scientific enterprise than in many of the markets where Silicon Valley has prospered. Many Silicon Valley investors have been lucky rather than smart. They may not do so well in a world where capital must be directed toward solving hard problems rather than toward winning a popularity contest.

Mastering “the demons of our own design”

The opportunity of machine learning in scientific R&D is profound. But machine learning also challenges our current approach to science, which relies on human theorizing and experiments. A machine learning model may be able to make successful predictions but not to explain them. When Arthur C. Clarke wrote “Any sufficiently advanced technology is indistinguishable from magic,” was he imagining a future in which our own science would leave our understanding behind? As Judea Pearl has noted, excessive identification of correlations (i.e, “curve fitting”) makes the definition of authentic causal relationships more challenging. And “real science” needs causal relationships.

I suspect that we will come to terms with machine learning-enabled science, just as we’ve come to terms with instruments that let us see far beyond the capabilities of the naked eye. But without a better understanding of our machine helpers, we may set them down paths that take us to the edge of a cliff, much as we’ve done with social media and our fractured information landscape.

That fractured landscape is not what was predicted—internet pioneers expected freedom and the wisdom of crowds, not that we would all be under the thumb of giant corporations profiting from a market in disinformation. What we invented was not what we hoped for. The internet became the stuff of our nightmares rather than of our dreams. We can still recover, but at least so far, Silicon Valley appears to be part of the problem more than it is part of the solution.

Can technology platforms rein in the demons of our own design (to use Richard Bookstaber’s memorable phrase)? That too will be one of the challenges that shape the coming decades.

Government regulators in Europe and the US have set their sights on Facebook, Google, Amazon, and Apple, but the regulatory responses will be insufficient if they are based on old theories, old understandings that the platforms have already outstripped. The US theory of antitrust has largely been based on the question of consumer harm, which is difficult to prove in marketplaces where services are provided to consumers at zero cost and where the marginal cost of experimenting on those consumers is also close to zero. The emerging European regulatory effort is properly focused on the role of dominant tech firms as “gatekeepers.” It aims to systematically limit their ability to shape the market for their own advantage. Its remedies, though, are blunt, and the processes for assessing harms will most likely proceed more slowly than the harms themselves.

Markets are ecosystems, and like other ecosystems, there are hidden dependencies everywhere. The harm of Google abusing its monopoly position will not show up first in harm to consumers, but in depressed profits, decreased R&D investment, and lower wages at the web companies to whom Google once directed traffic. For Amazon, it will show up in the increased fees and advertising costs required to show up in product search.

These harms to the supply side of marketplace platforms, with the majority of the gains being captured by the winner of the winner-takes-all model that Silicon Valley has encouraged, do eventually cascade to consumers. But because the pain is widely distributed and because the platforms are not required to report the information that would make it visible, the problem will not be obvious until much of the damage is irreversible.

When the “superstar firms” ruthlessly compete with smaller firms that come up with fresh ideas, not only starving them of talent but often introducing copycat products and services, there is decreased innovation from the market as a whole. Cities are dominated by a new class of highly paid big-company employees driving up housing costs and forcing out lower wage workers; wages and working conditions of workers in less profitable industries are squeezed to drive the growth of the giants. Their very jobs are made contingent and disposable, with inequality baked in from the beginning of their employment. Governments are starved of revenue by giant companies that have mastered the art of tax avoidance. The list is far longer than that.

In the case of social media platforms, manipulation of users for profit has frayed the fabric of democracy and the respect for truth. Silicon Valley, which once harnessed the collective intelligence of its users, now uses its deep knowledge of its users to “trade against them.” (I predicted the broad outline of this turn back in 2007, after conversations with venture capitalist and economist Bill Janeway about what we might learn from Wall Street about the future of the internet.)

Technology is far from the only offender. It is merely the most visible mirror of our values as a society. The extractive behavior the tech giants exhibit has been the norm for modern capitalism since Milton Friedman set its objective function in 1970: “The social responsibility of business is to increase its profits.” This is all the sadder, though, since the tech industry set out to model something better. The generosity of open source software and the World Wide Web, the genius of algorithmically amplified collective intelligence are still there, pointing the way to the Next Economy, but it is an economy we must actively choose, rather than riding the rails of a system that is taking us in the wrong direction.

Prediction: Because platform businesses have failed to regulate themselves, they will have limits placed on their potential for good as well as harm.

It’s a sad time for Silicon Valley, because we are seeing not only the death of its youthful idealism but a missed opportunity. Paul Cohen, the former DARPA program manager for AI, made a powerful statement a few years ago at a meeting of the National Academy of Sciences that we both attended: “The opportunity of AI is to help humans model and manage complex interacting systems.”

That statement sums up so much of the potential that is squandered when firms like Google, Amazon, and Facebook fall prey to the Friedman doctrine rather than setting more ambitious goals for their algorithms.

I’m not talking about future breakthroughs in AI so much as I’m talking about the fundamental advances in market coordination that the internet gatekeepers have demonstrated. These powers can be used to better model and manage complex interacting systems for the good of all. Too often, though, they have been made subservient to the old extractive paradigm.

To explain what I mean requires a small aside.

Free market economists believe that the willingness of producers and consumers to agree on prices at which they will exchange goods or services (in idealized markets that are characterized by perfect competition with no asymmetries of power or information) leads to the best allocation of society’s resources. The solution to complex equations representing supply chains of self-interest is expressed in these market prices. Money, in effect, is the coordinating power behind Adam Smith’s “invisible hand.”

Like the anonymous internet wag who wrote, “The difference between theory and practice is always greater in practice than it is in theory,” economists recognize that perfect competition exists only in theory, that “externalities” exist where costs are borne by people other than the buyer and the seller, and that few markets are completely efficient. The role of the state, in many ways, is to address the shortcomings of the market. Diane Coyle’s book Markets, State, and People gives an excellent account of how economic policy makers think about the trade-offs they make when they intervene. Even at their best, though, the available interventions—taxes, monetary policy, and regulations—are piecemeal and take years or decades to agree on and implement. (Carbon pricing is a case in point.)

Google’s search engine has given us a convincing demonstration of a radically different method for managing an economic system. Constantly refined, dynamic, and infused with AI, Google’s algorithmic systems demonstrate that it is possible to manage an economy in ways not imagined by 20th century economists. 40,000 times a second, 3.5 billion times a day, Google’s centrally managed search performs the magic that, for so long, was thought to be the unique province of decentralized, self-interested actors transacting in priced markets.

In a brilliant stroke, Google built an algorithmic system that uses hundreds of distinct information signals to make the best match between tens of millions of information providers and billions of information consumers—but price is not one of those signals. That is not to say that Google does not participate in the money economy. Far from it. But  for Google’s first decade and a half, the priced market of pay-per-click advertising was a sidecar to the primary matching marketplace of search. The initial genius of Google was to run the market coordinated by collective intelligence (organic search) and the market coordinated by money (pay per click advertising) in parallel. And when producers with economic motivations manipulated organic search results for profit but to the detriment of Google’s users, producing pages that satisfied the algorithms but failed to satisfy consumers, Google was ruthless in updating the algorithms to focus on consumer benefit.

To be sure, a great deal of content on the World Wide Web and in social media is produced and consumed with commercial intent, but a remarkable amount is produced entirely without a profit motive. Google economists have told me that only six percent of Google search result pages carry any advertising at all. The other 94% of pages are the product of the joyful exuberance of humanity, creating and sharing for the joy of it. If there has ever been a harbinger of a possible economy of abundance, we can see it in the best of the internet sharing economy.

In recent years, though, Google has increasingly blurred the lines between the two information markets it manages (the price-free market of search and the priced market of advertising). And that has made commercially valuable search results less effective than those that have no purely economic value. That is, Google appears to match information producers and consumers more effectively in the absence of the distorting power of money.

So too Amazon. Unlike Google, Amazon has always used price as an important signal in its search rankings, but price was intelligently combined with measures of collective opinion—what other consumers thought was the best product—to create a market that was more efficient than any previous consumer goods marketplace. But in recent years, with the introduction of search advertising as a major new revenue line, Amazon too has turned away from using the tools of collective intelligence to find the best products for its customers. Its search is now dominated by “featured” products—that is, products that producers have paid to put in front of consumers. With advertising now one of the biggest drivers of Amazon’s profits, it is hard to imagine that the company can remain, as Jeff Bezos has proudly boasted, the most consumer-centric platform on earth. I wrote about this problem at length last year, in “Antitrust regulators are using the wrong tools to regulate big tech.”

So many of the problems that antitrust actions and other regulations are now gearing up to address are, paradoxically, the result of the prime directive by which our economic and legal system governs its corporations: “Thou must maximize profits.”

The notion of maximizing profit is so ingrained in our society that in 2014, when Facebook researchers published a paper called “Experimental Evidence of Massive-Scale Emotional Contagion Through Social Networks,” the response was swift and savage. It was considered a terrible breach of research ethics to test whether the mix of stories in the Facebook news feed made its readers happier or sadder. The reaction was particularly striking because no one seemed to notice that Silicon Valley explicitly celebrates and teaches its entrepreneurs how to manipulate the emotional state of users, calling it “growth hacking” or “A/B testing” or “creating habit-forming products.” No one complains about these experiments. It’s considered a best practice to experiment on your customers as long as it is in pursuit of growth and profits.

Because the cost of those experiments is so low—it’s a sunk cost of the business—experimental mistakes and unforeseen consequences are only to be expected. They become a new class of externality little considered by economists and regulators.

In retrospect, some formal experimentation on emotional contagion and reflection on its implications would have been a good idea. Instead, we continue to run global-scale unsupervised experiments on the power of social media to spread negative emotional contagion for profit, while any effort by the platforms to influence their users in positive directions is still considered by many to be inappropriate intervention, or is abandoned because it might reduce user activity and growth.

For example, during the 2020 US presidential election, Facebook engineers reportedly trained a machine learning algorithm to recognize posts that their users would consider “bad for the world,” but the company found that showing fewer of them reduced the number of user sessions and thus, presumably revenue and profits. So they retrained the algorithm to find the point where “bad for the world” posts were reduced but not by so much that they impacted user sessions. Other changes to optimize for “news ecosystem quality” were put in place for a few weeks leading up to the election, but reversed thereafter.

“Shareholder value” is so ingrained in corporate governance that a special class of corporation, “the public benefit corporation,” has been defined to protect companies that are managed to take other considerations than profit into account. All “normal” companies are expected to treat employees, the environment, and society as costs to be minimized, avoided, or eliminated.

Silicon Valley is a mirror of what is wrong with our economy and corporate governance, not the cause of it, or even the worst exemplar. (Tobacco, oil, and pharma companies vie for the top spot.)

In many ways, regulators can still learn from Silicon Valley. Our economy too is shaped by invisible algorithms and embedded objectives. If regulators can see the analogies between the way Google, Amazon, and Facebook’s algorithms shape their services and the way that law, tax, and monetary policy shape who gets what and why in our society, and why corporate leaders act the way they do, we can use the current moment to improve not only Silicon Valley but the fairness and the goals of our entire economy.

As I wrote last year in “We Have Already Let the Genie Out of the Bottle,” an essay for a Rockefeller Foundation workshop on regulating AI, our corporations and our government and our markets are what science fiction writer Charlie Stross calls “slow AIs.” I made the case that we cannot regulate them without rebuilding the rules by which they operate:

“Attempts at governance…are futile until we recognize that we have built a machine and set it on its course. Instead, we pretend that the market is a natural phenomenon best left alone, and we fail to hold its mechanism designers to account. We need to tear down and rebuild that machine, reprogramming it so that human flourishing, not corporate profit, becomes its goal. We need to understand that we can’t just state our values. We must implement them in a way that our machines can understand and execute.”

Silicon Valley can still lead in this effort. The big platforms must understand their social responsibility to create more value than they capture, focus their algorithmic systems on improving human welfare, find ways to measure and communicate the value that they create, and help our broader society to better “model and manage complex interacting systems.”

The danger of regulatory response that simply tries to turn back the clock and doesn’t take into account the ways technology done right could point the way forward is illustrated by the battle over California’s Proposition 22. Its passage overturned state regulations requiring gig economy companies to treat their workers as employees rather than independent contractors.

Traditional labor protections and benefits assumed a world in which individuals worked for a single employer. An attempt to impose those assumptions on companies reliant on gig workers was seen as an existential threat by those companies, who mounted a massive campaign against the new rules. Their customers agreed, and regulations were rolled back by the will of the public.

The gig economy companies have made some small steps toward flexible benefits on their own, but they are a pale shadow of what they might have been if the companies and their gig workers and their customers, not to mention their regulators, had been working together to build systems that would allow benefits to be managed as dynamically as employment. In the German model of stakeholder capitalism, workers are at the management table rather than pitted against the companies they work for. Is there a 21st century version of stakeholder capitalism yet to be designed, one that is not zero-sum but instead “models and manages complex interacting systems” to find better solutions for all?

As I argued five years ago in “Workers in a World of Continuous Partial Employment,” we need a much more robust benefit system that is centered on the worker, not on the company. The gig economy companies are not outliers. Continuous partial employment has become the norm in much of the economy. A combination of the rise of the Friedman doctrine and the demise of labor unions has reset the balance of power between companies and their workers. The legislative and regulatory response needs to address this power imbalance systematically, across the entire labor economy, using the capabilities of technology to create new models of cooperation between companies and their workers, and a safety net that catches everyone, not just a lucky few.

Climate change and the energy economy

The recent news that Elon Musk is one of the world’s richest people is also a harbinger of the biggest opportunity of the 21st century: to avert climate change. Electric vehicles are the tip of the iceberg. Heating and cooling, agriculture, raw materials and manufacturing—all need reinvention. Climate will reshape residential and office construction, insurance, finance, and where and how food is produced. Massive climate migrations have only just begun; tens or hundreds of millions of people will need to be resettled. Will we offer them shantytowns, or will we help them become settlers building a new, better world?

Prediction: There will be more climate billionaires created in the next two decades than in the internet boom.

With the exception of Musk, many of the already-minted climate billionaires are outside the US, highlighting the way that other countries already have the lead in these industries of the future. Bloomberg recently named a few: China’s Zeng Yuqun, Huang Shilin, Pei Zhenhua, and Li Ping (electric vehicle batteries), Li Zhenguo, Li Chunan, and Li Xiyan (solar panels and films), Lin Jianhua (solar panels and films), and Wang Chuanfu (electric vehicles); Germany’s Aloys Wobben (wind turbines); and Spain’s Jose Manuel Entrecanales (renewable power generation).

There are great fortunes yet to be made, of course. While Impossible Foods CEO Patrick Brown and Beyond Meat founder Ethan Brown (no relation) and Plenty’s Matt Barnard, Bowery’s Irving Fain, or Nordic Harvest’s Anders Riemann are not yet billionaires, it is quite possible that they will be. But for the most part, Silicon Valley entrepreneurs and investors are not leaders in this sector.

In any case, who will get rich helping us transition to a new energy economy is unimportant compared to the question of whether we will summon the political will to make the transition in time to avoid the most disastrous consequences of climate change, which could, at their worst, bring an end to civilization as we know it.

A strong argument can be made that only a crash mobilization of the economy to electrify everything can get us there in time. Saul Griffith, Alex Laskey, and Sam Calisch of the nonprofit Rewiring America have made just that argument. And here, the algorithms that guide our economy to focus on “efficiency” need to be questioned. As economist and former venture capitalist Bill Janeway said to me, mobilizations can get hung up and stalled out due to excessive concern with efficiency as the dominant metric of value.  “World War II was won on ‘the  momentum of production,’ “ he noted, quoting from The Struggle for Survival, his father Eliot Janeway’s book about the World War II mobilization. “Similarly the WPA put millions to work during the Depression precisely because effective employment—not efficiency—was the dominant goal.”

There are five pillars to Rewiring America’s case for electrification as the answer to our urgent need to limit greenhouse gas emissions:

1. Electrifying everything requires only half as much energy as our current system. Saul and his team worked with the US Department of Energy in 2018 to create an interactive map of all the source-to-use energy flows in America. This map of our energy economy was started under the Nixon administration, but its true implications are only now being realized. One of the surprising consequences of their analysis is that half the energy we use is spent collectively on things like mining and transporting fossil fuels, and in thermoelectric losses from converting them to electricity, to heat, or to movement. Direct electrification of as much of our economy as possible is not only achievable but also the fastest way to avert climate disaster.

2. We need to reconceive solar panels, batteries, electric cars, and electric appliances as part of our national energy infrastructure, even when they are on or in people’s homes, rather than thinking of infrastructure as something owned only by utilities or the government. Electric heat pumps can be used for both hot water and home heating; hot water storage can in effect act as a battery, heating up with solar electricity during the day and giving heat back at night. We won’t balance a future renewables-heavy grid without using local (i.e., home and business) batteries and thermal loads (water and space heat) as part of the overall demand response and storage.

3. Markets won’t move fast enough without a World War II-style mobilization of private industry. We need a heroic 4–5 year effort to get to 100% transformation of our energy infrastructure. Otherwise, we will have to wait for the natural replacement rate on infrastructure, which will take decades that we don’t have. That heroic 4–5 year effort gets us to the scale of production appropriate to enable 100% adoption of the solution technologies, which will then require a consistent 10–20 year rollout beyond that initial ramp-up period.

4. Electrifying the US will create jobs—lots of them. Rewiring America estimates that such an effort could, at peak, create as many as 25 million US jobs, and 5 million ongoing jobs in the new industries. The cost of the retrofit will be high, but so will the payoff, in both jobs and in savings to consumers.

Rooftop solar can produce at most 25% of the total needs of a fully electrified economy, so there’s still plenty of room and need for grid-scale solar—the electrified economy will require 3x the capacity of the current grid—but local is the cheapest energy and the best way to pass savings to the consumer, as well as to create highly localized jobs across the country.

Rooftop solar jobs are of necessity geographically decentralized, potentially enabling an ecosystem of small local firms rather than rewarding a few giants.

5. Who gets the financial benefit of this massive investment—utilities, solar installers, or consumers—depends on interest rates.
“The miracle technology is much more likely to be finance than it is to be fusion,” Saul said in a recent presentation. Arguably, it was the invention of the auto loan by Alfred P. Sloan of General Motors and the later financial innovation by the Roosevelt administration of the Federal Housing Authority and the home mortgage that created the US middle class, he noted. “Mortgages are time machines that let you have the future you want today.” We need something similar for the electrification transformation. Otherwise, “only rich people can afford to decarbonize today.”

Utilities already have access to low-cost loans. But consumers don’t, and if you want to create both jobs and cost savings for consumers, low-cost interest rates for home electrification are the best way to do it. Otherwise, the savings all get captured by middlemen, or by utilities, and adoption is much slower.

This observation is entirely in line with my broader point that regulations and the tax code play much the same role in shaping who gets what and why in markets as do the controlling algorithms in online platforms.

The end of casino capitalism?

The final, and perhaps most important, reason why Silicon Valley as we know it may be over is that its current incarnation is a product of the extraordinarily cheap capital of the years since the global financial crisis of 2009.

There are two economies, often confused: the operating economy, in which companies make and sell products and services, and the betting economy, in which wealthy people bet on which companies will win and which will lose in the beauty contest that stock markets have become. In the operating economy, the measure of success is, as Nick Hanauer and Eric Beinhocker memorably put it, “the solution to human problems.” Companies compete to solve those problems more effectively and earn a profit thereby. Along the way, they employ people productively, create valuable new goods and services, and contribute to their communities.

In the betting economy, the measure of success is stock price, the higher the better. Fueled by massive money creation by central banks, capital is abundant (for those who, by virtue of existing wealth, already have access to it), and traditional sources of return, such as interest on loans or ROI on investment in plants and equipment or employees, are dwarfed by the potential returns that can be achieved by playing on the madness of crowds. What can you call it but a bubble when the median valuation of this past year’s tech IPOs was 24 times trailing revenue, while tech IPOs during most of the past decade were only valued at about six times trailing revenue. Data collected by University of Florida professor Jay Ritter shows that it’s even worse than it appears: only 16% of 2020’s tech IPOs had any profits at all.

Capital markets do play an important role in our society. Bets on an unknown future are an important way to fund innovation and to build out infrastructure in advance of the prosperity that it will bring once that innovation has been widely deployed. But in today’s financialized economy, the returns on betting for its own sake have grown far faster than the returns on true operating investment.

There are many who will argue that the enormous payoffs coming to today’s entrepreneurs and investors are the result of their world-changing innovations. History suggests otherwise. There was plenty of innovation when the returns to investors and entrepreneurs were a fraction of what they are today.

Silicon Valley is named for the semiconductor manufacturing companies that became the foundation of all that followed. Intel, one of the most successful of those companies, went public in 1971 with a valuation of about $58 million (about $372 million in today’s dollars). Intel had a small profit when it went public, but it went on to earn hundreds of billions of dollars in operating profit over the succeeding decades. Apple and Microsoft, the standard bearers of the next generation of Silicon Valley companies, were also profitable at IPO. Two decades later, Google too was highly profitable when it went public, and while Amazon was one of the first companies to legitimize the profitless IPO, its losses were falling as it grew. All have turned into companies that generate enormous profits in the operating economy.

Few of the companies in the recent crop of Silicon Valley companies can make that claim. At its IPO early in 2020, Palantir had prior year revenues of $743 million, on which it posted a loss of $576 million. Uber went public in 2019 with an operating loss of over $3 billion on $11 billion in revenue. When DoorDash went public, it had revenues of $1.92 billion for the trailing nine months, on which it had a net loss of $149 million. All these companies have valuations in the tens of billions, making their founders and investors very rich, despite not making any money at all in the operating economy. In many cases, the money invested in these companies was used to create the illusion of growth, acquiring customers below the cost of delivering services to them. It is money invested in the promise of more money, a kind of Ponzi scheme decoupled from the operating economy.

Intel’s stock market investors were making a rational bet that a world-changing technology would earn a huge stream of future profits. Palantir’s, Uber’s, and DoorDash’s investors were betting on how other investors might value their stocks, much as 16th century Dutch investors bet on the “value” of unique tulips or mid-19th century British investors bet on the prospects for railroads in distant countries, many of which were never built. Some of these companies may eventually turn an operating profit, but it is likely that when they do, investors will realize that those profits don’t justify the sky-high valuations, which will then come back down to earth. As Benjamin Graham, the father of the style of value investing favored by Warren Buffett, is reported to have said, “In the short run, the market is a voting machine. In the long run, it is a weighing machine.”

Were Gordon Moore and Robert Noyce, the founders of Intel, less motivated to build world-changing products because the proceeds were orders of magnitude less than they are for today’s Silicon Valley entrepreneurs? I suspect that it is the other way around. The easy profits from today’s financial betting markets encourage unproductive innovation. I’d take Gordon Moore over WeWork’s Adam Neumann any day. When investors and entrepreneurs who promise future innovation but are unable to deliver it still walk away with billions, something is seriously wrong.

As John Maynard Keynes wrote in his General Theory during the depths of the Great Depression, “Speculators may do no harm as bubbles on a steady stream of enterprise. But the position is serious when enterprise becomes the bubble on a whirlpool of speculation. When the capital development of a country becomes a by-product of the activities of a casino, the job is likely to be ill-done.”

The problem is that money “invested” in the betting economy is not really invested. It is spent, just like money at the gaming table. When the WeWork bubble popped, the money SoftBank had spent propping up its valuation might just as well have gone up in smoke. The end of this process could look something like the financial crisis of 2009. Money invested in the collateralized debt obligations of the first decade of this century was not backed by true worth in the operating economy, so when the CDOs went bust, the money simply vanished.

Prediction: When the bubble ends, greater opportunities will remain.

One of the gifts—if you can call it that—of crises like the pandemic and climate change is that they may teach us that we no longer have time for frivolity. We need our investment capital to flow back to the operating economy.

There is a robust strategy for investors and entrepreneurs: Work on stuff that matters. Invest in solving problems. Make a real difference in people’s lives. You will know you have done that when operating profits fairly earned, not stock market gains, are your measure of investment success.

Two of the big areas of innovation that I highlight in this essay—life sciences and climate change—require large amounts of real investment capital. Unlike money invested in internet companies that used it to buy unprofitable growth, money invested in Tesla was used to build factories, to manufacture cars and electric batteries, and to roll out national charging networks.The path to high returns may take longer, but the need is real, and so is the value created.

Solving global crises requires the best of what we have to offer. If the best way to predict the future is to invent it, it’s time we got busy. Which world do we want to invent? It’s up to us.

Categories: Technology

Topic for the March Virtual Meeting

PLUG - Wed, 2021/03/10 - 10:30
Dawn E. Collett: The Privacy Tax: How tracking and hacking affect
Join the meeting by going to

Surveillance is everywhere. From CCTV cameras on the streets to cookies that track users around the web as they browse, the vast majority of human activity is monitored in some form. The burden of being watched disproportionately falls on marginalised groups, and people with disabilities are no exception in this regard. If your medical records contain sensitive information that you've discussed with your doctor to get appropriate treatment, information security suddenly becomes far more important to you. And if your only method of communication is connected to the Internet 24/7, and thus vulnerable to hackers, true privacy is a great deal harder to achieve.

In this talk, we'll examine how data-driven systems view disability, and look at how accessible technology can be exploited to reveal information about its users. We'll break down ways that disabled people can - and do - avoid tracking and hacking, and find out why paying a 'privacy tax' isn't always feasible for everyone. Finally, we'll discuss open-source software that's already reducing the impact of surveillance and security risks on people with disabilities, and examine what technologists can do to make the privacy tax less necessary.

Dawn is a DevOps/Site Reliability Engineer who started out as a freelance developer, and realised that learning about infrastructure and release systems would save time and money for everyone involved.
As well as accidental accessibility advocacy, Dawn is on the organising team for the Melbourne AWS Programming and Tools meetup, and can regularly be found sharing knowledge within the Melbourne cloud infrastructure and DevOps communities.
Outside work, Dawn is an occasional author and kitchen alchemist.

0x6C: Even More DMCA Exemption Requests!

FAIF - Tue, 2021/03/09 - 11:42

Karen and Bradley discuss two other DMCA exemptions filed by Software Freedom Conservancy during the 2020/2021 Triennial Rulemaking Process at the copyright office: one for wireless router firmwares and one for privacy research.

Show Notes: Segment 0 (00:39) Segment 1 (06:30) Segment 2 (29:10)
Categories: Free Software

The Next Generation of AI

O'Reilly Radar - Tue, 2021/03/09 - 06:46

Programs like AlphaZero and GPT-3 are massive accomplishments: they represent years of sustained work solving a difficult problem. But these problems are squarely within the domain of traditional AI. Playing Chess and Go or building ever-better language models have been AI projects for decades. The following projects have a different flavor:

  • Another group of researchers published an article about using NLP (natural language processing) to analyze viral genomes and, specifically, to predict the behavior of mutations. They were able to distinguish between errors in “syntax” (which make the gene non-viable), and changes in semantics (which result in a viable virus that functions differently).
  • Yet another group of researchers modelled a small portion of a fruit fly’s brain (the part used for smell), and were able to train that to create a model for natural language processing. This new model appears to be orders of magnitude more efficient than state-of-the-art models like GPT-3.

The common thread through these advances is applying work in one field to another area that’s apparently unrelated—not sustained research at cracking a core AI problem. Using NLP to analyze mutations? That’s brilliant—and it’s one of those brilliant things that sounds so obvious once you think about it. And it’s an area where NLP may have a real significant advantage because it doesn’t actually understand language, any more than humans understand DNA.

The ability to create artificial human genomes is important in the short term because the human genome data available to researchers is limited by privacy laws. Synthetic genomes aren’t subject to privacy laws, because they don’t belong to any person. Data limitations aren’t a new problem; AI researchers frequently face the problem of finding sufficient data to train a model. So they have developed a lot of techniques for generating “synthetic” data: for example, cropping, rotating, or distorting pictures to get more data for image recognition. Once you’ve realized that it’s possible to create synthetic data, the jump to creating synthetic genomes isn’t far-fetched; you just have to make the connection. Asking where it might lead in the long term is even more important.

It’s not hard to come up with more examples of surprising work that comes from bringing techniques from one field into another. DALL-E (which combines NLP with image analysis to create a new image from a description) is another example. So is ShadowSense, which uses image analysis to let robots determine when they are touched.

These results suggest that we’re at the start of something new. The world isn’t a better place because computers can play Go; but it may become a better place if we can understand how our genomes work. Using adversarial techniques outside of game play or NLP techniques outside of language will inevitably lead to solving the problems we actually need to solve.

Unfortunately, that’s really only half the story. While we may be on the edge of making great advances in applications, we aren’t making the same advances in fairness and justice. Here are some key indicators:

  • Attempts to train models to predict the pain that Black patients will suffer as a result of medical procedures have largely failed. Recently, research discovered that the models were more successful if they got their training data by actually listening to Black patients, rather than just using records from their doctors.
  • A study by MIT discovered that training predictive crime models on crime reports rather than arrests doesn’t make them less racist.

Fortunately, the doctors modeling medical pain decided to listen to their Black patients; unfortunately, that kind of listening is still rare. Listening to Black patients shouldn’t be a breakthrough akin to using NLP to analyze DNA. Why weren’t we listening to the patients in the first place? And why are the patients’ assessments of their pain so different from the doctors’?  This is clearly progress, but more than that, it’s a sign of how much progress has yet to be made in treating minorities fairly.

And I’m afraid that MIT has only discovered that there aren’t any historical data sources about crime that aren’t biased, something we already knew. If you look at so-called “white collar” crime, Midtown Manhattan is the most dangerous neighborhood in New York. But that’s not where the police are spending their time.  The only somewhat tongue-in-cheek paper accompanying the map of White Collar Crime Risk Zones suggests that their next step will be using “facial features to quantify the ‘criminality’ of the individual.”  That would clearly be a joke if such techniques weren’t already under development, and not just in China.

It looks like we’re at the cusp of some breakthroughs in AI—not new algorithms or approaches, but new ways to use the algorithms we already have. But the more things change, the more they stay the same. Our ability to think about our responsibilities of ethics and justice—and, more specifically, to put  in place mechanisms to redress harms caused by unfair decisions–are slow to catch up.

Categories: Technology

Radar trends to watch: March 2021

O'Reilly Radar - Mon, 2021/03/01 - 07:13

For a short month, a lot happened in February–perhaps because the US elections are behind us, perhaps because COVID case numbers are dropping, perhaps for any number of reasons. Some of the most interesting articles I’ve seen have been about the Internet of Things, ranging from wireless peas to Elon Musk’s neural interfaces.

AI and ML
  • An AI system is being used to train crisis counsellors. Roleplaying plays a critical part in training staff at suicide prevention services. The AI plays the patient, freeing staff so that they can spend more time helping clients, rather than training other staff.
  • Google, in firing Margaret Mitchell weeks after Timnit Gebru, has abandoned any pretense of ethical leadership in AI. What signals does a company send when it fires both leaders of the best ethics research team in the industry?  The paper Gebru and Mitchell co-authored with Emily Bender and Angelina McMillan-Major, On the Danger of Stochastic Parrots, is a must-read.
  • How an AI algorithm learns has an impact on bias and fairness: it’s not just training data. Harder examples are “learned” later, and attempts to shorten training forego accuracy for these portions of the training data.
  • Researchers are using generative neural networks to create synthetic human genomes.  These genomes are then used for research in genetics.  They aren’t subject to privacy restrictions because they don’t belong to anyone. Creating synthetic data isn’t a new idea, but this research pushes the limits in a spectacular way.
  • It’s not really surprising, but MIT reports that training predictive policing algorithms on crime reports rather than arrests doesn’t make them less racist.
  • Spotify has filed a patent on monitoring users’ speech to help it make recommendations. They’re looking for gender, age, emotion, ethnicity (via accent), and whether the user is alone or with others. It’s possible that this patent will never be put into production.
  • Building a digital twin for the Earth as an aid to modeling and decision-making is an aid to making precise predictions about what the future holds. The model will incorporate data on all aspects of the earth, including the impact of human systems. Whether building and running this model will take more energy than is used to mine Bitcoin is a worthwhile question.
  • The Texas power outages are a new chapter in the same old story, lack of investment in infrastructure: many of last year’s fires were due to infrastructure problems, as were massive outages in other states. We need to re-think the US power grid, which is not ready for the transition to renewable resources.
  • Hydrogen may play a role in reducing carbon emissions. Using wind or solar power to produce hydrogen by electrolysis might be an effective way to store energy, though that begs the question of how hydrogen itself is stored.
  • Over the last decades, APIs have evolved from complex (RPC) interfaces to REST to GraphQL, and in doing so, have enabled new business models based on disaggregating services. Where is the evolution of APIs headed?
  • Microsoft’s DAPR reaches 1.0: Microsoft’s Distributed Application Runtime (DAPR) is an open-source attempt to make Kubernetes less difficult. Now that cloud services themselves have largely been commoditized, this is the level at which cloud vendors will try to control.
  • Low-code databases have been around since Microsoft Access (1992); they’re proliferating as data democratizes, enabling people who need to work with data to build the tools they need. Joe Hellerstein’s New directions in cloud programming is proposing programming languages that take cloud computing “beyond serverless.” The idea is to decouple semantics, availability, consistency, and optimization. This may free cloud computing from the complexity of Kubernetes and other orchestration technologies.
  • Procedural Connectivity is a technique that radically reduces the storage required for large simulations. Programs that required a supercomputer can now run on a laptop with a GPU.
  • Web programming without frameworks: Is “the stackless way” the route to simplification? For many web applications, JavaScript Modules and Web Components may be the path away from React and Angular.
  • Hotwire is a new minimal-JavaScript web framework that relies on Rails for the backend. Developed by @dhh. Do we really need another JavaScript framework?
  • No more third party tracking cookies: The Chrome browser will cease to support 3rd party cookies, in favor of a Google alternative called “federated learning of cohorts” (FLoC). FLoC is a win for privacy; it also tips the balance of power in advertising further in Google’s direction.
  • Will the response to COVID push us into the era of data-driven medicine? If nothing else, it has taught us what’s possible in a limited amount of time.
  • Using neural networks to discover antibiotics isn’t a new idea, but researchers appear to be making progress, both in discovering new compounds and in discovering new mechanisms.
  • mRNA (messenger RNA), the basis for the Pfizer and Moderna COVID vaccines, is useful for many applications.  There’s potential for treating genetic diseases (like Sickle Cell Anemia), HIV, and many other conditions. COVID may have pushed us to make a great leap forward.
  • Spinach that can send email: The spinach has been engineered to detect certain chemicals in the ground, and send a wireless signal to sensors. This clearly has big implications for precision agriculture and for an internet of (living) things.
  • Computing touch: ShadowSense can give soft robots the ability to sense “touch” using cameras to analyze shadows. Does this pave the way for interactions between robots and humans?
  • France is requiring manufacturers to publish a “repairability index” with their products to minimize the creation of waste.
  • Elon Musk’s Neuralink hopes to begin human trials of (wireless) direct computer-brain interfaces this year. Fixing neurological conditions like paralysis is one thing, but I worry about unmediated social media.
  • Beyond 5G: Transceivers for digital communication in the 300 GHz band.  That is very unknown (and unused) territory.
  • More supply chain attacks: a researcher demonstrated that it was possible to insert code into corporate projects by uploading modules to package managers (npm, Ruby Gems, pip) that matched the names of internal packages.
  • API-first brings its own set of security problems. APIs by definition have large attack surfaces. Developers need a better understanding of security basics, along with better systems for detecting attacks in real-time.
  • China is way ahead of the rest of the world (including the US) in developing a virtual currency.  Their goal is to replace the dollar and become the standard currency for the international monetary system.  This would give them a highly detailed view of the flow of money (down to individual transactions), both within China and internationally. However, China also sees virtual currency as a means of increasing social control within the country. These two objectives seem mutually exclusive. It will be interesting to see how they bridge the gap.
  • Major credit cards and other institutions are beginning to adopt cryptocurrencies for payment.
  • The Gamestop short squeeze is a new phenomenon: a meme threatening Wall Street. Whether it was just a meme that grew without control, or an intentional movement to punish hedge funds, it’s once again apparent that social media can break the assumptions on which “business as usual” depends.
Quantum Computing
  • The biggest problem facing quantum computing is error correction.  Is a fault-tolerant quantum computer possible?  Some intriguing results show that it might be.
Categories: Technology

Product Management for AI

O'Reilly Radar - Fri, 2021/02/26 - 12:40

A couple of years ago, Pete Skomoroch, Roger Magoulas, and I talked about the problems of being a product manager for an AI product. We decided that would be a good topic for an article, and possibly more.

After Pete and I wrote the first article for O’Reilly Radar, it was clear that there was “more”–a lot more.  We then added Justin Norman, VP of Data Science at Yelp, to the team.  Justin did the lion’s share of the work from that point on.  He has a great perspective on product management and AI, with deep practical experience with real-world products: not just building and deploying them, but shepherding them through the process from the initial idea to maintaining them after employment–including interfacing with management.

Many organizations start AI projects, but relatively few of those projects make it to production.  These articles show you how to minimize your risk at every stage of the project, from initial planning through to post-deployment monitoring and testing.  We’ve said that AI projects are inherently probabilistic. That’s true at every stage of the process.  But there’s no better way to maximize your probability of success than to understand the challenges you’ll face.


Product Management for AI


What you need to know about product management for AI
Practical Skills for the AI Product Manager
Bringing an AI Product to Market

Categories: Technology

5 things on our data and AI radar for 2021

O'Reilly Radar - Fri, 2021/02/19 - 08:23

Here are some of the most significant themes we see as we look toward 2021. Some of these are emerging topics and others are developments on existing concepts, but all of them will inform our thinking in the coming year.


MLOps attempts to bridge the gap between Machine Learning (ML) applications and the CI/CD pipelines that have become standard practice. ML presents a problem for CI/CD for several reasons. The data that powers ML applications is as important as code, making version control difficult; outputs are probabilistic rather than deterministic, making testing difficult; training a model is processor intensive and time consuming, making rapid build/deploy cycles difficult. None of these problems are unsolvable, but developing solutions will require substantial effort over the coming years.

The Time Is Now to Adopt Responsible Machine Learning

The era in which tech companies had a regulatory “free ride” has come to an end. Data use is no longer a “wild west” in which anything goes; there are legal and reputational consequences for using data improperly.  Responsible Machine Learning (ML) is a movement to make AI systems accountable for the results they produce.  Responsible ML includes explainable AI (systems that can explain why a decision was made), human-centered machine learning, regulatory compliance, ethics, interpretability, fairness, and building secure AI. Until now, corporate adoption of responsible ML has been lukewarm and reactive at best. In the next year, increased regulation (such as GDPR, CCPA), antitrust, and other legal forces will force companies to adopt responsible ML practices.

The Right Solution for Your Data: Cloud Data Lakes and Data Lakehouses

Data lakes have experienced a fairly robust resurgence over the last few years, specifically cloud data lakes. With more businesses migrating their data infrastructure to the cloud, as well as the increase of open source projects driving innovation in cloud data lakes, these will remain on the radar in 2021. Similarly, the data lakehouse, an architecture that features attributes of both the data lake and the data warehouse, gained traction in 2020 and will continue to grow in prominence in 2021. Cloud data warehouse engineering develops as a particular focus as database solutions move more and more to the cloud.

A Wave of Cloud-Native, Distributed Data Frameworks

Data science grew up with Hadoop and its vast ecosystem.  Hadoop is now last decade’s news, and momentum has shifted to Spark, which now dominates the way Hadoop used to. But there are new challengers out there. New distributed computing frameworks like Ray and Dask are more flexible, and are cloud-native: they make it very simple to move workloads to the cloud.  Both are seeing strong growth. What’s the next platform on the horizon?  We’ll see in the coming year.

Natural Language Processing Advances Significantly

This year, the biggest story in AI was GPT-3, and its ability to generate almost human-sounding prose.  What will that lead to in 2021? There are many possibilities, ranging from interactive assistants and automated customer service to automated fake news. Looking at GPT-3 more closely, here are the questions you should be asking. GPT-3 is being delivered via an API, not by incorporating the model directly into applications. Is “Language-as-a-service” the future? GPT-3 is great at creating English text, but has no concept of common sense or even facts; for example, it has recommended suicide as a cure for depression.  Can more sophisticated language models overcome those limitations?  GPT-3 reflects the biases and prejudices that are built into languages. How are those to be overcome, and is that the responsibility of the model or of the application developers?  GPT-3 is the most exciting development to appear during the last year; in 2021, our attention will remain focused on it and its successors. We can’t help but be excited (and maybe a little scared) by GPT-4.

O’Reilly’s online learning platform can give your employees the resources they need to upskill and stay up to date on AI, data and hundreds of other technology and business topics. Request a demo.

Categories: Technology

5 infrastructure and operations trends to watch in 2021

O'Reilly Radar - Fri, 2021/02/19 - 08:22

Change is the only constant in the technology world, and that’s particularly true in the realm of sysops, infrastructure, and security. Here’s a look ahead to 2021 and five of the trends we’re watching closely.

Kubernetes Complexity

To say that Kubernetes is the leading container orchestration platform is an understatement.  It’s the only container orchestration platform that counts. However, to say that Kubernetes is complex is also an understatement.  The learning curve is both steep and long. How will developers address Kubernetes’ complexity? We are starting to see some simpler alternatives for specific use cases–for example, K3S for edge computing.  Will that trend continue?  Or will Kubernetes be subsumed into cloud providers’ management consoles in a way that simplifies the options available to developers (aka “opinionated”)? We don’t know, but we believe that an important trend for the next year will be attempts to simplify cloud orchestration.

Site Reliability and Observability

The sea change in workplace dynamics brought about by Covid 19 has had a parallel effect on the world of site reliability engineering. Companies for whom an online presence was an afterthought are now finding it essential to survival. And they’re also finding it necessary to keep their online presence available 24×7. So we foresee an increase in the demand for site reliability engineers–but beyond that, we expect an emphasis on the tools that SREs need. Look for a heavy emphasis on system observability and its fruits: high speed, actionable data allowing engineers to understand, prevent, and mitigate outages. Although it’s only part of the story, we’re particularly interested in OpenTelemetry, a vendor neutral standard for collecting system data. OpenTelementery promises an array of more refined and calibrated open source tools for observability in the years ahead.

GitOps: The Future of DevOps

For a decade or more, the slogan “Infrastructure as Code” has driven efforts to make configuration programmable. We’ve made lots of progress; and perhaps the best example of that progress is Kubernetes, which orchestrates the deployment, creation, and construction of containers. There’s one more piece (for now) to consider: how do you automate the configuration of Kubernetes itself? How do you make new deployments faster, while minimizing the possibility of human error at the same time? That’s achieved by using Git to manage Kubernetes’ configuration files and any other artifacts it needs to run. When anything changes, a Kubernetes operator manages the process of informing Kubernetes and related orchestration tools and gradually pushing the deployed system to the desired state. GitOps may be the ultimate expression of “Infrastructure as Code”; we expect it to have a big impact in the coming year.

Cyber Resilience

Saying that cyber threats will increase, and that attacks will become more dangerous, hardly qualifies as a prediction or a trend. The sophisticated cyber attacks that compromised the U.S. Treasure, Commerce, and Homeland Security departments are, sadly, hardly surprising. What’s more important is how organizations respond to those threats. In the past, most companies have taken a reactive approach to security: address breaches as they happen, and if nothing happens, they’ve spent too much on security. That approach has failed time and time again. In the coming year, we expect companies to take a dynamic, holistic approach that strengthens their security posture. Steps towards resilience include having a robust Identity and Access Management policy by implementing zero trust, MFA, and passwordless authentication.  Expect to see increased use of AI and Machine Learning (ML) by both good and bad actors. Bad actors will use AI to find and exploit new vulnerabilities (including vulnerabilities in AI systems themselves); security teams will use AI and ML tools to detect and block attacks, and to automate routine tasks.

Multi-cloud and Hybrid Clouds

It’s too easy to think of “the cloud” as a place: a single virtual location, with a single provider. But that’s not reality.  As IBM has said frequently, the cloud is a capability, not a destination.  By the time most companies start thinking seriously about a “cloud strategy,” they already have pilot projects in multiple clouds.  Mergers and acquisitions complicate the situation even more, as does data that has to remain on-premises for regulatory or security reasons. What counts isn’t moving applications to a specific provider, but having a uniform interface that lets you use capabilities regardless of their physical location. 2021 will be the year that companies (officially) adopt multi- and hybrid clouds, removing the operational and developmental barriers between their own, on-premises IT and cloud providers. We will discover what it really means to be “cloud-native.”

O’Reilly’s online learning platform can give your employees the resources they need to upskill and stay up to date on Kubernetes, SRE, cybersecurity, cloud, and hundreds of other technology and business topics. Request a demo.

Categories: Technology

The Wrong Question

O'Reilly Radar - Tue, 2021/02/09 - 05:19

“If they can get you asking the wrong questions, they don’t have to worry about answers.”

Thomas Pynchon, Gravity’s Rainbow

The deplatforming of Donald Trump and his alt-right coterie has led to many discussions of free speech.  Some of the discussions make good points, most don’t, but it seems to me that all of them miss the real point.  We shouldn’t be discussing “speech” at all; we should be discussing the way social platforms amplify certain kinds of speech.

What is free speech, anyway?  In a strictly legal sense, “free speech” is only a term that makes sense in the context of government regulation. The First Amendment to the US constitution says that the government can’t pass a law that restricts your speech. And neither Twitter nor Facebook are the US government, so whatever they do to block content isn’t a “free speech” issue, at least strictly interpreted.

Admittedly, that narrow view leaves out a lot. Both the right and the left can agree that we don’t really want Zuck or @jack determining what kinds of speech are legitimate. And most of us can agree that there’s a time when abstract principles have to give way to concrete realities, such as terrorists storming the US capitol building. That situation resulted from years of abusive speech that the social platforms had ignored, so that when the corporate power finally stepped in, their actions were too little, too late.

But as I said, the focus on “free speech” misframes the issue. The important issue here isn’t speech itself; it’s how and why speech is amplified—an amplification that can be used to drown out or intimidate other voices, or to selectively amplify voices for reasons that may be well-intended, self-interested, or even hostile to the public interest. The discussion we need, the discussion of amplification and its implications, has largely been supplanted by arguments about “free speech.”

In the Third Amendment, the US Constitution also guarantees a “free press.” A free press is important because the press has the power of replication: of taking speech and making it available more broadly. In the 18th, 19th, and 20th centuries, that largely meant newspapers, which had the ability to reproduce tens of thousands of copies overnight. But freedom of the press has an important limitation. Anyone can talk, but to have freedom of the press you have to have a press–whether that’s a typewriter and a mimeograph, or all the infrastructure of a publisher like The New York TImes, CNN, or Fox News. And being a “press” has its own constraints: an editorial staff, an editorial policy, and so on. Because they’re in the business of replication, it’s probably more correct to think of Twitter and Facebook as exercising “press” functions.

But what is the editorial function for Facebook, Twitter, YouTube, and most other social media platforms? There isn’t an editor who decides whether your writing is insightful. There’s no editorial viewpoint. There’s only the shallowest attempt to verify facts. The editorial function is driven entirely by the desire to increase engagement, and this is done algorithmically. And what algorithms have “learned” perhaps isn’t surprising: showing people content that makes them angry is the best way to keep them coming back for more. And the more they come back, the more ads are clicked, and the more income flows in. Over the past few years, that editorial strategy has certainly played into the hands of the alt-right and neo-Nazi groups, who learned quickly how to take advantage of it. Nor have left-leaning polemicists missed the opportunity. The battle of overheated rhetoric has cheapened the public discourse and made consensus almost unattainable. Indeed, it has made attention itself unattainable: and, as Peter Wang has argued, scarcity of attention–particularly the “synchronous attention of a group”–is the biggest problem we face, because it rules out thoughtful consensus.

Again, that’s been discussed many times over the past few years, but we seem to have lost that thread. We’ve had reproduction—we’ve had a press—but with the worst possible kind of editorial values. There are plenty of discussions of journalistic values and ethics that might be appropriate; but an editorial policy that has no other value than increasing engagement doesn’t even pass the lowest bar. And that editorial policy has left the user communities of Facebook, Twitter, YouTube, and other media vulnerable to deafening feedback loops.

Social media feedback loops can be manipulated in many ways: by automated systems that reply or “like” certain kinds of content, as well as by individual users who can also reply and “like” by the thousands.  And those loops are aided by the platforms’ recommendation systems: either by recommending specific inflammatory posts, or by recommending that users join specific groups. An internal Facebook report showed that, by their own reckoning, 70% of all “civic” groups on Facebook contained “hate speech, misinformation, violent rhetoric, or other toxic behavior”; and the company has been aware of that since 2016.

So where are we left?  I would rather not have Zuck and @jack determine what kinds of speech are acceptable. That’s not the editorial policy we want.  And we certainly need protections for people saying unpopular things on social media; eliminating those protections cuts both ways. What needs to be controlled is different altogether: it’s the optimization function that maximizes engagement, measured by time spent on the platform. And we do want to hold Zuck and @jack responsible for that optimization function, just as we want the publisher of a newspaper or a television news channel to be responsible for the headlines they write and what they put on their front page.

Simply stripping Section 230 protection strikes me as irrelevant to dealing with what Shoshana Zuboff terms an “epistemic coup.” Is the right solution to do away with algorithmic engagement enhancement entirely?  Facebook’s decision to stop recommending political groups to users is a step forward. But they need to go much farther in stripping algorithmic enhancement from their platform. Detecting bots would be a start; a better algorithm for “engagement,” one that promotes well-being rather than anger, would be a great ending point. As Apple CEO Tim Cook, clearly thinking about Facebook, recently said, “A social dilemma cannot be allowed to become a social catastrophe…We believe that ethical technology is technology that works for you… It’s technology that helps you sleep, not keeps you up. It tells you when you’ve had enough. It gives you space to create or draw or write or learn, not refresh just one more time.”  This reflects Apple’s values rather than Facebook’s (and one would do well to reflect on Facebook’s origins at Harvard); but it is leading towards the right question.

Making people angry might increase shareholder value short-term. But that probably isn’t a sustainable business; and if it is, it’s a business that does incredible social damage. The “solution” isn’t likely to be legislation; I can’t imagine laws that regulate algorithms effectively, and that can’t be gamed by people who are willing to work hard to game them. I guarantee that those people are out there. We can’t say that the solution is to “be better people,” because there are plenty of people who don’t want to be better; just look at the reaction to the pandemic. Just look at the frustration of the many Facebook and Twitter employees who realized that the time to lay aside abstract principles like “free speech” was long before the election.

We could perhaps return to the original idea of “incorporation,” when incorporation meant a “body created by law for the purpose of attaining public ends through an appeal to private interests”–one of Zuboff’s solutions is to “tie data collection to fundamental rights and data use to public services.” However, that would require legal bodies that made tough decisions about whether corporations were indeed working towards “public ends.”  As Zuboff points out earlier in her article, it’s easy to look to antitrust, but the Sherman Antitrust Act was largely a failure.  Would courts ruling on “public ends” be any different?

In the end, we will get the social media we deserve. And that leads to the right question. How do we build social media that maintains social good, rather than destroying it?  What kinds of business models are needed to support that kind of social good, rather than merely maximizing shareholder value?

Categories: Technology

Radar trends to watch: February 2021

O'Reilly Radar - Mon, 2021/02/01 - 07:54

A lot happened in the last month, and not just in Washington. Important developments appeared all through the technology world. Perhaps the most spectacular was the use of Natural Language Processing techniques to analyze viral DNA. It’s actually sort of obvious once you think about it. If DNA is a language, then it should have syntax and semantics. And tools that don’t actually understand the language might have a unique ability to analyze it.

  • Can a fruit fly learn word embeddings? Researchers have modelled the structure of the portion of a fruit-fly’s brain that is used for smell perception, and trained that model for natural language processing. It appears to work, and requires a fraction of the training time and power used by current approaches.
  • Facebook is using AI to generate verbal descriptions of photos, which can then be read back to blind or vision-impaired users.  This application combines image recognition, concept recognition, natural language processing, and voice synthesis.
  • To train AI systems to evaluate pain in Black patients correctly, don’t train them to match doctors’ evaluations; doctors systematically undervalue pain in Black patients. Take the patient’s assessment as truth. This shouldn’t need to be said, but it’s important that it has been said.
  • The Allen Institute’s Genie is a human-in-the-loop tool for evaluation of synthetic texts produced by NLP. Genie coordinates the work of crowdsourced humans who annotate NLP output, among other things, standardizing their annotations.
  • Explainability is good, but it only goes part way. Can an AI system teach humans how it solves problems like the Rubik’s Cube? This is a new frontier.
  • Natural language algorithms can detect genetic mutations, specifically in Coronavirus.  Essentially, a mutation looks like a sentence that has changed its meaning. It’s a spectacular example of interdisciplinary AI applications.
  • Startups are offering tools to help monitor and audit AI systems for ethical issues like fairness and bias. This kind of business has been needed for a long time; Cathy O’Neil has been a pioneer in AI auditing. The world may be ready now.
  • Generating pictures from descriptive text with GPT-3: Another tour de force. If you can imagine it or describe something, DALL:E and CLIP might be able to draw it.  (Still, I’d like to know how many bad drawings they had to discard before they came up with the avocado armchair.)
  • Expressive Robotics: robots that understand (and can create) human expressions. This is an important step in making interactions between humans and robots less creepy. But more than that, it can be a safety issue. Can an autonomous vehicle read the expressions of bicyclists and pedestrians and use that to make predictions about what they will do?
  • RoboGrammar describes robot designs in a way that makes the physical design programmable and highly adaptable to different environments and applications.  It’s a step towards machine-learning based design tools for robotics.
Security and Privacy
  • A government (probably North Korea) is targeting security researchers in the US and elsewhere, using various forms of social engineering (including asking researchers to collaborate on a research project) and malware.
  • Guerilla tactics in the struggle against online surveillance: MIT Technology Review does a study of Ad Nauseam, a browser extension to create random ad clicks, and its effectiveness. The goal of Ad Nauseam isn’t so much to protect individuals, though it may do that; it’s to disrupt the entire advertising ecosystem.
  • The parent of all low-code languages, Excel, gets user definable functions. It’s now Turing-complete. Not just that, it’s a functional language.  (That is not a pun; functions are true lambdas.)
  • The new Raspberry Pi Pico is a $4 microcontroller board that can be used for almost any kind of project. It’s very cheap, widely available, and programmable in MicroPython.
  • Distributed systems from the command line: Posh is a data-aware shell that can send data-intensive processing tasks off to remote systems. It works by adding metadata to common UNIX commands for working with files.
  • Continuous documentation? Continuous all the things! Integrating tutorial style documentation (and testing of documentation) with CI pipelines as a way of helping new hires is certainly a new idea. Very few companies take documentation seriously; could this be the start of a trend?
  • RStudio is continuing to incorporate new features to support Python, including support for VSCode and Jupyter Notebooks. RStudio looks like it’s positioning itself as a general purpose (not language-specific) platform for data development.
  • Julia adoption continues to grow. We’ve been watching Julia for a long time. It’s not going to displace Python or R in the near future, but it has definitely become a contender.
Biology and Medicine
  • The Pandemic Technology Project is evaluating how tools like exposure-tracking apps and algorithms for determining who should get vaccinations are working in practice.
  • Senti Bio is building tools to make biology programmable: literally building control flow into genetic circuits, with the goal of programming better vaccines and other drugs. This is the future that synthetic biology has been looking for: is it possible to build medications that actually incorporate complex logic?
  • Japan’s COINS center is working on several moon-shot projects for medicine. A hospital in every body aims to develop organic nanomachines that live permanently in your body and can treat diseases autonomously, or send data outside for diagnosis and treatment planning.
  • Is the future of social media audio?  It’s an interesting thesis; even though Clubhouse has gotten poor reviews, there’s a new wave of apps designed for audio-based social networking. Discord is well-positioned; safety and content moderation are issues.
  • TabFS is a browser extension that allows you to mount your browser tabs as a filesystem. While this seems like gratuitous hackery, it means that just about everything a browser can do becomes scriptable with standard Unix/Linux utilities.
  • Webassembly Studio: There’s not much information, but this is clearly some kind of IDE for working with wasm, with support for projects in C, Rust, Web Assembly Script, and Wat (whatever that is). The existence of an IDE is more important than the IDE itself; wasm won’t succeed unless tools become widely available.
Quantum Computing
  • Quantum algorithms for nonlinear dynamics: We are slowly expanding the domain in which quantum computers will be able to deliver useful results. Nonlinear systems show up everywhere (fluid flow, hence weather, for example), but are extremely difficult to model with classical techniques.
  • Towards a quantum internet: Next generation networking might be based on the recent advances in quantum teleportation.
  • Are hologram capabilities the next step in smartphones?  Hologram projectors don’t require goggles or headsets; they could be what makes “virtual reality” real.
  • Google employees unionize: This union is less about collective bargaining than about social issues, and about non-employee staff (contractors, etc.) who don’t benefit from traditional collective bargaining.
Categories: Technology

Where Programming, Ops, AI, and the Cloud are Headed in 2021

O'Reilly Radar - Mon, 2021/01/25 - 05:03

In this report, we look at the data generated by the O’Reilly online learning platform to discern trends in the technology industry—trends technology leaders need to follow.

But what are “trends”? All too often, trends degenerate into horse races over languages and platforms. Look at all the angst heating up social media when TIOBE or RedMonk releases their reports on language rankings. Those reports are valuable, but their value isn’t in knowing what languages are popular in any given month. And that’s what I’d like to get to here: the real trends that aren’t reflected (or at best, are indirectly reflected) by the horse races. Sometimes they’re only apparent if you look carefully at the data; sometimes it’s just a matter of keeping your ear to the ground.

In either case, there’s a difference between “trends” and “trendy.” Trendy, fashionable things are often a flash in the pan, forgotten or regretted a year or two later (like Pet Rocks or Chia Pets). Real trends unfold on much longer time scales and may take several steps backward during the process: civil rights, for example. Something is happening and, over the long arc of history, it’s not going to stop. In our industry, cloud computing might be a good example.


This study is based on title usage on O’Reilly online learning. The data includes all usage of our platform, not just content that O’Reilly has published, and certainly not just books. We’ve explored usage across all publishing partners and learning modes, from live training courses and online events to interactive functionality provided by Katacoda and Jupyter notebooks. We’ve included search data in the graphs, although we have avoided using search data in our analysis. Search data is distorted by how quickly customers find what they want: if they don’t succeed, they may try a similar search with many of the same terms. (But don’t even think of searching for R or C!) Usage data shows what content our members actually use, though we admit it has its own problems: usage is biased by the content that’s available, and there’s no data for topics that are so new that content hasn’t been developed.

We haven’t combined data from multiple terms. Because we’re doing simple pattern matching against titles, usage for “AWS security” is a subset of the usage for “security.” We made a (very) few exceptions, usually when there are two different ways to search for the same concept. For example, we combined “SRE” with “site reliability engineering,” and “object oriented” with “object-oriented.”

The results are, of course, biased by the makeup of the user population of O’Reilly online learning itself. Our members are a mix of individuals (professionals, students, hobbyists) and corporate users (employees of a company with a corporate account). We suspect that the latter group is somewhat more conservative than the former. In practice, this means that we may have less meaningful data on the latest JavaScript frameworks or the newest programming languages. New frameworks appear every day (literally), and our corporate clients won’t suddenly tell their staff to reimplement the ecommerce site just because last year’s hot framework is no longer fashionable.

Usage and query data for each group are normalized to the highest value in each group. Practically, this means that you can compare topics within a group, but you can’t compare the groups with each other. Year-over-year (YOY) growth compares January through September 2020 with the same months of 2019. Small fluctuations (under 5% or so) are likely to be noise rather than a sign of a real trend.

Enough preliminaries. Let’s look at the data, starting at the highest level: O’Reilly online learning itself.

O’Reilly Online Learning

Usage of O’Reilly online learning grew steadily in 2020, with 24% growth since 2019. That may not be surprising, given the COVID-19 pandemic and the resulting changes in the technology industry. Companies that once resisted working from home were suddenly shutting down their offices and asking their staff to work remotely. Many have said that remote work will remain an option indefinitely. COVID had a significant effect on training: in-person training (whether on- or off-site) was no longer an option, so organizations of all sizes increased their participation in live online training, which grew by 96%. More traditional modes also saw increases: usage of books increased by 11%, while videos were up 24%. We also added two new learning modes, Katacoda scenarios and Jupyter notebooks, during the year; we don’t yet have enough data to see how they’re trending.

It’s important to place our growth data in this context. We frequently say that 10% growth in a topic is “healthy,” and we’ll stand by that, but remember that O’Reilly online learning itself showed 24% growth. So while a technology whose usage is growing 10% annually is healthy, it’s not keeping up with the platform.

As travel ground to a halt, so did traditional in-person conferences. We closed our conference business in March, replacing it with live virtual Superstreams. While we can’t compare in-person conference data with virtual event data, we can make a few observations. The most successful superstream series focused on software architecture and infrastructure and operations. Why? The in-person O’Reilly Software Architecture Conference was small but growing. But when the pandemic hit, companies found out that they really were online businesses—and if they weren’t, they had to become online to survive. Even small restaurants and farm markets were adding online ordering features to their websites. Suddenly, the ability to design, build, and operate applications at scale wasn’t optional; it was necessary for survival.

Programming Languages

Although we’re not fans of the language horse race, programming languages are as good a place as any to start. Figure 1 shows usage, year-over-year growth in usage, and the number of search queries for several popular languages. The top languages for O’Reilly online learning are Python (up 27%), Java (down 3%), C++ (up 10%), C (up 12%), and JavaScript (up 40%). Looking at 2020 usage rather than year-over-year changes, it’s surprising to see JavaScript so far behind Python and Java. (JavaScript usage is 20% of Python’s, and 33% of Java’s.)

Past the top five languages, we see healthy growth in Go (16%) and Rust (94%). Although we believe that Rust’s popularity will continue to grow, don’t get too excited; it’s easy to grow 94% when you’re starting from a small base. Go has clearly established itself, particularly as a language for concurrent programming, and Rust is likely to establish itself for “system programming”: building new operating systems and tooling for cloud operations. Julia, a language designed for mathematical computation, is an interesting wild card. It’s slightly down over the past year, but we’re optimistic about its long term chances.

Figure 1. Programming languages

We shouldn’t separate usage of titles specifically aimed at learning a programming language from titles applying the language or using frameworks based on it. After all, many Java developers use Spring, and searching for “Java” misses content only has the word “Spring” in the title. The same is true for JavaScript, with the React, Angular, and Node.js frameworks. With Python, the most heavily used libraries are PyTorch and scikit-learn. Figure 2 shows what happens when you add the use of content about Python, Java, and JavaScript to the most important frameworks for those languages.

Figure 2. Programming languages and frameworks combined

It probably isn’t a surprise that the results are similar, but there are some key differences. Adding usage and search query data for Spring (up 7%) reverses Java’s apparent decline (net-zero growth). Zero growth isn’t inappropriate for an established enterprise language, particularly one owned by a company that has mired the language in controversy. Looking further at JavaScript, if you add in usage for the most popular frameworks (React, Angular, and Node.js), JavaScript usage on O’Reilly online learning rises to 50% of Python’s, only slightly behind Java and its frameworks. However, Python, when added to the heavily used frameworks PyTorch and scikit-learn, remains the clear leader.

It’s important to understand what we’ve done though. We’re trying to build a more comprehensive picture of language use that includes the use of various frameworks. We’re not pretending the frameworks themselves are comparable—Spring is primarily for backend and middleware development (though it includes a web framework); React and Angular are for frontend development; and scikit-learn and PyTorch are machine learning libraries. And although it’s widely used, we didn’t assign TensorFlow to any language; it has bindings for Python, Java, C++, and JavaScript, and it’s not clear which language predominates. (Google Trends suggests C++.) We also ignored thousands (literally) of minor platforms, frameworks, and libraries for all these languages; once you get past the top few, you’re into the noise.

We aren’t advocating for Python, Java, or any other language. None of these top languages are going away, though their stock may rise or fall as fashions change and the software industry evolves. We’re just saying that when you make comparisons, you have to be careful about exactly what you’re comparing. The horse race? That’s just what it is. Fun to watch, and have a mint julep when it’s over, but don’t bet your savings (or your job) on it.

If the horse race isn’t significant, just what are the important trends for programming languages? We see several factors changing pro‐ gramming in significant ways:

  • Multiparadigm languages
    Since last year, O’Reilly online learning has seen a 14% increase in the use of content on functional programming. However, Haskell and Erlang, the classic functional languages, aren’t where the action is; neither shows significant usage, and both are headed down (roughly 20% decline year over year). Object oriented programming is up even more than functional programming: 29% growth since last year. This suggests that the real story is the integration of functional features into procedural and object-oriented languages. Starting with Python 3.0 in 2008 and continuing with Java 8 in 2014, programming languages have added higher-order functions (lambdas) and other “functional” features. Several popular languages (including JavaScript and Go) have had functional features from the beginning. This trend started over 20 years ago (with the Standard Template Library for C++), and we expect it to continue.
  • Concurrent programming
    Platform data for concurrency shows an 8% year-over-year increase. This isn’t a large number, but don’t miss the story because the numbers are small. Java was the first widely used language to support concurrency as part of the language. In the mid-’90s, thread support was a luxury; Moore’s law had plenty of room to grow. That’s no longer the case, and support for concurrency, like support for functional programming, has become table stakes. Go, Rust, and most other modern languages have built-in support for concurrency. Concurrency has always been one of Python’s weaknesses.
  • Dynamic versus static typing
    This is another important paradigmatic axis. The distinction between languages with dynamic typing (like Ruby and JavaScript) and statically typed languages (like Java and Go) is arguably more important than the distinction between functional and object-oriented languages. Not long ago, the idea of adding static typing to dynamic languages would have started a brawl. No longer. Combining paradigms to form a hybrid is taking a hold here too. Python 3.5 added type hinting, and more recent versions have added additional static typing features. TypeScript, which adds static typing to JavaScript, is coming into its own (12% year-over-year increase).
  • Low-code and no-code computing
    It’s hard for a learning platform to gather data about a trend that minimizes the need to learn, but low-code is real and is bound to have an effect. Spreadsheets were the forerunner of low-code computing. When VisiCalc was first released in 1979, it enabled millions to do significant and important computation without learning a programming language. Democratization is an important trend in many areas of technology; it would be surprising if programming were any different.

What’s important isn’t the horse race so much as the features that languages are acquiring, and why. Given that we’ve run to the end of Moore’s law, concurrency will be central to the future of programming. We can’t just get faster processors. We’ll be working with microservices and serverless/functions-as-a-service in the cloud for a long time–and these are inherently concurrent systems. Functional programming doesn’t solve the problem of concurrency—but the discipline of immutability certainly helps avoid pitfalls. (And who doesn’t love first-class functions?) As software projects inevitably become larger and more complex, it makes eminent sense for languages to extend themselves by mixing in functional features. We need programmers who are thinking about how to use functional and object-oriented features together; what practices and patterns make sense when building enterprise-scale concurrent software?

Low-code and no-code programming will inevitably change the nature of programming and programming languages:

  • There will be new languages, new libraries, and new tools to support no- or low-code programmers. They’ll be very simple. (Horrors, will they look like BASIC? Please no.) Whatever form they take, it will take programmers to build and maintain them.
  • We’ll certainly see sophisticated computer-aided coding as an aid to experienced programmers. Whether that means “pair programming with a machine” or algorithms that can write simple programs on their own remains to be seen. These tools won’t eliminate programmers; they’ll make programmers more productive.

There will be a predictable backlash against letting the great unwashed into the programmers’ domain. Ignore it. Low-code is part of a democratization movement that puts the power of computing into more peoples’ hands, and that’s almost always a good thing. Programmers who realize what this movement means won’t be put out of jobs by nonprogrammers. They’ll be the ones becoming more productive and writing the tools that others will use.

Whether you’re a technology leader or a new programmer, pay attention to these slow, long-term trends. They’re the ones that will change the face of our industry.

Operations or DevOps or SRE

The science (or art) of IT operations has changed radically in the last decade. There’s been a lot of discussion about operations culture (the movement frequently known as DevOps), continuous integration and deployment (CI/CD), and site reliability engineering (SRE). Cloud computing has replaced data centers, colocation facilities, and in-house machine rooms. Containers allow much closer integration between developers and operations and do a lot to standardize deployment.

Operations isn’t going away; there’s no such thing as NoOps. Technologies like Function as a Service (a.k.a. FaaS, a.k.a. serverless, a.k.a. AWS Lambda) only change the nature of the beast. The number of people needed to manage an infrastructure of a given size has shrunk, but the infrastructures we’re building have expanded, sometimes by orders of magnitude. It’s easy to round up tens of thousands of nodes to train or deploy a complex AI application. Even if those machines are all in Amazon’s giant data centers and managed in bulk using highly automated tools, operations staff still need to keep systems running smoothly, monitoring, troubleshooting, and ensuring that you’re not paying for resources you don’t need. Serverless and other cloud technologies allow the same operations team to manage much larger infrastructures; they don’t make operations go away.

The terminology used to describe this job fluctuates, but we don’t see any real changes. The term “DevOps” has fallen on hard times. Usage of DevOps-titled content in O’Reilly online learning has dropped by 17% in the past year, while SRE (including “site reliability engineering”) has climbed by 37%, and the term “operations” is up 25%. While SRE and DevOps are distinct concepts, for many customers SRE is DevOps at Google scale–and who doesn’t want that kind of growth? Both SRE and DevOps emphasize similar practices: version control (62% growth for GitHub, and 48% for Git), testing (high usage, though no year-over-year growth), continuous deployment (down 20%), monitoring (up 9%), and observability (up 128%). Terraform, HashiCorp’s open source tool for automating the configuration of cloud infrastructure, also shows strong (53%) growth.

Figure 3. Operations, DevOps, and SRE

It’s more interesting to look at the story the data tells about the tools. Docker is close to flat (5% decline year over year), but usage of content about containers skyrocketed by 99%. So yes, containerization is clearly a big deal. Docker itself may have stalled—we’ll know more next year—but Kubernetes’s dominance as the tool for container orchestration keeps containers central. Docker was the enabling technology, but Kubernetes made it possible to deploy containers at scale.

Kubernetes itself is the other superstar, with 47% growth, along with the highest usage (and the most search queries) in this group. Kubernetes isn’t just an orchestration tool; it’s the cloud’s operating system (or, as Kelsey Hightower has said, “Kubernetes will be the Linux of distributed systems”). But the data doesn’t show the number of conversations we’ve had with people who think that Kubernetes is just “too complex.” We see three possible solutions:

  • A “simplified” version of Kubernetes that isn’t as flexible, but trades off a lot of the complexity. K3s is a possible step in this direction. The question is, What’s the trade-off? Here’s my version of the Pareto principle, also known as the 80/20 rule. Given any system (like Kubernetes), it’s usually possible to build something simpler by keeping the most widely used 80% of the features and cutting the other 20%. And some applications will fit within the 80% of the features that were kept. But most applications (maybe 80% of them?) will require at least one of the features that were sacrificed to make the system simpler.
  • An entirely new approach, some tool that isn’t yet on the horizon. We have no idea what that tool is. In Yeats’s words, “What rough beast…slouches towards Bethlehem to be born”?
  • An integrated solution from a cloud vendor (for example, Microsoft’s open source Dapr distributed runtime). I don’t mean cloud vendors that provide Kubernetes as a service; we already have those. What if the cloud vendors integrate Kubernetes’s functionality into their stack in such a way that that functionality disappears into some kind of management console? Then the question becomes, What features do you lose, and do you need them? And what kind of vendor lock-in games do you want to play?

The rich ecosystem of tools surrounding Kubernetes (Istio, Helm, and others) shows how valuable it is. But where do we go from here? Even if Kubernetes is the right tool to manage the complexity of modern applications that run in the cloud, the desire for simpler solutions will eventually lead to higher-level abstractions. Will they be adequate?

Observability saw the greatest growth in the past year (128%), while monitoring is only up 9%. While observability is a richer, more powerful capability than monitoring—observability is the ability to find the information you need to analyze or debug software, while monitoring requires predicting in advance what data will be useful—we suspect that this shift is largely cosmetic. “Observability” risks becoming the new name for monitoring. And that’s unfortunate. If you think observability is merely a more fashionable term for monitoring, you’re missing its value. Complex systems running in the cloud will need true observability to be manageable.

Infrastructure is code, and we’ve seen plenty of tools for automating configuration. But Chef and Puppet, two leaders in this movement, are both significantly down (49% and 40% respectively), as is Salt. Ansible is the only tool from this group that’s up (34%). Two trends are responsible for this. Ansible appears to have supplanted Chef and Puppet, possibly because Ansible is multilingual, while Chef and Puppet are tied to Ruby. Second, Docker and Kubernetes have changed the configuration game. Our data shows that Chef and Puppet peaked in 2017, when Kubernetes started an almost exponential growth spurt, as Figure 4 shows. (Each curve is normalized separately to 1; we wanted to emphasize the inflection points rather than compare usage.) Containerized deployment appears to minimize the problem of reproducible configuration, since a container is a complete software package. You have a container; you can deploy it many times, getting the same result each time. In reality, it’s never that simple, but it certainly looks that simple–and that apparent simplicity reduces the need for tools like Chef and Puppet.

Figure 4. Docker and Kubernetes versus Chef and Puppet

The biggest challenge facing operations teams in the coming year, and the biggest challenge facing data engineers, will be learning how to deploy AI systems effectively. In the past decade, a lot of ideas and technologies have come out of the DevOps movement: the source repository as the single source of truth, rapid automated deployment, constant testing, and more. They’ve been very effective, but AI breaks the assumptions that lie behind them, and deployment is frequently the greatest barrier to AI success.

AI breaks these assumptions because data is more important than code. We don’t yet have adequate tools for versioning data (though DVC is a start). Models are neither code nor data, and we don’t have adequate tools for versioning models either (though tools like MLflow are a start). Frequent deployment assumes that the software can be built relatively quickly, but training a model can take days. It’s been suggested that model training doesn’t need to be part of the build process, but that’s really the most important part of the application. Testing is critical to continuous deployment, but the behavior of AI systems is probabilistic, not deterministic, so it’s harder to say that this test or that test failed. It’s particularly difficult if testing includes issues like fairness and bias.

Although there is a nascent MLOps movement, our data doesn’t show that people are using (or searching for) content in these areas in significant numbers. Usage is easily explainable; in many of these areas, content doesn’t exist yet. But users will search for content whether or not it exists, so the small number of searches shows that most of our users aren’t yet aware of the problem. Operations staff too frequently assume that an AI system is just another application—but they’re wrong. And AI developers too frequently assume that an operations team will be able to deploy their software, and they’ll be able to move on to the next project—but they’re also wrong. This situation is a train wreck in slow motion, and the big question is whether we can stop the trains before they crash. These problems will be solved eventually, with a new generation of tools—indeed, those tools are already being built—but we’re not there yet.

AI, Machine Learning, and Data

Healthy growth in artificial intelligence has continued: machine learning is up 14%, while AI is up 64%; data science is up 16%, and statistics is up 47%. While AI and machine learning are distinct concepts, there’s enough confusion about definitions that they’re frequently used interchangeably. We informally define machine learning as “the part of AI that works”; AI itself is more research oriented and aspirational. If you accept that definition, it’s not surprising that content about machine learning has seen the heaviest usage: it’s about taking research out of the lab and putting it into practice. It’s also not surprising that we see solid growth for AI, because that’s where bleeding-edge engineers are looking for new ideas to turn into machine learning.

Figure 5. Artificial intelligence, machine learning, and data

Have the skepticism, fear, and criticism surrounding AI taken a toll, or are “reports of AI’s death greatly exaggerated”? We don’t see that in our data, though there are certainly some metrics to say that artificial intelligence has stalled. Many projects never make it to production, and while the last year has seen amazing progress in natural language processing (up 21%), such as OpenAI’s GPT-3, we’re seeing fewer spectacular results like winning Go games. It’s possible that AI (along with machine learning, data, big data, and all their fellow travelers) is descending into the trough of the hype cycle. We don’t think so, but we’re prepared to be wrong. As Ben Lorica has said (in conversation), many years of work will be needed to bring current research into commercial products.

It’s certainly true that there’s been a (deserved) backlash over heavy handed use of AI. A backlash is only to be expected when deep learning applications are used to justify arresting the wrong people, and when some police departments are comfortable using software with a 98% false positive rate. A backlash is only to be expected when software systems designed to maximize “engagement” end up spreading misinformation and conspiracy theories. A backlash is only to be expected when software developers don’t take into account issues of power and abuse. And a backlash is only to be expected when too many executives see AI as a “magic sauce” that will turn their organization around without pain or, frankly, a whole lot of work.

But we don’t think those issues, as important as they are, say a lot about the future of AI. The future of AI is less about breathtaking breakthroughs and creepy face or voice recognition than it is about small, mundane applications. Think quality control in a factory; think intelligent search on O’Reilly online learning; think optimizing data compression; think tracking progress on a construction site. I’ve seen too many articles saying that AI hasn’t helped in the struggle against COVID, as if someone was going to click a button on their MacBook and a superdrug was going to pop out of a USB-C port. (And AI has played a huge role in COVID vaccine development.) AI is playing an important supporting role—and that’s exactly the role we should expect. It’s enabling researchers to navigate tens of thousands of research papers and reports, design drugs and engineer genes that might work, and analyze millions of health records. Without automating these tasks, getting to the end of the pandemic will be impossible.

So here’s the future we see for AI and machine learning:

  • Natural language has been (and will continue to be) a big deal. GPT-3 has changed the world. We’ll see AI being used to create “fake news,” and we’ll find that AI gives us the best tools for detecting what’s fake and what isn’t.
  • Many companies are placing significant bets on using AI to automate customer service. We’ve made great strides in our ability to synthesize speech, generate realistic answers, and search for solutions.
  • We’ll see lots of tiny, embedded AI systems in everything from medical sensors to appliances to factory floors. Anyone interested in the future of technology should watch Pete Warden’s work on TinyML very carefully.
  • We still haven’t faced squarely the issue of user interfaces for collaboration between humans and AI. We don’t want AI oracles that just replace human errors with machine-generated errors at scale; we want the ability to collaborate with AI to produce results better than either humans or machines could alone. Researchers are starting to catch on.

TensorFlow is the leader among machine learning platforms; it gets the most searches, while usage has stabilized at 6% growth. Content about scikit-learn, Python’s machine learning library, is used almost as heavily, with 11% year-over-year growth. PyTorch is in third place (yes, this is a horse race), but usage of PyTorch content has gone up 159% year over year. That increase is no doubt influenced by the popularity of Jeremy Howard’s Practical Deep Learning for Coders course and the PyTorch-based fastai library (no data for 2019). It also appears that PyTorch is more popular among researchers, while TensorFlow remains dominant in production. But as Jeremy’s students move into industry, and as researchers migrate toward production positions, we expect to see the balance between PyTorch and TensorFlow shift.

Kafka is a crucial tool for building data pipelines; it’s stable, with 6% growth and usage similar to Spark. Pulsar, Kafka’s “next generation” competition, isn’t yet on the map.

Tools for automating AI and machine learning development (IBM’s AutoAI, Google’s Cloud AutoML, Microsoft’s AutoML, and Amazon’s SageMaker) have gotten a lot of press attention in the past year, but we don’t see any signs that they’re making a significant dent in the market. That content usage is nonexistent isn’t a surprise; O’Reilly members can’t use content that doesn’t exist. But our members aren’t searching for these topics either. It may be that AutoAI is relatively new or that users don’t think they need to search for supplementary training material.

What about data science? The report What Is Data Science is a decade old, but surprisingly for a 10-year-old paper, views are up 142% over 2019. The tooling has changed though. Hadoop was at the center of the data science world a decade ago. It’s still around, but now it’s a legacy system, with a 23% decline since 2019. Spark is now the dominant data platform, and it’s certainly the tool engineers want to learn about: usage of Spark content is about three times that of Hadoop. But even Spark is down 11% since last year. Ray, a newcomer that promises to make it easier to build distributed applications, doesn’t yet show usage to match Spark (or even Hadoop), but it does show 189% growth. And there are other tools on the horizon: Dask is newer than Ray, and has seen nearly 400% growth.

It’s been exciting to watch the discussion of data ethics and activism in the past year. Broader societal movements (such as #BlackLivesMatter), along with increased industry awareness of diversity and inclusion, have made it more difficult to ignore issues like fairness, power, and transparency. What’s sad is that our data shows little evidence that this is more than a discussion. Usage of general content (not specific to AI and ML) about diversity and inclusion is up significantly (87%), but the absolute numbers are still small. Topics like ethics, fairness, transparency, and explainability don’t make a dent in our data. That may be because few books have been published and few training courses have been offered—but that’s a problem in itself.

Web Development

Since the invention of HTML in the early 1990s, the first web servers, and the first browsers, the web has exploded (or degenerated) into a proliferation of platforms. Those platforms make web development infinitely more flexible: They make it possible to support a host of devices and screen sizes. They make it possible to build sophisticated applications that run in the browser. And with every new year, “desktop” applications look more old-fashioned.

So what does the world of web frameworks look like? React leads in usage of content and also shows significant growth (34% year over year). Despite rumors that Angular is fading, it’s the #2 platform, with 10% growth. And usage of content about the server-side platform Node.js is just behind Angular, with 15% growth. None of this is surprising.

It’s more surprising that Ruby on Rails shows extremely strong growth (77% year over year) after several years of moderate, stable performance. Likewise, Django (which appeared at roughly the same time as Rails) shows both heavy usage and 63% growth. You might wonder whether this growth holds for all older platforms; it doesn’t. Usage of content about PHP is relatively low and declining (8% drop), even though it’s still used by almost 80% of all websites. (It will be interesting to see how PHP 8 changes the picture.) And while jQuery shows healthy 18% growth, usage of jQuery content was lower than any other platform we looked at. (Keep in mind, though, that there are literally thousands of web platforms. A complete study would be either heroic or foolish. Or both.)

Vue and Flask make surprisingly weak showings: for both platforms, content usage is about one-eighth of React’s. Usage of Vue-related content declined 13% in the past year, while Flask grew 10%. Neither is challenging the dominant players. It’s tempting to think of Flask and Vue as “new” platforms, but they were released in 2010 and 2014, respectively; they’ve had time to establish themselves. Two of the most promising new platforms, Svelte and Next.js, don’t yet produce enough data to chart—possibly because there isn’t yet much content to use. Likewise, WebAssembly (Wasm) doesn’t show up. (It’s also too new, with little content or training material available.) But WebAssembly represents a major rethinking of web programming and bears watching closely. Could WebAssembly turn JavaScript’s dominance of web development on its head? We suspect that nothing will happen quickly. Enterprise customers will be reluctant to bear the cost of moving from an older framework like PHP to a more fashionable JavaScript framework. It costs little to stick with an old stalwart.

Figure 6. Web development

The foundational technologies HTML, CSS, and JavaScript are all showing healthy growth in usage (22%, 46%, and 40%, respectively), though they’re behind the leading frameworks. We’ve already noted that JavaScript is one of the top programming languages—and the modern web platforms are nothing if not the apotheosis of JavaScript. We find that chilling. The original vision for the World Wide Web was radically empowering and democratizing. You didn’t need to be a techno-geek; you didn’t even need to program—you could just click “view source” in the browser and copy bits you liked from other sites. Twenty-five years later, that’s no longer true: you can still “view source,” but all you’ll see is a lot of incomprehensible JavaScript. Ironically, just as other technologies are democratizing, web development is increasingly the domain of programmers. Will that trend be reversed by a new generation of platforms, or by a reformulation of the web itself? We shall see.

Clouds of All Kinds

It’s no surprise that the cloud is growing rapidly. Usage of content about the cloud is up 41% since last year. Usage of cloud titles that don’t mention a specific vendor (e.g., Amazon Web Services, Microsoft Azure, or Google Cloud) grew at an even faster rate (46%). Our customers don’t see the cloud through the lens of any single platform. We’re only at the beginning of cloud adoption; while most companies are using cloud services in some form, and many have moved significant business-critical applications and datasets to the cloud, we have a long way to go. If there’s one technology trend you need to be on top of, this is it.

The horse race between the leading cloud vendors, AWS, Azure, and Google Cloud, doesn’t present any surprises. Amazon is winning, even ahead of the generic “cloud”—but Microsoft and Google are catching up, and Amazon’s growth has stalled (only 5%). Use of content about Azure shows 136% growth—more than any of the competitors—while Google Cloud’s 84% growth is hardly shabby. When you dominate a market the way AWS dominates the cloud, there’s nowhere to go but down. But with the growth that Azure and Google Cloud are showing, Amazon’s dominance could be short-lived.

What’s behind this story? Microsoft has done an excellent job of reinventing itself as a cloud company. In the past decade, it’s rethought every aspect of its business: Microsoft has become a leader in open source; it owns GitHub; it owns LinkedIn. It’s hard to think of any corporate transformation so radical. This clearly isn’t the Microsoft that declared Linux a “cancer,” and that Microsoft could never have succeeded with Azure.

Google faces a different set of problems. Twelve years ago, the company arguably delivered serverless with App Engine. It open sourced Kubernetes and bet very heavily on its leadership in AI, with the leading AI platform TensorFlow highly optimized to run on Google hardware. So why is it in third place? Google’s problem hasn’t been its ability to deliver leading-edge technology but rather its ability to reach customers—a problem that Thomas Kurian, Google Cloud’s CEO, is attempting to address. Ironically, part of Google’s customer problem is its focus on engineering to the detriment of the customers themselves. Any number of people have told us that they stay away from Google because they’re too likely to say, “Oh, that service you rely on? We’re shutting it down; we have a better solution.” Amazon and Microsoft don’t do that; they understand that a cloud provider has to support legacy software, and that all software is legacy the moment it’s released.

Figure 7. Cloud usage

While our data shows very strong growth (41%) in usage for content about the cloud, it doesn’t show significant usage for terms like “multicloud” and “hybrid cloud” or for specific hybrid cloud products like Google’s Anthos or Microsoft’s Azure Arc. These are new products, for which little content exists, so low usage isn’t surprising. But the usage of specific cloud technologies isn’t that important in this context; what’s more important is that usage of all the cloud platforms is growing, particularly content that isn’t tied to any vendor. We also see that our corporate clients are using content that spans all the cloud vendors; it’s difficult to find anyone who’s looking at a single vendor.

Not long ago, we were skeptical about hybrid and multicloud. It’s easy to assume that these concepts are pipe dreams springing from the minds of vendors who are in second, third, fourth, or fifth place: if you can’t win customers from Amazon, at least you can get a slice of their business. That story isn’t compelling—but it’s also the wrong story to tell. Cloud computing is hybrid by nature. Think about how companies “get into the cloud.” It’s often a chaotic grassroots process rather than a carefully planned strategy. An engineer can’t get the resources for some project, so they create an AWS account, billed to the company credit card. Then someone in another group runs into the same problem, but goes with Azure. Next there’s an acquisition, and the new company has built its infrastructure on Google Cloud. And there’s petabytes of data on-premises, and that data is subject to regulatory requirements that make it difficult to move. The result? Companies have hybrid clouds long before anyone at the C-level perceives the need for a coherent cloud strategy. By the time the C suite is building a master plan, there are already mission-critical apps in marketing, sales, and product development. And the one way to fail is to dictate that “we’ve decided to unify on cloud X.”

All the cloud vendors, including Amazon (which until recently didn’t even allow its partners to use the word multicloud), are being drawn to a strategy based not on locking customers into a specific cloud but on facilitating management of a hybrid cloud, and all offer tools to support hybrid cloud development. They know that support for hybrid clouds is key to cloud adoption–and, if there is any lock in, it will be around management. As IBM’s Rob Thomas has frequently said, “Cloud is a capability, not a location.”

As expected, we see a lot of interest in microservices, with a 10% year-over-year increase—not large, but still healthy. Serverless (a.k.a. functions as a service) also shows a 10% increase, but with lower usage. That’s important: while it “feels like” serverless adoption has stalled, our data suggests that it’s growing in parallel with microservices.

Security and Privacy

Security has always been a problematic discipline: defenders have to get thousands of things right, while an attacker only has to discover one mistake. And that mistake might have been made by a careless user rather than someone on the IT staff. On top of that, companies have often underinvested in security: when the best sign of success is that “nothing bad happened,” it’s very difficult to say whether money was well spent. Was the team successful or just lucky?

Yet the last decade has been full of high-profile break-ins that have cost billions of dollars (including increasingly hefty penalties) and led to the resignations and firings of C-suite executives. Have companies learned their lessons?

The data doesn’t tell a clear story. While we’ve avoided discussing absolute usage, usage of content about security is very high—higher than for any other topic except for the major programming languages like Java and Python. Perhaps a better comparison would be to compare security with a general topic like programming or cloud. If we take that approach, programming usage is heavier than security, and security is only slightly behind cloud. So the usage of content about security is high, indeed, with year-over-year growth of 35%.

Figure 8. Security and privacy

But what content are people using? Certification resources, certainly: CISSP content and training is 66% of general security content, with a slight (2%) decrease since 2019. Usage of content about the CompTIA Security+ certification is about 33% of general security, with a strong 58% increase.

There’s a fair amount of interest in hacking, which shows 16% growth. Interestingly, ethical hacking (a subset of hacking) shows about half as much usage as hacking, with 33% growth. So we’re evenly split between good and bad actors, but the good guys are increasing more rapidly. Penetration testing, which should be considered a kind of ethical hacking, shows a 14% decrease; this shift may only reflect which term is more popular.

Beyond those categories, we get into the long tail: there’s only minimal usage of content about specific topics like phishing and ransomware, though ransomware shows a huge year-over-year increase (155%); that increase no doubt reflects the frequency and severity of ransomware attacks in the past year. There’s also a 130% increase in content about “zero trust,” a technology used to build defensible networks—though again, usage is small.

It’s disappointing that we see so little interest in content about privacy, including content about specific regulatory requirements such as GDPR. We don’t see heavy usage; we don’t see growth; we don’t even see significant numbers of search queries. This doesn’t bode well.

Not the End of the Story

We’ve taken a tour through a significant portion of the technology landscape. We’ve reported on the horse races along with the deeper stories underlying those races. Trends aren’t just the latest fashions; they’re also long-term processes. Containerization goes back to Unix version 7 in 1979; and didn’t Sun Microsystems invent the cloud in the 1990s with its workstations and Sun Ray terminals? We may talk about “internet time,” but the most important trends span decades, not months or years—and often involve reinventing technology that was useful but forgotten, or technology that surfaced before its time.

With that in mind, let’s take several steps back and think about the big picture. How are we going to harness the computing power needed for AI applications? We’ve talked about concurrency for decades, but it was only an exotic capability important for huge number-crunching tasks. That’s no longer true; we’ve run out of Moore’s law, and concurrency is table stakes. We’ve talked about system administration for decades, and during that time, the ratio of IT staff to computers managed has gone from many-to-one (one mainframe, many operators) to one-to-thousands (monitoring infrastructure in the cloud). As part of that evolution, automation has also gone from an option to a necessity.

We’ve all heard that “everyone should learn to program.” This may be correct…or maybe not. It doesn’t mean that everyone should be a professional programmer but that everyone should be able to use computers effectively, and that requires programming. Will that be true in the future? No-code and low-code products are reaching the market, allowing users to build everything from business applications to AI prototypes. Again, this trend goes way back: in the late 1950s, the first modern programming languages made programming much easier. And yes, even back then there were those who said “real men use machine language.” (And that sexism was no doubt intentional, since the first generation of programmers included many women.) Will our future bring further democratization? Or a return to a cult of “wizards”? Low-code AI and complex JavaScript web platforms offer conflicting visions of what the future may bring.

Finally, the most important trend may not yet appear in our data at all. Technology has largely gotten a free ride as far as regulation and legislation are concerned. Yes, there are heavily regulated sectors like healthcare and finance, but social media, much of machine learning, and even much of online commerce have only been lightly regulated. That free ride is coming to an end. Between GDPR, the California Consumer Privacy Act (which will probably be copied by many states), California Propositions 22 and 24, many city ordinances regarding the use of face recognition, and rethinking the meaning of Section 230 of the Communications Decency Act, laws and regulations will play a big role in shaping technology in the coming years. Some of that regulation was inevitable, but a lot of it is a direct response to an industry that moved too fast and broke too many things. In this light, the lack of interest in privacy and related topics is unhealthy. Twenty years ago, we built a future that we don’t really want to live in. The question facing us now is simple:

What future will we build?

Categories: Technology

Seven Legal Questions for Data Scientists

O'Reilly Radar - Tue, 2021/01/19 - 05:21

“[T]he threats to consumers arising from data abuse, including those posed by algorithmic harms, are mounting and urgent.”

FTC Commissioner Rebecca K. Slaughter

Variants of artificial intelligence (AI), such as predictive modeling, statistical learning, and machine learning (ML), can create new value for organizations. AI can also cause costly reputational damage, get your organization slapped with a lawsuit, and run afoul of local, federal, or international regulations. Difficult questions about compliance and legality often pour cold water on late-stage AI deployments as well, because data scientists rarely get attorneys or oversight personnel involved in the build-stages of AI systems. Moreover, like many powerful commercial technologies, AI is likely to be highly regulated in the future.

This article poses seven legal questions that data scientists should address before they deploy AI. This article is not legal advice. However, these questions and answers should help you better align your organization’s technology with existing and future laws, leading to less discriminatory and invasive customer interactions, fewer regulatory or litigation headwinds, and better return on AI investments. As the questions below indicate, it’s important to think about the legal implications of your AI system as you’re building it. Although many organizations wait until there’s an incident to call in legal help, compliance by design saves resources and reputations.

Fairness: Are there outcome or accuracy differences in model decisions across protected groups? Are you documenting efforts to find and fix these differences?

Examples: Alleged discrimination in credit lines; Poor experimental design in healthcare algorithms

Federal regulations require non-discrimination in consumer finance, employment, and other practices in the U.S. Local laws often extend these protections or define separate protections. Even if your AI isn’t directly affected by existing laws today, algorithmic discrimination can lead to reputational damage and lawsuits, and the current political winds are blowing toward broader regulation of AI. To deal with the issue of algorithmic discrimination and to prepare for pending future regulations, organizations must improve cultural competencies, business processes, and tech stacks.

Technology alone cannot solve algorithmic discrimination problems. Solid technology must be paired with culture and process changes, like increased demographic and professional diversity on the teams that build AI systems and better audit processes for those systems. Some additional non-technical solutions involve ethical principles for organizational AI usage, and a general mindset change. Going fast and breaking things isn’t the best idea when what you’re breaking are people’s loans, jobs, and healthcare.

From a technical standpoint, you’ll need to start with careful experimental design and data that truly represents modeled populations. After your system is trained, all aspects of AI-based decisions should be tested for disparities across demographic groups: the system’s primary outcome, follow-on decisions, such as limits for credit cards, and manual overrides of automated decisions, along with the accuracy of all these decisions. In many cases, discrimination tests and any subsequent remediation must also be conducted using legally sanctioned techniques—not just your new favorite Python package. Measurements like adverse impact ratio, marginal effect, and standardized mean difference, along with prescribed methods for fixing discovered discrimination, are enshrined in regulatory commentary. Finally, you should document your efforts to address algorithmic discrimination. Such documentation shows your organization takes accountability for its AI systems seriously and can be invaluable if legal questions arise after deployment.

Privacy: Is your model complying with relevant privacy regulations?

Examples: Training data violates new state privacy laws

Personal data is highly regulated, even in the U.S., and nothing about using data in an AI system changes this fact. If you are using personal data in your AI system, you need to be mindful of existing laws and watch evolving state regulations, like the Biometric Information Privacy Act (BIPA) in Illinois or the new California Privacy Rights Act (CPRA).

To cope with the reality of privacy regulations, teams that are engaged in AI also need to comply with organizational data privacy policies. Data scientists should familiarize themselves with these policies from the early stages of an AI project to help avoid privacy problems. At a minimum, these policies will likely address:

  • Consent for use: how consumer consent for data-use is obtained; the types of information collected; and ways for consumers to opt-out of data collection and processing.
  • Legal basis: any applicable privacy regulations to which your data or AI are adhering; why you’re collecting certain information; and associated consumer rights.
  • Anonymization requirements: how consumer data is aggregated and anonymized.
  • Retention requirements: how long you store consumer data; the security you have to protect that data; and if and how consumers can request that you delete their data.

Given that most AI systems will change over time, you should also regularly audit your AI to ensure that it remains in compliance with your privacy policy over time. Consumer requests to delete data, or the addition of new data-hungry functionality, can cause legal problems, even for AI systems that were in compliance at the time of their initial deployment.

One last general tip is to have an incident response plan. This is a lesson learned from general IT security. Among many other considerations, that plan should detail systematic ways to inform regulators and consumers if data has been breached or misappropriated.

Security: Have you incorporated applicable security standards in your model? Can you detect if and when a breach occurs?

Examples: Poor physical security for AI systems; Security attacks on ML; Evasion attacks

As consumer software systems, AI systems likely fall under various security standards and breach reporting laws. You’ll need to update your organization’s IT security procedures to apply to AI systems, and you’ll need to make sure that you can report if AI systems—data or algorithms—are compromised.

Luckily, the basics of IT security are well-understood. First, ensure that these are applied uniformly across your IT assets, including that super-secret new AI project and the rock-star data scientists working on it. Second, start preparing for inevitable attacks on AI. These attacks tend to involve adversarial manipulation of AI-based decisions or the exfiltration of sensitive data from AI system endpoints. While these attacks are not common today, you don’t want to be the object lesson in AI security for years to come. So update your IT security policies to consider these new attacks. Standard counter-measures such as authentication and throttling at system endpoints go a long way toward promoting AI security, but newer approaches such as robust ML, differential privacy, and federated learning can make AI hacks even more difficult for bad actors.

Finally, you’ll need to report breaches if they occur in your AI systems. If your AI system is a labyrinthian black-box, that could be difficult. Avoid overly complex, black-box algorithms whenever possible, monitor AI systems in real-time for performance, security, and discrimination problems, and ensure system documentation is applicable for incident response and breach reporting purposes.

Agency: Is your AI system making unauthorized decisions on behalf of your organization?

Examples: Gig economy robo-firing; AI executing equities trades

If your AI system is making material decisions, it is crucial to ensure that it cannot make unauthorized decisions. If your AI is based on ML, as most are today, your system’s outcome is probabilistic: it will make wrong decisions. Wrong AI-based decisions about material matters—lending, financial transactions, employment, healthcare, or criminal justice, among others—can cause serious legal liabilities (see Negligence below). Worse still, using AI to mislead consumers can put your organization on the wrong side of an FTC enforcement action or a class action.

Every organization approaches risk management differently, so setting necessary limits on automated predictions is a business decision that requires input from many stakeholders. Furthermore, humans should review any AI decisions that implicate such limits before a customer’s final decision is issued. And don’t forget to routinely test your AI system with edge cases and novel situations to ensure it stays within those preset limits.

Relatedly, and to quote the FTC, “[d]on’t deceive consumers about how you use automated tools.” In their Using Artificial Intelligence and Algorithms guidance, the FTC specifically called out companies for manipulating consumers with digital avatars posing as real people. To avoid this kind of violation, always inform your consumers that they are interacting with an automated system. It’s also a best practice to implement recourse interventions directly into your AI-enabled customer interactions. Depending on the context, an intervention might involve options to interact with a human instead, options to avoid similar content in the future, or a full-blown appeals process.

Negligence: How are you ensuring your AI is safe and reliable?

Examples: Releasing the wrong person from jail; autonomous vehicle kills pedestrian

AI decision-making can lead to serious safety issues, including physical injuries. To keep your organization’s AI systems in check, the practice of model risk management–based roughly on the Federal Reserve’s SR 11-7 letter–is among the most tested frameworks for safeguarding predictive models against stability and performance failures.

For more advanced AI systems, a lot can go wrong. When creating autonomous vehicle or robotic process automation (RPA) systems, be sure to incorporate practices from the nascent discipline of safe and reliable machine learning. Diverse teams, including domain experts, should think through possible incidents, compare their designs to known past incidents, document steps taken to prevent such incidents, and develop response plans to prevent inevitable glitches from spiraling out of control.

Transparency: Can you explain how your model arrives at a decision?

Examples: Proprietary algorithms hide data errors in criminal sentencing and DNA testing

Federal law already requires explanations for certain consumer finance decisions. Beyond meeting regulatory requirements, interpretability of AI system mechanisms enables human trust and understanding of these high-impact technologies, meaningful recourse interventions, and proper system documentation. Over recent years, two promising technological approaches have increased AI systems’ interpretability: interpretable ML models and post-hoc explanations. Interpretable ML models (e.g., explainable boosting machines) are algorithms that are both highly accurate and highly transparent. Post-hoc explanations (e.g., Shapley values) attempt to summarize ML model mechanisms and decisions. These two tools can be used together to increase your AI’s transparency. Given both the fundamental importance of interpretability and the technological process made toward this goal, it’s not surprising that new regulatory initiatives, like the FTC’s AI guidance and the CPRA, prioritize both consumer-level explanations and overall transparency of AI systems.

Third Parties: Does your AI system depend on third-party tools, services, or personnel? Are they addressing these questions?

Examples:Natural language processing tools and training data images conceal discriminatory biases

It is rare for an AI system to be built entirely in-house without dependencies on third-party software, data, or consultants. When you use these third-party resources, third-party risk is introduced into your AI system. And, as the old saying goes, a chain is only as strong as its weakest link. Even if your organization takes the utmost precaution, any incidents involving your AI system, even if they stem from a third-party you relied on, can potentially be blamed on you. Therefore, it is essential to ensure that any parties involved in the design, implementation, review, or maintenance of your AI systems follow all applicable laws, policies, and regulations.

Before contracting with a third party, due diligence is required. Ask third parties for documentary proof that they take discrimination, privacy, security, and transparency seriously. And be on the lookout for signs of negligence, such as shoddy documentation, erratic software release cadences, lack of warranty, or unreasonably broad exceptions in terms of service or end-user license agreements (EULAs). You should also have contingency plans, including technical redundancies, incident response plans, and insurance covering third-party dependencies. Finally, don’t be shy about grading third-party vendors on a risk-assessment report card. Make sure these assessments happen over time, and not just at the beginning of the third-party contract. While these precautions may increase costs and delay your AI implementation in the short-term, they are the only way to mitigate third-party risks in your system consistently over time.

Looking Ahead

Several U.S. states and federal agencies have telegraphed their intentions regarding the future regulation of AI. Three of the broadest efforts to be aware of include the Algorithmic Accountability Act, the FTC’s AI guidance, and the CPRA. Numerous other industry-specific guidance documents are being drafted, such as the FDA’s proposed framework for AI in medical devices and FINRA’s Artificial Intelligence (AI) in the Securities Industry. Furthermore, other countries are setting examples for U.S. policymakers and regulators to follow. Canada, the European Union, Singapore, and the United Kingdom, among others, have all drafted or implemented detailed regulations for different aspects of AI and automated decision-making systems. In light of this government movement, and the growing public and government distrust of big tech, now is the perfect time to start minimizing AI system risk and prepare for future regulatory compliance.

Categories: Technology

0x6B: GPL Enforcement Investigation DMCA Exemption Request

FAIF - Thu, 2021/01/14 - 11:00

Software Freedom Conservancy filed multiple exemptions in the USA Copyright Office Triennial Rulemaking Process under the Digital Millennium Copyright Act (DMCA). In this episode, Karen and Bradley explore the details of Conservancy's filing to request permission to circumvent technological restriction measures in order to investigate infringement of other people's copyright, which is a necessary part of investigations of alleged violations of the GPL and other copyleft licenses.

Show Notes: Segment 0 (00:39)
  • Bradley claims that you'll now love the audcast more than ever (02:51)
  • Conservancy filed many exemptions as part of the currently ongoing triennial DMCA Process. (02:50)
Segment 1 (04:22) Segment 2 (28:07) Segment 3 (34:36)

If you are a Conservancy Supporter as well as being a FaiFCast listener, you can join this mailing list to receive announcements of live recordings and attend them through Conservancy's Big Blue Button (BBB) server.

Send feedback and comments on the cast to <>. You can keep in touch with Free as in Freedom on our IRC channel, #faif on, and by following Conservancy on on Twitter and and FaiF on Twitter.

Free as in Freedom is produced by Dan Lynch of Theme music written and performed by Mike Tarantino with Charlie Paxson on drums.

The content of this audcast, and the accompanying show notes and music are licensed under the Creative Commons Attribution-Share-Alike 4.0 license (CC BY-SA 4.0).

Categories: Free Software


O'Reilly Radar - Tue, 2021/01/12 - 05:56

A few months ago, I said that “making everything into a design pattern is a sign that you don’t know what design patterns really are.” So now, I feel obliged to say something about what design patterns are.

Design patterns are frequently observed solutions to common problems. The idea comes from the work of Christopher Alexander in architecture; patterns are things like “rooms on both sides of a hallway” or “door on the front of a building.”  There’s a lot we can unpack from this simple definition:

  • Design patterns are not invented.  They are observed. They aren’t about inventing (or re-inventing) wheels; they’re about noticing “I’ve put wheels on three things lately. Might be a good idea…” The first time you put wheels on something is an invention.  It becomes a pattern when you observe that you’re re-inventing the wheel, and that’s a good thing. The wheel becomes part of your repertoire of solutions.
  • Design patterns are not algorithms, which are specific solutions to generalized problems. Quicksort isn’t a pattern–nor is sorting itself.  Patterns have more to do with how software is organized. They are more like stories, in which a problem leads to a solution that coordinates a number of different parts.
  • Design patterns are often used without thinking; they feel natural, not clever, and that’s why they’re common. You’ll find them in your code; you’ll find them in the code of others.  You can find them even if they weren’t put there consciously; patterns are often no more than a common solution to a certain kind of problem, something that looks obvious in retrospect. Patterns become a problem when programmers try to force the issue–to use patterns where they don’t quite fit, because they’ve heard that design patterns make their code better.
  • Design patterns aren’t inherently good–and if you read Alexander, you’ll find that there are plenty of architectural patterns that he really doesn’t like. A corridor with rooms on both sides is a solution to certain architectural problems. It is frequently found in very boring hotels and offices.
  • “Anti-patterns” may be worth avoiding, but that doesn’t mean they aren’t patterns. Frequently observed bad solutions to common problems are still frequently observed solutions to common problems. And sometimes, anti-patterns will be the best possible solution to an otherwise intractable problem. That’s the kind of technical debt that enables you to ship, as Kevlin Henney and Ward Cunningham have written.

There isn’t any magic here. While the book Design Patterns: Elements of Reusable Object-Oriented Software by Gamma, Helm, Johnson, and Vlissides (the “Gang of Four”) is a classic, design patterns really aren’t things you look up in books. Design patterns are things you find in your code; they’d probably be there whether or not they had a name. So what’s the value?

The biggest value in design patterns is that it gives us a common language for talking about software and how it’s organized. That’s why Alexander named one of his books A Pattern Language. We’ve all spent hours making diagrams on black- or white-boards to show how some software we’re writing is organized. Design patterns give a common vocabulary so that we can discuss software with some certainty that we all mean the same thing. I eventually realized that UML had the same aim: UML diagrams are like architectural blueprints, in which one kind of line represents a brick wall, another wood, another plasterboard. Unfortunately, UML was never quite standard enough, and like design patterns, was perceived as a good in itself. In the end, a common vocabulary (whether a pattern catalog or UML) is a tool, and any tool can be abused.

Since the Gang of Four, design patterns have been associated with object-oriented programming, but the idea that patterns aren’t applicable to functional languages is itself meaningless. It’s certainly true that in functional languages, some well-known patterns (like strategy or map/reduce) are either primitives or simple library functions; but saying that there aren’t patterns in functional programming is equivalent to saying that there are no common solutions to common problems. It’s still useful to point out that the “strategy” pattern is equivalent to passing a function as a parameter to another function. That language gives you an intuitive and descriptive way to discuss solutions to a problem.

Patterns change over time, as problems change; there’s nothing special to the patterns the Gang of Four observed in the 1990s. In the 2020s, we should be building a pattern language for concurrent programming–and we may find that our patterns for the enterprise software of the 90s are less relevant.

You may find that you use patterns without thinking about it; you may discover patterns in your code; you may realize you’re facing a problem that a pattern will help you to solve; or you may use patterns to describe solutions to someone else. When solving a problem, sometimes the only thing you’re missing is a name; having a name for your solution crystallizes it. Studying patterns is useful because it gives you a larger vocabulary with which to think about problems and solutions. But using patterns for their own sake leads you nowhere. I remember hearing about programmers boasting about how many Gang of Four patterns they used, and managers telling programmers to use more patterns.  That’s not productive.

Like any good thing in programming, using design patterns should help you solve complex problems more simply. But there’s no guarantee that they’ll do that. And if you find that they aren’t, then you should research for some other solutions.

Categories: Technology
Subscribe to LuftHans aggregator