You are here


Topics for PLUG's 11/11 remote meeting

PLUG - Thu, 2021/11/11 - 16:25

der.hans has 2 topics for this Thursday's meeting "Intro to jq: grep for JSON" and "Command Line Quiz Time"
This is a remote meeting.  Please join by going to at 7pm on Thursday November 11th

der.hans: Intro to jq: grep for JSON

Want to parse JSON on the command line?
Want a pipeable inline tool for JSON manipulation?

jq is a powerful tool that's easy to fit into your data pipeline.
It can parse, search and manipulate JSON documents.

This talk will cover an introduction to jq and using it to search JSON objects similar to how grep is used for plain text.

Attendees will learn:

* basic parts of JSON
* key search
* value matching
* string match
* using variables with jq
* conditionals
* regular expressions
* prettified JSON output

der.hans: Command Line Quiz Time

Audience participative community building exercise sharing geekly knowledge in short question and exercise format.

About der.hans:
der.hans works remotely in the US.
In the FLOSS community he is active with conferences and local groups.

He's chairman of PLUG, chair for SeaGL Finance committee, founder of SeaGL Career Expo, RaiseMe career counselor, BoF organizer for the Southern California Linux Expo (SCaLE) and founder of the Free Software Stammtisch.

He speaks regularly at community-led FLOSS conferences such as SeaGL, Tux-Tage, Penguicon, SCaLE, LCA, FOSSASIA, Tübix, CLT, Kieler Linux Tage, LFNW, OLF, SELF and GeekBeacon Fest.

Hans is available to speak remotely for local groups.

Currently leads a support engineering team at Object Rocket. Public statements are not representative of $dayjob.

Fediverse/Mastodon -

Remote Teams in ML/AI

O'Reilly Radar - Tue, 2021/11/09 - 07:05

I’m well-versed in the ups and downs of remote work. I’ve been doing some form thereof for most of my career, and I’ve met plenty of people who have a similar story. When companies ask for my help in building their ML/AI teams, I often recommend that they consider remote hires. Sometimes I’ll even suggest that they build their data function as a fully-remote, distributed group. (I’ll oversimplify for brevity, using “remote team” and “distributed team” interchangeably. And I’ll treat both as umbrella terms that cover “remote-friendly” and “fully-distributed.”)

Remote hiring has plenty of benefits. As an employer, your talent pool spans the globe and you save a ton of money on office rent and insurance. For the people you hire, they get a near-zero commute and a Covid-free workplace.

Then again, even though you really should build a remote team, you also shouldn’t. Not just yet. You first want to think through one very important question:

Do I, as a leader, really want a remote team?

The Litmus Test

The key ingredient to successful remote work is, quite simply, whether company leadership wants it to work. Yes, it also requires policies, tooling, and re-thinking a lot of interactions. Not to mention, your HR team will need to double-check local laws wherever team members choose to live.  But before any of that, the people in charge have to actually want a remote team.

Here’s a quick test for the executives and hiring managers among you:

  • As the Covid-19 pandemic forced your team to work from home, did you insist on hiring only local candidates (so they could eventually work in the office)?
  • With wider vaccine rollouts and lower case counts, do you now require your team to spend some time in the office every week?
  • Do you see someone as “not really part of the team” or “less suitable for promotion” because they don’t come into the office?

If you’ve said yes to any of these, then you simply do not want a distributed team. You want an in-office team that you begrudgingly permit to work from home now and then. And as long as you don’t truly want one, any attempts to build and support one will not succeed.

If you’ve said yes to any of these, then you simply do not want a distributed team. You want an in-office team that you begrudgingly permit to work from home now and then. And as long as you don’t truly want one, any attempts to build and support one will not succeed.

And if you don’t want that, that’s fine. I’m not here to change your mind.

But if you do want to build a successful remote team, and you want some ideas on how to make it work, read on.

How You Say What You Have to Say

As a leader, most of your job involves communicating with people. This will require some adjustment in a distributed team environment.

A lot of you have developed a leadership style that’s optimized for everyone being in the same office space during working hours. That has cultivated poor, interruption-driven communication habits. It’s too easy to stop by someone’s office, pop over a cubicle wall, or bump into someone in the hallway and share some information with them.

With a remote team you’ll need to write these thoughts down instead. That also means deciding what you want to do before you even start writing, and then sticking with it after you’ve filed the request.

By communicating your thoughts in clear, unambiguous language, you’ve demonstrated your commitment to what you’re asking someone to do. You’re also leaving them a document they can refer to as they perform the task you’ve requested. This is key because, depending on work schedules, a person can’t just tap you on the shoulder to ask you to clarify a point.

(Side note: I’ve spent my career working with extremely busy people, and being one myself. That’s taught me a lot about how to communicate in written form. Short sentences, bullet points, and starting the message with the call-to-action—sometimes referred to as BLUF: Bottom Line Up-Front—will go a long way in making your e-mails clearer.)

The same holds true for meetings: the person who called the meeting should send an agenda ahead of time and follow up with recap notes. Attendees will be able to confirm their shared understanding of what is to be done and who is doing what.

Does this feel like a lot of documentation? That’s great. In my experience, what feels like over-communication for an in-office scenario is usually the right amount for a distributed team.

Embracing Remote for What It Is

Grammar rules differ by language. You won’t get very far speaking the words of a new language while using grammatical constructs from your native tongue. It takes time, practice, and patience to learn the new language so that you can truly express yourself.  The path takes you from “this is an unnatural and uncomfortable word order” to “German requires that I put the verb’s infinitive at the end of the clause.  That’s just how it works.”

There are parallels here to leading a distributed team. It’s too easy to assume that “remote work” is just “people re-creating the in-office experience, from their kitchen tables.” It will most certainly feel unnatural and uncomfortable if you hold that perspective.  And it should feel weird, since optimizing for remote work will require re-thinking a lot of the whats and hows of team interactions and success metrics.  You start winning when you determine where a distributed team works out better than the in-office alternative.

Remote work is people getting things done from a space that is not your central office, on time schedules that aren’t strict 9-to-5, and maybe even communicating in text-based chat systems.  Remote work is checking your messages in the morning, and seeing a stream of updates from your night-owl teammates.  Remote work is its own thing, and trying to shoe-horn it into the shape of an in-office setup means losing out on all of the benefits.

Embracing remote teams will require letting go of outdated in-office tropes to accept some uncomfortable truths. People will keep working when you’re not looking over their shoulder.  Some of them will work even better when they can do so in the peace and quiet of an environment they control.  They can be fully present in a meeting, even if they’ve turned off their video. They can most certainly be productive on a work schedule that doesn’t match yours, while wearing casual attire.

The old tropes were hardly valid to begin with. And now, 18 months after diving head-first into remote work, those tropes are officially dead. It’s up to you to learn new ways to evaluate team (and team member) productivity. More importantly, in true remote work fashion, you’ll have to step back and trust the team you’ve hired.

Exploring New Terrain

If distributed teamwork is new territory for your company, expect to stumble now and then. You’re walking through a new area and instead of following your trusty old map, you’re now creating the map. One step at a time, one stubbed toe at a time.

You’ll spend time defining new best practices that are specific to this environment. This will mean thinking through a lot more decisions than before—decisions that you used to be able to handle on autopilot—and as such you will find yourself saying “I don’t know” a lot more than you used to.

You’ll feel some of this friction when sorting out workplace norms.  What are “working hours,” if your team even has any?  Maybe all you need is a weekly group check-in, after which everyone heads in separate directions to focus on their work?  In that case, how will individuals specify their working hours and their off-time?  With so much asynchronous communication, there’s bound to be confusion around when a person is expected to pick up on an ongoing conversation in a chat channel, versus their name being @-mentioned, or contacting them by DM.  Setting those expectations will help the team shift into (the right kind of) autopilot, because they’ll know to not get frustrated when a person takes a few hours to catch up on a chat thread.  As a bonus, going through this exercise will sort out when you really need to hold a group meeting versus when you have to just make an announcement (e-mail) or pose a quick question (chat).

Security will be another source of friction.  When everyone is in the same physical office space, there’s little question of the “inside” versus the “outside” network.  But when your teammates are connecting to shared resources from home or a random cafe, how do you properly wall off the office from everything else? Mandating VPN usage is a start, but it’s hardly the entire picture.  There are also questions around company-issued devices having visibility into home-network traffic, and what they’re allowed to do with that information.  Or even a company laptop, hacked through the company network, infecting personal devices on the home LAN. Is your company’s work so sensitive that employees will require a separate, work-only internet service for their home office?  That would be fairly extreme—in my experience, I haven’t even seen banks go that far—but it’s not out of the realm of possibility.  At some point a CISO may rightfully determine that this is the best path.

Saying “I don’t know” is OK in all of these cases, so long as you follow that with “so let’s figure it out.” Be honest with your team to explain that you, as a group, may have to try a few rounds of something before it all settles. The only two sins here are to refuse to change course when it’s not working, and to revert to the old, familiar, in-office ways just to ease your cognitive burden. So long as you are thoughtful and intentional in your approach, you’ll succeed over the long run.

It’s Here to Stay

Your data scientists (and developers, and IT ops team) have long known that remote work is possible. They communicate through Slack and collaborate using shared documents. They see that their “datacenter” is a cloud infrastructure. They already know that a lot of their day-to-day interactions don’t require everyone being in the same office. Company leadership is usually the last to pick up on this, which is why they tend to show the most resistance.

If adaptive leadership is the key to success with distributed teams, then discipline is the key to that adaptation. You’ll need the discipline to plan your communication, to disable your office autopilot, and to trust your team more.

You must focus on what matters—defining what needs to get done, and letting people do it—and learn to let go of what doesn’t. That will be uncomfortable, yes. But your job as a leader is to clear the path for people who are doing the implementation work. What makes them comfortable trumps what makes you comfortable.

Not every company will accept this. Some are willing to trade the benefits of a distributed team for what they perceive to be a superior in-office experience. And that’s fine. But for those who want it, remote is here to stay.

Categories: Technology

Radar trends to watch: November 2021

O'Reilly Radar - Tue, 2021/11/02 - 04:40

While October’s news was dominated by Facebook’s (excuse me, Meta’s) continued problems (you’d think they’d get tired of the apology tour), the most interesting news comes from the AI world. I’m fascinated by the use of large language models to analyze the “speech” of whales, and to preserve endangered human languages. It’s also important that machine learning seems to have taken a step (pun somewhat intended) forward, with robots that teach themselves to walk by trial and error, and with robots that learn how to assemble themselves to perform specific tasks.

  • The design studio Artefact has created a game to teach middle school students about algorithmic bias.
  • Researchers are building large natural language models, potentially the size of GPT-3, to decode the “speech” of whales.
  • A group at Berkeley has built a robot that uses reinforcement learning to teach itself to walk from scratch–i.e., through trial and error. They used two levels of simulation before loading the model into a physical robot.
  • AI is reinventing computers: AI is driving new kinds of CPUs, new “out of the box” form factors (doorbells, appliances), decision-making rather than traditional computation. The “computer” as the computational device we know may be on the way out.
  • Weird creatures: Unimals, or universal animals, are robots that can use AI to evolve their body shapes so they can solve problems more efficiently. Future generations of robotics might not be designed with fixed bodies, but have the capability to adapt their shape as needed.
  • Would a National AI Cloud be a subsidy to Google, Facebook,, a threat to privacy, or a valuable academic research tool?
  • I’ve been skeptical about digital twins; they seem to be a technology looking for an application. However, Digital Twins (AI models of real-world systems, used for predicting their behavior) seem like a useful technology for optimizing the performance of large batteries.
  • Digital Twins could provide a way to predict supply chain problems and work around shortages. They could allow manufacturers to navigate a compromise between just-in-time stocking processes, which are vulnerable to shortages, and resilience.
  • Modulate is a startup currently testing real-time voice changing software. They provide realistic, human sounding voices that replace the user’s own voice. They are targeting gaming, but the software is useful in many situations where harassment is a risk.
  • Voice copying algorithms were able to fool both people and voice-enabled devices roughly 50% of the time (30% for Azure’s voice recognition service, 62% for Alexa). This is a new front in deep fakery.
  • Facebook AI Research has created a set of first-person (head-mounted camera) videos called Ego4D for training AI.  They want to build AI models that see the world “as a person sees it,” and be able to answer questions like “where did I leave my keys.” In essence, this means that they will need to collect literally everything that a subscriber does.  Although Facebook denies that they are thinking about commercial applications, there are obvious connections to Ray-Ban Stories and their interest in augmented reality.
  • DeepMind is working on a deep learning model that can emulate the output of any algorithm.  This is called Neuro Algorithmic Reasoning; it may be a step towards a “general AI.”
  • Microsoft and NVIDIA announce a 530 billion parameter natural language model named Megatron-Turing NLG 530B.  That’s bigger than GPT-3 (175B parameters).
  • Can machine learning be used to document endangered indigenous languages and aid in language reclamation?
  • Beethoven’s 10th symphony completed by AI: I’m not convinced that this is what Beethoven would have written, but this is better than other (human) attempts to complete the 10th that I’ve heard. It sounds like Beethoven, for the most part, though it quickly gets aimless.
  • I’m still fascinated by techniques to foil face recognition. Here’s a paper about an AI system that designs minimal, natural-looking makeup that reshapes the parts of the face that face recognition algorithms are most sensitive to, without substantially altering a person’s appearance.
  • Thoughtworks’ Responsible Tech Playbook is a curated collection of tools and techniques to help organizations become more aware of bias and become more inclusive and transparent.
  • Kerla is a Linux-like operating system kernel written in Rust that can run most Linux executables. I doubt this will ever be integrated into Linux, but it’s yet another sign that Rust has joined the big time.
  • OSS Port is an open source tool that aims to help developers understand large codebases. It parses a project repository on GitHub and produces maps and tours of the codebase. It currently works with JavaScript, Go, Java, and Python, with Rust support promised soon.
  • Turing Complete is a game about computer science. That about says it…
  • wasmCloud is a runtime environment that can be used to build distributed systems with wasm in the cloud. WebAssembly was designed as a programming-language-neutral virtual machine for  browsers, but it increasingly looks like it will also find a home on the server side.
  • Adobe Photoshop now runs in the browser, using wasm and Emscripten (the C++ toolchain for wasm).  In addition to compiling C++ to wasm, Emscripten also translates POSIX system calls to web API calls and converts OpenGL to WebGL.
  • JQL (JSON Query Language) is a Rust-based language for querying JSON (what else?).
  • Microsoft has launched an effort to train 250,000 cyber security workers in the US by 2025. This effort will work with community colleges. They estimate that it will only make up 50% of the shortfall in security talent.
  • Integrating zero trust security into the software development lifecycle is really the only way forward for companies who rely on systems that are secure and available.
  • A supply chain attack against a Node.js library (UA-Parser-JS) installs crypto miners and trojans for stealing passwords on Linux and Windows systems. The library’s normal function is to parse user agent strings, identifying the browser, operating system, and other parameters.
  • A cybercrime group has created penetration testing consultancies whose purpose is to acquire clients and then gather information and initiate ransomware attacks against those clients.
  • A federated cryptographic system will allow sharing of medical data without compromising patient privacy.  This is an essential element in “predictive, preventive, personalized, and participatory” medicine (aka P4).
  • The European Parliament has taken steps towards banning surveillance based on biometric data, private face recognition databases, and predictive policing.
  • Is it possible to reverse-engineer the data on which a model was trained? An attack against a fake face generator was able to identify the original faces in the training data. This has important implications for privacy and security, since it appears to generalize to other kinds of data.
  • Adversarial attacks against machine learning systems present a different set of challenges for cybersecurity. Models aren’t code, and have their own vulnerabilities and attack vectors. Atlas is a project to define the the machine learning threat landscape. Tools to harden machine learning models against attack include IBM’s Adversarial Robustness Toolbox and Microsoft’s Counterfit.
  • Researchers have discovered that you can encode malware into DNA that attacks sequencing software and gives the attacker control of the computer.  This attack hasn’t (yet) been found in the wild.
  • Masscan is a next generation, extremely fast port scanner.  It’s similar to nmap, but much faster; it claims to be able to scan the entire internet in 6 minutes.
  • ethr is an open source cross-platform network performance measurement tool developed by Microsoft in Go. Right now, it looks like the best network performance tool out there.
  • Self-aware systems monitor themselves constantly and are capable of detecting (and even repairing) attacks.
Infrastructure and Operations Devices and Things
  • Amazon is working on an Internet-enabled refrigerator that will keep track of what’s in it and notify you when you’re low on supplies.  (And there are already similar products on the market.) Remember when this was joke?
  • Consumer-facing AI: On one hand, “smart gadgets” present a lot of challenges and opportunities. On the other hand, it needs better deliverables than “smart” doorbells. Smart hearing aids that are field-upgradable as a subscription service?
  • A drone has been used to deliver a lung for organ transplant. This is only the second time a drone has been used to carry organs for transplantation.
  • Intel has released its next generation neuromorphic processor, Loihi. Neuromorphic processors are based on the structure of the brain, in which neurons asynchronously send each other signals. While they are still a research project, they appear to require much less power than traditional CPUs.
  • ipleak and dnsleaktest are sites that tell you what information your browser leaks. They are useful tools if you’re interested in preserving privacy. The results can be scary.
  • Dark design is the practice of designing interfaces that manipulate users into doing things they might not want to do, whether that’s agreeing to give up information about their web usage or clicking to buy a product. Dark patterns are already common, and becoming increasingly prevalent.
  • Black Twitter has become the new “Green Book,” a virtual place for tips on dealing with a racist society. The original Green Book was a Jim Crow-era publication that told Black people where they could travel safely, which hotels would accept them, and where they were likely to become victims of racist violence.
Quantum Computing
  • A group at Duke University has made significant progress on error correcting quantum computing. They have created a “logical qubit” that can be read with a 99.4% probability of being correct. (Still well below what is needed for practical quantum computing.)
  • There are now two claims of quantum supremacy from Chinese quantum computing projects.
  • Would our response to the COVID pandemic been better if it was approached as an engineering problem, rather than scientific research?
Categories: Technology

The Sobering Truth About the Impact of Your Business Ideas

O'Reilly Radar - Tue, 2021/10/26 - 06:07

The introduction of data science into the business world has contributed far more than recommendation algorithms; it has also taught us a lot about the efficacy with which we manage our businesses. Specifically, data science has introduced rigorous methods for measuring the outcomes of business ideas. These are the strategic ideas that we implement in order to achieve our business goals. For example, “We’ll lower prices to increase demand by 10%” and “we’ll implement a loyalty program to improve retention by 5%.” Many companies simply execute on their business ideas without measuring if they delivered the impact that was expected. But, science-based organizations are rigorously quantifying this impact and have learned some sobering lessons:

  1. The vast majority of business ideas fail to generate a positive impact.
  2. Most companies are unaware of this.
  3. It is unlikely that companies will increase the success rate for their business ideas.

These are lessons that could profoundly change how businesses operate. In what follows, we flesh out the three assertions above with the bulk of the content explaining why it may be difficult to improve the poor success rate for business ideas. Despite the challenges, we conclude with some recommendations for better managing your business.

(1) The vast majority of business ideas fail to generate positive results

To properly measure the outcomes of business ideas, companies are embracing experimentation (a.k.a. randomized controlled trials or A/B testing). The process is simple in concept. Before rolling out a business idea, you test; you try the idea out on a subset group of customers1 while another group—a control group—is not exposed to the new idea. When properly sampled, the two groups will exhibit the same attributes (demographics, geographics, etc.) and behaviors (purchase rates, life-time-value, etc.). Therefore, when the intervention is introduced—ie. the exposure to the new business idea—any changes in behavior can be causally attributed to the new business idea. This is the gold standard in scientific measurement used in clinical trials for medical research, biological studies, pharmaceutical trials, and now to test business ideas.

For the very first time in many business domains, experimentation reveals the causal impact of our business ideas. The results are humbling. They indicate that the vast majority of our business ideas fail to generate positive results. It’s not uncommon for 70-90% of ideas to either have no impact at all or actually move the metrics in the opposite direction of what was intended. Here are some statistics from a few notable companies that have disclosed their success rates publicly:

  • Microsoft declared that roughly one-third of their ideas yield negative results, one-third yield no results, and one-third yield positive results (Kohavi and Thomke, 2017).
  • Streaming service Netflix believes that 90% of its ideas are wrong (Moran, 2007).
  • Google reported that as much as 96.1% of their ideas fail to generate positive results (Thomke, 2020).
  • Travel site shared that 9 out of 10 of their ideas fail to improve metrics (Thomke, 2020).

To be sure, the statistics cited above reflect a tiny subset of the ideas implemented by companies. Further, they probably reflect a particular class of ideas: those that are conducive to experimentation2 such as changes to user interfaces, new ad creatives, subtle messaging variants, and so on. Moreover, the companies represented are all relatively young and either in the tech sector or leverage technology as a medium for their business. This is far from a random sample of all companies and business ideas. So, while it’s possible that the high failure rates are specific to the types of companies and ideas that are convenient to test experimentally, it seems more plausible that the high failure rates are reflective of business ideas in general and that the disparity in perception of their success can be attributed to the method of measurement. We shouldn’t be surprised; high failure rates are common in many domains. Venture capitalists invest in many companies because most fail; similarly, most stock portfolio managers fail to outperform the S&P 500; in biology, most mutations are unsuccessful; and so on. The more surprising aspect of the low success rates for business ideas is most of us don’t seem to know about it.

(2) Most companies are unaware of the low success rates for their business ideas

Those statistics should be sobering to any organization. Collectively, business ideas represent the roadmap companies rely upon to hit their goals and objectives. However, the dismal failure rates appear to be known only to the few companies that regularly conduct experiments to scientifically measure the impact of their ideas. Most companies do not appear to employ such a practice and seem to have the impression that all or most of their ideas are or will be successful. Planners, strategists, and functional leaders rarely convey any doubts about their ideas. To the contrary, they set expectations on the predicted impact of their ideas and plan for them as if they are certain. They attach revenue goals and even their own bonuses to those predictions. But, how much do they really know about the outcomes of those ideas? If they don’t have an experimentation practice, they likely know very little about the impact their roadmap is actually having.

Without experimentation, companies either don’t measure the outcomes of their ideas at all or use flimsy methods to assess their impacts. In some situations, ideas are acted upon so fluidly that they are not recognized as something that merits measurement.  For example, in some companies an idea such as “we’ll lower prices to increase demand by 10%” might be made on a whim by a marketing exec and there will be no follow up at all to see if it had the expected impact on demand. In other situations, a post-implementation assessment of a business idea is done, but in terms of execution, not impact (“Was it implemented on time?” “Did it meet requirements?” etc., not “What was the causal impact on business metrics?”). In other cases still, post hoc analysis is performed in an attempt to quantify the impact of the idea. But, this is often done using subjective or less-than-rigorous methods to justify the idea as a success. That is, the team responsible for doing the analysis often is motivated either implicitly or explicitly to find evidence of success. Bonuses are often tied to the outcomes of business ideas. Or, perhaps the VP whose idea it was is the one commissioning the analysis. In either case, there is a strong motivation to find success. For example, a company may seek qualitative customer feedback on the new loyalty program in order to craft a narrative for how it is received. Yet, the customers willing to give feedback are often biased towards the positive. Even if more objective feedback were to be acquired it would still not be a measure of impact; customers often behave differently from the sentiments they express. In still other cases, empirical analysis is performed on transaction data in an attempt to quantify the impact. But, without experimentation, at best, such analysis can only capture correlation—not causation. Business metrics are influenced simultaneously by many factors, including random fluctuations. Without properly controlling for these factors, it can be tempting to attribute any uptick in metrics as a result of the new business idea. The combination of malleable measurements and strong incentives to show success likely explain why so many business initiatives are perceived to be successful.

By contrast, the results of experimentation are numeric and austere. They do not care about the hard work that went into executing on a business initiative. They are unswayed by well-crafted narratives, emotional reviews by customers, or an executive’s influence. In short, they are brutally honest and often hard-to-accept.3 Without experimentation, companies don’t learn the sobering truth about their high failure rate. While ignorance is bliss, it is not an effective way to run your business.

(3) It is unlikely that companies will increase the success rate for their business ideas.

At this point, you may be thinking, “we need to get better at separating the wheat from the chaff, so that we only allocate resources to the good ideas.” Sadly, without experimentation, we see little reason for optimism as there are forces that will actively work against your efforts.

One force that is actively working against us is the way we reason about our companies.

We like to reason about our businesses as if they are simple, predictable systems. We build models of their component parts and manage them as if they are levers we can pull in order to predictably manage the business to a desired state. For example, a marketer seeking to increase demand builds a model that allows her to associate each possible price with a predicted level of demand. The scope of the model is intentionally narrow so that she can isolate the impact price has on demand. Other factors like consumer perception, the competitive assortment, operational capacity, the macroeconomic landscape, and so on are out of her control and assumed to remain constant. Equipped with such an intuitive model, she can identify the price that optimizes demand. She’s in control and hitting her goal is merely a matter of execution.

However, experimentation reveals that our predictions for the impact of new business ideas can be radically off—not just a little off in terms of magnitude, but often in the completely wrong direction. We lower prices and see demand go down. We launch a new loyalty program and it hurts retention. Such unintuitive results are far more common than you might think.

The problem is that many businesses behave as complex systems which cannot be understood by studying its components in isolation. Customers, competitors, partners, market force—each can adjust in response to the intervention in ways that are not observable from simple models of the components. Just as you can’t learn about an ant colony by studying the behaviors of an individual ant (Mauboussin, 2009), the insights derived from modeling individual components of a business in isolation often have little relevance to the way the business behaves as a whole.

It’s important to note that our use of the term complex does not just mean ‘not simple.’ Complexity is an entire area of research within Systems Theory. Complexity arises in systems with many interacting agents that react and adapt to one another and their environment. Examples of complex systems include weather systems, rain forest ecology, economies, the nervous system, cities, and yes, many businesses.

Reasoning about complex systems requires a different approach. Rather than focusing on component parts, attention needs to be directed at system-wide behaviors. These behaviors are often termed “emergent,” to indicate that they are very hard to anticipate. This frame orients us around learning, not executing. It encourages more trial and error with less attachment to the outcomes of a narrow set of ideas. As complexity researcher Scott E. Page says, “An actor in a complex system controls almost nothing but influences almost everything” (Page, 2009).

An example of an attempt to manage a complex system to change behaviors

To make this tangible let’s take a look at a real example. Consider the story of the child daycare company featured in the popular book, Freakonomics (the original paper can be found here). The company faced a challenge with late pickups. The daycare closed at 4:00pm, yet parents would frequently pick up their children several minutes later. This required staff to stay late causing both expense and inconvenience. Someone in the company had a business idea to address the situation: a fine for late pickups.

Many companies would simply implement the fine and not think to measure the outcome. Fortunately for the daycare, a group of researchers convinced them to run an experiment to measure the effectiveness of the policy. The daycare operates many locations which were randomly divided into test and control groups; the test sites would implement the late pickup fine while the control sites would leave things as is. The experiment ran its course and to everyone’s surprise they learned that fine actually increased the number of late pickups.

How is it possible that the business idea had the opposite effect of what was intended? There are several very plausible explanations, which we summarize below—some of these come from the paper while others are our own hypotheses.

  • The authors of the paper assert that imposing a fine makes the penalty for a late pick up explicitly clear. Parents are generally aware that late pick-ups are not condoned. But in the absence of a fine, they are unsure what the penalty may be. Some parents may have imagined a penalty much worse than the fine—e.g., expulsion from the daycare. This belief might have been an effective deterrent. But when the fine was imposed it explicitly quantified that amount of the penalty for the late pickups (roughly equivalent to $2.75 in 1998 dollars). For some parents this was a sigh of relief—expulsion was not on the docket. One merely has to pay a fine for the transgression, making the cost of a late pickup less than what was believed. Hence, late pick-ups increase (Gneezy & Rustichini, 2000).

  • Another explanation from the paper involves social norms. Many parents may have considered late pickups as socially inappropriate and would therefore go through great lengths to avoid them (leaving work early, scrambling for backup coverage, etc). The fine however, provides an easier way to stay in good social standing. It’s as if it signals ‘late pickups are not condoned. But if you pay us the fine you are forgiven. Therefore, the fine acts as the price to pay to stay in good standing. For some parents this price is low relative to the arduous and diligent planning required to prevent a late pickup. Hence, late pickups increase in the presence of the fine (Gneezy & Rustichini, 2000).

  • Still another explanation (which was only alluded to in the paper) has to do with the perceived cost structure associated with the staff having to stay late. From the parent’s perspective, the burden to the daycare of a late pickup might be considered fixed. If there is already at least one other parent also running late then there is no extra burden imposed since staff already has to stay. As surmised by the other explanations above, the fine increases the number of late pickups, which, therefore increases the probability that staff will have to stay late due to some other parent’s tardiness. Thus, one extra late pickup is no additional burden. Late pickups increase further.

  • One of our own explanations has to do with social norms thresholds. Each parent has a threshold for the appropriateness for late pickups based on social norms. The threshold might be the number of other parents observed or believed to be doing late pickups before such activity is believed to be appropriate. I.e., if others are doing it, it must be okay. (Note: this signal of appropriateness is independent from the perceived fixed cost structure mentioned above.) Since the fine increased the number of late pickups for some parents, other parents observed more late pickups and then followed suit.

The above are plausible explanations for the observed outcome. Some may even seem obvious in hindsight.4 However, these behaviors are extremely difficult to anticipate by focusing your attention on an individual component part: the fine.  Such surprising outcomes are less rare than you might think. In this case, the increase in late pickups might have been so apparent that they could have been detected even without the experiment. However, the impact of many ideas often go undetected.

Another force that is actively working against our efforts to discern good ideas from bad is our cognitive biases. You might be thinking: “Thank goodness my company has processes that filter away bad ideas, so that we only invest in great ideas!” Unfortunately, all companies probably try hard to select only the best ideas, and yet we assert that they are not particularly successful at separating good from bad ideas. We suggest that this is because these processes are deeply human in nature, leaving them vulnerable to cognitive biases.

Cognitive biases are systematic errors in human thinking and decision making (Tversky & Kahneman, 1974). They result from the core thinking and decision making processes that we developed over our evolutionary history. Unfortunately, evolution adapted us to an environment with many differences from the modern world. This can lead to a habit of poor decision making. To illustrate: we know that a healthy bundle of kale is better for our bodies than a big juicy burger. Yet, we have an innate preference for the burger. Many of us will decide to eat the burger tonight. And tomorrow night. And again next week. We know we shouldn’t. But yet our society continues consuming too much meat, fat, and sugar. Obesity is now a major public health problem. Why are we doing this to ourselves? Why are we imbued with such a strong urge—a literal gut instinct—to repeatedly make decisions that have negative consequences for us? It’s because meat, fat, and sugar were scarce and precious resources for most of our evolutionary history. Consuming these resources at every opportunity was an adaptive behavior, and so humans evolved a strong desire to do so. Unfortunately, we remain imbued with this desire despite the modern world’s abundance of burger joints.

Cognitive biases are predictable and pervasive. We fall prey to them despite believing that we are rational and objective thinkers. Business leaders (ourselves included) are not immune. These biases compromise our ability to filter out bad business ideas. They can also make us feel extremely confident as we make a bad business decision. See the following sidebar for examples of cognitive biases manifesting in business environments and producing bad decisions.

Cognitive bias examples

Group Think (Whyte, 1952) describes our tendency to converge towards shared opinions when we gather in groups. This emerges from a very human impulse to conform. Group cohesion was important in our evolutionary past. You might have observed this bias during a prioritization meeting: The group entered with disparate, weakly held opinions, but exited with a consensus opinion, which everyone felt confident about.  As a hypothetical example: A meeting is called to discuss a disagreement between two departments. Members of the departments have differing but strong opinions, based on solid lines of reasoning and evidence. But once the meeting starts the attendees begin to self censor. Nobody wants to look difficult. One attendee recognizes a gaping flaw in the “other side’s” analysis, but they don’t want to make their key cross functional partner look bad in front of their boss. Another attendee may have thought the idea was too risky, but, because the responsibility for the idea is now diffused across everyone in the meeting, won’t be her fault if the project fails and so she acquiesces. Finally, a highly admired senior executive speaks up and everyone converges towards this position (in business lingo we just heard the HiPPO or Highest Paid Person’s Opinion; or in the scientific vernacular, the Authority Bias (Milgram, 1963). These social pressures will have collectively stifled the meaningful debate that could have filtered out a bad business decision.

The Sunk Cost bias (Arkes & Blumer, 1985) describes our tendency to justify new investments via past expenditures. In colloquial terms, it’s our tendency to throw good money after bad. We suspect you’ve seen this bias more than a few times in the workplace. As another hypothetical example: A manager is deciding what their team will prioritize over the next fiscal year. They naturally think about incremental improvements that they could make to their team’s core product. This product is based on a compelling idea, however, it hasn’t yet delivered the impact that everyone expected. But, the manager has spent so much time and effort building organizational momentum behind the product. The manager gave presentations about it to senior leadership and painstakingly cultivated a sense of excitement about it with their cross functional partners. As a result, the manager decides to prioritize incremental work on the existing product, without properly investigating a new idea that would have yielded much more impact. In this case, the manager’s decision was driven by thinking about the sunk costs associated with the existing system. This created a barrier to innovation and yielded a bad business decision.

The Confirmation Bias (Nickerson, 1998) describes our tendency to focus upon evidence that confirms our beliefs, while discounting evidence that challenges our beliefs. We’ve certainly fallen prey to this bias in our personal and professional lives. As a hypothetical example: An exec wonders ‘should we implement a loyalty program to improve client retention?’ They find a team member who thinks this sounds like a good idea. So the exec asks the team member to do some market research to inform whether the company should create their own loyalty program. The team member looks for examples of highly successful loyalty programs from other companies. Why look for examples of bad programs? This company has no intention of implementing a bad loyalty program. Also, the team member wants to impress the exec by describing all the opportunities that could be unlocked with this program. They want to demonstrate their abilities as a strategic thinker. They might even get to lead the implementation of the program, which could be great for their career. As a result, the team member builds a presentation that emphasizes positive examples and opportunities, while discounting negative examples and risks. This presentation leads the exec to overestimate the probability that this initiative will improve client retention, and thus fail to filter out a bad business decision.

The biases we’ve listed above are just a sample of the extensive and well documented set of cognitive biases (e.g., Availability Bias, Survivorship Bias, Dunning-Kruger effect, etc.) that limit business leaders’ ability to identify and implement only successful business initiatives. Awareness of these biases can decrease our probability of committing them. However, awareness isn’t a silver bullet. We have a desk mat in our office that lists many of these cognitive biases. We regret to report that we often return to our desks, stare down at the mat … and realize that we’ve just fallen prey to another bias. 

A final force that is actively working against efforts to discern good ideas from bad is your business maturing. A thought experiment: Suppose a local high school coach told NBA superstar Stephen Curry how to adjust his jump shot. Would implementing these changes improve or hurt his performance? It is hard to imagine it would help. Now, suppose the coach gave this advice to a local 6th grader. It seems likely that it would help the kid’s game.

Now, imagine a consultant telling Google how to improve their search algorithm versus advising a startup on setting up a database. It’s easier to imagine the consultant helping the startup. Why? Well, Google search is a cutting edge system that has received extensive attention from numerous world class experts—kind of like Steph Curry. It’s going to be hard to offer a new great idea. In contrast, the startup will benefit from getting pointed in a variety of good directions—kind of like a 6th grader.

To use a more analytic framework, imagine a hill which represents a company’s objective function5 like profit, revenue, or retention. The company’s goal is to climb to the peak, where it’s objective is maximized. However, the company can’t see very far in this landscape. It doesn’t know where the peak is. It can only assess (if it’s careful and uses experimentation) whether it’s going up or downhill by taking small steps in different directions—perhaps by tweaking it’s pricing strategy and measuring if revenue goes up.

When a company (or basketball player) is young, its position on this objective function (profit, etc.) landscape is low. It can step in many directions and go uphill. Through this process, a company can grow (walk up Mount Revenue). However, as it climbs the mountain, a smaller proportion of the possible directions to step will lead uphill. At the summit a step in any direction will take you downhill.

This is admittedly a simple model  of a business (and we already discussed the follies of using simple models). However, all companies will eventually face the truism that as they improve, there are fewer ways to continue to improve (the low apples have been plucked), as well as the extrinsic constraints of market saturation, commoditization, etc. that make it harder to improve your business as it matures.6

So, what to do

We’ve argued that most business ideas fail to deliver on their promised goals. We’ve also explained that there are systematic reasons that make it unlikely that companies will get better, just by trying harder. So where does this leave you? Are you destined to implement mostly bad ideas? Here are a few recommendations that might help:

  1. Run experiments and exercise your optionality. Recognize that your business may be a complex system, making it very difficult to predict how it will respond to your business ideas. Instead of rolling out your new business ideas to all customers, try them on a sample of customers as an experiment. This will show you the impact your idea has on the company. You can then make an informed decision about whether or not to roll out your idea. If your idea has a positive impact, great. Roll it out to all customers. But in the more likely event that your idea does not have the positive impact you were hoping for you can end the experiment and kill the idea. It may seem wasteful to use company resources to implement a business idea only to later kill it, but this is better than unknowingly providing on-going support to an idea that is doing nothing or actually hurting your metrics—which is what happens most of the time.
  2. Recognize your cognitive biases, collect a priori predictions, and celebrate learnings. Your company’s ability to filter out bad business ideas will be limited by your team member’s cognitive biases. You can start building a culture that appreciates this fact by sending a survey to all of a project’s stakeholders before your next big release. Ask everyone to predict how the metrics will move. Make an anonymized version of these predictions and accuracy available for employees. We expect your team members will become less confident in their predictions over time. This process may also reveal that big wins tend to emerge from a string of experiments, rather than a single stroke of inspiration. So celebrate all of the necessary stepping stones on the way to a big win.
  3. Recognize that it’s going to get harder to find successful ideas, so try more things, and get more skeptical. As your company matures, it may get harder to find ways to improve it. We see three ways to address this challenge. First, try more ideas. It will be hard to increase the success rate of your ideas, so try more ideas. Consider building a leverageable and reusable experimentation platform to increase your bandwidth. Follow the lead of the venture world: fund a lot of ideas to get a few big wins.7 Second, as your company matures, you might want to adjust the amount of evidence that is required before you roll out a change—a more mature company should require a higher degree of statistical certainty before inferring that a new feature has improved metrics. In experimental lingo, you might want to adjust the “p-value thresholds” that you use to assess an experiment. Or to use our metaphor, a 6th grader should probably just listen whenever a coach tells them to adjust their jump shot, but Steph Curry should require a lot of evidence before he adjusts his.

This may be a hard message to accept. It’s easier to assume that all of our ideas are having the positive impact that we intended. It’s more inspiring to believe that successful ideas and companies are the result of brilliance rather than trial and error. But, consider the deference we give to mother nature. She is able to produce such exquisite creatures—the giraffe, the mighty oak tree, even us humans—each so perfectly adapted to their environment that we see them as the rightful owners of their respective niches. Yet, mother nature achieves this not through grandiose ideas, but through trial and error… with a success rate far more dismal than that of our business ideas. It’s an effective strategy if we can convince our egos to embrace it.


Arkes, H. R., & Blumer, C. (1985), The psychology of sunk costs. Organizational Behavior and Human Decision Processes, 35, 124-140.

Gneezy, U., & Rustichini, A. (2000). A Fine is a Price. The Journal of Legal Studies, 29(1), 1-17. doi:10.1086/468061

Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure to disagree. American Psychologist, 64(6), 515–526.

Kohavi, R. & Thomke, S. “The Surprising Power of Online Experiments,” Harvard Business Review 95, no. 5 (September-October 2017)

Mauboussin, M. J. (2009). Think Twice: Harnessing the Power of Counterintuition. Harvard Business Review Press.

Milgram, S. (1963). “Behavioral Study of obedience”. The Journal of Abnormal and Social Psychology. 67 (4): 371–378.

Moran, M. Do It Wrong Quickly: How the Web Changes the Old Marketing Rules . s.l. : IBM Press, 2007. 0132255960.

Nickerson, R. S. (1998), “Confirmation bias: A ubiquitous phenomenon in many guises”, Review of General Psychology, 2 (2): 175–220.

Page, S. E. (2009). Understanding Complexity – The Great Courses – Lecture Transcript and Course Guidebook (1st ed.). The Teaching Company.

Thomke, S. H. (2020). Experimentation Works: The Surprising Power of Business Experiments. Harvard Business Review Press.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.

Whyte, W. H., (1952). “Groupthink”. Fortune, 114-117, 142, 146.

  1. Do not confuse the term ‘test’ to mean a process by which a nascent idea is vetted to get feedback. In an experiment, the test group receives a full-featured implementation of an idea. The goal of the experiment is to measure impact—not get feedback.
  2. In some cases there may be insufficient sample size, ethical concerns, lack of a suitable control group, and many other conditions that can inhibit experimentation.
  3. Even trained statisticians can fall victim to pressures to cajole the data. “P-hacking”, “significance chasing” and other terms refer to the temptation to use flawed methods in statistical analysis.
  4. We believe that these types of factors are only obvious in hindsight because the signals are often unobserved until we know to look for them (Kahneman & Klein, 2009).
  5. One reason among many why this mental picture is oversimplified is that it implicitly takes business conditions and the world at large to be static—the company “state vector” that maximizes the objective function today is the same as what maximizes the objective function tomorrow. In other words, it ignores that, in reality, the hill is changing shape under our feet as we try to climb it. Still, it’s a useful toy model.
  6. Finding a new market (jumping to a new “hill” in the “Mount Revenue” metaphor), as recommended in the next section, is one way to continue improving business metrics even as your company matures.
  7. VCs are able to learn about the outcomes of the startups even without experimentation. This is because the outcomes are far more readily apparent than that of business ideas. It’s difficult to cajole results to show a successful outcome when the company is out of business.

Categories: Technology

MLOps and DevOps: Why Data Makes It Different

O'Reilly Radar - Tue, 2021/10/19 - 07:17

Much has been written about struggles of deploying machine learning projects to production. As with many burgeoning fields and disciplines, we don’t yet have a shared canonical infrastructure stack or best practices for developing and deploying data-intensive applications. This is both frustrating for companies that would prefer making ML an ordinary, fuss-free value-generating function like software engineering, as well as exciting for vendors who see the opportunity to create buzz around a new category of enterprise software.

The new category is often called MLOps. While there isn’t an authoritative definition for the term, it shares its ethos with its predecessor, the DevOps movement in software engineering: by adopting well-defined processes, modern tooling, and automated workflows, we can streamline the process of moving from development to robust production deployments. This approach has worked well for software development, so it is reasonable to assume that it could address struggles related to deploying machine learning in production too.

However, the concept is quite abstract. Just introducing a new term like MLOps doesn’t solve anything by itself, rather, it just adds to the confusion. In this article, we want to dig deeper into the fundamentals of machine learning as an engineering discipline and outline answers to key questions:

  1. Why does ML need special treatment in the first place? Can’t we just fold it into existing DevOps best practices?
  2. What does a modern technology stack for streamlined ML processes look like?
  3. How can you start applying the stack in practice today?
Why: Data Makes It Different

All ML projects are software projects. If you peek under the hood of an ML-powered application, these days you will often find a repository of Python code. If you ask an engineer to show how they operate the application in production, they will likely show containers and operational dashboards—not unlike any other software service.

Since software engineers manage to build ordinary software without experiencing as much pain as their counterparts in the ML department, it begs the question: should we just start treating ML projects as software engineering projects as usual, maybe educating ML practitioners about the existing best practices?

Let’s start by considering the job of a non-ML software engineer: writing traditional software deals with well-defined, narrowly-scoped inputs, which the engineer can exhaustively and cleanly model in the code. In effect, the engineer designs and builds the world wherein the software operates.

In contrast, a defining feature of ML-powered applications is that they are directly exposed to a large amount of messy, real-world data which is too complex to be understood and modeled by hand.

This characteristic makes ML applications fundamentally different from traditional software. It has far-reaching implications as to how such applications should be developed and by whom:

  1. ML applications are directly exposed to the constantly changing real world through data, whereas traditional software operates in a simplified, static, abstract world which is directly constructed by the developer.
  2. ML apps need to be developed through cycles of experimentation: due to the constant exposure to data, we don’t learn the behavior of ML apps through logical reasoning but through empirical observation.
  3. The skillset and the background of people building the applications gets realigned: while it is still effective to express applications in code, the emphasis shifts to data and experimentation—more akin to empirical science—rather than traditional software engineering.

This approach is not novel. There is a decades-long tradition of data-centric programming: developers who have been using data-centric IDEs, such as RStudio, Matlab, Jupyter Notebooks, or even Excel to model complex real-world phenomena, should find this paradigm familiar. However, these tools have been rather insular environments: they are great for prototyping but lacking when it comes to production use.

To make ML applications production-ready from the beginning, developers must adhere to the same set of standards as all other production-grade software. This introduces further requirements:

  1. The scale of operations is often two orders of magnitude larger than in the earlier data-centric environments. Not only is data larger, but models—deep learning models in particular—are much larger than before.
  2. Modern ML applications need to be carefully orchestrated: with the dramatic increase in the complexity of apps, which can require dozens of interconnected steps, developers need better software paradigms, such as first-class DAGs.
  3. We need robust versioning for data, models, code, and preferably even the internal state of applications—think Git on steroids to answer inevitable questions: What changed? Why did something break? Who did what and when? How do two iterations compare?
  4. The applications must be integrated to the surrounding business systems so ideas can be tested and validated in the real world in a controlled manner.

Two important trends collide in these lists. On the one hand we have the long tradition of data-centric programming; on the other hand, we face the needs of modern, large-scale business applications. Either paradigm is insufficient by itself: it would be ill-advised to suggest building a modern ML application in Excel. Similarly, it would be pointless to pretend that a data-intensive application resembles a run-off-the-mill microservice which can be built with the usual software toolchain consisting of, say, GitHub, Docker, and Kubernetes.

We need a new path that allows the results of data-centric programming, models and data science applications in general, to be deployed to modern production infrastructure, similar to how DevOps practices allows traditional software artifacts to be deployed to production continuously and reliably. Crucially, the new path is analogous but not equal to the existing DevOps path.

What: The Modern Stack of ML Infrastructure

What kind of foundation would the modern ML application require? It should combine the best parts of modern production infrastructure to ensure robust deployments, as well as draw inspiration from data-centric programming to maximize productivity.

While implementation details vary, the major infrastructural layers we’ve seen emerge are relatively uniform across a large number of projects. Let’s now take a tour of the various layers, to begin to map the territory. Along the way, we’ll provide illustrative examples. The intention behind the examples is not to be comprehensive (perhaps a fool’s errand, anyway!), but to reference concrete tooling used today in order to ground what could otherwise be a somewhat abstract exercise.

Adapted from the book Effective Data Science Infrastructure Foundational Infrastructure Layers Data

Data is at the core of any ML project, so data infrastructure is a foundational concern. ML use cases rarely dictate the master data management solution, so the ML stack needs to integrate with existing data warehouses. Cloud-based data warehouses, such as Snowflake, AWS’ portfolio of databases like RDS, Redshift or Aurora, or an S3-based data lake, are a great match to ML use cases since they tend to be much more scalable than traditional databases, both in terms of the data set sizes as well as query patterns.


To make data useful, we must be able to conduct large-scale compute easily. Since the needs of data-intensive applications are diverse, it is useful to have a general-purpose compute layer that can handle different types of tasks from IO-heavy data processing to training large models on GPUs. Besides variety, the number of tasks can be high too: imagine a single workflow that trains a separate model for 200 countries in the world, running a hyperparameter search over 100 parameters for each model—the workflow yields 20,000 parallel tasks.

Prior to the cloud, setting up and operating a cluster that can handle workloads like this would have been a major technical challenge. Today, a number of cloud-based, auto-scaling systems are easily available, such as AWS Batch. Kubernetes, a popular choice for general-purpose container orchestration, can be configured to work as a scalable batch compute layer, although the downside of its flexibility is increased complexity. Note that container orchestration for the compute layer is not to be confused with the workflow orchestration layer, which we will cover next.


The nature of computation is structured: we must be able to manage the complexity of applications by structuring them, for example, as a graph or a workflow that is orchestrated.

The workflow orchestrator needs to perform a seemingly simple task: given a workflow or DAG definition, execute the tasks defined by the graph in order using the compute layer. There are countless systems that can perform this task for small DAGs on a single server. However, as the workflow orchestrator plays a key role in ensuring that production workflows execute reliably, it makes sense to use a system that is both scalable and highly available, which leaves us with a few battle-hardened options, for instance: Airflow, a popular open-source workflow orchestrator; Argo, a newer orchestrator that runs natively on Kubernetes, and managed solutions such as Google Cloud Composer and AWS Step Functions.

Software Development Layers

While these three foundational layers, data, compute, and orchestration, are technically all we need to execute ML applications at arbitrary scale, building and operating ML applications directly on top of these components would be like hacking software in assembly language: technically possible but inconvenient and unproductive. To make people productive, we need higher levels of abstraction. Enter the software development layers.


ML app and software artifacts exist and evolve in a dynamic environment. To manage the dynamism, we can resort to taking snapshots that represent immutable points in time: of models, of data, of code, and of internal state. For this reason, we require a strong versioning layer.

While Git, GitHub, and other similar tools for software version control work well for code and the usual workflows of software development, they are a bit clunky for tracking all experiments, models, and data. To plug this gap, frameworks like Metaflow or MLFlow provide a custom solution for versioning.

Software Architecture

Next, we need to consider who builds these applications and how. They are often built by data scientists who are not software engineers or computer science majors by training. Arguably, high-level programming languages like Python are the most expressive and efficient ways that humankind has conceived to formally define complex processes. It is hard to imagine a better way to express non-trivial business logic and convert mathematical concepts into an executable form.

However, not all Python code is equal. Python written in Jupyter notebooks following the tradition of data-centric programming is very different from Python used to implement a scalable web server. To make the data scientists maximally productive, we want to provide supporting software architecture in terms of APIs and libraries that allow them to focus on data, not on the machines.

Data Science Layers

With these five layers, we can present a highly productive, data-centric software interface that enables iterative development of large-scale data-intensive applications. However, none of these layers help with modeling and optimization. We cannot expect data scientists to write modeling frameworks like PyTorch or optimizers like Adam from scratch! Furthermore, there are steps that are needed to go from raw data to features required by models.

Model Operations

When it comes to data science and modeling, we separate three concerns, starting from the most practical progressing towards the most theoretical. Assuming you have a model, how can you use it effectively? Perhaps you want to produce predictions in real-time or as a batch process. No matter what you do, you should monitor the quality of the results. Altogether, we can group these practical concerns in the model operations layer. There are many new tools in this space helping with various aspects of operations, including Seldon for model deployments, Weights and Biases for model monitoring, and TruEra for model explainability.

Feature Engineering

Before you have a model, you have to decide how to feed it with labelled data. Managing the process of converting raw facts to features is a deep topic of its own, potentially involving feature encoders, feature stores, and so on. Producing labels is another, equally deep topic. You want to carefully manage consistency of data between training and predictions, as well as make sure that there’s no leakage of information when models are being trained and tested with historical data. We bucket these questions in the feature engineering layer. There’s an emerging space of ML-focused feature stores such as Tecton or labeling solutions like Scale and Snorkel. Feature stores aim to solve the challenge that many data scientists in an organization require similar data transformations and features for their work and labeling solutions deal with the very real challenges associated with hand labeling datasets.

Model Development

Finally, at the very top of the stack we get to the question of mathematical modeling: What kind of modeling technique to use? What model architecture is most suitable for the task? How to parameterize the model? Fortunately, excellent off-the-shelf libraries like scikit-learn and PyTorch are available to help with model development.

An Overarching Concern: Correctness and Testing

Regardless of the systems we use at each layer of the stack, we want to guarantee the correctness of results. In traditional software engineering we can do this by writing tests: for instance, a unit test can be used to check the behavior of a function with predetermined inputs. Since we know exactly how the function is implemented, we can convince ourselves through inductive reasoning that the function should work correctly, based on the correctness of a unit test.

This process doesn’t work when the function, such as a model, is opaque to us. We must resort to black box testing—testing the behavior of the function with a wide range of inputs. Even worse, sophisticated ML applications can take a huge number of contextual data points as inputs, like the time of day, user’s past behavior, or device type into account, so an accurate test set up may need to become a full-fledged simulator.

Since building an accurate simulator is a highly non-trivial challenge in itself, often it is easier to use a slice of the real-world as a simulator and A/B test the application in production against a known baseline. To make A/B testing possible, all layers of the stack should be be able to run many versions of the application concurrently, so an arbitrary number of production-like deployments can be run simultaneously. This poses a challenge to many infrastructure tools of today, which have been designed for more rigid traditional software in mind. Besides infrastructure, effective A/B testing requires a control plane, a modern experimentation platform, such as StatSig.

How: Wrapping The Stack For Maximum Usability

Imagine choosing a production-grade solution for each layer of the stack: for instance, Snowflake for data, Kubernetes for compute (container orchestration), and Argo for workflow orchestration. While each system does a good job at its own domain, it is not trivial to build a data-intensive application that has cross-cutting concerns touching all the foundational layers. In addition, you have to layer the higher-level concerns from versioning to model development on top of the already complex stack. It is not realistic to ask a data scientist to prototype quickly and deploy to production with confidence using such a contraption. Adding more YAML to cover cracks in the stack is not an adequate solution.

Many data-centric environments of the previous generation, such as Excel and RStudio, really shine at maximizing usability and developer productivity. Optimally, we could wrap the production-grade infrastructure stack inside a developer-oriented user interface. Such an interface should allow the data scientist to focus on concerns that are most relevant for them, namely the topmost layers of stack, while abstracting away the foundational layers.

The combination of a production-grade core and a user-friendly shell makes sure that ML applications can be prototyped rapidly, deployed to production, and brought back to the prototyping environment for continuous improvement. The iteration cycles should be measured in hours or days, not in months.

Over the past five years, a number of such frameworks have started to emerge, both as commercial offerings as well as in open-source.

Metaflow is an open-source framework, originally developed at Netflix, specifically designed to address this concern (disclaimer: one of the authors works on Metaflow): How can we wrap robust production infrastructure in a single coherent, easy-to-use interface for data scientists? Under the hood, Metaflow integrates with best-of-the-breed production infrastructure, such as Kubernetes and AWS Step Functions, while providing a development experience that draws inspiration from data-centric programming, that is, by treating local prototyping as the first-class citizen.

Google’s open-source Kubeflow addresses similar concerns, although with a more engineer-oriented approach. As a commercial product, Databricks provides a managed environment that combines data-centric notebooks with a proprietary production infrastructure. All cloud providers provide commercial solutions as well, such as AWS Sagemaker or Azure ML Studio.

While these solutions, and many less known ones, seem similar on the surface, there are many differences between them. When evaluating solutions, consider focusing on the three key dimensions covered in this article:

  1. Does the solution provide a delightful user experience for data scientists and ML engineers? There is no fundamental reason why data scientists should accept a worse level of productivity than is achievable with existing data-centric tools.
  2. Does the solution provide first-class support for rapid iterative development and frictionless A/B testing? It should be easy to take projects quickly from prototype to production and back, so production issues can be reproduced and debugged locally.
  3. Does the solution integrate with your existing infrastructure, in particular to the foundational data, compute, and orchestration layers? It is not productive to operate ML as an island. When it comes to operating ML in production, it is beneficial to be able to leverage existing production tooling for observability and deployments, for example, as much as possible.

It is safe to say that all existing solutions still have room for improvement. Yet it seems inevitable that over the next five years the whole stack will mature, and the user experience will converge towards and eventually beyond the best data-centric IDEs.  Businesses will learn how to create value with ML similar to traditional software engineering and empirical, data-driven development will take its place amongst other ubiquitous software development paradigms.

Categories: Technology

PLUG Topic for 10/14

PLUG - Thu, 2021/10/14 - 16:58

Thursday's topic is Introduction to Nextcloud by der.hans
This is a remote meeting.  Please join by going to at 7pm on Thursday October 14th

der.hans: Introduction to Nextcloud

Nextcloud for private, secure cloud features such as collaboration tools, Enterprise File Sync and Share and phone syncing.

Your data, your cloud.

Nextcloud integrates with Collabora Online for collaborative office suite.

In addition to the Nextcloud Talk for video calls it integrates with Big Blue Button for full classroom and meeting style video conferencing.

Nextcloud also has options like phone contact and file sync, web forms, project management and password management.

Nextcloud is used by large, enterprise organizations, but can also be self-hosted at home for personal use. There are commercial vendors that offer hosting and some Free Software organizations offer hosting as well.

Nextcloud was created to be Free Software. Using the AGPL helps ensure it will remain Free in the future.

Attendees will learn about:

* Basic features of Nextcloud
* Integrations with Collabora Online office suite and Big Blue Button
* Nextcloud apps
* Nextcloud history
* Hosting options

About der.hans:
der.hans works remotely in the US.
In the FLOSS community he is active with conferences and local groups.

He's chairman of PLUG, chair for SeaGL Finance committee, founder of SeaGL Career Expo, RaiseMe career counselor, BoF organizer for the Southern California Linux Expo (SCaLE) and founder of the Free Software Stammtisch.

He speaks regularly at community-led FLOSS conferences such as SeaGL, Penguicon, SCaLE, LCA, FOSSASIA, Tübix, CLT, Kieler Linux Tage, LFNW, OLF, SELF and GeekBeacon Fest.

Hans is available to speak remotely for local groups.

Currently a Customer Data Engineer at Object Rocket. Public statements are not representative of $dayjob.

Fediverse/Mastodon -

The Quality of Auto-Generated Code

O'Reilly Radar - Tue, 2021/10/12 - 06:45

Kevlin Henney and I were riffing on some ideas about GitHub Copilot, the tool for automatically generating code base on GPT-3’s language model, trained on the body of code that’s in GitHub. This article poses some questions and (perhaps) some answers, without trying to present any conclusions.

First, we wondered about code quality. There are lots of ways to solve a given programming problem; but most of us have some ideas about what makes code “good” or “bad.” Is it readable, is it well-organized? Things like that.  In a professional setting, where software needs to be maintained and modified over long periods, readability and organization count for a lot.

We know how to test whether or not code is correct (at least up to a certain limit). Given enough unit tests and acceptance tests, we can imagine a system for automatically generating code that is correct. Property-based testing might give us some additional ideas about building test suites robust enough to verify that code works properly. But we don’t have methods to test for code that’s “good.” Imagine asking Copilot to write a function that sorts a list. There are lots of ways to sort. Some are pretty good—for example, quicksort. Some of them are awful. But a unit test has no way of telling whether a function is implemented using quicksort, permutation sort, (which completes in factorial time), sleep sort, or one of the other strange sorting algorithms that Kevlin has been writing about.

Do we care? Well, we care about O(N log N) behavior versus O(N!). But assuming that we have some way to resolve that issue, if we can specify a program’s behavior precisely enough so that we are highly confident that Copilot will write code that’s correct and tolerably performant, do we care about its aesthetics? Do we care whether it’s readable? 40 years ago, we might have cared about the assembly language code generated by a compiler. But today, we don’t, except for a few increasingly rare corner cases that usually involve device drivers or embedded systems. If I write something in C and compile it with gcc, realistically I’m never going to look at the compiler’s output. I don’t need to understand it.

To get to this point, we may need a meta-language for describing what we want the program to do that’s almost as detailed as a modern high-level language. That could be what the future holds: an understanding of “prompt engineering” that lets us tell an AI system precisely what we want a program to do, rather than how to do it. Testing would become much more important, as would understanding precisely the business problem that needs to be solved. “Slinging code” in whatever the language would become less common.

But what if we don’t get to the point where we trust automatically generated code as much as we now trust the output of a compiler? Readability will be at a premium as long as humans need to read code. If we have to read the output from one of Copilot’s descendants to judge whether or not it will work, or if we have to debug that output because it mostly works, but fails in some cases, then we will need it to generate code that’s readable. Not that humans currently do a good job of writing readable code; but we all know how painful it is to debug code that isn’t readable, and we all have some concept of what “readability” means.

Second: Copilot was trained on the body of code in GitHub. At this point, it is all (or almost all) written by humans. Some of it is good, high quality, readable code; a lot of it isn’t. What if Copilot became so successful that Copilot-generated code came to constitute a significant percentage of the code on GitHub? The model will certainly need to be re-trained from time to time. So now, we have a feedback loop: Copilot trained on code that has been (at least partially) generated by Copilot. Does code quality improve? Or does it degrade? And again, do we care, and why?

This question can be argued either way. People working on automated tagging for AI seem to be taking the position that iterative tagging leads to better results: i.e., after a tagging pass, use a human-in-the-loop to check some of the tags, correct them where wrong, and then use this additional input in another training pass. Repeat as needed. That’s not all that different from current (non-automated) programming: write, compile, run, debug, as often as needed to get something that works. The feedback loop enables you to write good code.

A human-in-the-loop approach to training an AI code generator is one possible way of getting “good code” (for whatever “good” means)—though it’s only a partial solution. Issues like indentation style, meaningful variable names, and the like are only a start. Evaluating whether a body of code is structured into coherent modules, has well-designed APIs, and could easily be understood by maintainers is a more difficult problem. Humans can evaluate code with these qualities in mind, but it takes time. A human-in-the-loop might help to train AI systems to design good APIs, but at some point, the “human” part of the loop will start to dominate the rest.

If you look at this problem from the standpoint of evolution, you see something different. If you breed plants or animals (a highly selected form of evolution) for one desired quality, you will almost certainly see all the other qualities degrade: you’ll get large dogs with hips that don’t work, or dogs with flat faces that can’t breathe properly.

What direction will automatically generated code take? We don’t know. Our guess is that, without ways to measure “code quality” rigorously, code quality will probably degrade. Ever since Peter Drucker, management consultants have liked to say, “If you can’t measure it, you can’t improve it.” And we suspect that applies to code generation, too: aspects of the code that can be measured will improve, aspects that can’t won’t.  Or, as the accounting historian H. Thomas Johnson said, “Perhaps what you measure is what you get. More likely, what you measure is all you’ll get. What you don’t (or can’t) measure is lost.”

We can write tools to measure some superficial aspects of code quality, like obeying stylistic conventions. We already have tools that can “fix” fairly superficial quality problems like indentation. But again, that superficial approach doesn’t touch the more difficult parts of the problem. If we had an algorithm that could score readability, and restrict Copilot’s training set to code that scores in the 90th percentile, we would certainly see output that looks better than most human code. Even with such an algorithm, though, it’s still unclear whether that algorithm could determine whether variables and functions had appropriate names, let alone whether a large project was well-structured.

And a third time: do we care? If we have a rigorous way to express what we want a program to do, we may never need to look at the underlying C or C++. At some point, one of Copilot’s descendants may not need to generate code in a “high level language” at all: perhaps it will generate machine code for your target machine directly. And perhaps that target machine will be Web Assembly, the JVM, or something else that’s very highly portable.

Do we care whether tools like Copilot write good code? We will, until we don’t. Readability will be important as long as humans have a part to play in the debugging loop. The important question probably isn’t “do we care”; it’s “when will we stop caring?” When we can trust the output of a code model, we’ll see a rapid phase change.  We’ll care less about the code, and more about describing the task (and appropriate tests for that task) correctly.

Categories: Technology

Radar trends to watch: October 2021

O'Reilly Radar - Tue, 2021/10/05 - 04:42

The unwilling star of this month’s trends is clearly Facebook. Between reports that they knew about the damage that their applications were causing long before that damage hit the news, their continued denials and apologies, and their attempts to block researchers from studying the consequences of their products, they’ve been in the news almost every day. Perhaps the most interesting item, though, is the introduction of Ray-Ban Stories, a pair of sunglasses with a camera built in. We’ve been talking about virtual and augmented reality for years; when will it enter the mainstream? Will Stories be enough to make it cool, or will it have the same fate as Google Glass?

  • Researchers at Samsung and Harvard are proposing to copy the neuronal interconnections of parts of the brain, and “paste” them onto a semiconductor array, creating an integrated circuit that directly models the brain’s interconnections.
  • Using AI to understand “lost” languages, written languages that we don’t know how to translate, isn’t just about NLP; it sometimes requires deciphering damaged texts (such as eroded stone tablets) where humans can no longer recognize the written characters.
  • Inaccurate face recognition is preventing people from getting necessary government aid, and there are few (if any) systems for remediation.
  • DeepMind has been studying techniques for making the output of language generation models like GPT-3 less toxic, and found that there are no good solutions.
  • Apple is working on iPhone features to detect depression, cognitive decline, and autism.  A phone that plays psychiatrist is almost certainly a bad idea. How intrusive do you want your phone to be?
  • Reservoir computing is a neural network technique that has been used to solve computationally difficult problems in dynamic systems. It is very resource intensive, but recent work has led to speedups by factors of up to a million. It may be the next step forward in AI.
  • Can AI be used to forecast (and even plan) the future of scientific research?  Not yet, but one group is working on analyzing the past 10 years of research for NASA’s Decadal Survey.
  • There have been many articles about using AI to read X-Rays. This one covers an experiment that uses training data from multiple sources to reduce one of the problems plaguing this technology: different X-ray machines, different calibration, different staff. It also places a human radiologist in the loop; the AI is only used to detect areas of possible abnormality.
  • It isn’t a surprise, but undergraduates who are studying data science receive little training in ethics, including issues like privacy and systemic bias.
  • Stanford’s Institute for Human-Centered Artificial Intelligence is creating a group to study the impact of “foundational” models on issues like bias and fairness. Foundational models are very large models like GPT-3 on which other models are built. Problems with foundational models are easily inherited by models on top of them.
  • Can machine learning learn to unlearn?  That may be required by laws like GDPR and the European “right to be forgotten.” Can a model be trained to eliminate the influence of some of its training data, without being retrained from the beginning?
  • Deep Mind’s technology for up-scaling image resolution looks really good. It produces excellent high-resolution images from pixelated originals, works on natural scenes as well as portraits, and they appear to have used a good number of Black people as models.
  • Amazon has announced details about Astro, its home robot. But questions remain: is this a toy? A data collection ploy? I don’t know that we need something that follows you around playing podcasts. It integrates with Amazon products like Ring and Alexa Guard.
  • Is self-healing cybersecurity possible by killing affected containers and starting new ones? That’s an interesting partial solution to cloud security, though it only comes into play after an attack has succeeded.
  • With three months to go in 2021, we’ve already seen a record number of zero-day exploits. Is this a crisis? Or is it good news, because bad actors are discovered more effectively? One thing is clear: discovering new 0days is becoming more difficult, making them more valuable.
  • The FBI had the decryption key for the Kaseya ransomware attack, but delayed sharing it with victims for three weeks. The FBI claims it withheld the key because it was planning a counterattack against the REvil group, which disappeared before the attack was executed.
  • Privacy for the masses? iOS 15 has a beta “private relay” feature that appears to be something like TOR. And Nahoft, an application for use in Iran, encodes private messages as sequences of innocuous words that can get by automated censors.
  • HIPv2 is an alternative to TLS that is designed for implementing zero-trust security for embedded devices.
  • Kubescape is an open source tool to test whether Kubernetes has been deployed securely.  The tests are based on the NSA’s guidance for hardening Kubernetes.
  • Rootkits are hardly new, but now they’re being used to attack containers. Their goal is usually to mine bitcoin, and to hide that mining from monitoring tools. Tracee is a new tool, built with eBPF, that may help to detect successful attacks.
User Interfaces
  • Kids these days don’t understand files and directories. Seriously, Google’s dominance in everyday life means that users expect to find things through search. But search is often inadequate. It will be important for software designers to think through these issues.
  • Holograms you can touch? Aerohaptics uses jets of air to create the feeling of “touch” when interacting with a hologram. Another step towards the Star Trek Holodeck.
  • Fraunhofer has developed a system for detecting whether a driver is tired or asleep.  Software like this will be particularly important for semi-automated driving systems, which require support from a human driver.
  • What is property based testing, anyway? Fuzzing? Unit tests at scale? Greater testing discipline will be required if we expect AI systems to generate code. Can property-based testing get there?
  • Google Cloud has introduced Supply Chain Twin, a “digital twin” service for supply chain management.
  • Open VSCodeServer is an open source project that allows VSCode to run on a remote machine and be accessed through a web browser.
  • Ent is an open source object-relational mapping tool for Go that uses graph concepts to model the database schema. Facebook has contributed Ent to the CNCF.
  • Glean is an open source search engine for source code.  Looks like it’s a LOT better than grepping through your src directories.
  • Urbit looks like it could be an interesting operating system for decentralized peer-to-peer applications.
  • Facebook on regulation: Please require competitors to do the things we do. And don’t look at targeted advertising.
  • NFTs, generative art, and open source: do we need a new kind of license to protect artistic works that are generated by software?
  • China issues a Request for Comments on their proposed social media regulations. Google Translate’s translation isn’t bad, and CNBC has a good summary. Users must be notified about the use of algorithmic recommendations; users must be able to disable recommendations; and algorithmic recommendations must not be designed to create addictive behavior.
  • South Korea has passed a law that will force Apple and Google to open their devices to other app stores.
  • Research by Google shows that, worldwide, Government-ordered Internet shutdowns have become much more common in the past year. These shutdowns are usually to suppress dissent. India has shut down Internet access more than any other country.
  • George Church’s startup Colossal has received venture funding for developing “cold tolerant Asian elephants” (as Church puts it), a project more commonly known as de-extincting Wooly Mammoths.
  • Researchers at NYU have created artificial cell-like objects that can ingest, process, and expel objects. These aren’t artificial cells, but represent a step towards creating them.
  • A breakthrough in building phase change memory that consumes little power may make phase change memory practical, allowing tighter integration between processors and storage.
  • Mainframes aren’t dead. The Telum is IBM’s new processor for its System Z mainframes. 7nm technology, 5 GHz base clock speed, 8 cores, 16 threads per core; it’s a very impressive chip.
  • One of Google’s X companies has deployed a 20 Gbps Internet trunk using lasers. The connection crosses the Congo River, a path that is difficult because of the river’s depth and speed.  This technology could be used in other places where running fiber is difficult.
  • Facebook and Ray-Ban have released smart glasses (branded as Ray-Ban Stories), which are eyeglasses with a built-in camera and speakers. This is not AR (there is no projector), but a step on the way. Xiaomi also appears to be working on smart glasses, and Linux is getting into the act with a work-oriented headset called Simula One.
Quantum Computing
  • IBM introduces Qiskit Nature, a platform for using quantum computers to experiment with quantum effects in the natural sciences. Because these experiments are about the behavior of quantum systems, they (probably) don’t require the error correction that’s necessary to make quantum computing viable.
  • Want to build your own quantum computer?  IBM has open sourced Qiskit Metal, a design automation tool for superconducting quantum computers.
  • Curiously-named Valleytronics uses electrons’ “valley pseudospin” to store quantum data. It might enable small, room-temperature quantum computers.
Social Media
  • Facebook has put “Instagram for Kids” on hold. While they dispute the evidence that Instagram harms teenagers, public outcry and legislative pressure, along with Facebook’s own evidence that Instagram is particularly damaging to teenage girls, has caused them to delay the release.
  • Twitter is allowing bot accounts to identify themselves as bots.  Labeling isn’t mandatory.
  • Facebook adds junk content to its HTML to prevent researchers from using automated tools to collect posts.
Categories: Technology

Ethical Social Media: Oxymoron or Attainable Goal?

O'Reilly Radar - Tue, 2021/09/21 - 04:55

Humans have wrestled with ethics for millennia. Each generation spawns a fresh batch of ethical dilemmas and then wonders how to deal with them.

For this generation, social media has generated a vast set of new ethical challenges, which is unsurprising when you consider the degree of its influence. Social media has been linked to health risks in individuals and political violence in societies. Despite growing awareness of its potential for causing harm, social media has received what amounts to a free pass on unethical behavior.

Minerva Tantoco, who served as New York City’s first chief technology officer, suggests that “technology exceptionalism” is the root cause. Unlike the rapacious robber barons of the Gilded Age, today’s tech moguls were viewed initially as eccentric geeks who enjoyed inventing cool new products. Social media was perceived as a harmless timewaster, rather than as a carefully designed tool for relentless commerce and psychological manipulation.

“The idea of treating social media differently came about because the individuals who started it weren’t from traditional media companies,” Tantoco says. “Over time, however, the distinction between social media and traditional media has blurred, and perhaps the time has come for social media to be subject to the same rules and codes that apply to broadcasters, news outlets and advertisers. Which means that social media would be held accountable for content that causes harm or violates existing laws.”

Ethical standards that were developed for print, radio, television, and telecommunications during the 20th century could be applied to social media. “We would start with existing norms and codes for media generally and test whether these existing frameworks and laws would apply to social media,” Tantoco says.

Taking existing norms and applying them, with modifications, to novel situations is a time-honored practice.  “When e-commerce web sites first started, it was unclear if state sales taxes would apply to purchases,” Tantoco says. “It turned out that online sales were not exempt from sales taxes and that rules that had been developed for mail-order sites decades earlier could be fairly applied to e-commerce.”

Learning from AI

Christine Chambers Goodman, a professor at Pepperdine University’s Caruso School of Law, has written extensively on the topic of artificial intelligence and its impact on society. She sees potential in applying AI guidelines to social media, and she cited the European Commission’s High-Level Expert Group on Artificial Intelligence’s seven key ethical requirements for trustworthy AI:1

  • Human agency and oversight
  • Technical robustness and safety
  • Privacy and data governance
  • Transparency
  • Diversity, non-discrimination and fairness
  • Societal and environmental well-being
  • Accountability

The commission’s proposed requirements for AI would be a good starting point for conversations about ethical social media. Ideally, basic ethical components would be designed into social media platforms before they are built. Software engineers should be trained to recognize their own biases and learn specific techniques for writing code that is inherently fair and non-discriminatory.

“It starts with that first requirement of human agency and oversight,” Goodman says. If ethical standards are “paramount” during the design phase of a platform, “then I see some room for optimism.”

Colleges and universities also can play important roles in training a new generation of ethical software engineers by requiring students to take classes in ethics, she says.

Economic Fairness and Equity

Social media companies are private business entities, even when they are publicly held. But the social media phenomenon has become so thoroughly woven into the fabric of our daily lives that many people now regard it as a public utility such as gas, electricity, and water. In a remarkably brief span of time, social media has become an institution, and generally speaking, we expect our institutions to behave fairly and equitably.  Clearly, however, the social media giants see no reason to share the economic benefits of their success with anyone except their shareholders.

“The large social media companies make hundreds of billions of dollars from advertising revenue and share almost none of it with their users,” says Greg Fell, CEO of Display Social, a platform that shares up to 50 percent of its advertising revenue with content creators who post on its site.

Historically, content creators have been paid for their work. Imagine if CBS had told Lucille Ball and Desi Arnaz that they wouldn’t be paid for creating episodes of “I Love Lucy,” but that instead they would be allowed to sell “I Love Lucy” coffee mugs and T-shirts. If the original TV networks had operated like social media corporations, there never would have been a Golden Age of Television.

Most societies reward creators, artists, entertainers, athletes, and influencers for their contributions. Why does social media get to play by a different set of rules?

“Economic fairness should be part of the social media ethos. People should be rewarded financially for posting on social media, instead of being exploited by business models that are unfair and unethical,” Fell says.

From Fell’s perspective, the exploitive and unfair economic practices of the large social media companies represent short-term thinking. “Ultimately, they will burn out their audiences and implode. Meantime, they are causing harm. That’s the problem with unethical behavior—in the long run, it’s self-destructive and self-defeating.”

Transforming Attention into Revenue

Virtually all of the large social media platforms rely on some form of advertising to generate revenue. Their business models are exceedingly simple: they attract the attention of users and then sell the attention to advertisers. In crude terms, they’re selling your eyeballs to the highest bidder.

As a result, their only real interest is attracting attention. The more attention they attract, the more money they make. Their algorithms are brilliantly designed to catch and hold your attention by serving up content that will trigger dopamine rushes in your brain. Dopamine isn’t a cause of addiction, but it plays a role in addictive behaviors. So, is it fair to say that social media is intentionally addictive? Maybe.

“For many social media companies, addictive behavior (as in people consuming more than they intend to and regretting it afterwards) is the point,” says Esther Dyson, an author, philanthropist, and investor focused on health, open government, digital technology, biotechnology, and aerospace. “Cigarettes, drugs, and gambling are all premised on the model that too much is never enough.  And from the point of view of many investors, sustainable profits are not enough.  They want exits. Indeed, the goal of these investors is creating ever-growing legions of addicts. That starts with generating and keeping attention.”

Monetizing Misinformation

As it happens, misinformation is highly attractive to many users. It’s a digital version of potato chips—you can’t eat just one. The algorithms figure this out quickly, and feed users a steady supply of misinformation to hold their attention.

In an advertising-driven business model, attention equals dollars. With the help of machine learning and sophisticated algorithms, social media has effectively monetized misinformation, creating a vicious, addictive cycle that seems increasingly difficult to stop.

Social media has staked its fortunes to a business model that is deeply unethical and seems destined to fail in the long term. But could the industry survive, at least in the short term, with a business model that hews more closely to ethical norms?

Greg Fell doesn’t believe that ethical guidelines will slow the industry’s growth or reduce its profitability. “People expect fairness. They want to be treated as human beings, not as products,” he says. “You can build fairness into a platform if you make it part of your goal from the start. But it shouldn’t be an afterthought.”

Slowing the Spread of False Narratives

In addition to implementing structural design elements that would make it easier for people to recognize misinformation and false narratives, social media companies could partner with the public sector to promote media literacy.  Renée DiResta is the technical research manager at Stanford Internet Observatory, a cross-disciplinary program of research, teaching, and policy engagement for the study of abuse in current information technologies. She investigates the spread of narratives across social and traditional media networks.

“I think we need better ways for teaching people to distinguish between rhetoric and reality,” DiResta says, noting that tropes such as “dead people are voting” are commonly repeated and reused from one election cycle to the next, even when they are provably false. These kinds of tropes are the “building blocks” of misinformation campaigns designed to undermine confidence in elections, she says.

“If we can help people recognize the elements of false narratives, maybe they will build up an immunity to them,” DiResta says.

It’s Not Too Late to Stop the Train

The phenomenon we recognize today as “social media” only began taking shape in the late 1990s and early 2000s. It is barely two decades old, which makes it far too young to have developed iron-clad traditions. It is an immature field by any measure, and it’s not too late to alter its course.

Moreover, social media’s business model is not terribly complicated, and it’s easy to envision a variety of other models that might be equally or even more profitable, and represent far less of a threat to society. Newer platforms such as Substack, Patreon, OnlyFans, Buy Me a Coffee, and Display Social are opening the door to a creator-centric social media industry that isn’t fueled primarily by advertising dollars.

“Social media has its positives, and it isn’t all doom and gloom, but it certainly isn’t perfect and resolving some of these issues could ensure these applications are the fun and happy escape they need to be,” says Ella Chambers, UX designer and creator of the UK-based Ethical Social Media Project. “The majority of social media is okay.”

That said, some of the problems created by social media are far from trivial. “My research led me to conclude that the rise of social media has brought the downfall of many users’ mental health,” Chambers says. A recent series of investigative articles in the Wall Street Journal casts a harsh spotlight on the mental health risks of social media, especially to teen-age girls. Facebook has issued a rebuttal3 to the WSJ, but it’s not likely to persuade critics into believing that social media is some kind of wonderful playground for kids and teens.

Creating a practical framework of ethical guidelines would be a positive step forward. Ideally, the framework would evolve into a set of common practices and processes for ensuring fairness, diversity, inclusion, equity, safety, accuracy, accountability, and transparency in social media.

Chinese officials recently unveiled a comprehensive draft of proposed rules governing the use of recommendation algorithms in China.2 One of the proposed regulations would require algorithm providers to “respect social ethics and ethics, abide by business ethics and professional ethics, and follow the principles of fairness, openness, transparency, scientific rationality, and honesty.”

Another proposed regulation would provide users with “convenient options to turn off algorithm recommendation services” and enable users to select, modify or delete user tags. And another proposed rule would restrict service providers from using algorithms “to falsely register accounts … manipulate user accounts, or falsely like, comment, forward, or navigate through web pages to implement traffic fraud or traffic hijacking …”

Eloy Sasot, group chief data and analytics officer at Richemont, the Switzerland-based luxury goods holding company, agrees that regulations are necessary. “And the regulations also should be managed with extreme care. When you add rules to an already complex system, there can be unintended consequences, both at the AI-solution level and the macro-economic level,” he says.

For instance, small companies, which have limited resources, may be less able to counter negative business impacts created by regulations targeting large companies. “So, in effect, regulations, if not carefully supervised, might result in a landscape that is less competitive and more monopolistic, with unintended consequences for end consumers whom the regulations were designed to protect,” he explains.

Technology Problem, or a People Problem?

Casey Fiesler is an assistant professor in the Department of Information Science at University of Colorado Boulder. She researches and teaches in the areas of technology ethics, internet law and policy, and online communities.

“I do not think that social media—or more broadly, online communities—are inherently harmful,” says Fiesler. “In fact, online communities have also done incredible good, especially in terms of social support and activism.”

But the harm caused by unfettered use of social media “often impacts marginalized and vulnerable users disproportionately,” she notes. Ethical social media platforms would consider those effects and work proactively to reduce or eliminate hate speech, trolling, defamation, cyber bullying, swatting, doxing, impersonation, and the intentional spread of false narratives.

“I consider myself an optimist who thinks that it is very important to think like a pessimist. And we should critique technology like social media because it has so much potential for good, and if we want to see those benefits, then we need to push for it to be better,” Fiesler says.

Ultimately, the future of ethical social media may depend more on the behaviors of people than on advances in technology.

“It’s not the medium that’s unethical—it’s the business people controlling it,” Dyson observes. “Talking about social media ethics is like talking about telephone ethics. It really depends on the people involved, not the platform.”

From Dyson’s point of view, the quest for ethical social media represents a fundamental challenge for society. “Are parents teaching their children to behave ethically? Are parents serving as role models for ethical behavior? We talk a lot about training AI, but are we training our children to think long-term, or just to seek short-term relief? Addiction is not about pleasure; it’s about relief from discomfort, from anxiety, from uncertainty, from a sense that we have no future,” she adds. “I personally think we’re just being blind to the consequences of short-term thinking. Silicon Valley is addicted to profits and exponential growth. But we need to start thinking about what we’re creating for the long term.”

  2. ​​​​

Categories: Technology

2021 Data/AI Salary Survey

O'Reilly Radar - Wed, 2021/09/15 - 04:32

In June 2021, we asked the recipients of our Data & AI Newsletter to respond to a survey about compensation. The results gave us insight into what our subscribers are paid, where they’re located, what industries they work for, what their concerns are, and what sorts of career development opportunities they’re pursuing.

While it’s sadly premature to say that the survey took place at the end of the COVID-19 pandemic (though we can all hope), it took place at a time when restrictions were loosening: we were starting to go out in public, have parties, and in some cases even attend in-person conferences. The results then provide a place to start thinking about what effect the pandemic had on employment. There was a lot of uncertainty about stability, particularly at smaller companies: Would the company’s business model continue to be effective? Would your job still be there in a year? At the same time, employees were reluctant to look for new jobs, especially if they would require relocating—at least according to the rumor mill. Were those concerns reflected in new patterns for employment?

Executive Summary
  • The average salary for data and AI professionals who responded to the survey was $146,000.
  • The average change in compensation over the last three years was $9,252. This corresponds to an annual increase of 2.25%. However, 8% of the correspondents reported decreased compensation, and 18% reported no change.
  • We don’t see evidence of a “great resignation.” 22% of respondents said they intended to change jobs, roughly what we would have expected. Respondents seemed concerned about job security, probably because of the pandemic’s effect on the economy.
  • Average compensation was highest in California ($176,000), followed by Eastern Seaboard states like New York and Massachusetts.
  • Compensation for women was significantly lower than for men (84%). Salaries were lower regardless of education or job title. Women were more likely than men to have advanced degrees, particularly PhDs.
  • Many respondents acquired certifications. Cloud certifications, specifically in AWS and Microsoft Azure, were most strongly associated with salary increases.
  • Most respondents participated in training of some form. Learning new skills and improving old ones were the most common reasons for training, though hireability and job security were also factors. Company-provided training opportunities were most strongly associated with pay increases.

The survey was publicized through O’Reilly’s Data & AI Newsletter and was limited to respondents in the United States and the United Kingdom. There were 3,136 valid responses, 2,778 from the US and 284 from the UK. This report focuses on the respondents from the US, with only limited attention paid to those from the UK. A small number of respondents (74) identified as residents of the US or UK, but their IP addresses indicated that they were located elsewhere. We didn’t use the data from these respondents; in practice, discarding this data had no effect on the results.

Of the 2,778 US respondents, 2,225 (81%) identified as men, and 383 (14%) identified as women (as identified by their preferred pronouns). 113 (4%) identified as “other,” and 14 (0.5%) used “they.”

The results are biased by the survey’s recipients (subscribers to O’Reilly’s Data & AI Newsletter). Our audience is particularly strong in the software (20% of respondents), computer hardware (4%), and computer security (2%) industries—over 25% of the total. Our audience is also strong in the states where these industries are concentrated: 42% of the US respondents lived in California (20%), New York (9%), Massachusetts (6%), and Texas (7%), though these states only make up 27% of the US population.

Compensation Basics

The average annual salary for employees who worked in data or AI was $146,000. Most salaries were between $100,000 and $150,000 yearly (34%); the next most common salary tier was from $150,000 to $200,000 (26%). Compensation depended strongly on location, with average salaries highest in California ($176,000).

The average salary change over the past three years was $9,252, which is 2.25% per year (assuming a final salary equal to the average). A small number of respondents (8%) reported salary decreases, and 18% reported no change. Economic uncertainty caused by the pandemic may be responsible for the declines in compensation. 19% reported increases of $5,000 to $10,000 over that period; 14% reported increases of over $25,000. A study by the IEEE suggests that the average salary for technical employees increased 3.6% per year, higher than our respondents indicated.

39% of respondents reported promotions in the past three years, and 37% reported changing employers during that period. 22% reported that they were considering changing jobs because their salaries hadn’t increased during the past year. Is this a sign of what some have called a “great resignation”? Common wisdom has it that technical employees change jobs every three to four years. LinkedIn and Indeed both recommend staying for at least three years, though they observe that younger employees change jobs more often. LinkedIn elsewhere states that the annual turnover rate for technology employees is 13.2%—which suggests that employees stay at their jobs for roughly seven and a half years. If that’s correct, the 37% that changed jobs over three years seems about right, and the 22% who said they “intend to leave their job due to a lack of compensation increase” doesn’t seem overly high. Keep in mind that intent to change and actual change are not the same—and that there are many reasons to change jobs aside from salary, including flexibility around working hours and working from home.

64% of the respondents took part in training or obtained certifications in the past year, and 31% reported spending over 100 hours in training programs, ranging from formal graduate degrees to reading blog posts. As we’ll see later, cloud certifications (specifically in AWS and Microsoft Azure) were the most popular and appeared to have the largest effect on salaries.

The reasons respondents gave for participating in training were surprisingly consistent. The vast majority reported that they wanted to learn new skills (91%) or improve existing skills (84%). Data and AI professionals are clearly interested in learning—and that learning is self-motivated, not imposed by management. Relatively few (22%) said that training was required by their job, and even fewer participated in training because they were concerned about losing their job (9%).

However, there were other motives at work. 56% of our respondents said that they wanted to increase their “job security,” which is at odds with the low number who were concerned about losing their job. And 73% reported that they engaged in training or obtained certifications to increase their “hireability,” which may suggest more concern about job stability than our respondents would admit. The pandemic was a threat to many businesses, and employees were justifiably concerned that their job could vanish after a bad pandemic-influenced quarter. A desire for increased hireability may also indicate that we’ll see more people looking to change jobs in the near future.

Finally, 61% of the respondents said that they participated in training or earned certifications because they wanted a salary increase or a promotion (“increase in job title/responsibilities”). It isn’t surprising that employees see training as a route to promotion—especially as companies that want to hire in fields like data science, machine learning, and AI contend with a shortage of qualified employees. Given the difficulty of hiring expertise from outside, we expect an increasing number of companies to grow their own ML and AI talent internally using training programs.

Salaries by Gender

To nobody’s surprise, our survey showed that data science and AI professionals are mostly male. The number of respondents tells the story by itself: only 14% identified as women, which is lower than we’d have guessed, though it’s roughly consistent with our conference attendance (back when we had live conferences) and roughly equivalent to other technical fields. A small number (5%) reported their preferred pronoun as “they” or Other, but this sample was too small to draw any significant comparisons about compensation.

Women’s salaries were sharply lower than men’s salaries, averaging $126,000 annually, or 84% of the average salary for men ($150,000). That differential held regardless of education, as Figure 1 shows: the average salary for a woman with a doctorate or master’s degree was 82% of the salary for a man with an equivalent degree. The difference wasn’t quite as high for people with bachelor’s degrees or who were still students, but it was still significant: women with bachelor’s degrees or who were students earned 86% or 87% of the average salary for men. The difference in salaries was greatest between people who were self-taught: in that case, women’s salaries were 72% of men’s. An associate’s degree was the only degree for which women’s salaries were higher than men’s.

Figure 1. Women’s and men’s salaries by degree

Despite the salary differential, a higher percentage of women had advanced degrees than men: 16% of women had a doctorate, as opposed to 13% of men. And 47% of women had a master’s degree, as opposed to 46% of men. (If those percentages seem high, keep in mind that many professionals in data science and AI are escapees from academia.)

Women’s salaries also lagged men’s salaries when we compared women and men with similar job titles (see Figure 2). At the executive level, the average salary for women was $163,000 versus $205,000 for men (a 20% difference). At the director level, the difference was much smaller—$180,000 for women versus $184,000 for men—and women’s salaries were actually higher than those at the executive level. It’s easy to hypothesize about this difference, but we’re at a loss to explain it. For managers, women’s salaries were $143,000 versus $154,000 for men (a 7% difference).

Career advancement is also an issue: 18% of the women who participated in the survey were executives or directors, compared with 23% of the men.

Figure 2. Women’s and men’s salaries by job title

Before moving on from our consideration of the effect of gender on salary, let’s take a brief look at how salaries changed over the past three years. As Figure 3 shows, the percentage of men and women respondents who saw no change was virtually identical (18%). But more women than men saw their salaries decrease (10% versus 7%). Correspondingly, more men saw their salaries increase. Women were also more likely to have a smaller increase: 24% of women had an increase of under $5,000 versus 17% of men. At the high end of the salary spectrum, the difference between men and women was smaller, though still not zero: 19% of men saw their salaries increase by over $20,000, but only 18% of women did. So the most significant differences were in the midrange. One anomaly sticks out: a slightly higher percentage of women than men received salary increases in the $15,000 to $20,000 range (8% versus 6%).

Figure 3. Change in salary for women and men over three years Salaries by Programming Language

When we looked at the most popular programming languages for data and AI practitioners, we didn’t see any surprises: Python was dominant (61%), followed by SQL (54%), JavaScript (32%), HTML (29%), Bash (29%), Java (24%), and R (20%). C++, C#, and C were further back in the list (12%, 12%, and 11%, respectively).

Discussing the connection between programming languages and salary is tricky because respondents were allowed to check multiple languages, and most did. But when we looked at the languages associated with the highest salaries, we got a significantly different list. The most widely used and popular languages, like Python ($150,000), SQL ($144,000), Java ($155,000), and JavaScript ($146,000), were solidly in the middle of the salary range. The outliers were Rust, which had the highest average salary (over $180,000), Go ($179,000), and Scala ($178,000). Other less common languages associated with high salaries were Erlang, Julia, Swift, and F#. Web languages (HTML, PHP, and CSS) were at the bottom (all around $135,000). See Figure 4 for the full list.

Figure 4. Salary vs. programming language

How do we explain this? It’s difficult to say that data and AI developers who use Rust command a higher salary, since most respondents checked several languages. But we believe that this data shows something significant. The supply of talent for newer languages like Rust and Go is relatively small. While there may not be a huge demand for data scientists who use these languages (yet), there’s clearly some demand—and with experienced Go and Rust programmers in short supply, they command a higher salary. Perhaps it is even simpler: regardless of the language someone will use at work, employers interpret knowledge of Rust and Go as a sign of competence and willingness to learn, which increases candidates’ value. A similar argument can be made for Scala, which is the native language for the widely used Spark platform. Languages like Python and SQL are table stakes: an applicant who can’t use them could easily be penalized, but competence doesn’t confer any special distinction.

One surprise is that 10% of the respondents said that they didn’t use any programming languages. We’re not sure what that means. It’s possible they worked entirely in Excel, which should be considered a programming language but often isn’t. It’s also possible that they were managers or executives who no longer did any programming.

Salaries by Tool and Platform

We also asked respondents what tools they used for statistics and machine learning and what platforms they used for data analytics and data management. We observed some of the same patterns that we saw with programming languages. And the same caution applies: respondents were allowed to select multiple answers to our questions about the tools and platforms that they use. (However, multiple answers weren’t as frequent as for programming languages.) In addition, if you’re familiar with tools and platforms for machine learning and statistics, you know that the boundary between them is fuzzy. Is Spark a tool or a platform? We considered it a platform, though two Spark libraries are in the list of tools. What about Kafka? A platform, clearly, but a platform for building data pipelines that’s qualitatively different from a platform like Ray, Spark, or Hadoop.

Just as with programming languages, we found that the most widely used tools and platforms were associated with midrange salaries; older tools, even if they’re still widely used, were associated with lower salaries; and some of the tools and platforms with the fewest users corresponded to the highest salaries. (See Figure 5 for the full list.)

The most common responses to the question about tools for machine learning or statistics were “I don’t use any tools” (40%) or Excel (31%). Ignoring the question of how one does machine learning or statistics without tools, we’ll only note that those who didn’t use tools had an average salary of $143,000, and Excel users had an average salary of $138,000—both below average. Stata ($120,000) was also at the bottom of the list; it’s an older package with relatively few users and is clearly falling out of favor.

The popular machine learning packages PyTorch (19% of users, $166,000 average salary), TensorFlow (20%, $164,000), and scikit-learn (27%, $157,000) occupied the middle ground. Those salaries were above the average for all respondents, which was pulled down by the large numbers who didn’t use tools or only used Excel. The highest salaries were associated with H2O (3%, $183,000), KNIME (2%, $180,000), Spark NLP (5%, $179,000), and Spark MLlib (8%, $175,000). It’s hard to trust conclusions based on 2% or 3% of the respondents, but it appears that salaries are higher for people who work with tools that have a lot of “buzz” but aren’t yet widely used. Employers pay a premium for specialized expertise.

Figure 5. Average salary by tools for statistics or machine learning

We see almost exactly the same thing when we look at data frameworks (Figure 6). Again, the most common response was from people who didn’t use a framework; that group also received the lowest salaries (30% of users, $133,000 average salary).

In 2021, Hadoop often seems like legacy software, but 15% of the respondents were working on the Hadoop platform, with an average salary of $166,000. That was above the average salary for all users and at the low end of the midrange for salaries sorted by platform.

The highest salaries were associated with Clicktale (now ContentSquare), a cloud-based analytics system for researching customer experience: only 0.2% of respondents use it, but they have an average salary of $225,000. Other frameworks associated with high salaries were Tecton (the commercial version of Michelangelo, at $218,000), Ray ($191,000), and Amundsen ($189,000). These frameworks had relatively few users—the most widely used in this group was Amundsen with 0.8% of respondents (and again, we caution against reading too much into results based on so few respondents). All of these platforms are relatively new, frequently discussed in the tech press and social media, and appear to be growing healthily. Kafka, Spark, Google BigQuery, and Dask were in the middle, with a lot of users (15%, 19%, 8%, and 5%) and above-average salaries ($179,000, $172,000, $170,000, and $170,000). Again, the most popular platforms occupied the middle of the range; experience with less frequently used and growing platforms commanded a premium.

Figure 6. Average salary by data framework or platform Salaries by Industry

The greatest number of respondents worked in the software industry (20% of the total), followed by consulting (11%) and healthcare, banking, and education (each at 8%). Relatively few respondents listed themselves as consultants (also 2%), though consultancy tends to be cyclic, depending on current thinking on outsourcing, tax law, and other factors. The average income for consultants was $150,000, which is only slightly higher than the average for all respondents ($146,000). That may indicate that we’re currently in some kind of an equilibrium between consultants and in-house talent.

While data analysis has become essential to every kind of business and AI is finding many applications outside of computing, salaries were highest in the computer industry itself, as Figure 7 makes clear. For our purposes, the “computer industry” was divided into four segments: computer hardware, cloud services and hosting, security, and software. Average salaries in these industries ranged from $171,000 (for computer hardware) to $164,000 (for software). Salaries for the advertising industry (including social media) were surprisingly low, only $150,000.

Figure 7. Average salary by industry

Education and nonprofit organizations (including trade associations) were at the bottom end of the scale, with compensation just above $100,000 ($106,000 and $103,000, respectively). Salaries for technical workers in government were slightly higher ($124,000).

Salaries by State

When looking at data and AI practitioners geographically, there weren’t any big surprises. The states with the most respondents were California, New York, Texas, and Massachusetts. California accounted for 19% of the total, with over double the number of respondents from New York (8%). To understand how these four states dominate, remember that they make up 42% of our respondents but only 27% of the United States’ population.

Salaries in California were the highest, averaging $176,000. The Eastern Seaboard did well, with an average salary of $157,000 in Massachusetts (second highest). New York, Delaware, New Jersey, Maryland, and Washington, DC, all reported average salaries in the neighborhood of $150,000 (as did North Dakota, with five respondents). The average salary reported for Texas was $148,000, which is slightly above the national average but nevertheless seems on the low side for a state with a significant technology industry.

Salaries in the Pacific Northwest were not as high as we expected. Washington just barely made it into the top 10 in terms of the number of respondents, and average salaries in Washington and Oregon were $138,000 and $133,000, respectively. (See Figure 8 for the full list.)

The highest-paying jobs, with salaries over $300,000, were concentrated in California (5% of the state’s respondents) and Massachusetts (4%). There were a few interesting outliers: North Dakota and Nevada both had very few respondents, but each had one respondent making over $300,000. In Nevada, we’re guessing that’s someone who works for the casino industry—after all, the origins of probability and statistics are tied to gambling. Most states had no respondents with compensation over $300,000.

Figure 8. Average salary by state

The lowest salaries were, for the most part, from states with the fewest respondents. We’re reluctant to say more than that. These states typically had under 10 respondents, which means that averaging salaries is extremely noisy. For example, Alaska only had two respondents and an average salary of $75,000; Mississippi and Louisiana each only had five respondents, and Rhode Island only had three. In any of these states, one or two additional respondents at the executive level would have a huge effect on the states average. Furthermore, the averages in those states are so low that all (or almost all) respondents must be students, interns, or in entry-level positions. So we don’t think we can make any statement stronger than “the high paying jobs are where you’d expect them to be.”

Job Change by Salary

Despite the differences between states, we found that the desire to change jobs based on lack of compensation didn’t depend significantly on geography. There were outliers at both extremes, but they were all in states where the number of respondents was small and one or two people looking to change jobs would make a significant difference. It’s not terribly interesting to say that 24% of respondents from California intend to change jobs (only 2% above the national average); after all, you’d expect California to dominate. There may be a small signal from states like New York, with 232 respondents, of whom 27% intend to change jobs, or from a state like Virginia, with 137 respondents, of whom only 19% were thinking of changing. But again, these numbers aren’t much different from the total percentage of possible job changers.

If intent to change jobs due to compensation isn’t dependent on location, then what does it depend on? Salary. It’s not at all surprising that respondents with the lowest salaries (under $50,000/year) are highly motivated to change jobs (29%); this group is composed largely of students, interns, and others who are starting their careers. The group that showed the second highest desire to change jobs, however, had the highest salaries: over $400,000/year (27%). It’s an interesting pairing: those with the highest and lowest salaries were most intent on getting a salary increase.

26% of those with annual salaries between $50,000 and $100,000 indicated that they intend to change jobs because of compensation. For the remainder of the respondents (those with salaries between $100,000 and $400,000), the percentage who intend to change jobs was 22% or lower.

Salaries by Certification

Over a third of the respondents (37%) replied that they hadn’t obtained any certifications in the past year. The next biggest group replied “other” (14%), meaning that they had obtained certifications in the past year but not one of the certifications we listed. We allowed them to write in their own responses, and they shared 352 unique answers, ranging from vendor-specific certifications (e.g., DataRobot) to university degrees (e.g., University of Texas) to well-established certifications in any number of fields (e.g., Certified Information Systems Security Professional a.k.a. CISSP). While there were certainly cases where respondents used different words to describe the same thing, the amount of unique write-in responses reflects the great number of certifications available.

Cloud certifications were by far the most popular. The top certification was for AWS (3.9% obtained AWS Certified Solutions Architect-Associate), followed by Microsoft Azure (3.8% had AZ-900: Microsoft Azure Fundamentals), then two more AWS certifications and CompTIA’s Security+ certification (1% each). Keep in mind that 1% only represents 27 respondents, and all the other certifications had even fewer respondents.

As Figure 9 shows, the highest salaries were associated with AWS certifications, the Microsoft AZ-104 (Azure Administrator Associate) certification, and the CISSP security certification. The average salary for people listing these certifications was higher than the average salary for US respondents as a whole. And the average salary for respondents who wrote in a certification was slightly above the average for those who didn’t earn any certifications ($149,000 versus $143,000).

Figure 9. Average salary by certification earned

Certifications were also associated with salary increases (Figure 10). Again AWS and Microsoft Azure dominate, with Microsoft’s AZ-104 leading the way, followed by three AWS certifications. And on the whole, respondents with certifications appear to have received larger salary increases than those who didn’t earn any technical certifications.

Figure 10. Average salary change by certification

Google Cloud is an obvious omission from this story. While Google is the third-most-important cloud provider, only 26 respondents (roughly 1%) claimed any Google certification, all under the “Other” category.

Among our respondents, security certifications were relatively uncommon and didn’t appear to be associated with significantly higher salaries or salary increases. Cisco’s CCNP was associated with higher salary increases; respondents who earned the CompTIA Security+ or CISSP certifications received smaller increases. Does this reflect that management undervalues security training? If this hypothesis is correct, undervaluing security is clearly a significant mistake, given the ongoing importance of security and the possibility of new attacks against AI and other data-driven systems.

Cloud certifications clearly had the greatest effect on salary increases. With very few exceptions, any certification was better than no certification: respondents who wrote in a certification under “Other” averaged a $9,600 salary increase over the last few years, as opposed to $8,900 for respondents who didn’t obtain a certification and $9,300 for all respondents regardless of certification.


Participating in training resulted in salary increases—but only for those who spent more than 100 hours in a training program. As Figure 11 shows, those respondents had an average salary increase of $11,000. This was also the largest group of respondents (19%). Respondents who only reported undertaking 1–19 hours of training (8%) saw lower salary increases, with an average of $7,100. It’s interesting that those who participated in 1–19 hours of training saw smaller increases than those who didn’t participate in training at all. It doesn’t make sense to speculate about this difference, but the data does make one thing clear: if you engage in training, be serious about it.

Figure 11. Average salary change vs. hours of training

We also asked what types of training respondents engaged in: whether it was company provided (for which there were three alternatives), a certification program, a conference, or some other kind of training (detailed in Figure 12). Respondents who took advantage of company-provided opportunities had the highest average salaries ($156,000, $150,000, and $149,000). Those who obtained certifications were next ($148,000). The results are similar if we look at salary increases over the past three years: Those who participated in various forms of company-offered training received increases between $11,000 and $10,000. Salary increases for respondents who obtained a certification were in the same range ($11,000).

Figure 12. Average salary change vs. type of training The Last Word

Data and AI professionals—a rubric under which we include data scientists, data engineers, and specialists in AI and ML—are well-paid, reporting an average salary just under $150,000. However, there were sharp state-by-state differences: salaries were significantly higher in California, though the Northeast (with some exceptions) did well.

There were also significant differences between salaries for men and women. Men’s salaries were higher regardless of job title, regardless of training and regardless of academic degrees—even though women were more likely to have an advanced academic degree (PhD or master’s degree) than were men.

We don’t see evidence of a “great resignation.” Job turnover through the pandemic was roughly what we’d expect (perhaps slightly below normal). Respondents did appear to be concerned about job security, though they didn’t want to admit it explicitly. But with the exception of the least- and most-highly compensated respondents, the intent to change jobs because of salary was surprisingly consistent and nothing to be alarmed at.

Training was important, in part because it was associated with hireability and job security but more because respondents were genuinely interested in learning new skills and improving current ones. Cloud training, particularly in AWS and Microsoft Azure, was the most strongly associated with higher salary increases.

But perhaps we should leave the last word to our respondents. The final question in our survey asked what areas of technology would have the biggest effect on salary and promotions in the coming year. It wasn’t a surprise that most of the respondents said machine learning (63%)—these days, ML is the hottest topic in the data world. It was more of a surprise that “programming languages” was noted by just 34% of respondents. (Only “Other” received fewer responses—see Figure 13 for full details.) Our respondents clearly aren’t impressed by programming languages, even though the data suggests that employers are willing to pay a premium for Rust, Go, and Scala.

There’s another signal worth paying attention to if we look beyond the extremes. Data tools, cloud and containers, and automation were nearly tied (46, 47, and 44%). The cloud and containers category includes tools like Docker and Kubernetes, cloud providers like AWS and Microsoft Azure, and disciplines like MLOps. The tools category includes tools for building and maintaining data pipelines, like Kafka. “Automation” can mean a lot of things but in this context probably means automated training and deployment.

Figure 13. What technologies will have the biggest effect on compensation in the coming year?

We’ve argued for some time that operations—successfully deploying and managing applications in production—is the biggest issue facing ML practitioners in the coming years. If you want to stay on top of what’s happening in data, and if you want to maximize your job security, hireability, and salary, don’t just learn how to build AI models; learn how to deploy applications that live in the cloud.

In the classic movie The Graduate, one character famously says, “There’s a great future in plastics. Think about it.” In 2021, and without being anywhere near as repulsive, we’d say, “There’s a great future in the cloud. Think about it.”

Categories: Technology

Topic for metting on 9/9

PLUG - Wed, 2021/09/08 - 11:01

This month we'll have "Thank You" from der.hans.
Attend the meeting on by visiting: on the 8th of July at 7pm MST

Thank you to all the developers, translators, documentors, artists and maintainers who make Free Software.
Thank you to all the project managers, mailing list admins, SREs and toolsmiths who enable the making.
Thanks to the people who use Free Software and do amazing things with it.
This talk will use stories and anecdotes to explore the thanks we can have for the Free Software community.

About der.hans:
der.hans works remotely in the US.
In the FLOSS community he is active with conferences and local groups.
He's chairman of PLUG, chair for SeaGL Finance committee, founder of SeaGL Career Expo, RaiseMe career counselor, BoF organizer for the Southern California Linux Expo (SCaLE) and founder of the Free Software Stammtisch.
He speaks regularly at community-led FLOSS conferences such as SeaGL, Penguicon, SCaLE, LCA, FOSSASIA, Tübix, CLT, Kieler Linux Tage, LFNW, OLF, SELF and GeekBeacon Fest.
Hans is available to speak remotely for local groups.
Currently a Customer Data Engineer at Object Rocket. Public statements are not representative of $dayjob.
Fediverse/Mastodon -

Radar trends to watch: September 2021

O'Reilly Radar - Wed, 2021/09/01 - 05:18

Let’s start with a moment of silence for O’Reilly Author Toby Segaran, who passed away on August 11, 2021.  Toby was one of the people who got the Data Science movement started. His book, Programming Collective Intelligence, taught many how to start using their data. Throughout his career, he mentored many, and was particularly influential in mentoring young women interested in science and technology. Toby is greatly missed by everyone in the Data Science community.

AI and Data
  • Margaret Mitchell joins HuggingFace to create tools to help build fair algorithms.
  • Embedded Machine Learning for Hard Hat Detection is an interesting real-world application of AI on the edge. Wearing hard hats is essential to work site safety; this project developed a model for detecting whether workers were wearing hard hats that could easily be deployed without network connectivity. It also goes into rebalancing datasets–in this case, public datasets with too few hard hats, but this technique is applicable to other instances of bias.
  • Liquid Neural Networks are neural networks that can adapt in real time to incoming data.  They are particularly useful for time series data–which, as the author points out, is almost all data.
  • US Government agencies plan to increase their use of facial recognition, in many cases for law enforcement, despite well-known accuracy problems for minorities and women.  Local bans on face recognition cannot prohibit federal use.
  • Data and Politics is an ongoing research project that studies how political organizations are collecting and using data.
  • FaunaDB is a distributed document database designed for serverless architectures. It comes with REST API support, GraphQL, built-in attribute based access control, and a lot of other great features.
  • Facial expression recognition is being added to a future version of Android as part of their accessibility package. Developers can create applications where expressions (smiles, etc.) can be used as commands.
  • Open AI’s Codex (the technology behind Copilot) takes the next step: translating English into runnable code, rather than making suggestions.  Codex is now in private beta.
  • Who is responsible for publicly available datasets, and how do you ensure that they’re used appropriately? Margaret Mitchell suggests organizations for data stewardship. These would curate, maintain, and enforce legal standards for the use of public data.
  • An AI system can predict race accurately based purely on medical images, with no other information about the subject. This creates huge concerns about how bias could enter AI-driven diagnostics; but it also raises the possibility that we might discover better treatments for minorities who are underserved (or badly served) by the medical industry.
  • DeepMind has made progress in building a generalizable AI: AI agents that can solve problems that they have never seen before, and transfer learning from one problem to another. They have developed XLand, an environment that creates games and problems, to enable this research.
  • GPT-J is one of a number of open source alternatives to Github Copilot. It is smaller and faster, and appears to be at least as good.
  • Master faces” are images generated by adversarial neural networks that are capable of passing facial recognition tests without corresponding to any specific face.
  • Researchers have created a 3D map of a small part of a mouse’s brain. This is the most detailed map of how neurons connect that has ever been made.  The map contains 75,000 neurons and 523 million synapses; the map and the data set have been released to the public.
  • Robotic chameleons (or chameleon robotics): Researchers have developed a robotic “skin” that can change color in real time to match its surroundings.
  • Elon Musk announces that Tesla will release a humanoid robot next year; it will be capable of performing tasks like going to the store. Is this real, or just a distraction from investigations into the safety of Tesla’s autonomous driving software?
  • According to the UN, lethal autonomous robots (robots capable of detecting and attacking a target without human intervention) have been deployed and used by the Libyan government.
  • A new generation of warehouse robots is capable of simple manipulation (picking up and boxing objects); robots capable of more fine-grained manipulation are coming.
  • The end of passwords draws even closer. GitHub is now requiring 2-factor authentication, preferably using WebAuthn or Yubikey. Amazon will be giving free USB authentication keys to some customers (root account owners spending over $100/month).
  • There are many vulnerabilities in charging systems for electric vehicles. This is sad, but not surprising: the automotive industry hasn’t learned from the problems of IoT security.
  • Advances in cryptography may make it more efficient to do computation without decrypting encrypted data.
  • Amazon is offering store credit to people who give them their palm prints, for use in biometric checkout at their brick-and-mortar stores.
  • Amazon, Google, Microsoft, and others join the US Joint Cyber Defense Collaborative to fight the spread of ransomware.
  • Apple will be scanning iPhones for images of child abuse.  Child abuse aside, this decision raises questions about cryptographic backdoors for government agencies and Apple’s long-standing marketing of privacy. If they can monitor for one thing, they can monitor for others, and can presumably be legally forced to do so.
  • Automating incident response: self-healing auto-remediation could be the next step in automating all the things, building more reliable systems, and eliminating the 3AM pager.
  • Hearables are very small computers, worn in the ear, for which the only interface is a microphone, a speaker, and a network. They may have applications in education, music, real time translation (like Babelfish), and of course, next-generation hearing aids.
  • Timekeeping is an old and well-recognized problem in distributed computing. Facebook’s Time cards are an open-source (code and hardware) solution for accurate time keeping. The cards are PCIe bus cards (PC standard) and incorporate a satellite receiver and an atomic clock.
  • A new cellular board for IoT from Ray Ozzie’s company Blues Wireless is a very interesting product. It is easy to program (JSON in and out), interfaces easily to Raspberry Pi and other systems, and $49 includes 10 years of cellular connectivity.
Social Media
  • Researchers are using Google Trends data to identify COVID symptoms as a proxy for hospital data, since hospital data isn’t publicly available. The key is distinguishing between flu-like flu symptoms and flu-like COVID symptoms.
  • A topic-based approach to targeted advertising may be Google’s new alternative to tracking cookies, replacing the idea of assigning users to cohorts with similar behavior.
  • Facebook shares a little information about what’s most widely viewed on their network. It only covers the top 20 URLs and, given Facebook’s attempts to shut down researchers studying their behavior, qualifies as transparency theater rather than substance.
  • As an experiment, Twitter is allowing certain users to mark misleading content.  They have not (and presumably won’t) specified how to become one of these users. The information they gain won’t be used directly for blocking misinformation, but to study how it propagates.
  • Banning as a service: It’s now possible to hire a company to get someone banned from Instagram and other social media. Not surprisingly, these organizations may be connected to organizations that specialize in restoring banned accounts.
  • Facebook may be researching ways to use some combination of AI and homomorphic encryption to place targeted ads on encrypted messages without decrypting them.
  • Inspired by the security community and bug bounties, Twitter offers a bounty to people who discover algorithmic bias.
  • Facebook’s virtual reality workrooms could transform remote meetings by putting all the participants in a single VR conference room–assuming that all the participants are willing to wear goggles.
  • A survey shows that 70% of employees would prefer to work at home, even if it costs them in benefits, including vacation time and salaries.  Eliminating the commute adds up.
  • Sky computing–the next step towards true utility computing–is essentially what we now call “multi cloud,” but with an inter-cloud layer that provides interoperability between cloud providers.
  • Thoughts on the future of the data stack as data starts to take advantage of cloud: how do organizations get beyond “lift and shift” and other early approaches to use clouds effectively?
  • Matrix is another protocol for decentralized messaging (similar in concept to Scuttlebutt) that appears to be getting some enterprise traction.
  • Using federated learning to build decentralized intelligent wireless communications systems that predict traffic patterns to help traffic management may be part of 6G.
  • How do you scale intelligence at the edge of the network? APIs, industrially hardened Linux systems, and Kubernetes adapted to small systems (e.g., K3S).
  • The EU is considering a law that would require cryptocurrency transactions to be traceable.  An EU-wide authority to prevent money laundering would have authority over cryptocurrencies.
  • Autocorrect errors in Excel are a problem in genomics: autocorrect modifies gene names, which are frequently “corrected” to dates.
  • Google may have created the first time crystals in a quantum computer. Time crystals are a theoretical construct that has a structure that constantly changes but repeats over time, without requiring additional energy.
Categories: Technology

Rebranding Data

O'Reilly Radar - Tue, 2021/08/24 - 07:16

There’s a flavor of puzzle in which you try to determine the next number or shape in a sequence. We’re living that now, but for naming the data field.  “Predictive analytics.” “Big Data.” “Data science.” “Machine learning.” “AI.” What’s next?

It’s hard to say.  These terms all claim to be different, but they are very much the same.  They are supersets, subsets, and Venn diagrams with a lot of overlap.  Case in point: machine learning used to be considered part of data science; now it’s seen as a distinct (and superior) field.  What gives?

Since the promise of “analyzing data for fun and profit” has proven so successful, it’s odd that the field would feel the need to rebrand every couple of years.  You’d think that it would build on a single name, to drive home its transformative power.  Unless, maybe, it’s not all it claims to be?

Resetting the hype cycle

In a typical bubble—whether in the stock market, or the Dot-Com era—you see a large upswing and then a crash.  The upswing is businesses over-investing time, money, and effort in The New Thing. The crash happens when those same groups realize that The New Thing won’t ultimately help them, and they suddenly stop throwing money at it.

In finance terms, we’d say that the upswing represents a large and growing delta between the fundamental price (what The New Thing is actually worth) and the observed price (what people are spending on it, which is based on what they think it’s worth).  The ensuing crash represents a correction: a sharp, sudden reduction in that delta, as the observed price falls to something closer to the fundamental price.

Given that, we should have seen the initial Big Data hype bubble expand and then burst once businesses determined that this would only help a very small number of companies.  Big Data never crashed, though. Instead, we saw “data science” take off.  What’s weird is that companies were investing in roughly the same thing as before. It’s as though the rebranding was a way of laundering the data name, so that businesses and consumers could more easily forget that the previous version didn’t hold up to its claims.  This is the old “hair of the dog” hangover cure.

And it actually works.  Until it doesn’t.

Data success is not dead; it’s just unevenly distributed

This isn’t to say that data analysis has no value. The ability to explore massive amounts of data can be tremendously useful.  And lucrative.  Just not for everyone.

Too often, companies look to the FAANGs—Facebook, Amazon, Apple, Netflix, Google: the businesses that have clearly made a mint in data analysis—and figure they can copycat their way to the same success.  Reality’s harsh lesson is that it’s not so simple.  “Collect and analyze data” is just one ingredient of a successful data operation. You also need to connect those activities to your business model, and hand-waving over that part is only a temporary solution. At some point, you need to actually determine whether the fancy new thing can improve your business.  If not, it’s time to let it go.

We saw the same thing in the 1990s Dot-Com bust. The companies that genuinely needed developers and other in-house tech staff continued to need them; those that didn’t, well, they were able to save money by shedding jobs that weren’t providing business value.

Maybe data’s constant re-branding is the lesson learned from the 1990s? That if we keep re-branding, we can ride the misplaced optimism, and we’ll never hit that low point?

Why it matters

If the data world is able to sustain itself by simply changing its name every few years, what’s the big deal? Companies are making money, consumers are happy with claims of AI-driven products, and some people have managed to find very lucrative jobs.  Why worry about this now?

This quote from Cem Karsan, founder of Aegea Capital Management, sums it up well.  He’s talking about flows of money on Wall St. but the analogy applies just as well to the AI hype bubble:

If you’re on an airplane, and you’re 30,000 feet off the ground, that 30,000 feet off the ground is the valuation gap.  That’s where valuations are really high. But if those engines are firing, are you worried up in that plane about the valuations?  No!  You’re worried about the speed and trajectory of where you’re going, based on the engines.  […]  But, when all of the sudden, those engines go off, how far off the ground you are is all that matters.

—Cem Karsan, from Corey Hoffstein’s Flirting with Models podcast, S4E1 (2021/05/03), starting 37:30

Right now most of AI’s 30,000-foot altitude is hype. When the hype fades—when changing the name fails to keep the field aloft—that hype dissipates.  At that point you’ll have to sell based on what AI can really do, instead of a rosy, blurry picture of what might be possible.

This is when you might remind me of the old saying: “Make hay while the sun shines.”  I would agree, to a point.  So long as you’re able to cash out on the AI hype, even if that means renaming the field a few more times, go ahead.  But that’s a short-term plan.  Long-term survival in this game means knowing when that sun will set and planning accordingly.  How many more name-changes do we get?  How long before regulation and consumer privacy frustrations start to chip away at the façade?  How much longer will companies be able to paper over their AI-based systems’ mishaps?

Where to next?

If you’re building AI that’s all hype, then these questions may trouble you.  Post-bubble AI (or whatever we call it then) will be judged on meaningful characteristics and harsh realities: “Does this actually work?” and “Do the practitioners of this field create products and analyses that are genuinely useful?”  (For the investors in the crowd, this is akin to judging a company’s stock price on market fundamentals.)  Surviving long-term in this field will require that you find and build on realistic, worthwhile applications of AI.

Does our field need some time to sort that out?  I figure we have at least one more name change before we lose altitude.  We’ll need to use that time wisely, to become smarter about how we use and build around data.  We have to be ready to produce real value after the hype fades.

That’s easier said than done, but it’s far from impossible. We can start by shifting our focus to the basics, like reviewing our data and seeing whether it’s any good.  Accepting the uncomfortable truth that BI’s sums and groupings will help more businesses than AI’s neural networks. Evaluating the true total cost of AI, such that each six-figure data scientist salary is a proper business investment and not a very expensive lottery ticket.

We’ll also have to get better about folding AI into products (and understanding the risks in doing so), which will require building interdisciplinary, cognitively-diverse teams where everyone gets a chance to weigh in. Overall, then, we’ll have to educate ourselves and our customers on what data analysis can really achieve, and then plan our efforts accordingly.

We can do it. We’ll pretty much have to do it.  The question is: will we start before the plane loses altitude?

Categories: Technology

A Way Forward with Communal Computing

O'Reilly Radar - Tue, 2021/08/17 - 05:45

Communal devices in our homes and offices aren’t quite right. In previous articles, we discussed the history of communal computing and the origin of the single user model. Then we reviewed the problems that arise due to identity, privacy, security, experience, and ownership issues. They aren’t solvable by just making a quick fix. They require a huge reorientation in how these devices are framed and designed.

This article focuses on modeling the communal device you want to build and understanding how it fits into the larger context. This includes how it interoperates with services that are connected, and how it communicates across boundaries with other devices in peoples’ homes. Ignore these warnings at your own peril. They can always unplug the device and recycle it.

Let’s first talk about how we gain an understanding of the environment inside homes and offices.

Mapping the communal space

We have seen a long list of problems that keep communal computing from aligning with people’s needs. This misalignment arises from the assumption that there is a single relationship between a person and a device, rather than between all the people involved and their devices.

Dr. S.A. Applin has referred to this assumption as “design individualism”; it is a common misframing used by technology organizations. She uses this term most recently in the paper “Facebook’s Project Aria indicates problems for responsible innovation when broadly deploying AR and other pervasive technology in the Commons:”

“Unfortunately, this is not an uncommon assumption in technology companies, but is a flaw in conceptual modelling that can cause great problems when products based on this ‘design individualism’ are deployed into the Commons (Applin, 2016b). In short, Facebook acknowledges the plural of ‘people’, but sees them as individuals collectively, not as a collective that is enmeshed, intertwined and exists based on multiple, multiplex, social, technological, and socio-technological relationships as described through [PolySocial Reality].”

PolySocial Reality (PoSR) is a theory described in a series of papers by Applin and Fisher (2010-ongoing) on the following:

“[PoSR] models the outcomes when all entities in networks send both synchronous and asynchronous messages to maintain social relationships. These messages can be human-to-human, human-to-machine, and machine-to-machine. PoSR contains the entirety of all messages at all times between all entities, and we can use this idea to understand how various factors in the outcomes from the way that messages are sent and received, can impact our ability to communicate, collaborate, and most importantly, cooperate with each other.”

In the case of PoSR, we need to consider how agents make decisions about the messages between entities. The designers of these non-human entities will make decisions that impact all entities in a system.

The reality is that the “self” only exists as part of a larger network. It is the connections between us and the rest of the network that is meaningful. We pull all of the pseudo identities for those various connections together to create our “one” self.

The model that I’ve found most helpful to address this problem attempts to describe the complete environment of the communal space. It culminates in a map of the connections between nodes, or relationships between entities. This web of interactions includes all the individuals, the devices they use, and the services that intermediate them. The key is to understand how non-human entities intermediate the humans, and how those messages eventually make it to human actors.

The home is a network, like an ecosystem, of people, devices, and services all interacting to create an experience. It is connected with services, people, and devices outside the home as well as my mom, my mom’s picture frame, and Google’s services that enable it.

To see why this map is helpful, consider an ecosystem (or food web). When we only consider interactions between individual animals, like a wolf eating a sheep, we ignore how the changes in population of each animal impacts other actors in the web: too many wolves mean the sheep population dies off. In turn, this change has an impact on other elements of the ecosystem like how much the grass grows. Likewise, when we only consider a single person interacting with one device, we find that most interactions are simple: some input from the user is followed by a response from the device. We often don’t consider other people interacting with the device, nor do we consider how other personal devices exist within that space. We start to see these interactions when we consider other people in the communal space, the new communal device, and all other personal devices. In a communal map, these all interact.

These ecosystems already exist within a home or office. They are made up of items ranging from refrigerator magnets for displaying physical pictures to a connected TV, and they include personal smartphones. The ecosystem extends to the services that the devices connect to outside the home, and to the other people whom they intermediate. We get an incomplete picture if we don’t consider the entire graph. Adding a new device isn’t about filling a specific gap in the ecosystem. The ecosystem may have many problems or challenges, but the ecosystem isn’t actively seeking to solve them. The new device needs to adapt and find its own niche. This includes making the ecosystem more beneficial to the device, something that evolutionary biologists call ‘niche expansion’. Technologists would think about this as building a need for their services.

Thinking about how a device creates a space within an already complex ecosystem is key to understanding what kinds of experiences the team building the device should create. It will help us do things like building for everyone and evolving with the space. It will also help us to avoid the things we should not do, like assuming that every device has to do everything.

Do’s and don’ts of building communal devices

With so much to consider when building communal devices, where do you start? Here are a few do’s and don’ts:

Do user research in the users’ own environment

Studying and understanding expectations and social norms is the key discovery task for building communal devices. Expectations and norms dictate the rules of the environment into which your device needs to fit, including people’s pseudo-identities, their expectations around privacy, and how willing they are to deal with the friction of added security. Just doing a survey isn’t enough.  Find people who are willing to let you see how they use these devices in their homes, and ask lots of questions about how they feel about the devices.

“If you are going to deal with social, people, communal, community, and general sociability, I would suggest hiring applied anthropologists and/or other social scientists on product teams. These experts will save you time and money, by providing you with more context and understanding of what you are making and its impact on others. This translates into more accurate and useful results.”

– Dr. S.A. Applin

Observing where the devices are placed and how the location’s use changes over time will give you fascinating insights about the context in which the device is used. A living room may be a children’s play area in the morning, a home office in the middle of the day, and a guest bedroom at night. People in these contexts have different sets of norms and privacy expectations.

As part of the user research, you should be building an ecosystem graph of all people present and the devices that they use. What people not present are intermediated by technology? Are there stories where this intermediation went wrong? Are there frictions that are created between people that your device should address? Are there frictions that the device should get out of the way of?

Do build for everyone who might have access

Don’t focus on the identity of the person who buys and sets up the device. You need to consider the identity (or lack) of everyone who could have access. Consider whether they feel that information collected about them violates their desire to control the information (as in Contextual Integrity). This could mean you need to put up walls to prevent users from doing something sensitive without authorization. Using the Zero Trust framework’s “trust engine” concept, you should ask for the appropriate level of authentication before proceeding.

Most of today’s user experience design is focused on making frictionless or seamless experiences. This goal doesn’t make sense when considering a risk tradeoff. In some cases, adding friction increases the chance that a user won’t move forward with a risky action, which could be a good thing. If the potential risk of showing a private picture is high, you should make it harder to show that picture.

Realize you may not always understand the right context. Having good and safe default states for those cases is important. It is your job to adjust or simplify the model so that people can understand and interpret why the device does something.

Do consider pseudo-identities for individuals and groups

Avoid singular identities and focus on group pseudo-identities. If users don’t consider these devices their own, why not have the setup experience mirror those expectations? Build device setup, usage, and management around everyone who should have a say in the device’s operation.

Pseudo-identities become very interesting when you start to learn what certain behaviors mean for subgroups. Is this music being played for an individual with particular tastes? Or does the choice reflect a compromise between multiple people in the room? Should it avoid explicit language since there are children present?

Group norms and relationships need to be made more understandable. It will take technology advances to make these norms more visible. These advances include using machine learning to help the device understand what kind of content it is showing, and who (or what) is depicted in that content. Text, image, and video analysis needs to take place to answer the question: what type of content is this and who is currently in that context? It also means using contextual prediction to consider who may be in the room, their relationship to the people in the content, and how they may feel about the content. When in doubt, restrict what you do.

Do evolve with the space

As time goes on, life events will change the environment in which the device operates. Try to detect those changes and adapt accordingly. New pseudo-identities could be present, or the identity representing the group may shift. It is like moving into a new home. You may set things up in one way only to find months later there is a better configuration. Be aware of these changes and adapt.

If behavior that would be considered anomalous becomes the norm, something may have changed about the use of that space. Changes in use are usually led by a change in life–for example, someone moving in or out could trigger a change in how a device is used. Unplugging the device and moving it to a different part of the room or to a different shelf symbolizes a new need for contextual understanding. If you detect a change in the environment but don’t know why the change was made, ask.

Do use behavioral data carefully, or don’t use it at all

All communal devices end up collecting data. For example, Spotify uses what you are listening to when building recommendation systems. When dealing with behavioral information, the group’s identity is important, not the individual’s. If you don’t know who is in front of the device, you should consider whether you can use that behavioral data at all. Rather than using an individual identity, you may want to default to the group pseudo-identity’s recommendations. What does the whole house usually like to listen to?

When the whole family is watching, how do we find a common ground based on all of our preferences, rather than the owner’s? Spotify has a Premium Family package where each person gets a recommended playlist based on everyone’s listening behavior called a Family Mix, whereas Netflix requires users to choose between individual profiles.

Spotify has family and couple accounts that allow multiple people to have an account under one bill. Each person gets their own login and recommendations. Spotify gives all sub-accounts on the subscription access to a shared playlist (like a Family Mix) that makes recommendations based on the group’s preferences.

Spotify, and services like it, should go a step further to reduce the weight of a song in their recommendations algorithm when it is being played on a shared device in a communal place–a kitchen, for example. It’s impossible to know everyone who is in a communal space. There’s a strong chance that a song played in a kitchen may not be preferred by anyone that lives there. To give that particular song a lot of weight will start to change recommendations on the group members’ personal devices.

If you can’t use behavioral data appropriately, don’t bring it into a user’s profile on your services. You should probably not collect it at all until you can handle the many people who could be using the device. Edge processing can allow a device to build context that respects the many people and their pseudo-identities that are at play in a communal environment. Sometimes it is just safer to not track.

Don’t assume that automation will work in all contexts

Prediction technology helps communal devices by finding behavior patterns. These patterns allow the device to calculate what content should be displayed and the potential trust. If a student always listens to music after school while doing homework, the device can assume that contextual integrity holds if the student is the only person there. These assumptions get problematic when part of the context is no longer understood, like when the student has other classmates over. That’s when violations of norms or of privacy expectations are likely to occur. If other people are around, different content is being requested, or if it is a different time of day, the device may not know enough to predict the correct information to display.

Amazon’s Alexa has started wading into these waters with their Hunches feature. If you say “good night” to Alexa, it can decide to turn off the lights. What happens if someone is quietly reading in the living room when the lights go out?  We’ve all accidentally turned the lights out on a friend or partner, but such mistakes quickly become more serious when they’re made by algorithm.

When the prediction algorithm’s confidence is low, it should disengage and try to learn the new behavior. Worst case, just ask the user what is appropriate and gauge the trust vs risk tradeoff accordingly. The more unexpected the context, the less likely it is that the system should presume anything. It should progressively restrict features until it is at its core: for home assistants, that may just mean displaying the current time.

Don’t include all service functionality on the device

All product teams consider what they should add next to make a device “fully functional” and reflect all of the service possibilities. For a communal device, you can’t just think about what you could put there; you also have to consider what you will never put there. An example could be allowing access to Gmail messages from a Google Home Hub. If it doesn’t make sense for most people to have access to some feature, it shouldn’t be there in the first place. It just creates clutter and makes the device harder to use. It is entirely appropriate to allow people to change personal preferences and deal with highly personal information on their own, private devices. There is a time and place for the appropriate content.

Amazon has considered whether Echo users should be allowed to complete a purchase, or limit them to just adding items to a shopping list. They have had to add four digit codes and voice profiles. The resulting interface is complex enough to warrant a top level help article on why people can’t make the purchases.

If you have already built too much, think about how to sunset certain features so that the value and differentiator of your device is clearer. Full access to personal data doesn’t work in the communal experience. It is a chance for some unknown privacy violation to occur.

Don’t assume your devices will be the only ones

Never assume that your company’s devices will be the only ones in the space. Even for large companies like Amazon, there is no future in which the refrigerator, oven, and TV will all be Amazon devices (even if they are trying really hard). The communal space is built up over a long time, and devices like refrigerators have lifetimes that can span decades.

Think about how your device might work alongside other devices, including personal devices. To do this, you need to integrate with network services (e.g. Google Calendar) or local device services (e.g. Amazon Ring video feed). This is the case for services within a communal space as well. People have different preferences for the services they use to communicate and entertain themselves. For example, Snapchat’s adoption by 13-24 year olds (~90% in the US market) accounts for 70% of its usage. This means that people over 24 years old are using very different services to interact with their family and peers.

Apple’s iOS has started to realize that apps need to ask for permission before collecting information from other devices on a local network. It verifies that an app is allowed to access other devices on the network. Local network access is not a foregone conclusion either: different routers and wifi access points are increasingly managed by network providers.

Communal device manufacturers must build for interoperability between devices whether they like it or not, taking into account industry standards for communicating state, messaging, and more. A device that isn’t networked with the other devices in the home is much more likely to be replaced when the single, non-networked use is no longer valid or current.

Don’t change the terms without an ‘out’ for owners

Bricking a device because someone doesn’t want to pay for a subscription or doesn’t like the new data use policy is bad. Not only will it create distrust in users but it violates the idea that they are purchasing something for their home.

When you need to change terms, allow owners to make a decision about whether they want new functionality or to stop getting updates. Not having an active subscription is no excuse for a device to fail, since devices should be able to work when a home’s WiFi is down or when AWS has a problem that stops a home’s light bulbs from working. Baseline functionality should always be available, even if leading edge features (for example, features using machine learning) require a subscription. “Smart” or not, there should be no such thing as a light bulb that can’t be turned on.

When a company can no longer support a device–either because they’re sunsetting it or, in the worst case, because they are going out of business–they should consider how to allow people to keep using their devices. In some cases, a motivated community can take on the support; this happened with the Jibo community when the device creator shut down.

Don’t require personal mobile apps to use the device

One bad limitation that I’ve seen is requiring an app to be installed on the purchaser’s phone, and requiring the purchaser to be logged in to use the device. Identity and security aren’t always necessary, and being too strict about identity tethers the device to a particular person’s phone.

The Philips Hue smart light bulbs are a way to turn any light fixture into a component in a smart lighting system. However, you need one of their branded apps to control the lightbulbs. If you integrate your lighting system with your Amazon or Google accounts, you still need to know what the bulb or “zone” of your house is called. As a host you end up having to take the action for someone else (say by yelling at your Echo for them) or put a piece of paper in the room with all of the instructions. We are back in the age of overly complicated instructions to turn on a TV and AV system.

In addition to making sure you can integrate with other touch and voice interfaces, you need to consider physical ways to allow anyone to interact. IoT power devices like the VeSync Smart Plug by Etekcity (I have a bunch around the house) have a physical button to allow manual switching, in addition to integrating with your smart home or using their branded apps. If you can’t operate the device manually if you are standing in front of it, is it really being built for everyone in the home?

How do you know you got this right?

Once you have implemented all of the recommendations, how do you know you are on the right track?

A simple way to figure out whether you are building a communal-friendly device is to look for people adding their profiles to the device. This means linking their accounts to other services like Spotify (if you allow that kind of linking). However, not everyone will want to or be able to add their accounts, especially people who are passing through (guests) or who cannot legally consent (children).

Using behavior to detect whether someone else is using the device can be difficult. While people don’t change their taste in music or other interests quickly, they slowly drift through the space of possible options. We seek things that are similar to what we like, but just different enough to be novel. In fact, we see that most of our music tastes are set in our teenage years. Therefore, if a communal device is asked to play songs in a different language or genre whereas a personal device does not, it’s more likely that someone new is listening than that the owner has suddenly learned a new language. Compare what users are doing on your device to their behavior on other platforms (for example, compare a Google Home Hub in the kitchen to a personal iPhone) to determine whether new users are accessing the platform.

Behavioral patterns can also be used to predict demographic information. For example, you may be able to predict that someone is a parent based on their usage patterns. If this confidence is high, and you only see their interests showing up in the behavioral data, that means that other people who are around the device are not using it.

Don’t forget that you can ask the users themselves about who is likely to use the device. This is information that you can collect during initial setup. This can help ensure you are not making incorrect assumptions about the placement and use of the device.

Finally, consider talking with customers about how they use the device, the issues that come up, and how it fits into their lives. Qualitative user research doesn’t end after the initial design phase. You need to be aware of how the device has changed the environment it fits into. Without social scientists you can’t know this.

Is everything a communal experience?

Up until this point we have been talking about devices that are part of the infrastructure of a home, like a smart screen or light switch. Once we realize that technology serves as an intermediary between people, everything is communal.

Inside of a home, roommates generally have to share expenses like utilities with each other. Companies like Klarna and Braid make finances communal. How you pay together is an important aspect to harmony within a home.

You are also part of communities in your neighborhoods. Amazon Sidewalk extends your devices into the neighborhood you live in. This mesh technology starts to map and extend further with each communal space. Where does your home’s communal space end? If you misplaced your keys a block away, a Tile could help you find them. It could also identify people in your neighborhood without considering your neighbors’ privacy expectations.

Communities aren’t just based on proximity. We can extend the household to connect with other households far away. Amazon’s Drop In has started their own calling network between households. Loop, a new startup, is focused on building a device for connecting families in their own social network.

Google/Alphabet’s Sidewalk Labs has taken on projects that aim to make the connected world part of the cityscape. An early project called LinkNYC (owned through a shell corporation) was digital signage that included free calling and USB hubs. This changed how homeless people used the built environment. When walking down the street you could see people’s smartphones dangling from a LinkNYC while they were panhandling nearby. Later, a district-wide project called Sidewalk Toronto was rejected by voters. Every object within the urban environment becomes something that not only collects data but that could be interactive.

The town square and public park has been built to be welcoming to people and set expectations of what they do there, unlike online social media. New Public is taking cues from this type of physical shared space for reimagining the online public square.

Taking cues from the real world, groups like New Public are asking what would happen if we built social media the same way we build public spaces. What if social media followed the norms that we have in social spaces like the public parks or squares?

A key aspect to communal computing is the natural limitations of physical and temporal use. Only so many people can fit inside a kitchen or a meeting room. Only so many people can use a device at once, even if it is a subway ticket machine that services millions of people per month. Only so many can fit onto a sidewalk. We need to consider the way that space and time play a part in these experiences.

Adapt or be unplugged

Rethinking how people use devices together inside our homes, offices, and other spaces is key to the future of ubiquitous computing. We have a long way to go in understanding how context changes the expectations and norms of the people in those spaces. Without updating how we design and build these devices, the device you build will just be one more addition to the landfill.

To understand how devices are used in these spaces, we need to expand our thinking beyond the single owner and design for communal use from the start. If we don’t, the devices will never fit properly into our shared and intimate spaces. The mismatch between expectations and what is delivered will grow greater and lead to more dire problems.

This is a call for change in how we consider devices integrated into our lives. We shouldn’t assume that because humans are adaptive, we can adapt to the technologies built. We should design the technologies to fit into our lives, making sure the devices understand the context in which they’re working.

The future of computing that is contextual, is communal.


Thanks to Adam Thomas, Mark McCoy, Hugo Bowne-Anderson, and Danny Nou for their thoughts and edits on the early draft of this. Also, Dr. S.A. Applin for all of the great work on PoSR. Finally, from O’Reilly, Mike Loukides for being a great editor and Susan Thompson for the art.

Categories: Technology

Defending against ransomware is all about the basics

O'Reilly Radar - Tue, 2021/08/10 - 05:18

The concept behind ransomware is simple. An attacker plants malware on your system that encrypts all the files, making your system useless, then offers to sell you the key you need to decrypt the files. Payment is usually in bitcoin (BTC), and the decryption key is deleted if you don’t pay within a certain period. Payments have typically been relatively small—though that’s obviously no longer true, with Colonial Pipeline’s multimillion-dollar payout.

Recently, ransomware attacks have been coupled with extortion: the malware sends valuable data (for example, a database of credit card numbers) back to the attacker, who then threatens to publish the data online if you don’t comply with the request.  

A survey on O’Reilly’s website1 showed that 6% of the respondents worked for organizations that were victims of ransomware attacks. How do you avoid joining them? We’ll have more to say about that, but the tl;dr is simple: pay attention to security basics. Strong passwords, two-factor authentication, defense in depth, staying on top of software updates, good backups, and the ability to restore from backups go a long way. Not only do they protect you from becoming a ransomware victim, but those basics can also help protect you from data theft, cryptojacking, and most other forms of cybercrime. The sad truth is that few organizations practice good security hygiene—and those that don’t end up paying the price.

But what about ransomware? Why is it such an issue, and how is it evolving? Historically, ransomware has been a relatively easy way to make money: set up operations in a country that’s not likely to investigate cybercrime, attack targets that are more likely to pay a ransom, keep the ransom small so it’s easier to pay than to restore from backup, and accept payment via some medium that’s perceived as anonymous. Like most things on the internet, ransomware’s advantage is scale: The WannaCry attack infected around 230,000 systems. If even a small percentage paid the US$300 ransom, that’s a lot of money.

Early on, attacks focused on small and midsize businesses, which often have limited IT staff and no professional security specialists. But more recently, hospitals, governments, and other organizations with valuable data have been attacked. A modern hospital can’t operate without patient data, so restoring systems is literally a matter of life and death. Most recently, we’ve seen attacks against large enterprises, like Colonial Pipeline. And this move toward bigger targets, with more valuable data, has been accompanied by larger ransoms.

Attackers have also gotten more sophisticated and specialized. They’ve set up help desks and customer service agents (much like any other company) to help customers make their payments and decrypt their data. Some criminal organizations offer “ransomware as a service,” running attacks for customers. Others develop the software or create the attacks that find victims. Initiating an attack doesn’t require any technical knowledge; it can all be contracted out, and the customer gets a nice dashboard to show the attack’s progress.

While it’s easy to believe (and probably correct) that government actors have gotten into the game, it’s important to keep in mind that attribution of an attack is very difficult—not least because of the number of actors involved. An “as a service” operator really doesn’t care who its clients are, and its clients may be (willingly) unaware of exactly what they’re buying. Plausible deniability is also a service.

How an attack begins

Ransomware attacks frequently start with phishing. An email to a victim entices them to open an attachment or to visit a website that installs malware. So the first thing you can do to prevent ransomware attacks is to make sure everyone is aware of phishing, very skeptical of any attachments they receive, and appropriately cautious about the websites they visit. Unfortunately, teaching people how to avoid being victimized by a phish is a battle you’re not likely to win. Phishes are getting increasingly sophisticated and now do a good job of impersonating people the victim knows. Spear phishing requires extensive research, and ransomware criminals have typically tried to compromise systems in bulk. But recently, we’ve been seeing attacks against more valuable victims. Larger, more valuable targets, with correspondingly bigger payouts, will merit the investment in research.

It’s also possible for an attack to start when a victim visits a legitimate but compromised website. In some cases, an attack can start without any action by the victim. Some ransomware (for example, WannaCry) can spread directly from computer to computer. One recent attack started through a supply chain compromise: attackers planted the ransomware in an enterprise security product, which was then distributed unwittingly to the product’s customers. Almost any vulnerability can be exploited to plant a ransomware payload on a victim’s device. Keeping browsers up-to-date helps to defend against compromised websites.

Most ransomware attacks begin on Windows systems or on mobile phones. This isn’t to imply that macOS, Linux, and other operating systems are less vulnerable; it’s just that other attack vectors are more common. We can guess at some reasons for this. Mobile phones move between different domains, as the owner goes from a coffee shop to home to the office, and are exposed to different networks with different risk factors. Although they are often used in risky territory, they’re rarely subject to the same device management that’s applied to “company” systems—but they’re often accorded the same level of trust. Therefore, it’s relatively easy for a phone to be compromised outside the office and then bring the attacker onto the corporate network when its owner returns to work.

It’s possible that Windows systems are common attack vectors just because there are so many of them, particularly in business environments. Many also believe that Windows users install updates less often than macOS and Linux users. Microsoft does a good job of patching vulnerabilities before they can be exploited, but that doesn’t do any good if updates aren’t installed. For example, Microsoft discovered and patched the vulnerability that WannaCry exploited well before the attacks began, but many individuals, and many companies, never installed the updates.

Preparations and precautions

The best defense against ransomware is to be prepared, starting with basic security hygiene. Frankly, this is true of any attack: get the basics right and you’ll have much less to worry about. If you’ve defended yourself against ransomware, you’ve done a lot to defend yourself against data theft, cryptojacking, and many other forms of cybercrime.

Security hygiene is simple in concept but hard in practice. It starts with passwords: Users must have nontrivial passwords. And they should never give their password to someone else, whether or not “someone else” is on staff (or claims to be).

Two-factor authentication (2FA), which requires something in addition to a password (for example, biometric authentication or a text message sent to a cell phone) is a must. Don’t just recommend 2FA; require it. Too many organizations buy and install the software but never require their staff to use it. (76% of the respondents to our survey said that their company used 2FA; 14% said they weren’t sure.)

Users should be aware of phishing and be extremely skeptical of email attachments that they weren’t expecting and websites that they didn’t plan to visit. It’s always a good practice to type URLs in yourself, rather than clicking on links in email—even those in messages that appear to be from friends or associates. Users should be aware of phishing and be extremely skeptical of email attachments that they weren’t expecting and websites that they didn’t plan to visit. It’s always a good practice to type URLs in yourself, rather than clicking on links in email—even those in messages that appear to be from friends or associates.

Backups are absolutely essential. But what’s even more important is the ability to restore from a backup. The easiest solution to ransomware is to reformat the disks and restore from backup. Unfortunately, few companies have good backups or the ability to restore from a backup—one security expert guesses that it’s as low as 10%. Here are a few key points:

  • You actually have to do the backups. (Many companies don’t.) Don’t rely solely on cloud storage; backup on physical drives that are disconnected when a backup isn’t in progress. (70% of our survey respondents said that their company performed backups regularly.)
  • You have to test the backups to ensure that you can restore the system. If you have a backup but can’t restore, you’re only pretending that you have a backup. (Only 48% of the respondents said that their company regularly practiced restoring from backups; 36% said they didn’t know.)
  • The backup device needs to be offline, connected only when a backup is in progress. Otherwise, it’s possible for the ransomware attack to encrypt your backup.

Don’t overlook testing your backups. Your business continuity planning should include ransomware scenarios: how do you continue doing business while systems are being restored? Chaos engineering, an approach developed at Netflix, is a good idea. Make a practice of breaking your storage capability, then restoring it from backup. Do this monthly—if possible, schedule it with the product and project management teams. Testing the ability to restore your production systems isn’t just about proving that everything works; it’s about training staff to react calmly in a crisis and resolve the outage efficiently. When something goes bad, you don’t want to be on Stack Overflow asking how to do a restore. You want that knowledge imprinted in everyone’s brains.

Keep operating systems and browsers up-to-date. Too many have become victims because of a vulnerability that was patched in a software update that they didn’t install. (79% of our survey respondents said that their company had processes for updating critical software, including browsers.)

An important principle in any kind of security is “least privilege.” No person or system should be authorized to do anything it doesn’t need to do. For example, no one outside of HR should have access to the employee database. “Of course,” you say—but that includes the CEO. No one outside of sales should have access to the customer database. And so on. Least privilege works for software too. Services need access to other services—but services must authenticate to each other and should only be able to make requests appropriate to their role. Any unexpected request should be rejected and treated as a signal that the software has been compromised. And least privilege works for hardware, whether virtual or physical: finance systems and servers shouldn’t be able to access HR systems, for example. Ideally, they should be on separate networks. You should have a “defense in depth” security strategy that focuses not only on keeping “bad guys” out of your network but also on limiting where they can go once they’re inside. You want to stop an attack that originates on HR systems from finding its way to the finance systems or some other part of the company. Particularly when you’re dealing with ransomware, making it difficult for an attack to propagate from one system to another is all-important.

Attribute-based access control (ABAC) can be seen as an extension of least privilege. ABAC is based on defining policies about exactly who and what should be allowed to access every service: What are the criteria on which trust should be based? And how do these criteria change over time? If a device suddenly moves between networks, does that represent a risk? If a system suddenly makes a request that it has never made before, has it been compromised? At what point should access to services be denied? ABAC, done right, is difficult and requires a lot of human involvement: looking at logs, deciding what kinds of access are appropriate, and keeping policies up-to-date as the situation changes. Working from home is an example of a major change that security people will need to take into account. You might have “trusted” an employee’s laptop, but should you trust it when it’s on the same network as their children? Some of this can be automated, but the bottom line is that you can’t automate security.

Finally: detecting a ransomware attack isn’t difficult. If you think about it, this makes a lot of sense: encrypting all your files requires a lot of CPU and filesystem activity, and that’s a red flag. The way files change is also a giveaway. Most unencrypted files have low entropy: they have a high degree of order. (On the simplest level, you can glance at a text file and tell that it’s text. That’s because it has a certain kind of order. Other kinds of files are also ordered, though the order isn’t as apparent to a human.) Encrypted files have high entropy (i.e., they’re very disordered)—they have to be; otherwise, they’d be easy to decrypt. Computing a file’s entropy is simple and for these purposes doesn’t require looking at the entire file. Many security products for desktop and laptop systems are capable of detecting and stopping a ransomware attack. We don’t do product recommendations, but we do recommend that you research the products that are available. (PC Magazine’s 2021 review of ransomware detection products is a good place to start.)

In the data center or the cloud

Detecting ransomware once it has escaped into a data center, whether in the cloud or on-premises, isn’t a fundamentally different task, but commercial products aren’t there yet. Again, prevention is the best defense, and the best defense is strong on the fundamentals. Ransomware makes its way from a desktop to a data center via compromised credentials and operating systems that are unpatched and unprotected. We can’t say this too often: make sure secrets are protected, make sure identity and access management are configured correctly, make sure you have a backup strategy (and that the backups work), and make sure operating systems are patched—zero-trust is your friend.

Amazon Web Services, Microsoft Azure, and Google Cloud all have services named “Identity and Access Management” (IAM); the fact that they all converged on the same name tells you something about how important it is. These are the services that configure users, roles, and privileges, and they’re the key to protecting your cloud assets. IAM doesn’t have a reputation for being easy. Nevertheless, it’s something you have to get right; misconfigured IAM is at the root of many cloud vulnerabilities. One report claims that well over 50% of the organizations using Google Cloud were running workloads with administrator privileges. While that report singles out Google, we believe that the same is true at other cloud providers. All of these workloads are at risk; administrator privileges should only be used for essential management tasks. Google Cloud, AWS, Azure, and the other providers give you the tools you need to secure your workloads, but they can’t force you to use them correctly.

It’s worth asking your cloud vendor some hard questions. Specifically, what kind of support can your vendor give you if you are a victim of a security breach? What can your vendor do if you lose control of your applications because IAM has been misconfigured? What can your vendor do to restore your data if you succumb to ransomware? Don’t assume that everything in the cloud is “backed up” just because it’s in the cloud. AWS and Azure offer backup services; Google Cloud offers backup services for SQL databases but doesn’t appear to offer anything comprehensive. Whatever your solution, don’t just assume it works. Make sure that your backups can’t be accessed via the normal paths for accessing your services—that’s the cloud version of “leave your physical backup drives disconnected when not in use.” You don’t want an attacker to find your cloud backups and encrypt them too. And finally, test your backups and practice restoring your data.

Any frameworks your IT group has in place for observability will be a big help: Abnormal file activity is always suspicious. Databases that suddenly change in unexpected ways are suspicious. So are services (whether “micro” or “macroscopic”) that suddenly start to fail. If you have built observability into your systems, you’re at least partway there.

How confident are you that you can defend against a ransomware attack? In our survey, 60% of the respondents said that they were confident; another 28% said “maybe,” and 12% said “no.” We’d give our respondents good, but not great, marks on readiness (2FA, software updates, and backups). And we’d caution that confidence is good but overconfidence can be fatal. Make sure that your defenses are in place and that those defenses work.

If you become a victim

What do you do? Many organizations just pay. ( tracks total payments to ransomware sites, currently estimated at $92,120,383.83.) The FBI says that you shouldn’t pay, but if you don’t have the ability to restore your systems from backups, you might not have an alternative. Although the FBI was able to recover the ransom paid by Colonial Pipeline, I don’t think there’s any case in which they’ve been able to recover decryption keys.

Whether paying the ransom is a good option depends on how much you trust the cybercriminals responsible for the attack. The common wisdom is that ransomware attackers are trustworthy, that they’ll give you the key you need to decrypt your data and even help you use it correctly. If the word gets out that they can’t be trusted to restore your systems, they’ll find fewer victims willing to pay up. However, at least one security vendor says that 40% of ransomware victims who pay never get their files restored. That’s a very big “however,” and a very big risk—especially as ransomware demands skyrocket. Criminals are, after all, criminals. It’s all the more reason to have good backups.

There’s another reason not to pay that may be more important. Ransomware is a big business, and like any business, it will continue to exist as long as it’s profitable. Paying your attackers might be an easy solution short-term, but you’re just setting up the next victim. We need to protect each other, and the best way to do that is to make ransomware less profitable.

Another problem that victims face is extortion. If the attackers steal your data in addition to encrypting it, they can demand money not to publish your confidential data online—which may leave you with substantial penalties for exposing private data under laws such as GDPR and CCPA. This secondary attack is becoming increasingly common.

Whether or not they pay, ransomware victims frequently face revictimization because they never fix the vulnerability that allowed the ransomware in the first place. So they pay the ransom, and a few months later, they’re attacked again, using the same vulnerability. The attack may come from the same people or it may come from someone else. Like any other business, an attacker wants to maximize its profits, and that might mean selling the information they used to compromise your systems to other ransomware outfits. If you become a victim, take that as a very serious warning. Don’t think that the story is over when you’ve restored your systems.

Here’s the bottom line, whether or not you pay. If you become a victim of ransomware, figure out how the ransomware got in and plug those holes. We began this article by talking about basic security practices. Keep your software up-to-date. Use two-factor authentication. Implement defense in depth wherever possible. Design zero-trust into your applications. And above all, get serious about backups and practice restoring from backup regularly. You don’t want to become a victim again.

Thanks to John Viega, Dean Bushmiller, Ronald Eddings, and Matthew Kirk for their help. Any errors or misunderstandings are, of course, mine.

  1. The survey ran July 21, 2021, through July 23, 2021, and received more than 700 responses.
Categories: Technology

Radar trends to watch: August 2021

O'Reilly Radar - Mon, 2021/08/02 - 07:27

Security continues to be in the news: most notably the Kaseya ransomware attack, which was the first case of a supply chain ransomware attack that we’re aware of. That’s new and very dangerous territory. However, the biggest problem in security remains simple: take care of the basics. Good practices for authentication, backups, and software updates are the best defense against ransomware and many other attacks.

Facebook has said that it is now focusing on building the virtual reality Metaverse, which will be the successor to the web. To succeed, VR will have to get beyond ultra geeky goggles. But Google Glass showed the way, and that path is being followed by Apple and Facebook in their product development.

AI and Data
  • There’s a new technique for protecting natural language systems from attack by misinformation and malware bots: using honeypots to capture attackers’ key phrases proactively, and incorporate defenses into the training process.
  • DeepMind’s AlphaFold has made major breakthroughs in protein folding. DeepMind has released the source code for AlphaCode 2.0 on GitHub. DeepMind will also release the structure of every known protein. The database currently includes over 350,000 protein structures, but is expected to grow to over 100,000,000. This is of immense importance to research in biology and medicine.
  • Google searches can now tell you why a given result was included. It’s a minor change, but we’ve long argued that in AI, “why” may give you more information than “what.”
  • Researchers have been able to synthesize speech using the brainwaves of a patient who has been paralyzed and unable to talk. The process combines brain wave detection with models that predict the next word.
  • The National Institute of Standards (NIST) tests systems for identifying airline passengers for flight boarding.  They claim that they have achieved 99.87% accuracy, without significant differences in performance between different demographic groups.
  • An attempt at adding imagination to AI works has been made by combining different attributes of known objects. Humans are good at this: we can imagine a green dog, for example.
  • Phase precession is a recently discovered phenomenon by which neurons encode information in the timing of their firing.  It may relate to humans’ ability to learn on the basis of a small number of examples.
  • Yoshua Bengio, Geoff Hinton, and Yann LeCun give an assessment of the state of Deep Learning, its future, and its ability to solve problems.
  • AI is learning to predict human behavior from videos (e.g., movies). This research attempts to answer the question “What will someone do next?” in situations where there are large uncertainties. One trick is reverting to high-level concepts (e.g., “greet”) when the system can’t predict more specific behaviors (e.g., “shake hands”).
  • JAX is a new Python library for high-performance mathematics. It includes a just-in-time compiler, support for GPUs and TPUs, automatic differentiation, and automatic vectorization and parallelization.
  • Matrix is an open standard for a decentralized “conversation store” that is used as the background for many other kinds of applications. Germany has announced that it will use Matrix as the standard for digital messaging in its national electronic health records system.
  • Brython is Python 3.9.5 running in the browser, with access to the DOM.  It’s not a replacement for JavaScript, but there are a lot of clever things you can do with it.
  • Using a terminal well has always been a superpower. Warp is a new terminal emulator built in Rust with features that you’d never expect: command sharing, long-term cloud-based history, a true text editor, and a lot more.
  • Is it WebAssembly’s time? Probably not yet, but it’s coming. Krustlets allow you to run WebAssembly workloads under Kubernetes. There is also an alternative to a filesystem written in wasm; JupyterLite is an attempt to build a complete distribution of Jupyter, including JupyterLab, that runs entirely in the browser.
  • Google launches Intrinsic, a moonshot project to develop industrial robots.
  • 21st Century Problems: should autonomous delivery robots be allowed in bike lanes? The Austin (Texas) City Council already has to consider this issue.
  • Veins in materials? Researchers have greatly reduced the time it takes to build vascular systems into materials, which could have an important impact on our ability to build self-healing structures.
  • Researchers have designed fabrics that can cool the body by up to 5 degrees Celsius by absorbing heat and re-emitting it in the near-infrared range.
  • A bendable processor from ARM could be the future of wearable computing. It’s far from a state-of-the-art CPU, and probably will never be one, but with further development could be useful in edge applications that require flexibility.
  • Google experiments with error correction for quantum computers.  Developing error correction is a necessary step towards making quantum computers “real.”
  • Attackers have learned to scan repos like GitHub to find private keys and other credentials that have inadvertently been left in code that has been checked in. Checkov, a code analysis tool for detecting vulnerabilities in cloud infrastructure, can now can find these credentials in code.
  • Amnesty International has released an open source tool for checking whether a phone has been compromised by Pegasus, the spyware sold by the NSO group to many governments, and used (among other things) to track journalists. Matthew Green’s perspective on “security nihilism” discusses the NSO’s activity; it is a must-read.
  • The REvil ransomware gang (among other things, responsible for the Kaseya attack, which infected over 1,000 businesses) has disappeared; all of its web sites went down at the same time. Nobody knows why; possibilities include pressure from law enforcement, reorganization, and even retirement.
  • DID is a new proposed form of decentralized digital identity that is currently being tested in the travel passports with COVID data being developed by the International Air Transport Association.
  • A massive ransomware attack by the REvil cybercrime group exploited supply chain vulnerabilities. The payload was implanted in a security product by Kaseya that is used to automate software installation and updates. The attack apparently only affects on-premises infrastructure. Victims are worldwide; the number of victims is in the “low thousands.”
  • Kubernetes is being used by the FancyBear cybercrime group, and other groups associated with the Russian government, to orchestrate a worldwide wave of brute-force attacks aimed at data theft and credential stealing.
  • Observability is the next step beyond monitoring.  That applies to data and machine learning, too, and is part of incorporating ML into production processes.
  • A new load balancing algorithm does a much better job of managing load at datacenters, and reduces power consumption by allowing servers to be shut down when not in use.
  • MicroK8S is a version of Kubernetes designed for small clusters that claims to be fault tolerant and self-healing, requiring little administration.
  • Calico is a Kubernetes plugin that simplifies network configuration. 
Web and Mobile
  • Scuttlebutt is a protocol for the decentralized web that’s “a way out of the social media rat race.”  It’s (by definition) “sometimes on,” not a constant presence.
  • Storywrangler is a tool for analyzing Twitter at scale.  It picks out the most popular word combinations in a large number of languages.
  • Google is adding support for “COVID vaccination passports” to Android devices.
  • Tim Berners-Lee’s Solid protocol appears to be getting real, with a small ecosystem of pod providers (online data stores) and apps.
  • Why are Apple and Google interested in autonomous vehicles? What’s the business model? They are after the last few minutes of attention. If you aren’t driving, you’ll be in an app.
Virtual Reality
  • Mark Zuckerberg has been talking up the Metaverse as the next stage in the Internet’s evolution: a replacement for the Web as an AR/VR world. But who will want to live in Facebook’s world?
  • Facebook is committing to the OpenXR standard for its Virtual Reality products. In August 2022, all new applications will be required to use OpenXR; its proprietary APIs will be deprecated.
  • The Open Voice Network is an industry association organized by the Linux Foundation that is dedicated to ethics in voice-driven applications. Their goal is to close the “trust gap” in voice applications.
Categories: Technology
Subscribe to LuftHans aggregator - Technology