You are here

Feed aggregator

Four short links: 31 January 2019

O'Reilly Radar - Thu, 2019/01/31 - 04:55

Locke the Thinkfluencer, Open Source Semiconductor Manufacturing, AR/VR, and IT's Recycling Shame

  1. Cory Doctorow at Grand Reopening of the Public Domain -- Locke was a thinkfluencer. No transcript yet, but audio ripped on the Internet Archive.
  2. Libre Silicon -- We develop a free and open source semiconductor manufacturing process standard and provide a quick, easy, and inexpensive way for manufacturing. No NDAs will be required anywhere to get started, making it possible to build the designs in your basement if you wish. We are aiming to revolutionize the market by breaking through the monopoly of proprietary closed-source manufacturers.
  3. Predicting Visual Discomfort with Stereo Displays -- In a third experiment, we measured phoria and the zone of clear single binocular vision, which are clinical measurements commonly associated with correcting refractive error. Those measurements predicted susceptibility to discomfort in the first two experiments. A simple predictor of whether and when you're going to puke with an AR/VR headset would be a wonderful thing. Perception of synthetic realities are weird: a friend told me about encountering a bug in a VR renderer that made him immediately (a) fall over, and (b) puke. Core dumped?
  4. A New Circular Vision for Electronics (World Economic Forum) -- getting coverage because it says: Each year, close to 50 million tonnes of electronic and electrical waste (e-waste) are produced, equivalent in weight to all commercial aircraft ever built; only 20% is formally recycled. If nothing is done, the amount of waste will more than double by 2050, to 120 million tonnes annually. [...] That same e-waste represents a huge opportunity. The material value alone is worth $62.5 billion (€55 billion), three times more than the annual output of the world’s silver mines and more than the GDP of most countries. There is 100 times more gold in a tonne of mobile phones than in a tonne of gold ore. (via Slashdot)

Continue reading Four short links: 31 January 2019.

Categories: Technology

Four short links: 30 January 2019

O'Reilly Radar - Wed, 2019/01/30 - 11:35

No Code, Enterprise Sales, Deep-Learning the Brain, and Computer Architecture

  1. The Rise of No Code -- As creating things on the internet becomes more accessible, more people will become makers. It’s no longer limited to the >1% of engineers who can code, resulting in an explosion of ideas from all kinds of people. We see “no code” projects on Product Hunt often. This is related to my ongoing interest in Ways In Which Programmers Are Automating Themselves Out of A Job. This might be bad for some low-complexity programmers in the short term, and good for society. Or it might be that the AI Apocalypse is triggered by someone's Glitch bot achieving sentience. Watch this space!
  2. My Losing Battle with Enterprise Sales (Luke Kanies) -- All that discounting you have to do for enterprise clients? It’s because procurement’s bonus is based on how much of a discount they force you to give. Absolutely everyone knows this is how it works, and that everyone knows this, so it’s just a game. I offer my product for a huge price, you try to force a discount, and then at the end we all compare notes to see how we did relative to market. Neither of us really wants to be too far out of spec; I want to keep my average prices the same, and you just want to be sure you aren’t paying too much. Luke tells all.
  3. Decoding Words from Brain Waves -- In each study, electrodes placed directly on the brain recorded neural activity while brain-surgery patients listened to speech or read words out loud. Then, researchers tried to figure out what the patients were hearing or saying. In each case, researchers were able to convert the brain's electrical activity into at least somewhat-intelligible sound files.
  4. A New Golden Age for Computer Architecture (ACM) -- the opportunities for future improvements in speed and energy efficiency will come from (the authors predict): compiler tech and domain-specific architectures. This is a very good overview of how we got here, by way of Moore's Law, Dennard's Law, and Amdahl's Law.

Continue reading Four short links: 30 January 2019.

Categories: Technology

How companies are building sustainable AI and ML initiatives

O'Reilly Radar - Tue, 2019/01/29 - 05:00

A recent survey investigated how companies are approaching their AI and ML practices, and measured the sophistication of their efforts.

In 2017, we published “How Companies Are Putting AI to Work Through Deep Learning,” a report based on a survey we ran aiming to help leaders better understand how organizations are applying AI through deep learning. We found companies were planning to use deep learning over the next 12-18 months. In 2018, we decided to run a follow-up survey to determine whether companies’ machine learning (ML) and AI initiatives are sustainable—the results of which are in our recently published report, “Evolving Data Infrastructure.”

The current generation of AI and ML methods and technologies rely on large amounts of data—specifically, labeled training data. In order to have a longstanding AI and ML practice, companies need to have data infrastructure in place to collect, transform, store, and manage data. On one hand, we wanted to see whether companies were building out key components. On the other hand, we wanted to measure the sophistication of their use of these components. In other words, could we see a roadmap for transitioning from legacy cases (perhaps some business intelligence) toward data science practices, and from there into the tooling required for more substantial AI adoption?

Here are some notable findings from the survey:

  • Companies are serious about machine learning and AI. Fifty-eight percent of respondents indicated that they were either building or evaluating data science platform solutions. Data science (or machine learning) platforms are essential for companies that are keen on growing their data science teams and machine learning capabilities.
  • Companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI. These include data integration and extract, transform, and load (ETL) (60% of respondents indicated they were building or evaluating solutions), data preparation and cleaning (52%), data governance (31%), metadata analysis and management (28%), and data lineage management (21%).
  • Data scientists and data engineers are in demand. When asked which were the main skills related to data that their teams needed to strengthen, 44% chose data science and 41% chose data engineering.
  • Companies are building data infrastructure in the cloud. Eighty-five percent indicated that they had data infrastructure in at least one of the seven cloud providers we listed, with two-thirds (63%) using Amazon Web Services (AWS) for some portion of their data infrastructure. We found that users of AWS, Microsoft Azure, and Google Cloud Platform (GCP) tended to use multiple cloud providers.

Continue reading How companies are building sustainable AI and ML initiatives.

Categories: Technology

Four short links: 29 January 2019

O'Reilly Radar - Tue, 2019/01/29 - 04:50

Git Tool, Linear Algebra, Steganography, and WebAssembly

  1. git-absorb -- git commit --fixup, but automatic.
  2. Coding the Matrix -- linear algebra was where math broke me at university, so my eyes are always drawn to presentations of the subject that promise relevance and comprehensibility. (via Academic Torrents)
  3. A List of Useful Steganography Tools and Resources -- what it says on the box.
  4. Analyzing the Performance of WebAssembly vs. Native Code -- Across the SPEC CPU suite of benchmarks, we find a substantial performance gap: applications compiled to WebAssembly run slower by an average of 50% (Firefox) to 89% (Chrome), with peak slowdowns of 2.6x (Firefox) and 3.14x (Chrome). We identify the causes of this performance degradation, some of which are due to missing optimizations and code generation issues, while others are inherent to the WebAssembly platform.

Continue reading Four short links: 29 January 2019.

Categories: Technology

Four short links: 28 January 2019

O'Reilly Radar - Mon, 2019/01/28 - 06:15

Medical AI, Opinion Mapping, Voting-Free Democracy, and a Typed Graph Database

  1. AI Helps Amputees Walk With a Robotic Knee (IEEE) -- Normally, human technicians spend hours working with amputees to manually adjust robotic limbs to work well with each person’s style of walking. By comparison, the reinforcement learning technique automatically tuned a robotic knee, enabling the prosthetic wearers to walk smoothly on level ground within 10 minutes.
  2. Penelope -- a cloud-based, open, and modular platform that consists of tools and techniques for mapping landscapes of opinions expressed in online (social) media. The platform is used for analyzing the opinions that dominate the debate on certain crucial social issues, such as immigration, climate change, and national identity. Penelope is part of the H2020 EU project ODYCCEUS (Opinion Dynamics and Cultural Conflict in European Spaces).
  3. What MMOs Can Teach Us About Real-Life Politics -- Larry Lessig is designing the political mechanics for a videogame, and this interview is very intriguing. Lessig is also interested in possibly implementing an in-game process in which democracy doesn’t depend on voting: “I’m eager to experiment or enable the experimentation of systems that don’t need to be tied so much to election.” (via BoingBoing)
  4. The AtomSpace: a Typed Graphical Distributed in-RAM Knowledgebase (OpenCog) -- Here’s my sales pitch: you want a graph database with a sophisticated type system built into it. Maybe you don’t know this yet. But you do. You will. You’ll have trouble doing anything reasonable with your knowledge (like reasoning, inferencing, and learning) if you don’t. This is why the OpenCog AtomSpace is a graph database, with types.

Continue reading Four short links: 28 January 2019.

Categories: Technology

Rethinking informed consent

O'Reilly Radar - Mon, 2019/01/28 - 05:00

Consent is the first step toward the ethical use of data, but it's not the last.

Informed consent is part of the bedrock of data ethics. DJ Patil, Hilary Mason, and I have written about it, as have many others. It's rightfully part of every code of data ethics I've seen. But I have to admit misgivings—not so much about the need for consent, but about what it means. Obtaining consent to collect and use data isn't the end of the process; at best, it's the beginning, and perhaps not a very good one.

Helen Nissenbaum, in an interview with Scott Berinato, articulates some of the problems. It's easy to talk about informed consent, but what do we mean by "informed"? Almost everyone who reads this article has consented to some kind of medical procedure; did any of us have a real understanding of what the procedure was and what the risks were? We rely on the prestige or status of the doctor, but unless we're medical professionals, or have done significant online research, we have, at best, a vague notion of what's going to happen and what the risks are. In medicine, for the most part, things come out all right. The problems with consent to data collection are much deeper.

The problem starts with the origin of the consent criterion. It comes from medicine and the social sciences, in which consenting to data collection and to being a research subject has a substantial history. It arose out of experiments with mind-boggling ethical problems (for example, the Tuskeegee syphilis experiment), and it still isn't always observed (paternalism is still a thing). "Consent" in medicine is limited: whether or not you understand what you're consenting to, you are consenting to a single procedure (plus emergency measures if things go badly wrong). The doctor can't come back and do a second operation without further consent. And likewise, "consent" in the social sciences is limited to a single study: you become a single point in an array of data that ceases to exist when the study is complete.

That may have been true years ago, but those limitations on how consent is used seem very shaky, as Nissenbaum argues. Consent is fundamentally an assurance about context: consenting to a medical procedure means the doctors do their stuff, and that's it. The outcome might not be what you want, but you've agreed to take the risk. But what about the insurance companies? They get the data, and they can repackage and exchange it. What happens when, a few years down the road, you're denied coverage because of a "pre-existing condition"? That data has moved beyond the bounds of an operating room. What happens when data from an online survey or social media profile is shared with another organization and combined and re-combined with other data? When it is used in other contexts, can it be de-anonymized and used to harm the participants? That single point in an array of data has now become a constellation of points feeding many experiments, not all of which are benign.

I'm haunted by the question, "what are users consenting to?" Technologists rarely think through the consequences of their work carefully enough; but even if they did, there will always be consequences that can't be foreseen or understood, particularly when data from different sources is combined. So, consenting to data collection, whether it's clicking on the ever-present checkbox about cookies or agreeing to Facebook's license agreement, is significantly different from agreeing to surgery. We really don't know how that data is used, or might be used, or could be used in the future. To use Nissenbaum's language, we don't know where data will flow, nor can we predict the contexts in which it will be used.

Consent frequently isn't optional, but compelled. Writing about the #DeleteFacebook movement, Jillian York argues that for many, deleting Facebook is not an option: "for people with marginalized identities, chronic illnesses, or families spread across the world, walking away [from Facebook] means leaving behind a potentially vital safety net of support." She continues by writing that small businesses, media outlets, artists, and activists rely on it to reach audiences. While no one is compelled to sign up, or to remain a user, for many "deleting facebook" means becoming a non-entity. If Facebook is your only way to communicate with friends, relatives, and support communities, refusing "consent" may not be an option; consent is effectively compelled. The ability to withdraw consent from Facebook is a sign of privilege. If you lack privilege, an untrustworthy tool may be better than no tool at all.

One alternative to consent is the idea that you own the data and should be compensated for its use. Eric Posner, Glen Weyl, and others have made this argument, which essentially substitutes a market economy for consent: if you pay me enough, I'll let you use my data. However, markets don’t solve many problems. In "It's time for a bill of data rights," Martin Tisne argues that data ownership is inadequate. When everything you do creates data, it's no more meaningful to own your "digital shadow" than your physical one. How do you "own" your demographic profile? Do you even "own" your medical record? Tisne writes: "A person doesn’t 'own' the fact that she has diabetes—but she can have the right not to be discriminated against because of it... But absent government regulation to prevent health insurance companies from using data about preexisting conditions, individual consumers lack the ability to withhold consent. ... Consent, to put it bluntly, does not work." And it doesn't work whether or not consent is mediated by a market. At best, the market may give some incremental income, but at worst, it gives users incentives to act against their best interest.

It's also easy to forget that in many situations, users are compensated for their data: we're compensated by the services that Facebook, Twitter, Google, and Amazon provide. And that compensation is significant; how many of us could do our jobs without Google? The economic value of those services to me is large, and the value of my data is actually quite limited. To Google, the dozens of Google searches I do in a day are worth a few cents at most. Google's market valuation doesn't derive from the value of my data or yours in isolation, but the added value that comes from aggregating data across billions of searches and other sources. Who owns that added value? Not me. An economic model for consent (I consent to let you use my data if you pay me) misses the point: data’s value doesn’t live with the individual.

It would be tragic to abandon consent, though I agree with Nissenbaum that we urgently need to get beyond "incremental improvement to consent mechanisms." It is time to recognize that consent has serious limitations, due partly to its academic and historical origins. It's important to gain consent for participation in an experiment; otherwise, the subject isn't a participant but a victim. However, while understanding the consequences of any action has never been easy, the consent criterion arose when consequences were far more limited and data didn't spread at the speed of light.

So, the question is: how do we get beyond consent? What kinds of controls can we place on the collection and use of data that align better with the problems we're facing? Tisne suggests a "data bill of rights": a set of general legal principles about how data can be used. The GDPR is a step in this direction; the Montreal Declaration for the Responsible Development of Artificial Intelligence could be reformulated as a "bill of data rights." But a data bill of rights assumes a new legal infrastructure, and by nature such infrastructures place the burden of redress on the user. Would one bring a legal action against Facebook or Google for violation of one's data rights? Europe's enforcement of GDPR will provide an important test case, particularly since this case is essentially about data flows and contexts. It isn't clear that our current legal institutions can keep pace with the many flows and contexts in which data travels.

Nissenbaum starts from the knowledge that data moves, and that the important questions aren't around how our data is used, but where our data travels. This shift in perspective is important precisely because data sets become more powerful when they're combined; because it isn't possible to anticipate all the ways data might be used; and because once data has started flowing, it's very hard to stop it. But we have to admit we don't yet know how to ask for consent about data flows or how to prove they are under control. Which data flows should be allowed? Which shouldn't? We want to enable medical research on large aggregated data sets without jeopardizing the insurance coverage of the people whose data are in those sets. Data would need to carry metadata with it that describes where it could be transferred and how it could be used once it's transferred; it makes no sense to talk about controlling data flows if that control can't be automated.

As Ben Lorica and I have argued, the only way forward is through more automation, not less; issues of scale won't let us have it any other way. In a conversation, Andrew Zaldivar told me of his work with Margaret Mitchell, Timnit Gebru, and others, on model cards that describe the behavior of a machine learning model, and of Timnit Gebru's work on Datasheets for Datasets, which specify how a data set was collected, how it is intended to be used, and other information. Model cards and data set datasheets are a step toward the kind of metadata we'd need to automate control over data flows, to build automated tools that manage where data can and can't travel, to protect public goods as well as personal privacy. In the past year, we’ve seen how easy it is to be overly optimistic about tool building, but we are all already using data at the scale of Google and Facebook. There will need to be human systems that override automatic control over data flows, but automation is an essential ingredient.

Consent is the first step along the path toward ethical use of data, but not the last one. What is the next step?

Continue reading Rethinking informed consent.

Categories: Technology

Four short links: 25 January 2019

O'Reilly Radar - Fri, 2019/01/25 - 05:25

IT Failures, Paradigms, AI Governance, Quantum Hokum

  1. Biggest IT Failures of 2018 (IEEE) -- a coding error with the spot-welding robots at Subaru’s Indiana Automotive plant in Lafayette, Ind., meant 293 of its new Subaru Ascents had to be sent to the car crusher. A similar problem is suspected as the reason behind the welding problems affecting the steering on Fiat Chrysler Jeep Wranglers. This is not the "crushing it" that brogrammers intended.
  2. Programming Paradigms for Dummies: What Every Programmer Should Know -- This chapter gives an introduction to all the main programming paradigms, their underlying concepts, and the relationships between them. We give a broad view to help programmers choose the right concepts they need to solve the problems at hand. We give a taxonomy of almost 30 useful programming paradigms and how they are related. Most of them differ only in one or a few concepts, but this can make a world of difference in programming. (via Adrian Colyer)
  3. Proposed Model Governance -- Singapore Government's work on regulating AI.
  4. Talent Shortage in Quantum Computing (MIT) -- an argument that we need special training for quantum computing, as it's a mix of engineering and science at this stage in its evolution. This chap would disagree, colorfully: when a subject which claims to be a technology, which lacks even the rudiments of experiment that may one day make it into a technology, you can know with absolute certainty that this "technology" is total nonsense. That was the politest quote I could make.

Continue reading Four short links: 25 January 2019.

Categories: Technology

Four short links: 24 January 2019

O'Reilly Radar - Thu, 2019/01/24 - 04:55

Computational Periscopy, Automating Data Structures, Multi-Stream Processing, and Open Source Bioinstruments

  1. Computational Periscopy with an Ordinary Camera (Nature) -- Here we introduce a two-dimensional computational periscopy technique that requires only a single photograph captured with an ordinary digital camera. Our technique recovers the position of an opaque object and the scene behind (but not completely obscured by) the object, when both the object and scene are outside the line of sight of the camera, without requiring controlled or time-varying illumination. Such recovery is based on the visible penumbra of the opaque object having a linear dependence on the hidden scene that can be modeled through ray optics. Computation and vision, whether deep learning or this kind of mathematical witchcraft, has brought about an age of truly amazing advances. Digital cameras are going to make film cameras look like pinhole cameras because the digital feature set will be staggering. (All requiring computational power, on- or off-device)
  2. The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost Models -- We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay out data, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. I'm always interested in augmentation for programmers. (via Adrian Colyer)
  3. Confluo (Berkeley) -- open source system for real-time distributed analysis of multiple data streams. Confluo simultaneously supports high throughput concurrent writes, online queries at millisecond timescales, and CPU-efficient ad hoc queries via a combination of data structures carefully designed for the specialized case of multiple data streams, and an end-to-end optimized system design. The home page has more information. Designing for multiple data streams is an interesting architectural choice. Any interesting business will track multiple data streams, but will they do that in one system or bolt together multiple?
  4. Open-Sourcing Bioinstruments -- story of the poseidon syringe pump system, which has free hardware designs and software.

Continue reading Four short links: 24 January 2019.

Categories: Technology

7 web dev trends on our radar

O'Reilly Radar - Thu, 2019/01/24 - 04:00

Experts weigh in on GraphQL, machine learning, React, micro-frontends, and other trends that will shape web development.

The Greek philosopher Heraclitus’ saying that “change is the only constant in life” resonates strongly with web developers. We asked our community of experts for their take on the tools and trends that will usher in the greatest changes for the web development world in the coming months.

GraphQL leaves the nest

2019 is going to be a big year for figuring out how larger organizations are going to work with GraphQL at scale. For companies that aren't set up like Facebook, reconciling large-scale service-oriented architecture (SOA) efforts with GraphQL will require thinking around schema composition and quite a bit of exciting tooling to make development fast and easy. — Adam Neary, Tech Lead at Airbnb

Machine learning in the browser

With machine learning (ML) shifting toward a larger developer audience, we expect to see new use cases for ML in the browser and connected IoT devices such as the Raspberry Pi and Google AIY projects. Tools like TensorFlow and TensorFlow.js are enabling developers to build ML-enabled applications without first completing their PhDs, while easier-to-use APIs in TensorFlow.js and TensorFlow with Keras are lowering the barrier to building deep learning models. These advances make it possible to quickly deploy off-the-shelf models and research paper models into production environments. — Nick Kreeger, Senior Software Engineer, Google

React introduces the notion of hooks

The announcement of React Hooks demonstrates how the React team is making great decisions about the future of the library. In 2019, teams using React will be able to opt-in to new features, and it's very likely that other organizations will move to React due to the strength of these new proposals. — Alex Banks and Eve Porcello, Software Engineers, Moon Highway

Micro-frontends and/or ES6 modules scale frontend applications

The frontend ecosystem is looking for a better way to collaborate with distributed teams in medium to large projects in order to speed up the delivery of new features or products. Micro-frontend principles and ES6 modules are the answer to this challenge, bringing to the table a smart and consistent way to slice an application via subdomains. — Luca Mezzalira, Chief Architect, DAZN

Less typing for mobile users

For mobile web users, web authentication and web payments will finally appear on a lot of sites in 2019, allowing users to log in and pay without entering any details in a web form, just by accessing native features from the browsers. — Maximiliano Firtman, Mobile and Web Developer and Consultant

Progressive web apps catch on

Ever since Alex Russell and Frances Berriman first described them in 2015, people have been talking about progressive web apps (PWAs). In 2019 they’ll become increasingly pervasive. Developers are likely to write more PWAs primarily because the libraries they use, such as Redux and Firebase, encourage them to design apps that align with a PWA architecture. — David Griffiths and Dawn Griffiths, authors of Head First Kotlin

MobX and MobX State Tree usage expands

MobX has gotten a lot of traction in the past year, and it will be consolidated even further in 2019. It’s the perfect companion for working in combination with React. Its reactivity model makes it really easy to implement an application end to end—particularly when you use the MobX State Tree to provide a structure to your projects. — Luca Mezzalira

Continue reading 7 web dev trends on our radar.

Categories: Technology

Four short links: 23 January 2019

O'Reilly Radar - Wed, 2019/01/23 - 06:10

NLP, Verified Software, LiveJournal, and Personal CRM

  1. Zero-Shot Transfer Across 93 Languages (Facebook) -- we have significantly expanded and enhanced our LASER (Language-Agnostic SEntence Representations) toolkit. We are now open-sourcing our work, making LASER the first successful exploration of massively multilingual sentence representations to be shared publicly with the NLP community. The toolkit now works with more than 90 languages, written in 28 different alphabets.
  2. Formally Verified Software in the Real World (CACM) -- This was not the first autonomous flight of the AH-6, dubbed the Unmanned Little Bird (ULB); it had been doing them for years. This time, however, the aircraft was subjected to mid-flight cyber attacks. The central mission computer was attacked by rogue camera software as well as by a virus delivered through a compromised USB stick that had been inserted during maintenance. The attack compromised some subsystems but could not affect the safe operation of the aircraft.
  3. The Linux of Social Media: How LiveJournal Pioneered Then Lost Web Blogging -- “We were always saying we were fighting for the users, that we would run everything by the community before we did anything,” says Mark Smith, a software engineer who worked on LiveJournal and became the co-creator of Dreamwidth. “Well, as it turns out, when you do that, you end up with the community telling you they want everything to stay the same, forever."
  4. Monica -- open source personal CRM. Monica helps you organize the social interactions with your loved ones.

Continue reading Four short links: 23 January 2019.

Categories: Technology

The trinity of errors in financial models: An introductory analysis using TensorFlow Probability

O'Reilly Radar - Tue, 2019/01/22 - 05:00

An exploration of three types of errors inherent in all financial models.

At Hedged Capital, an AI-first financial trading and advisory firm, we use probabilistic models to trade the financial markets. In this blog post, we explore three types of errors inherent in all financial models, with a simple example of a model in TensorFlow Probability (TFP).

Finance is not physics

Adam Smith, generally recognized as the founder of modern economics, was in awe of Newton’s laws of mechanics and gravitation. Ever since then, economists have endeavored to make their discipline into a science like physics. They aspire to formulate theories that accurately explain and predict the economic activities of human beings at the micro and macro levels. This desire gathered momentum in the early 20th century with economists like Irving Fisher and culminated in the Econophysics movement of the late 20th century.

Figure 1. Image by Mike Shwe and Deepak Kanungo. Used with permission.

Despite all the complicated mathematics of modern finance, its theories are woefully inadequate, especially when compared to those of physics. For instance, physics can predict the motion of the moon and the electrons in your computer with jaw-dropping precision. These predictions can be calculated by any physicist, at any time, anywhere on the planet. By contrast, market participants have trouble explaining the causes of daily market movements or predicting the price of a stock at any time, anywhere in the world.

Perhaps finance is harder than physics. Unlike atoms and pendulums, people are complex, emotional beings with free will and latent cognitive biases. They tend to behave inconsistently and continually react to the actions of others. Furthermore, market participants profit by beating or gaming the systems in which they operate.

After losing a fortune on his investment in the South Sea Company, Newton remarked, “I can calculate the movement of the stars, but not the madness of men.” Note that Newton was not “retail dumb money.” He served as the Warden of the Mint in England for almost 31 years, helping put the British pound on the gold standard where it would stay for over two centuries.

All financial models are wrong

Models are used to simplify the complexity of the real world, thus enabling us to focus on the features of a phenomenon that interests us. Clearly, a map will not be able to capture the richness of the terrain it models. George Box, a statistician, famously quipped, “All models are wrong, but some are useful.”

This observation is particularly applicable to finance. Some academics have even argued that financial models are not only wrong, but also dangerous; the veneer of a physical science lulls adherents of economic models into a false sense of certainty about the accuracy of their predictive powers. This blind faith has led to many disastrous consequences for their adherents and for society at large. The most successful hedge fund in history, Renaissance Technologies, has put its critical views of financial theories into practice. Instead of hiring people with a finance or Wall Street background, they prefer to hire physicists, mathematicians, statisticians, and computer scientists. They trade the markets using quantitative models based on non-financial theories such as information theory, data science, and machine learning.

Whether financial models are based on academic theories or empirical data mining strategies, they are all subject to the trinity of modeling errors explained below. All models, therefore, need to quantify the uncertainty inherent in their predictions. Errors in analysis and forecasting may arise from any of the following modeling issues: using an inappropriate functional form, inputting inaccurate parameters, or failing to adapt to structural changes in the market.

The trinity of modeling errors

1. Errors in model specification: Almost all financial theories use the normal distribution in their models. For instance, the normal distribution is the foundation upon which Markowitz’s Modern Portfolio Theory and Black-Scholes-Merton Option Pricing Theory are built. However, it is a well documented fact that stocks, bonds, currencies, and commodities have fat-tailed distributions. In other words, extreme events occur far more frequently than predicted by the normal distribution.

If asset price returns were normally distributed, none of the following financial disasters would occur within the age of the universe: Black Monday, the Mexican Peso Crisis, Asian Currency Crisis, the bankruptcy of Long Term Capital Management (which incidentally was led by two “Nobel-prize” winning economists), or the Flash Crash. “Mini flash crashes” of individual stocks occur with even higher frequency than these macro events.

Yet, finance textbooks, programs, and professionals continue to use the normal distribution in their asset valuation and risk models because of its simplicity and analytical tractability. These reasons are no longer justifiable given today’s advanced algorithms and computational resources. This reluctance in abandoning the normal distribution is a clear example of “the drunkard’s search”: a principle derived from a joke about a drunkard who loses his key in the darkness of a park but frantically searches for it under a lamppost because that’s where the light is.

2. Errors in model parameter estimates: Errors of this type may arise because market participants have access to different levels of information with varying speeds of delivery. They also have different levels of sophistication in processing abilities and different cognitive biases. These factors lead to profound epistemic uncertainty about model parameters.

Let’s consider a specific example of interest rates. Fundamental to the valuation of any financial asset, interest rates are used to discount uncertain future cash flows of the asset and estimate its value in the present. At the consumer level, for example, credit cards have variable interest rates pegged to a benchmark called the prime rate. This rate generally changes in lock-step with the federal funds rate, an interest rate of seminal importance to the U.S. and the world economies.

Let’s imagine that you would like to estimate the interest rate on your credit card one year from now. Suppose the current prime rate is 2% and your credit card company charges you 10% plus prime. Given the strength of the current economy, you believe the Federal Reserve is more likely to raise interest rates than not. The Fed will meet eight times in the next 12 months and will either raise the federal funds rate by 0.25% or leave it at the previous level.

In the following TFP code example (see entire Colab), we use the binomial distribution to model your credit card’s interest rate at the end of the 12-month period. Specifically, we’ll use the TensorFlow Probability Binomial distribution class with the following parameters: total_count = 8 (number of trials or meetings), probs = {0.6, 0.7,0 .8, 0.9}, for our range of estimates about the probability of the Fed raising the federal funds rate by 0.25% at each meeting.

# First we encode our assumptions. num_times_fed_meets_per_year = 8. possible_fed_increases = tf.range( start=0., limit=num_times_fed_meets_per_year + 1) possible_cc_interest_rates = 2. + 10. + 0.25 * possible_fed_increases prob_fed_raises_rates = tf.constant([0.6, 0.7, 0.8, 0.9]) # Now we use TFP to compute probabilities in a vectorized manner. # Pad a dim so we broadcast fed probs against CC interest rates. prob_fed_raises_rates = prob_fed_raises_rates[…, tf.newaxis] prob_cc_interest_rate = tfd.Binomial( total_count=num_times_fed_meets_per_year, probs=prob_fed_raises_rates).prob(possible_fed_increases)

In the graphs below, notice how the probability distribution for your credit card rate in 12 months depends critically on your estimate about the probability of the Fed raising rates at each of the eight meetings. You can see that for every increase of 0.1 in your estimate of the Fed raising rates at each meeting, the expected interest rate for your credit card in 12 months increases by about 0.2%.

Figure 2. Image by Josh Dillion and Deepak Kanungo. Used with permission.

Even if all market participants used the binomial distribution in their models, it’s easy to see how they could disagree about the future prime rate because of the differences in their estimate for probs. Indeed, this parameter is hard to estimate. Many institutions have dedicated analysts, including previous employees of the Fed, analyzing the Fed’s every document, speech, and event to try to estimate this parameter.

Recall that we assumed this parameter probs was constant in our model for each of the next eight Fed meetings. How realistic is that? Members of the Federal Open Market Committee (FOMC), the rate setting body, are not just a set of biased coins. They can and do change their individual biases based on how the economy changes over time. The assumption that the parameter probs will be constant over the next 12 months is not only unrealistic, but also risky.

3. Errors from the failure of a model to adapt to structural changes: The underlying data-generating stochastic process may vary over time—i.e., the process is not stationary ergodic. We live in a dynamic capitalist economy characterized by technological innovations and changing monetary and fiscal policies. Time-variant distributions for asset values and risks are the rule, not the exception. For such distributions, parameter values based on historical data are bound to introduce errors into forecasts.

In our example above, if the economy were to show signs of slowing down, the Fed might decide to adopt a more neutral stance in its fourth meeting, making you change your probs parameter from 70% to 50% going forward. This change in your probs parameter will in turn change the forecast of your credit card interest rate.

Sometimes the time-variant distributions and their parameters change continuously or abruptly, as in the Mexican Peso Crisis. For either continuous or abrupt changes, the models used will need to adapt to evolving market conditions. A new functional form with different parameters might be required to explain and predict asset values and risks in the new regime.

Suppose after the fifth meeting in our example, the U.S. economy is hit by an external shock—say a new populist government in Greece decides to default on its debt obligations. Now the Fed may be more likely to cut interest rates than to raise them. Given this structural change in the Fed’s outlook, we will have to change the binomial probability distribution in our model to a trinomial distribution with appropriate parameters.


Finance is not a precise predictive science like physics. Not even close. So, let’s not treat academic theories and models of finance as if they were models of quantum physics.

All financial models, whether based on academic theories or data mining strategies, are at the mercy of the trinity of modeling errors. While this trifecta of errors can be mitigated with appropriate modeling tools, it cannot be eliminated. There will always be asymmetry of information and cognitive biases. Models of asset values and risks will change over time due to the dynamic nature of capitalism, human behavior, and technological innovation.

Financial models need a framework that quantifies the uncertainty inherent in predictions of time-variant stochastic processes. Equally importantly, the framework needs to continually update the model or its parameters—or both—based on materially new data sets. Such models will have to be trained using small data sets, since the underlying environment may have changed too quickly to collect a sizable amount of relevant data.


We thank the TensorFlow Probability team, especially Mike Shwe and Josh Dillon, for their help in earlier drafts of this blog post.

  1. The Money Formula, by David Orrell and Paul Wilmott, Wiley, 2017
  2. Nobels For Nonsense, by J.R. Thompson, L.S. Baggett, W.C. Wojciechowski, and E.E. Williams, Journal of Post Keynesian Economics, Fall 2006
  3. Model Error, by Katerina Simons, New England Economic Review, November 1997
  4. Bayesian Risk Management, by Matt Sekerke, Wiley, 2015

Continue reading The trinity of errors in financial models: An introductory analysis using TensorFlow Probability.

Categories: Technology

Four short links: 22 January 2019

O'Reilly Radar - Tue, 2019/01/22 - 04:50

Data Science with Puzzles, Formal Methods, Sketching from Photos, and Teaching Machines

  1. Teaching Data Science with Puzzles (Irene Steves) -- genius! The repo has the puzzles in an R project.
  2. Why Don't People Use Formal Methods? -- a really good introduction to the field and current challenges. And entertainingly written: Proofs are hard. Obnoxiously hard. “Quit programming and join the circus” hard. Surprisingly, formal code proofs are often more rigorous than the proofs most mathematicians write! Mathematics is a very creative activity with a definite answer that’s only valid if you show your work. Creativity, formalism, and computers are a bad combination.
  3. Photo Sketching: Inferring Contour Drawings from Images -- the examples in the paper are impressive.
  4. History of Teaching Machines (Audrey Watters) -- a bit of context for the ZOMG APPS WILL SAVE EDUCATION mania.

Continue reading Four short links: 22 January 2019.

Categories: Technology

9 trends to watch in systems engineering and operations

O'Reilly Radar - Tue, 2019/01/22 - 04:00

From artificial intelligence to serverless to Kubernetes, here’s what's on our radar.

If your job or business relies on systems engineering and operations, be sure to keep an eye on the following trends in the months ahead.


Artificial intelligence for IT operations (AIOps) will allow for improved software delivery pipelines in 2019. This practice incorporates machine learning in order to make sense of data and keep engineers informed about both patterns and problems so they can address them swiftly. Rather than replace current approaches, however, the goal of AIOps is to enhance these processes by consolidating, automating, and updating them. A related innovation, Robotic Process Automation (RPA), presents options for task automation and is expected to see rapid and substantial growth as well.

Knative vs. AWS Lambda vs. Microsoft Azure Functions vs. Google Cloud

The serverless craze is in full swing, and shows no signs of stopping—since December 2017 alone, the technology has grown 22%, and Gartner reports that by 2020, more than 20% of global enterprises will be deploying serverless. This is a huge projected increase from the mere 5% that are currently utilizing it. The advantages of serverless are numerous: it saves money and allows organizations to scale and pivot quickly, and better manage development and testing.

While Knative was introduced in July at Google Next—as a joint initiative of Google, Red Hat, Pivotal, SAP, and IBM—the jury’s still out as to whether it will become the industry standard for building serverless applications on top of Kubernetes. It does have a lot going for it, though—namely, that it’s open source and portable between cloud providers. Because it’s not linked to any specific cloud platform (unlike its competitors Amazon Web Services Lambda, Microsoft Azure Functions, and Google Cloud), it may also be more appealing to organizations looking to avoid obligations to a particular vendor, and in turn has the potential to unify the serverless world.

Cloud-native infrastructure

Another fast-growing trend, cloud-native applications in production have seen a 200% spike in the past year. This development marks a clear shift from merely doing business in the cloud. Instead, it moves the focus to creating cloud-native applications, and puts the spotlight on the advantages of cloud-based infrastructure.


As both security threats and compliance pressures grow, automating security and baking security controls into the software development process is now critical. With recently established GDPR regulations that necessitate the notification of a security breach within 72 hours, DevOps and security practices are becoming intertwined not only in process, but in culture as well. Gone are the days of simply responding to issues as they pop up. Instead, a proactive approach that seamlessly weaves security into the development lifecycle will optimize productivity and efficiency for development teams into next year and beyond.

Service mesh

The movement from monolith to microservices has already started, and service meshes will be a key component in fast-tracking the transition. A service mesh can best be described as a dedicated layer of infrastructure that enables rapid, secure, and dependable communication between and among service instances. Among the vendors to watch are Istio, HashiCorps, and Linkerd.

DevOps breaks down more silos; rise of the SRE

Teams and departments will continue to merge across organizations, as both data management and security requirements demand cross-functional processes and the lines between traditional role definitions blur. Reliability will be key to ensuring always-on, always-available performance so we’ll see more engineers and administrators adding reliability to their titles. Database reliability engineers (DBREs), for starters, will replace database administrators (DBAs), incorporating site reliability engineering processes into their day-to-day routine, and adopting the same code review and practices as DevOps teams use to create and deploy applications.


The current industry standard for container orchestration, Kubernetes will continue to own the spotlight in 2019 as more organizations start to walk the talk—either by implementing their own Kube-based infrastructure or letting their cloud vendor manage the complexity through a hosted solution like Microsoft’s Azure Kubernetes Service (AKS), Amazon’s EKS, or Google’s Kubernetes Engine. A recent O’Reilly survey found that less than 40% of respondents have actually implemented Kubernetes, suggesting that the hype currently outweighs the reality. There’s still plenty of room for adoption and growth within organizations—despite how oversaturated the landscape may currently seem. If you haven’t worked with Kubernetes yet, you likely will soon.

Distributed tracing

Distributed tracing, a tool for monitoring and debugging microservices applications, is poised to become a critical trend going forward. The prevalence of heterogeneous distributed systems has made it slightly more difficult to put distributed tracing into practice there. However, service meshes—another hot topic—have made communication between services more hassle-free, so the time is right for this method to shine. While there are a variety of open source distributed tracing tools, including Google's Dapper and Open Tracing API, Twitter’s Zipkin is currently the most buzzed about, and it will likely rise to the top and set a precedent for the industry.


According to a 2018 survey of 576 IT leaders, 44% were planning to replace at least a portion of their virtual machines with containers. There are a number of reasons for this switch—namely, C-suite executives’ ever-increasing dissatisfaction with VM licensing fees. In addition to a major infrastructure change, the move to containers also necessitates the adoption of both DevOps processes and culture, affecting the makeup of IT departments.

Continue reading 9 trends to watch in systems engineering and operations.

Categories: Technology

Four short links: 21 January 2019

O'Reilly Radar - Mon, 2019/01/21 - 06:05

Programming Spreadsheets, Star Emulator, AI for Social Good, Participatory Democracy

  1. Applying Programming Language Research Ideas to Transform Spreadsheets (Microsoft) -- an Excel cell can now contain a first-class record, linked to external data sources. And ordinary Excel formulas can now compute array values, that “spill” into adjacent cells (dynamic arrays). There is more to come: we have a working version of full, higher-order, lexically scoped lambdas (and let-expressions) in Excel’s formula language and we are prototyping sheet-defined functions and full nesting of array values.
  2. Darkstar: A Xerox Star Emulator -- this blog post describes the journey of building the emulator for this historic system. From the good folks at the Living Computer Museum in Seattle.
  3. AI for Social Good Impact Challenge -- $25M pool, $500K-$2M for 1-3 years. If you are selected to receive a grant, the standard grant agreement will require any intellectual property created with grant funding from Google be made available for free to the public under a permissive open source license.
  4. Decidim -- free open source participatory democracy for cities and organizations.

Continue reading Four short links: 21 January 2019.

Categories: Technology

Four short links: 18 January 2019

O'Reilly Radar - Fri, 2019/01/18 - 05:00

Remove Filters, Quantum Cables, Embedded Vision, and Citizen Developers

  1. Desnapify -- deep convolutional generative adversarial network (DCGAN) trained to remove Snapchat filters from selfie images.
  2. Quantum Computer Component Shortage (MIT TR) -- cables for superconducting quantum computing experiments turn out to be hard to find at Radio Shack. Reminder: QC is in its infancy.
  3. SOD -- an embedded computer vision and machine learning library (CPU optimized and IoT capable).
  4. Devsumer -- interesting argument: lots of people with exposure to programming via Hour of Code type things, as IT departments are too busy to build all the apps people want, so [a] number of products have emerged that allow people to build simple software applications, or to use templated applications for their own work flow or productivity. You can think of this as taking a SQL database or excel spreadsheet and turning it into an app platform.

Continue reading Four short links: 18 January 2019.

Categories: Technology

How machine learning impacts information security

O'Reilly Radar - Thu, 2019/01/17 - 06:50

The O’Reilly Data Show Podcast: Andrew Burt on the need to modernize data protection tools and strategies.

In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer and legal engineer at Immuta, a company building data management tools tuned for data science. Burt and cybersecurity pioneer Daniel Geer recently released a must-read white paper (“Flat Light”) that provides a great framework for how to think about information security in the age of big data and AI. They list important changes to the information landscape and offer suggestions on how to alleviate some of the new risks introduced by the rise of machine learning and AI.

We discussed their new white paper, cybersecurity (Burt was previously a special advisor at the FBI), and an exciting new Strata Data tutorial that Burt will be co-teaching in March.

Continue reading How machine learning impacts information security.

Categories: Technology

Four short links: 17 January 2019

O'Reilly Radar - Thu, 2019/01/17 - 05:00

Git, SMS Deep Dive, Rethinking Capitalism, and Polarized Opinions

  1. pushb -- like pushd/popd, but for git branches. See also "How to Teach Git," Rachel Carmena's visual explanation of the mental models of directories that will help new git users.
  2. The Route of a Text Message (Scott Weingart) -- a single text message: how it was typed, stored, sent, received, and displayed. I sprinkle in some history and context to break up the alphabet soup of protocols, but though the piece gets technical, it should all be easily understood.
  3. The Market-Shaping Forces of Capitalism (Mariana Mazzucato) -- YouTube video of her first lecture on the "Rethinking Capitalism" undergraduate module at UCL. Tim's a big fan of her work.
  4. nuclear -- describes itself as "Popcorn Time for music," but far more interesting for this fantastic line: highly polarized opinions about languages and frameworks are characteristic of people who lack real-world programming experience and are more interested in building an identity than creating computer programs.

Continue reading Four short links: 17 January 2019.

Categories: Technology

What lies ahead for Python, Java, Go, C#, Kotlin, and Rust

O'Reilly Radar - Thu, 2019/01/17 - 04:00

O’Reilly authors and instructors explore the near-term future of popular and growing programming languages.

Change is the only constant in the technology world, and programming languages are no exception. Competition among languages has led to improvements across the board. Established players like Java have added major features, while upstart languages like Go and Rust look to improve packaging and exception handling to add “fit and finish” to their ecosystems. As we enter 2019, we asked some of our O’Reilly authors and training course instructors for their thoughts on what’s in store for established players and fast-growing languages.


Python's incredible growth over the past decade shows no signs of slowing. In addition to maintaining its position as the most popular introductory language for students, scientists, and knowledge workers, Python will continue its widespread adoption in web development, DevOps, data analysis, and machine learning circles. Matt Harrison, who runs the Python and data science training and consulting company MetaSnake (and is a frequent instructor of Python courses on the O'Reilly online learning platform), offers his take:

Python has traditionally been more focused on small data, but I think that as other tools that enable big data—such as Dask and flexible Python solutions on top of Kubernetes—continue to improve, we will see Python dominate in big data as well. I’m continuing to see large companies that have traditionally used Java or proprietary languages replacing those with Python.

In 2019, the Python community will cohere around Python 3, as maintenance for Python 2 will end on January 1, 2020. And it will do so under a new governance model, as Guido van Rossum, the creator of the language, stepped down as "benevolent dictator for life" in July 2018. After months of debate, the community recently voted to go forward under a steering council model.


The release of Java 11 in September introduced major new features, such as nest-based access controls, which eliminate the need for compilers to insert bridge methods; dynamic class-file constraints; the new HttpClient, which removes the need for an external dependency when writing applications to communicate with web services; and the adoption of the Unicode 10 standard for localization. As Ben Evans, coauthor of Optimizing Java and Java in a Nutshell, explains: "Java has adapted well to new frontiers such as cloud and microservices. Java 8 had problems with microservice startup times, but Java 11 solves that problem. It's a much better environment for developing new microservice applications from scratch."

Looking ahead to future versions of Java, Evans says that bringing value types to Java is a major current project. Value types are intended to be a third form of data type (to complement the existing primitive types and object references), which Evans sees as one way to future-proof the JVM, calling it one of the major changes to the language that "will change the character of Java development in fundamental ways."


The Go team is working on a prototype command called vgo. Currently, when you install third-party libraries with the go get tool, the latest available version of a package is retrieved, even if it includes incompatibilities that can break your code. The vgo tool will “help you manage the versions of the various packages your app requires, without introducing conflicts,” explains Jay McGavren, author of the forthcoming Head First Go.

The late 2018 release of Go 1.11 provided experimental support for compiling Go to WebAssembly, a binary format for code that can run in a web browser. “This promises to be faster and more efficient than JavaScript,” McGavren says. “And it’s supported by all the major browsers. The ability to make apps using Go that can run inside the browser offers new possibilities that I’m excited to explore.”


The upcoming release of C# 8.0 will include a number of new features, notably nullable reference types. Andrew Stellman, coauthor of Head First C#, calls it “code safety for the rest of us,” as it causes the compiler to give warnings any time a reference type variable can potentially be assigned a null value, thus “giving developers a new way to write safer code.”

Stellman notes that another upcoming feature that has C# developers talking is asynchronous streams—foreach await is a new version of the familiar foreach keyword that will consume an asynchronous stream, represented by the IAsyncEnumerable interface, automatically pausing the loop until the next value is available. Other expected new features include an asynchronous version of yield return and asynchronous disposables.


Kotlin's latest release (Kotlin 1.3, released in late October 2018) saw coroutines—lightweight threads that allow code to scale-out efficiently—graduated from experimental to stable status. Coroutines enable the creation of multiple pieces of code that can run asynchronously; for example, you can launch a background job (such as reading data from an external server) without the rest of your code having to wait for the job to complete before doing anything else. “This gives users a more fluid experience, and it also makes your application more scalable,” says David Griffiths, coauthor (along with Dawn Griffiths) of the forthcoming Head First Kotlin. Coroutines are also at the heart of Ktor, a new framework for building asynchronous servers and clients in connected systems using the Kotlin language.

Looking ahead to 2019, Kotlin is “likely to see significant use beyond the Java world,” says Griffiths. “It is proving to be an excellent language for library builders. If you have an application that performs some complex financial calculation on the server, Kotlin allows you to convert that server code into a Kotlin library which can run on both the server and the client.” Also anticipated for Kotlin, according to Griffiths, are first-class immutability support for the language and features that reduce or eliminate shared mutable state in concurrent code.


Rust 2018, released in December, was the first major new edition of the language since Rust 1.0 in 2015. Rust 2018 introduced async (asynchronous) functions and await expressions in order to make Rust more effective for writing network servers and other I/O-intensive applications. “An async function can use an await expression to suspend its execution until the data it needs becomes available,” says Jim Blandy, coauthor of Programming Rust. “Rust has supported asynchronous programming in one form or another for a long time,” he notes, “but async functions provide a syntax for this sort of code that is a major improvement over what Rust has had before.”

Another in-the-works enhancement to Rust is improvement of Rust’s existing support of the WebAssembly standard for running code as part of a web page. “This will make it easier to integrate WebAssembly packages written in Rust with the existing JavaScript ecosystem, ” says Blandy.

What's next?

“What’s next?” is the question that's always on every programmer’s mind. In 2019 and beyond, language design will continue to look for new ways to help programmers manage complexity and abstraction, as applications and data grow ever larger and become more crucial to the modern enterprise.

Continue reading What lies ahead for Python, Java, Go, C#, Kotlin, and Rust.

Categories: Technology

Security Meeting Topic for January 17th

PLUG - Wed, 2019/01/16 - 13:38

Sebastian Tuchband: The Blockchain Utility

Many people use blockchains wrong in their implementation. Many people think of implementations that are horribly inefficient as a blockchain.
There are many considerations that developers and business people don't seem to take note of.
This presentation talks about the blockchain and the more ideal methods of going about doing what was desired.

I am a privacy enthusiast with a background in programming.
Also, I enjoy meeting people and teaching the ideal method over the common method.
Many people don't seem to know it, and I intend to inform people about these alternative methods.

AI brings speed to security

O'Reilly Radar - Wed, 2019/01/16 - 07:40

Survey results indicate incident response times improve with AI-based security services.

Organizations that use security tools with artificial intelligence (AI) and machine learning (ML) see a significant decrease in incident response time, according to a survey of 457 security practitioners conducted by O’Reilly Media in conjunction with Oracle.

Twenty percent of IT professionals who rely on traditional security measures said their teams can detect a malware infection or other attack within minutes, according to the survey. But among IT pros who reported using AI and ML security services, that number more than doubled to 45%. The long tail shows a similar trend: only 16% of IT professionals need days or longer to find an infection when AI or ML is involved, versus a whopping 35% for those who don’t use these technologies.

As cyberattacks become more malicious and stealthy, it's increasingly important to improve incident response time in order to detect and mitigate threats before they unleash their full fury. Eighty-four percent of survey respondents who use ML and AI security services said their response times are within minutes or hours. Among respondents who don't use these technologies, that number was substantially lower: 66%.

But AI and ML alone aren't responsible for this improvement in incident response time. Shorter response times were also associated with the use of security information and event management, antimalware, vulnerability scanning, and bot management software, according to the survey. It’s also worth noting that, because many vendors tout traditional business intelligence techniques as artificial intelligence, some respondents may have said they use the technology when they really use more traditional algorithms instead.

AI security services still catching on

Despite the improvements that AI and ML bring to incident response time, the survey showed that most organizations have not yet adopted the technologies. Just 26% of respondents have started to embrace ML and AI security services, and another 28% said they're interested in learning more about them.

According to the survey report, we can expect increasing interest in AI-based security tools over the next few years, in the same way that AI is making its entry into other industries.

As rapid response times show, adoption may happen very quickly because it can be a useful differentiator between businesses that avoid crippling attacks and those that fall victim to them.

To cloud or not to cloud?

Surprisingly, 38% of respondents are still only using on-premises, stand-alone appliances. A significant proportion of IT professional are using only traditional tools for security and are missing the trend of more modern, scalable solutions.

As for the rest, 51% of respondents employ a combination of on-premises and cloud-based security tools, but just 9% use only cloud-based security services.

One of the reasons why so few professionals have embraced cloud cybersecurity solutions could be the concern of cloud breaches: the potential for data breaches is the top cybersecurity concern IT pros have about using the public cloud.

Security is integral to IT budgets for organizations with CISOs

We asked respondents what percentage of their IT budget went to security. Of those who answered, the vast majority (79%) indicated they spend 10% or less of their IT budget on security.

The results show the lowest category of expenditure (less than 5%) was the most frequently selected response amongst respondents reporting the responsibility for security lies with the director or VP of IT, CIO, or CEO (45%, 46%, and 44% of respondents, respectively).

In contrast, higher levels of spending were cited amongst respondents who reported the responsibility for security fell to the CISO (49% of respondents indicating a CISO also selected 5%-­10% spend).

A smaller budget also means the least modern tools: respondents with the smallest security budgets (less than 5% going toward security) were more likely to deploy security tools only on-premises (49% of these sites, versus 23-26% of sites with higher budgets). This suggests people who move to the cloud are willing to spend more to protect security. We don't know whether this means cloud security tools are more expensive, that their clients care more about security, or that they feel they are more at risk in the cloud than on-premises.

Additional findings

The report also found the top tools and strategies used to preemptively mitigate attacks on websites and applications are vulnerability scans, privileged access management, network firewalls, and web application firewalls.

About the respondents

We asked the respondents to tell us a little about themselves and their organizations, and the results were similar to those for our resilience survey. For instance, organizational size was dramatically skewed to the smallest and largest: 40% of respondents work in organizations with 1-199 people, while 25% work in organizations with 10,000 or more.

The respondents answer to a wide range of job descriptions, from system administrators and network operations to upper management. And they come from a variety of industries, although two stand out: IT services takes 21% of the share of respondents, and software takes another 15%.

This post is a collaboration between O’Reilly and Oracle Dyn. See our statement of editorial independence.

Continue reading AI brings speed to security.

Categories: Technology


Subscribe to LuftHans aggregator