You are here

Feed aggregator

170+ live online training courses opened for March and April

O'Reilly Radar - Wed, 2019/02/06 - 04:00

Get hands-on training in machine learning, microservices, blockchain, Python, Java, and many other topics.

Learn new topics and refine your skills with more than 170 new live online training courses we opened up for March and April on the O'Reilly online learning platform.

AI and machine learning

Spotlight on Innovation: Succeeding with Machine Learning with Alex Jaimes, February 13

Hands-On Adversarial Machine Learning, February 25

Probabilistic Modeling With TensorFlow Probability, February 27

Deep Learning Fundamentals, March 5

An Introduction to Amazon Machine Learning on AWS, March 6-7

Natural Language Processing (NLP) from Scratch, March 11

Deep Reinforcement Learning, March 12

Sentiment Analysis for Chatbots in Python, March 13

Hands-on Machine Learning with Python: Classification and Regression, March 13

TensorFlow Extended: Data Validation and Transform, March 14

Hands-On Machine Learning with Python: Clustering, Dimension Reduction, and Time Series Analysis, March 14

Building a Robust Machine Learning Pipeline, March 14-15

Machine Learning in Practice, March 19

TensorFlow Extended: Model Build, Analysis, and Serving, March 20

Artificial Intelligence: An Overview of AI and Machine Learning, March 20

Machine Learning for IoT , March 20

Next Generation Decision Making: Pragmatic Artificial Intelligence, March 20-21

Getting Started with Machine Learning, March 21

Artificial Intelligence for Robotics, March 21-22

Beginning Machine Learning with PyTorch, March 25

Artificial Intelligence: Real-World Applications, March 28

Active Learning, April 9

Hands On Adversarial Machine Learning, April 11

Practical Deep Learning with PyTorch, April 11-12


Introducing Blockchain, March 8

Building Smart Contracts on the Blockchain, March 21-22

IBM Blockchain Platform as a Service, March 25-26

Understanding Hyperledger Fabric Blockchain, March 28-29

Blockchain for Enterprise, April 1


Innovative Teams, March 11

Fundamentals of Cognitive Biases, March 11

Artificial Intelligence: AI For Business, March 12

Business Strategy Fundamentals, March 13

The Power of Lean in Software Projects: Less Wasted Effort and More Product Results, March 14

Leadership Communication Skills for Managers, March 14

Emotional Intelligence in the Workplace, March 14

Thinking Like a Manager, March 14

Tools for the Digital Transformation, March 14-15

Introduction to Delegation Skills, March 21

Negotiation Fundamentals, March 22

Introduction to Critical Thinking, March 26

Your First 30 Days as a Manager, April 2

How to Give Great Presentations, April 5

Introduction to Strategic Thinking Skills, April 8

Data science and data tools

Business Data Analytics Using Python, February 27

Hands-on Introduction to Apache Hadoop and Spark Programming, March 5-6

Designing and Implementing Big Data Solutions with Azure, March 11-12

Time Series Forecasting, March 14

Cleaning Data at Scale, March 19

Practical Data Cleaning with Python, March 20-21

Building Distributed Pipelines for Data Science Using Kafka, Spark, and Cassandra , April 8-10

Real-Time Data Foundations: Kafka, April 9

Real-Time Data Foundations: Spark, April 10

Building Data APIs with GraphQL, April 11

Design and product management

From User Experience Designer to Digital Product Designer, March 1

Mastering UX Mapping, March 7-8

Writing User Stories, March 13

Product Roadmaps from the Ground Up, April 3


Design Patterns Boot Camp, February 19-20

Discovering Modern Java, March 1

Beginner’s Guide to Writing AWS Lambda Functions in Python, March 1

Building APIs with Django REST Framework, March 4

SQL for Any IT Professional, March 4

Spring Boot and Kotlin, March 5

Programming with Java Lambdas and Streams, March 5

Bootiful Testing, March 6

Learning Python 3 by Example, March 7

Getting Started with OpenShift, March 8

Setting Up Scala Projects, March 11

Getting Started with Pandas, March 11

Getting Started with Python 3, March 11-12

Java Full Throttle with Paul Deitel: A One-Day, Code-Intensive Java Standard Edition Presentation, March 12

Mastering Pandas, March 12

Scalable Concurrency with the Java Executor Framework, March 12

Getting Started with Python's Pytest, March 13

Python Programming Fundamentals, March 13

Mastering Python's Pytest, March 14

Kotlin Fundamentals, March 14

Quantitative Trading with Python, March 14

Advanced TDD (Test-Driven Development), March 15

Introduction to Python Programming, March 15

Bash Shell Scripting in 4 Hours, March 18

Java Testing with Mockito and the Hamcrest Matchers, March 19

Scala Core Programming: Methods, Classes Traits, March 19

Ansible in 4 Hours, March 19

Getting Started with PHP and MySQL , March 20

Mastering the Basics of Relational SQL Querying, March 20-21

Reactive Spring and Spring Boot, March 21

Automating with Ansible, March 22

Scala Core Programming: Sealed Traits, Collections, and Functions, March 25

Mastering SELinux, March 25

Intermediate Git, March 25

Scalable Programming with Java 8 Parallel Streams, March 27

Design Patterns Boot Camp, March 27-28

Mastering C# 8.0 and .NET Core 3.0, March 27-28

Rethinking REST: A Hands-On Guide to GraphQL and Queryable APIs, March 28

C# Programming: A Hands-On Guide, March 28

Web Application Programming in C# and ASP.NET Core with MVC and Entity Framework, March 28-29

Introduction to JavaScript Programming, April 2-3

Visualization in Python with Matplotlib, April 8

Python for Finance, April 8-9

Practical MQTT for the Internet of Things, April 8-9

Getting Started with Pandas, April 9

Getting Started with Python 3, April 9-10

Getting Started with React.js, April 10

What's New In Java, April 11

Fundamentals of Rust, April 11-12


CompTIA PenTest+ Crash Course, March 5-6

Start Your Security Certification Career Today, March 8

Protecting Data Privacy in a Machine Learning World, March 11-12

Certified Ethical Hacker (CEH) Crash Course, March 12-13

CompTIA Security+ SY0-501 Crash Course, March 18-19

Intense Introduction to Hacking Web Applications, March 19

Cyber Security Fundamentals, March 26-27

CISSP Crash Course, March 26-27

CISSP Certification Practice Questions and Exam Strategies, March 27

AWS Certified Security - Specialty Crash Course, March 27-28

Systems engineering and operations

Software Architecture by Example, February 21

Red Hat Certified System Administrator (RHCSA) Crash Course, March 4-7

Creating Serverless APIs with AWS Lambda and API Gateway, March 5

Amazon Web Services (AWS): Up and Running, March 6

Docker Compose, March 6

Microservice Collaboration, March 7

Docker CI/CD, March 7

OpenStack for Cloud Architects, March 7-8

Red Hat RHEL 8 New Features, March 11

From Developer to Software Architect, March 11-12

Google Cloud Certified Associate Cloud Engineer Crash Course, March 11-12

AWS Certified Solutions Architect Associate Crash Course, March 11-12

9 Steps to Awesome with Kubernetes, March 12

IP Subnetting from Beginning to Mastery, March 12-13

Istio on Kubernetes: Enter the Service Mesh, March 14

How the Internet Really Works, March 15

Kubernetes Serverless with Knative, March 15

AWS Advanced Security with Config, GuardDuty, and Macie, March 18

Software Architecture by Example, March 18

Amazon Web Services: AWS Managed Services, March 18-19

Practical Kubernetes, March 18-19

AWS Certified SysOps Administrator (Associate) Crash Course, March 18-19

CCNA Routing and Switching 200-125 Crash Course, March 18-22

Managing Containers on Linux, March 19

Docker Images, March 19

Docker: Up and Running, March 19-20

Docker Containers, March 20

Implementing Evolutionary Architectures, March 20-21

Kubernetes in 4 Hours, March 21

AWS Security Fundamentals, March 21

Deploying Container-Based Microservices on AWS, March 21-22

Google Cloud Platform (GCP) for AWS Professionals, March 22

Architecture for Continuous Delivery , March 25

Docker for JVM projects, March 25

Implementing Azure for Enterprises, March 25-26

Building and Managing Kubernetes Applications, March 26

Cloud Computing Governance, March 26

Getting Started with Amazon Web Services (AWS), March 26-27

Microservices Caching Strategies, March 27

Cloud Complexity Management, March 28

Comparing Service-Based Architectures, March 28

Network DevOps, March 29

API Driven Architecture with Swagger and API Blueprint, March 29

Software Architecture for Developers, April 1

Implementing and Troubleshooting TCP/IP, April 2

Amazon Web Services (AWS) Technical Essentials, April 2

Building Applications with Apache Cassandra, April 3-4

Introduction to Kubernetes, April 3-4

CCNA Routing and Switching Crash Course, April 4-5

Architecting Secure IoT Applications with Azure Sphere, April 4-5

AWS Design Fundamentals, April 9-10

Microservices Architecture and Design, April 9-10

Practical Docker, April 10

Automation with AWS Serverless Technologies, April 10

Continue reading 170+ live online training courses opened for March and April.

Categories: Technology

The future of cloud-native programming

O'Reilly Radar - Tue, 2019/02/05 - 14:00

Tamar Eilam offers an overview of cloud-native programming and outlines a path toward the unification of the cloud programming model.

Continue reading The future of cloud-native programming.

Categories: Technology

Highlights from the O'Reilly Software Architecture Conference in New York 2019

O'Reilly Radar - Tue, 2019/02/05 - 14:00

Watch highlights from expert talks covering cloud-native programming, software architecture career advice, and more.

People from across the software architecture world came together in New York for the O'Reilly Software Architecture Conference. Below you'll find links to highlights from the event.

Architecting IT transformation

Gregor Hohpe explains how software architects can use what they know about technical systems to help refactor organizations.

--> Career advice for architects

Trisha Gee shares lessons she learned the hard way while managing her career as a developer, lead, and technical advocate.

From the trenches: An interview with Mark Richards

Neal Ford talks with Mark Richards about his career path and his work as a software architect.

The future of cloud-native programming

Tamar Eilam offers an overview of cloud-native programming and outlines a path toward the unification of the cloud programming model.

Design and architecture: Special Dumpster Fire Unit

Matt Stine looks at the tricky situations that sometimes emerge from design and architecture.

Design after Agile: How to succeed by trying less

Stuart Halloway explains how to augment agility with principles for designing systems.

Roaming free: The power of reading beyond your field

Glenn Vanderburg talks about the importance of letting your attention roam, and he shares examples of how insights from other fields have inspired software practitioners.

Continue reading Highlights from the O'Reilly Software Architecture Conference in New York 2019.

Categories: Technology

Career advice for architects

O'Reilly Radar - Tue, 2019/02/05 - 14:00

Trisha Gee shares lessons she learned the hard way while managing her career as a developer, lead, and technical advocate.

Continue reading Career advice for architects.

Categories: Technology

From the trenches: An interview with Mark Richards

O'Reilly Radar - Tue, 2019/02/05 - 14:00

Neal Ford talks with Mark Richards about his career path and his work as a software architect.

Continue reading From the trenches: An interview with Mark Richards.

Categories: Technology

Four short links: 5 February 2019

O'Reilly Radar - Tue, 2019/02/05 - 05:00

Creating the Future, LIDAR, Human-AI Design, and Command-line Course

  1. The Best Way to Predict the Future is to Create It. But Is It Already Too Late? (Alan Kay) -- Virtually everybody in the computing science has almost no sense of human history and context of where we are and where we are going. So, I think of much of the stuff that has been done as inverse vandalism. Inverse vandalism is making things just because you can. Every sentence is a cracker. (via Daniel G. Siegel)
  2. Trying to Make Powerful, Low-cost LIDAR (Ars Technica) -- a good intro to the tech and competition in the space.
  3. Guidelines for Human-AI Interaction -- Microsoft paper on design challenges in "smart" apps.
  4. MIT Hacker Tools -- lectures on the Unix tools that command-line natives use.

Continue reading Four short links: 5 February 2019.

Categories: Technology

3 emerging trends tech leaders should watch

O'Reilly Radar - Tue, 2019/02/05 - 04:00

Analysis of the O’Reilly online learning platform reveals a new approach to technical architecture, the rise of blockchain, and shifts in programming language adoption.

Keeping up with technology can be a daunting task for tech leaders. Each year, to make the task a little easier, we analyze behavior on the O’Reilly online learning platform, using the platform as a massive sensor that yields insights and identifies areas tech leaders should pay attention to, explore, and learn.

Our analysis includes the top search terms and the topics that garner the most usage on our learning platform.[1] This combination of search and usage data provides a holistic view; search data shows the areas where subscribers are exploring, and usage identifies topics where they’re actively engaged.

The signals from the O’Reilly online learning platform reveal:

  • Strong growth in cloud topics and Kubernetes, as well as interest in containers and decomposition (microservices), points toward the rise of a “Next Architecture.”
  • Interest in blockchain, which we first noted in 2017, continues. While the full potential of the blockchain gets sorted out, consider that if you’re not investigating blockchain, someone you compete with is.
  • Python, Java, and JavaScript—the “big three” languages on our learning platform—continue to dominate usage year after year. In addition, Rust and Go showed growing interest on the platform, suggesting that organizations are using languages that emphasize developer productivity while also embracing languages that tilt the balance toward performance and scaling.
Figure 1. The top search terms on the O’Reilly online learning platform in 2018 (left) and the rate of change for each term (right) Figure 2. Topics on the O’Reilly online learning platform with the most usage in 2018 (left) and the rate of change for each topic (right) The signs of a Next Architecture

The growth we’ve seen on our online learning platform in cloud topics, in orchestration and container-related terms such as Kubernetes and Docker, and in microservices is part of a larger trend in how organizations plan, code, test, and deploy applications that we call the Next Architecture. This architecture allows fast, flexible deployment, feature flexibility, efficient use of programmer resources, and rapid adapting, including scaling, to unpredictable resource requirements. These are all goals businesses feel increasingly pressured to achieve to keep up with nimble competitors.

There are four aspects of the Next Architecture, each of which shows up in the platform’s search and usage data.

Figure 3. AWS, Kubernetes, Docker, and Microservices—each representing an important part of the Next Architecture—appear in the top search terms from the O’Reilly online learning platform Decomposition

Organizations get a lot of benefits by breaking large and complex activities into small, loosely connected pieces. Through decomposition, these activities can be turned into standalone services that can be developed independently and linked together to create a more complex application. Microservices, the manifestation of decomposition, was the number 13 search term on our online learning platform in 2018.


An organization needs the flexibility to adjust, scale, and innovate its digital presence—often across different time zones and geographies. The cloud supports these goals with compute instances that are fungible, coming and going as needed, and easy to replace automatically if failures are detected. The move toward decomposition (microservices) helps accelerate the trend toward the cloud by providing more impetus for quickly spinning up and managing services that support the need for dynamic, adaptable applications.

Cloud-related terms had a significant presence in the search and usage data. AWS, Amazon’s suite of cloud-based tools, was the number 4 search term, and it had 28% growth in year-over-year usage. Google Cloud (66% growth in usage over 2017) and Microsoft Azure (60% growth in usage) also increased. In addition, the topic “cloud migration” was up 40% in usage in 2018.


Containers provide a lightweight way to achieve the modularity favored by decomposition and the cloud. Docker, the number 7 search term in 2018, makes it easy to automate the deployment of the microservices that are created through decomposition.


The huge number of microservices running on containers—often in the hundreds or thousands—exceeds the capacity of humans to track and manage them. Orchestration tools, notably Kubernetes, fill the gap through rigorous specifications and automation. Kubernetes was the number 5 search term in 2018, jumping 11 spots, and usage growth was up a notable 160% year over year.

We’ll continue to explore the Next Architecture in the coming months.

Keep an eye on blockchain

Blockchain, which was one of the stars in our 2017 results, jumped seven spots in the top search terms (number 13), and it was up 36% in usage in 2018. Ethereum, a tool for implementing blockchains, was up 66% in year-over-year usage from a small base. Platform subscribers were likely exploring blockchain to assess its potential, developing an awareness of where blockchain may fit into their strategic plans or evaluating it as an existential threat, mostly in the areas of payments, supply chain logistics, and provenance.

Python, Java, and JavaScript continue their dominance

In 2018 we saw Python, Java, and JavaScript maintain the strong positions they’ve gained on our online learning platform over the years.

Python gets a boost, in part, from the increased interest in machine learning (ML). Many ML libraries, such as TensorFlow, are wrapped in Python libraries and promoted with Python interfaces. Ascendant ML tools also bolster interest in Python. For example, PyTorch, a library for computer vision and natural language processing, saw a 300% increase in year-over-year usage from a small base, and scikit-learn, another Python-based machine learning library, was up 39% in usage.

Many tools used in big data applications—notably the ones from the Apache Foundation, such as Spark and Kafka—feature Java interfaces. Thus, machine learning and big data may explain the popularity of both Python and Java. Java also remains a workhorse language for large-scale applications.

The JavaScript ecosystem of web frameworks and libraries saw less growth than Java and Python. However, usage trends show engagement with the popular JavaScript web frameworks. Angular was up 23% in usage, and React was up 39%, though search activity on both topics was flat. A third JavaScript framework, Vue, showed big usage growth, up 220% from a small base.

After JavaScript, one more language appears in our top searches. Go, the number 11 search term, jumped three spots in the top search results, and content usage was up 14%. Go sits conveniently between high-level languages, interpreted languages like Python, and low-level, fast systems-compiled languages like C. It combines the syntactic ease of the high-level languages with compiler-driven performance, good concurrency support, an active and growing developer community, and the full support of Google. When performance matters, or when an app or service written in a high-level language needs a performance boost, Go is (sorry for the pun) the go-to language for an increasing number of developers.

Finally, the fastest usage growth we saw for any language between 2017 and 2018 was for Rust (up 44%). Rust is a systems language with near-C performance, safe, efficient memory management, native concurrency support, and a modern syntax. Developers are increasingly finding Rust a good fit when performance is or becomes a priority.

Other findings

There are a few more items from the analysis that are worth calling out.

  • Machine learning (ML), the number 10 search term, has been a leader on our learning platform for more than a year, as we showed in last year's trends. In 2018 we saw a change in the distribution of interest in ML topics within the search and usage results. There was less growth in exploratory topics and phrases like “machine learning” and “deep learning.” This was coupled with a shift toward more specific topics like “natural language processing” (up 22% in search and 11% in usage) and “reinforcement learning” (up 122% in search and 331% in usage from a small base). We attribute the shift to the maturation of the ML topic and a move beyond exploration toward more engaged implementation. This is a trend reinforced by the ML and artificial intelligence surveys we’ve run.
  • A 5% increase in usage for business-related material on the platform highlights the importance of tech for every facet of a business. It also aligns with the idea that all companies are now tech companies.
  • Security content went up 6% in usage in 2018, which is a good sign since we’ve noted in the past that security was underappreciated. Increased scrutiny from notable breaches may partly explain the increase. The development of distributed systems also presents new security challenges organizations must confront.
  • Web and mobile topics showed slight but noticeable declines in search and usage. We think the decline relates to maturity and a semantic transition. Organizations no longer pursue “web” and “mobile” computing; the web and mobile are now endemic enough that it’s all just “computing.”
Looking ahead

The rise of the Next Architecture, the maturation of blockchain, and emerging patterns in programming languages are areas of focus for us in the year ahead. We’ll continue to examine search and usage data on the platform, and we’ll also engage in research via conversations with our conference speakers and attendees, through perspectives from our community of practitioners and thought leaders, from media coverage, and from other sources. Ultimately, we want to see if these additional signals reinforce or challenge the findings from our platform data.

[1] This article is based on non-personally-identifiable information about the top search terms and topics on the O’Reilly online learning platform in 2018.

Continue reading 3 emerging trends tech leaders should watch.

Categories: Technology

Artificial intelligence and machine learning adoption in European enterprise

O'Reilly Radar - Mon, 2019/02/04 - 12:20

How companies in Europe are preparing for and adopting AI and ML technologies.

In a recent survey, we explored how companies were adjusting to the growing importance of machine learning and analytics, while also preparing for the explosion in the number of data sources. In practice this means developing a coherent strategy for integrating artificial intelligence (AI), big data, and cloud components, and specifically investing in foundational technologies needed to sustain the sensible use of data, analytics, and machine learning. (You can find full results from the survey in the free report “Evolving Data Infrastructure”.)

This survey drew from more than 3,200 respondents, including more than 1,000 respondents from Western and Eastern Europe. In this post, I’ll describe some of the key areas of interest and concern highlighted by respondents from Europe, while describing how some of these topics will be covered at the upcoming Strata Data conference in London (April 29 - May 2, 2019).

As interest in machine learning (ML) and AI grow, organizations are realizing that model building is but one aspect they need to plan for. Given the end-to-end nature of many data products and applications, sustaining ML and AI requires a host of tools and processes, ranging from collecting, cleaning, and harmonizing data, understanding what data is available and who has access to it, being able to trace changes made to data as it travels across a pipeline, and many other components. Our survey showed that companies are beginning to build some of the foundational pieces needed to sustain ML and AI within their organizations:

Solutions, including those for data governance, data lineage management, data integration and ETL, need to integrate with existing big data technologies used within companies. To that end, we also asked respondents what technologies (open source, managed services) they use for things like data storage, data management, and data processing. For example, the chart below lists popular (batch and streaming) data processing tools used by respondents based in Europe:

Many of the systems listed in the previous chart—Apache Spark, Kafka, Hadoop, etc.,—have been in use at enterprises across the globe for several years. One of the newer systems is Apache Pulsar, a promising new messaging system that unifies queuing and streaming. Pulsar will be covered in a popular new tutorial at Strata Data London, “Architecture and Algorithms for End-to-End Streaming Data Processing”. More importantly, there will be many sessions on the foundational technologies needed for machine learning and AI:

Our survey also aligned with recent articles describing the strong demand for data scientists. As noted above, ML and AI involves more than model building. Just as one needs a suite of technologies to sustain success in ML and AI, one also needs a team with a broad range of skills that go beyond model building. Not only is ML quite different from traditional software engineering, as noted in a previous post, ML is changing the nature of software development itself. The chart below lists demand for data-related skills in Europe:

The data science and machine learning program for Strata Data London will cover tools and methodologies, case studies and best practices, deep dives into familiar data types (text, temporal data, graphs), and new automation tools for data and machine learning professionals:

At the 2018 Strata Data London, data privacy and GDPR were big topics. In fact, our 2018 conference happened the same week GDPR came online. A year later, companies are still navigating through GDPR while also preparing for a new set of regulations (including the California Consumer Privacy Act). At this year’s conference, we will continue to have tutorials and sessions on data privacy and data security, but we will also have sessions on techniques and tools for privacy-preserving analytics—the very tools needed to build analytic and AI products that respect user privacy:

We are beginning to see interesting industrial IoT applications and systems. There’s good reason to expect that streaming and real-time applications will explode in the years to come. Tools and infrastructure for collecting streaming data have improved and continue to get easier to use. 5G mobile services are just around the corner and will pave the way for many new machine-to-machine applications. Since 5G increases the network bandwidth to mobile devices, it potentially will make it much more attractive to put machine learning at the edge of the network. Coincidentally, we are also beginning to see specialized hardware for intelligence on edge devices.

The good news is that companies are beginning to build foundational technologies (described in Figure 1) that will be essential in a world where the number of machine learning models and AI applications explode. The program at the Strata Data Conference in London will cover all these areas and more:

In an upcoming survey on the use of AI technologies (report forthcoming), we found that companies consider their inability to maintain a portfolio of use cases to be a major obstacle to AI adoption. At this year’s conference, we have presentations from leading companies detailing how they have successfully deployed data and machine learning technologies in real-world settings:

Continue reading Artificial intelligence and machine learning adoption in European enterprise.

Categories: Technology

Reinforcement learning for the birds

O'Reilly Radar - Mon, 2019/02/04 - 08:35

Much like human speech, bird song learning is social; perhaps we'll discover machine learning is social, too.

I just read a fascinating article about an experiment in bird psychology. We've known for a long time that bird songs aren't innate; they're learned. If you listen carefully to your back yard birds in the spring, you can hear the young birds learning their songs; you'll probably hear a few that can't get it right, and that gradually get better as summer progresses.

We've also known that bird songs (as distinct from other bird calls) are strictly a male behavior: they're part of the mating rituals. Getting its song right is an important step in a male bird's education. Female birds don't sing. They stay quiet and choose the mate whose song they like best. (For a fascinating discussion of birds, mating rituals, and aesthetics, see The Evolution of Beauty.)

The common sense understanding of how birds learn their songs has been that it's simply imitation: the baby bird tries to sing like its father. But it's not that simple. The mother plays a crucial role. She gives tiny signals to her children that show them whether they're getting the song right. It's basically a birdie "thumbs up": fluffing her feathers or twitching a wing to show that she likes the song. We haven't noticed because a bird's mental processing is much faster than ours. We're too slow to see the minute twitches that the mother birds use to signal to their offspring. They only became evident when scientists used high-speed cameras and slowed down the video.

So, my first thought (well, actually, third or fourth thought) was, how does this relate to machine learning? And I realized this is essentially reinforcement learning: the mother is rewarding her offspring as it progresses toward a better song. In this context, "better" presumably means "more attractive to female birds," and the mother trains her male offspring to go forth and find mates. Get yourself good singing lessons, and you're all set.

Neural networks are about imitation. You get your tagged training set, see how well the model can imitate the tags in the training set, and when it's good enough, you try real-world data. That's gotten us a long way, but it has limitations: it requires a lot of training data, and lengthy training through many iterations. Bird learning, like reinforcement learning, is all about rewards, and that's fundamentally different. There's relatively little training data: just a mother that gives cues about whether the child is getting it right. (And even as I write this, I think, just a mother? Even considered purely as data, a mother probably represents more data than the largest training sets imaginable.) Is that terribly surprising? Human babies don't learn to speak on their own; they're constantly getting feedback from their parents.

Now we know that's how birds do it, too. And maybe that's how our machines will do it. Learning is social; perhaps we'll find out that, in the end, machine learning is also social.

Continue reading Reinforcement learning for the birds.

Categories: Technology

Four short links: 4 February 2019

O'Reilly Radar - Mon, 2019/02/04 - 05:00

Information Theory, Event Sourcing, Sunsetting Software, and Social Perception

  1. A Mini-Introduction To Information Theory -- This article consists of a very short introduction to classical and quantum information theory. Basic properties of the classical Shannon entropy and the quantum von Neumann entropy are described, along with related concepts such as classical and quantum relative entropy, conditional entropy, and mutual information. A few more detailed topics are considered in the quantum case.
  2. Event Sourcing is Hard (Chris Kiehl) -- In practice, this manages to somehow simultaneously be both extremely coupled and yet excruciatingly opaque.
  3. Executing a Sunset (Etsy) -- In this blog post, we will explore how we sunset these products at Etsy. This process involves a host of stakeholders, including marketing, product, customer support, finance, and many other teams, but the focus of this blog post is on engineering and the actual execution of the sunset.
  4. Social Perception for Machines -- a lecture by CMU's Yaser Ajmal Sheikh. In this talk, I will describe our research arc over the past decade at CMU to make human signaling a perceptible channel of information for machines.

Continue reading Four short links: 4 February 2019.

Categories: Technology

Four short links: 1 February 2019

O'Reilly Radar - Fri, 2019/02/01 - 04:55

GPU Analytics, 8-Bit Coding, Evil HCI, and CGI for Websockets

  1. AresDB -- Uber’s GPU-powered open source, real-time analytics engine.
  2. 8 Bit Workshop -- Learn how classic game hardware worked. Write code and see it run instantly. In your browser.
  3. CHI4Evil -- In this workshop, we will explore the creative use of HCI methods and concepts such as design fiction or speculative design to help anticipate and reflect on the potential downsides of our technology design, research, and implementation. Call for papers. Channel your inner Black Mirror. (via BoingBoing)
  4. websocketd -- CGI for WebSockets.

Continue reading Four short links: 1 February 2019.

Categories: Technology

Using machine learning and analytics to attract and retain employees

O'Reilly Radar - Thu, 2019/01/31 - 07:00

The O’Reilly Data Show Podcast: Maryam Jahanshahi on building tools to help improve efficiency and fairness in how companies recruit.

In this episode of the Data Show, I spoke with Maryam Jahanshahi, research scientist at TapRecruit, a startup that uses machine learning and analytics to help companies recruit more effectively. In an upcoming survey, we found that a “skills gap” or “lack of skilled people” was one of the main bottlenecks holding back adoption of AI technologies. Many companies are exploring a variety of internal and external programs to train staff on new tools and processes. The other route is to hire new talent. But recent reports suggest that demand for data professionals is strong and competition for experienced talent is fierce. Jahanshahi and her team are building natural language and statistical tools that can help companies improve their ability to attract and retain talent across many key areas.

Continue reading Using machine learning and analytics to attract and retain employees.

Categories: Technology

Four short links: 31 January 2019

O'Reilly Radar - Thu, 2019/01/31 - 04:55

Locke the Thinkfluencer, Open Source Semiconductor Manufacturing, AR/VR, and IT's Recycling Shame

  1. Cory Doctorow at Grand Reopening of the Public Domain -- Locke was a thinkfluencer. No transcript yet, but audio ripped on the Internet Archive.
  2. Libre Silicon -- We develop a free and open source semiconductor manufacturing process standard and provide a quick, easy, and inexpensive way for manufacturing. No NDAs will be required anywhere to get started, making it possible to build the designs in your basement if you wish. We are aiming to revolutionize the market by breaking through the monopoly of proprietary closed-source manufacturers.
  3. Predicting Visual Discomfort with Stereo Displays -- In a third experiment, we measured phoria and the zone of clear single binocular vision, which are clinical measurements commonly associated with correcting refractive error. Those measurements predicted susceptibility to discomfort in the first two experiments. A simple predictor of whether and when you're going to puke with an AR/VR headset would be a wonderful thing. Perception of synthetic realities are weird: a friend told me about encountering a bug in a VR renderer that made him immediately (a) fall over, and (b) puke. Core dumped?
  4. A New Circular Vision for Electronics (World Economic Forum) -- getting coverage because it says: Each year, close to 50 million tonnes of electronic and electrical waste (e-waste) are produced, equivalent in weight to all commercial aircraft ever built; only 20% is formally recycled. If nothing is done, the amount of waste will more than double by 2050, to 120 million tonnes annually. [...] That same e-waste represents a huge opportunity. The material value alone is worth $62.5 billion (€55 billion), three times more than the annual output of the world’s silver mines and more than the GDP of most countries. There is 100 times more gold in a tonne of mobile phones than in a tonne of gold ore. (via Slashdot)

Continue reading Four short links: 31 January 2019.

Categories: Technology

Four short links: 30 January 2019

O'Reilly Radar - Wed, 2019/01/30 - 11:35

No Code, Enterprise Sales, Deep-Learning the Brain, and Computer Architecture

  1. The Rise of No Code -- As creating things on the internet becomes more accessible, more people will become makers. It’s no longer limited to the >1% of engineers who can code, resulting in an explosion of ideas from all kinds of people. We see “no code” projects on Product Hunt often. This is related to my ongoing interest in Ways In Which Programmers Are Automating Themselves Out of A Job. This might be bad for some low-complexity programmers in the short term, and good for society. Or it might be that the AI Apocalypse is triggered by someone's Glitch bot achieving sentience. Watch this space!
  2. My Losing Battle with Enterprise Sales (Luke Kanies) -- All that discounting you have to do for enterprise clients? It’s because procurement’s bonus is based on how much of a discount they force you to give. Absolutely everyone knows this is how it works, and that everyone knows this, so it’s just a game. I offer my product for a huge price, you try to force a discount, and then at the end we all compare notes to see how we did relative to market. Neither of us really wants to be too far out of spec; I want to keep my average prices the same, and you just want to be sure you aren’t paying too much. Luke tells all.
  3. Decoding Words from Brain Waves -- In each study, electrodes placed directly on the brain recorded neural activity while brain-surgery patients listened to speech or read words out loud. Then, researchers tried to figure out what the patients were hearing or saying. In each case, researchers were able to convert the brain's electrical activity into at least somewhat-intelligible sound files.
  4. A New Golden Age for Computer Architecture (ACM) -- the opportunities for future improvements in speed and energy efficiency will come from (the authors predict): compiler tech and domain-specific architectures. This is a very good overview of how we got here, by way of Moore's Law, Dennard's Law, and Amdahl's Law.

Continue reading Four short links: 30 January 2019.

Categories: Technology

How companies are building sustainable AI and ML initiatives

O'Reilly Radar - Tue, 2019/01/29 - 05:00

A recent survey investigated how companies are approaching their AI and ML practices, and measured the sophistication of their efforts.

In 2017, we published “How Companies Are Putting AI to Work Through Deep Learning,” a report based on a survey we ran aiming to help leaders better understand how organizations are applying AI through deep learning. We found companies were planning to use deep learning over the next 12-18 months. In 2018, we decided to run a follow-up survey to determine whether companies’ machine learning (ML) and AI initiatives are sustainable—the results of which are in our recently published report, “Evolving Data Infrastructure.”

The current generation of AI and ML methods and technologies rely on large amounts of data—specifically, labeled training data. In order to have a longstanding AI and ML practice, companies need to have data infrastructure in place to collect, transform, store, and manage data. On one hand, we wanted to see whether companies were building out key components. On the other hand, we wanted to measure the sophistication of their use of these components. In other words, could we see a roadmap for transitioning from legacy cases (perhaps some business intelligence) toward data science practices, and from there into the tooling required for more substantial AI adoption?

Here are some notable findings from the survey:

  • Companies are serious about machine learning and AI. Fifty-eight percent of respondents indicated that they were either building or evaluating data science platform solutions. Data science (or machine learning) platforms are essential for companies that are keen on growing their data science teams and machine learning capabilities.
  • Companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI. These include data integration and extract, transform, and load (ETL) (60% of respondents indicated they were building or evaluating solutions), data preparation and cleaning (52%), data governance (31%), metadata analysis and management (28%), and data lineage management (21%).
  • Data scientists and data engineers are in demand. When asked which were the main skills related to data that their teams needed to strengthen, 44% chose data science and 41% chose data engineering.
  • Companies are building data infrastructure in the cloud. Eighty-five percent indicated that they had data infrastructure in at least one of the seven cloud providers we listed, with two-thirds (63%) using Amazon Web Services (AWS) for some portion of their data infrastructure. We found that users of AWS, Microsoft Azure, and Google Cloud Platform (GCP) tended to use multiple cloud providers.

Continue reading How companies are building sustainable AI and ML initiatives.

Categories: Technology

Four short links: 29 January 2019

O'Reilly Radar - Tue, 2019/01/29 - 04:50

Git Tool, Linear Algebra, Steganography, and WebAssembly

  1. git-absorb -- git commit --fixup, but automatic.
  2. Coding the Matrix -- linear algebra was where math broke me at university, so my eyes are always drawn to presentations of the subject that promise relevance and comprehensibility. (via Academic Torrents)
  3. A List of Useful Steganography Tools and Resources -- what it says on the box.
  4. Analyzing the Performance of WebAssembly vs. Native Code -- Across the SPEC CPU suite of benchmarks, we find a substantial performance gap: applications compiled to WebAssembly run slower by an average of 50% (Firefox) to 89% (Chrome), with peak slowdowns of 2.6x (Firefox) and 3.14x (Chrome). We identify the causes of this performance degradation, some of which are due to missing optimizations and code generation issues, while others are inherent to the WebAssembly platform.

Continue reading Four short links: 29 January 2019.

Categories: Technology

Four short links: 28 January 2019

O'Reilly Radar - Mon, 2019/01/28 - 06:15

Medical AI, Opinion Mapping, Voting-Free Democracy, and a Typed Graph Database

  1. AI Helps Amputees Walk With a Robotic Knee (IEEE) -- Normally, human technicians spend hours working with amputees to manually adjust robotic limbs to work well with each person’s style of walking. By comparison, the reinforcement learning technique automatically tuned a robotic knee, enabling the prosthetic wearers to walk smoothly on level ground within 10 minutes.
  2. Penelope -- a cloud-based, open, and modular platform that consists of tools and techniques for mapping landscapes of opinions expressed in online (social) media. The platform is used for analyzing the opinions that dominate the debate on certain crucial social issues, such as immigration, climate change, and national identity. Penelope is part of the H2020 EU project ODYCCEUS (Opinion Dynamics and Cultural Conflict in European Spaces).
  3. What MMOs Can Teach Us About Real-Life Politics -- Larry Lessig is designing the political mechanics for a videogame, and this interview is very intriguing. Lessig is also interested in possibly implementing an in-game process in which democracy doesn’t depend on voting: “I’m eager to experiment or enable the experimentation of systems that don’t need to be tied so much to election.” (via BoingBoing)
  4. The AtomSpace: a Typed Graphical Distributed in-RAM Knowledgebase (OpenCog) -- Here’s my sales pitch: you want a graph database with a sophisticated type system built into it. Maybe you don’t know this yet. But you do. You will. You’ll have trouble doing anything reasonable with your knowledge (like reasoning, inferencing, and learning) if you don’t. This is why the OpenCog AtomSpace is a graph database, with types.

Continue reading Four short links: 28 January 2019.

Categories: Technology

Rethinking informed consent

O'Reilly Radar - Mon, 2019/01/28 - 05:00

Consent is the first step toward the ethical use of data, but it's not the last.

Informed consent is part of the bedrock of data ethics. DJ Patil, Hilary Mason, and I have written about it, as have many others. It's rightfully part of every code of data ethics I've seen. But I have to admit misgivings—not so much about the need for consent, but about what it means. Obtaining consent to collect and use data isn't the end of the process; at best, it's the beginning, and perhaps not a very good one.

Helen Nissenbaum, in an interview with Scott Berinato, articulates some of the problems. It's easy to talk about informed consent, but what do we mean by "informed"? Almost everyone who reads this article has consented to some kind of medical procedure; did any of us have a real understanding of what the procedure was and what the risks were? We rely on the prestige or status of the doctor, but unless we're medical professionals, or have done significant online research, we have, at best, a vague notion of what's going to happen and what the risks are. In medicine, for the most part, things come out all right. The problems with consent to data collection are much deeper.

The problem starts with the origin of the consent criterion. It comes from medicine and the social sciences, in which consenting to data collection and to being a research subject has a substantial history. It arose out of experiments with mind-boggling ethical problems (for example, the Tuskeegee syphilis experiment), and it still isn't always observed (paternalism is still a thing). "Consent" in medicine is limited: whether or not you understand what you're consenting to, you are consenting to a single procedure (plus emergency measures if things go badly wrong). The doctor can't come back and do a second operation without further consent. And likewise, "consent" in the social sciences is limited to a single study: you become a single point in an array of data that ceases to exist when the study is complete.

That may have been true years ago, but those limitations on how consent is used seem very shaky, as Nissenbaum argues. Consent is fundamentally an assurance about context: consenting to a medical procedure means the doctors do their stuff, and that's it. The outcome might not be what you want, but you've agreed to take the risk. But what about the insurance companies? They get the data, and they can repackage and exchange it. What happens when, a few years down the road, you're denied coverage because of a "pre-existing condition"? That data has moved beyond the bounds of an operating room. What happens when data from an online survey or social media profile is shared with another organization and combined and re-combined with other data? When it is used in other contexts, can it be de-anonymized and used to harm the participants? That single point in an array of data has now become a constellation of points feeding many experiments, not all of which are benign.

I'm haunted by the question, "what are users consenting to?" Technologists rarely think through the consequences of their work carefully enough; but even if they did, there will always be consequences that can't be foreseen or understood, particularly when data from different sources is combined. So, consenting to data collection, whether it's clicking on the ever-present checkbox about cookies or agreeing to Facebook's license agreement, is significantly different from agreeing to surgery. We really don't know how that data is used, or might be used, or could be used in the future. To use Nissenbaum's language, we don't know where data will flow, nor can we predict the contexts in which it will be used.

Consent frequently isn't optional, but compelled. Writing about the #DeleteFacebook movement, Jillian York argues that for many, deleting Facebook is not an option: "for people with marginalized identities, chronic illnesses, or families spread across the world, walking away [from Facebook] means leaving behind a potentially vital safety net of support." She continues by writing that small businesses, media outlets, artists, and activists rely on it to reach audiences. While no one is compelled to sign up, or to remain a user, for many "deleting facebook" means becoming a non-entity. If Facebook is your only way to communicate with friends, relatives, and support communities, refusing "consent" may not be an option; consent is effectively compelled. The ability to withdraw consent from Facebook is a sign of privilege. If you lack privilege, an untrustworthy tool may be better than no tool at all.

One alternative to consent is the idea that you own the data and should be compensated for its use. Eric Posner, Glen Weyl, and others have made this argument, which essentially substitutes a market economy for consent: if you pay me enough, I'll let you use my data. However, markets don’t solve many problems. In "It's time for a bill of data rights," Martin Tisne argues that data ownership is inadequate. When everything you do creates data, it's no more meaningful to own your "digital shadow" than your physical one. How do you "own" your demographic profile? Do you even "own" your medical record? Tisne writes: "A person doesn’t 'own' the fact that she has diabetes—but she can have the right not to be discriminated against because of it... But absent government regulation to prevent health insurance companies from using data about preexisting conditions, individual consumers lack the ability to withhold consent. ... Consent, to put it bluntly, does not work." And it doesn't work whether or not consent is mediated by a market. At best, the market may give some incremental income, but at worst, it gives users incentives to act against their best interest.

It's also easy to forget that in many situations, users are compensated for their data: we're compensated by the services that Facebook, Twitter, Google, and Amazon provide. And that compensation is significant; how many of us could do our jobs without Google? The economic value of those services to me is large, and the value of my data is actually quite limited. To Google, the dozens of Google searches I do in a day are worth a few cents at most. Google's market valuation doesn't derive from the value of my data or yours in isolation, but the added value that comes from aggregating data across billions of searches and other sources. Who owns that added value? Not me. An economic model for consent (I consent to let you use my data if you pay me) misses the point: data’s value doesn’t live with the individual.

It would be tragic to abandon consent, though I agree with Nissenbaum that we urgently need to get beyond "incremental improvement to consent mechanisms." It is time to recognize that consent has serious limitations, due partly to its academic and historical origins. It's important to gain consent for participation in an experiment; otherwise, the subject isn't a participant but a victim. However, while understanding the consequences of any action has never been easy, the consent criterion arose when consequences were far more limited and data didn't spread at the speed of light.

So, the question is: how do we get beyond consent? What kinds of controls can we place on the collection and use of data that align better with the problems we're facing? Tisne suggests a "data bill of rights": a set of general legal principles about how data can be used. The GDPR is a step in this direction; the Montreal Declaration for the Responsible Development of Artificial Intelligence could be reformulated as a "bill of data rights." But a data bill of rights assumes a new legal infrastructure, and by nature such infrastructures place the burden of redress on the user. Would one bring a legal action against Facebook or Google for violation of one's data rights? Europe's enforcement of GDPR will provide an important test case, particularly since this case is essentially about data flows and contexts. It isn't clear that our current legal institutions can keep pace with the many flows and contexts in which data travels.

Nissenbaum starts from the knowledge that data moves, and that the important questions aren't around how our data is used, but where our data travels. This shift in perspective is important precisely because data sets become more powerful when they're combined; because it isn't possible to anticipate all the ways data might be used; and because once data has started flowing, it's very hard to stop it. But we have to admit we don't yet know how to ask for consent about data flows or how to prove they are under control. Which data flows should be allowed? Which shouldn't? We want to enable medical research on large aggregated data sets without jeopardizing the insurance coverage of the people whose data are in those sets. Data would need to carry metadata with it that describes where it could be transferred and how it could be used once it's transferred; it makes no sense to talk about controlling data flows if that control can't be automated.

As Ben Lorica and I have argued, the only way forward is through more automation, not less; issues of scale won't let us have it any other way. In a conversation, Andrew Zaldivar told me of his work with Margaret Mitchell, Timnit Gebru, and others, on model cards that describe the behavior of a machine learning model, and of Timnit Gebru's work on Datasheets for Datasets, which specify how a data set was collected, how it is intended to be used, and other information. Model cards and data set datasheets are a step toward the kind of metadata we'd need to automate control over data flows, to build automated tools that manage where data can and can't travel, to protect public goods as well as personal privacy. In the past year, we’ve seen how easy it is to be overly optimistic about tool building, but we are all already using data at the scale of Google and Facebook. There will need to be human systems that override automatic control over data flows, but automation is an essential ingredient.

Consent is the first step along the path toward ethical use of data, but not the last one. What is the next step?

Continue reading Rethinking informed consent.

Categories: Technology

Four short links: 25 January 2019

O'Reilly Radar - Fri, 2019/01/25 - 05:25

IT Failures, Paradigms, AI Governance, Quantum Hokum

  1. Biggest IT Failures of 2018 (IEEE) -- a coding error with the spot-welding robots at Subaru’s Indiana Automotive plant in Lafayette, Ind., meant 293 of its new Subaru Ascents had to be sent to the car crusher. A similar problem is suspected as the reason behind the welding problems affecting the steering on Fiat Chrysler Jeep Wranglers. This is not the "crushing it" that brogrammers intended.
  2. Programming Paradigms for Dummies: What Every Programmer Should Know -- This chapter gives an introduction to all the main programming paradigms, their underlying concepts, and the relationships between them. We give a broad view to help programmers choose the right concepts they need to solve the problems at hand. We give a taxonomy of almost 30 useful programming paradigms and how they are related. Most of them differ only in one or a few concepts, but this can make a world of difference in programming. (via Adrian Colyer)
  3. Proposed Model Governance -- Singapore Government's work on regulating AI.
  4. Talent Shortage in Quantum Computing (MIT) -- an argument that we need special training for quantum computing, as it's a mix of engineering and science at this stage in its evolution. This chap would disagree, colorfully: when a subject which claims to be a technology, which lacks even the rudiments of experiment that may one day make it into a technology, you can know with absolute certainty that this "technology" is total nonsense. That was the politest quote I could make.

Continue reading Four short links: 25 January 2019.

Categories: Technology

Four short links: 24 January 2019

O'Reilly Radar - Thu, 2019/01/24 - 04:55

Computational Periscopy, Automating Data Structures, Multi-Stream Processing, and Open Source Bioinstruments

  1. Computational Periscopy with an Ordinary Camera (Nature) -- Here we introduce a two-dimensional computational periscopy technique that requires only a single photograph captured with an ordinary digital camera. Our technique recovers the position of an opaque object and the scene behind (but not completely obscured by) the object, when both the object and scene are outside the line of sight of the camera, without requiring controlled or time-varying illumination. Such recovery is based on the visible penumbra of the opaque object having a linear dependence on the hidden scene that can be modeled through ray optics. Computation and vision, whether deep learning or this kind of mathematical witchcraft, has brought about an age of truly amazing advances. Digital cameras are going to make film cameras look like pinhole cameras because the digital feature set will be staggering. (All requiring computational power, on- or off-device)
  2. The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost Models -- We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay out data, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. I'm always interested in augmentation for programmers. (via Adrian Colyer)
  3. Confluo (Berkeley) -- open source system for real-time distributed analysis of multiple data streams. Confluo simultaneously supports high throughput concurrent writes, online queries at millisecond timescales, and CPU-efficient ad hoc queries via a combination of data structures carefully designed for the specialized case of multiple data streams, and an end-to-end optimized system design. The home page has more information. Designing for multiple data streams is an interesting architectural choice. Any interesting business will track multiple data streams, but will they do that in one system or bolt together multiple?
  4. Open-Sourcing Bioinstruments -- story of the poseidon syringe pump system, which has free hardware designs and software.

Continue reading Four short links: 24 January 2019.

Categories: Technology


Subscribe to LuftHans aggregator