You are here

Feed aggregator

Four short links: 12 August 2019

O'Reilly Radar - Mon, 2019/08/12 - 04:00

Retro Hacking, Explaining AI, Teacher Ratings, and Algorithmic Bias

  1. First Person Adventure via Mario Maker (Vice) -- the remarkable “3D Maze House (P59-698-55G)” by creator ねぎちん somehow manages to credibly re-create the experience of playing a first-person (!!) adventure game like Wizardy, something Nintendo cleary never intended.
  2. Measurable Counterfactual Local Explanations for Any Classifier -- generates w-counterfactual explanations that state minimum changes necessary to flip a prediction’s classification [and ...] builds local regression models, using the w-counterfactuals to measure and improve the fidelity of its regressions. Making AI "explain itself" is useful and hard; this seems like an interesting step forward.
  3. Student Evaluation of Teaching Ratings and Student Learning are Not Related (Science Direct) -- Students do not learn more from professors with higher student evaluation of teaching (SET) ratings. [...] New meta-analyses of multisection studies show that SET ratings are unrelated to student learning. (via Sciblogs)
  4. Apparent Gender-Based Discrimination in the Display of STEM Career Ads -- women disproportionately click on job ads, so bidding algorithms charge more to advertisers to show to women, so men see more job ads. (via Ethan Molick)

Continue reading Four short links: 12 August 2019.

Categories: Technology

Four short links: 9 August 2019

O'Reilly Radar - Fri, 2019/08/09 - 04:05

Shadow Ban Patent, Abusing Unix Tools, Deblurring Photos, and Postal Vectors

  1. Facebook Patents Shadow Banning -- which has a long history elsewhere.
  2. Living Off The Land in Linux -- legitimate functions of Unix binaries that can be abused to break out restricted shells, escalate or maintain elevated privileges, transfer files, spawn bind and reverse shells, and facilitate the other post-exploitation tasks. Interesting to see the surprising functionality built into some utilities.
  3. Neural Blind Deconvolution Using Deep Priors -- deblurring photos with neural nets. Very cool, and they've posted code. (via @roadrunning01)
  4. Warshipping (TechCrunch) -- I mail you a package that contains a Wi-Fi sniffer with cellular connection back to me. It ships me your Wi-Fi handshake, I crack it, ship it back, now it joins your network and the game is afoot. (via BoingBoing)

Continue reading Four short links: 9 August 2019.

Categories: Technology

PLUG Security meeting topic for Aug 15th

PLUG - Thu, 2019/08/08 - 09:45

Gavin Klondike: Machine Learning for Security Analysts

Today, over a quarter of security products for detection have some form of machine learning built in. However, “machine learning” is nothing more than a mysterious buzzword for many security analysts. In order to properly deploy and manage these products, analysts will need to understand how the machine learning components operate to ensure they are working efficiently. In this talk, we will dive head first into building and training our own machine learning models using the 7-step machine learning process.

Gavin is a senior consultant and researcher who has a passion for network security, both attack and defense. Through that passion, he runs NetSec Explained; a blog and YouTube channel which covers intermediate and advanced level network security topics, in an easy to understand way. His work has given him the opportunity to be published in industry magazines and speak at conferences such as Defcon and CactusCon. Currently, he is researching into ways to address the cybersecurity skills gap, by utilizing machine learning to augment the capabilities of current security analysts.

Got speech? These guidelines will help you get started building voice applications

O'Reilly Radar - Thu, 2019/08/08 - 07:45

Speech adds another level of complexity to AI applications—today’s voice applications provide a very early glimpse of what is to come.

As companies begin to explore AI technologies, three areas in particular are garnering a lot of attention: computer vision, natural language applications, and speech technologies. A recent report from the World Intellectual Patent Office (WIPO) found that together these three areas accounted for a majority of patents related to AI: computer vision (49% of all patents), natural language processing (NLP) (14%), and speech (13%).

Figure 1. A 2019 WIPO Study shows patent publications in a few key areas. Image by Ben Lorica.

Companies are awash with unstructured and semi-structured text, and many organizations already have some experience with NLP and text analytics. While fewer companies have infrastructure for collecting and storing images or video, computer vision is an area that many companies are beginning to explore. The rise of deep learning and other techniques have led to startups commercializing computer vision applications in security and compliance, media and advertising, and content creation.

Companies are also exploring speech and voice applications. Recent progress in natural language and speech models have increased accuracy and opened up new applications. Contact centers, sales and customer support, and personal assistants lead the way as far as enterprise speech applications. Voice search, smart speakers, and digital assistants are increasingly prevalent on the consumer side. While far from perfect, the current generation of speech and voice applications work well enough to drive an explosion in voice applications. An early sign of the potential of speech technologies is the growth of voice-driven searches: Comscore estimates that by 2020 about half of all online searches will use voice; Gartner recommends that companies redesign their websites to support both visual and voice search. Additionally, smart speakers are projected to grow by more than 82% from 2018 to 2019, and by the end of the year, the installed base for such devices will exceed 200 million.

Figure 2. Types of voice interactions. Image source: Yishay Carmiel and Ben Lorica.

Audio content is also exploding, and this new content will need to be searched, mined, and unlocked using speech technologies. For example, according to a recent New York Times article, in the US, “nearly one out of three people listen to at least one podcast every month.” The growth in podcasts isn't limited to the US: podcasts are growing in other parts of the world, including China.

Voice and conversational applications can be challenging

Unlike text and NLP, or computer vision, where one can pull together simple applications, voice applications—that venture beyond simple voice commands—remain challenging for many organizations. Spoken language tends to be “noisier” than written text. For example, having read many podcast transcripts, we can attest that transcripts from spoken conversations still require a lot of editing. Even if you have access to the best transcription (speech-to-text) technology available, you often end up with a document with sentences that contain pauses, fillers, restarts, interjections (in the case of conversations), and ungrammatical constructs. The transcript may also contain passages that need to be refined due to the possibility that someone is “thinking out loud” or had trouble articulating or formulating specific points. Also, the resulting transcript may not be properly punctuated or capitalized in the right places. Thus, in many applications, post-processing of transcripts will require human editors.

In computer vision (and now in NLP), we are at a stage where data has become at least as important as algorithms. Specifically, pre-trained models have achieved the state-of-the art in several tasks in computer vision and NLP. What about for speech? There are a few reasons why a “one size fits all” speech model hasn’t materialized:

  • There are a variety of acoustic environments and background noises: indoor or outdoor, in a car, in a warehouse, or in a home, etc.
  • Multiple languages (English, Spanish, Mandarin, etc.) may need to be supported, particularly in situations where speakers use (or mix and match) several languages in the course of conversations.
  • The type of application (search, personal assistant, etc.) impacts dialog flow and vocabulary.
  • Depending on the level of sophistication of an application, language models and vocabulary will need to be tuned for specific domains and topics. This is also true for text and natural language applications.
Building voice applications

Challenges notwithstanding, as we noted: there is already considerable excitement surrounding speech technologies and voice applications. We haven’t reached the stage where a general-purpose solution can be used to power a wide variety of voice applications, nor do we have voice-enabled intelligent assistants that can handle multiple domains.

There are, however, good building blocks from which one can assemble interesting voice applications. To assist companies that are exploring speech technologies, we assembled the following guidelines:

  • Narrow your focus. As we noted, “one size fits all” is not possible with the current generation of speech technologies, so it is best to focus on specific tasks, languages, and domains.
  • Understand the goal of the application, then backtrack to the types of techniques that will be needed. If you know the KPIs for your application, this will let you target the language models needed to achieve those metrics for the specific application domain.
  • Experiment with "real data and real scenarios.” If you plan to get started by using off-the-shelf models and services, note that it is important to experiment with "real data and real scenarios.” In many cases, your initial test data will not be representative of how users will interact with the system you hope to deploy.
  • Acquire labeled examples for each specific task. For example, recognizing the word “cat” in English and “cat” in Mandarin will require different models and different labeled data.
  • Develop a data-acquisition strategy to gather appropriate data. Make sure you build a system that can learn as it gathers more data, and an iterative process that fosters ongoing improvement.
  • Users of speech applications are concerned about outcomes. Speech models are only as interesting as the insights that can be derived and the actions that are taken using those insights. For example, if a user asks a smart speaker to play a specific song, the only thing that matters to this user is that it plays that exact song.
Figure 3. Models should be used to derive insights. Image source: Yishay Carmiel and Ben Lorica.
  • Automate workflows. Ideally, the needed lexicon and speech models can be updated without much intervention (from machine learning or speech technology experts).
  • Voice applications are complex end-to-end systems, so optimize when possible. Speech recognition systems alone are comprised of several building blocks which we described in a previous post. Training and retraining models can be expensive. Depending on the application and setup, latency and continuous connectivity can be important considerations.
From NLU to SLU

We are still in the early stages for voice applications in the enterprise. The past 12 months have seen rapid progress in pre-trained natural language models that set records across multiple NLP benchmarks. Developers are beginning to take these language models and tune them for specific domains and applications.

Speech adds another level of complexity—beyond natural language understanding (NLU)—to AI applications. Spoken language understanding (SLU) requires the ability to extract meaning from speech utterances. While SLU is not yet on hand for voice or speech applications, the good news is that one can already build simple, narrowly focused voice applications using existing models. To find the right use cases, companies will need to understand the limitations of current technologies and algorithms.

In the meantime, we'll proceed in stages. As Alan Nichol noted in a post focused on text-based applications, “Chatbots are just the first step in the journey to achieve true AI assistants and autonomous organizations.” In the same way, today’s voice applications provide a very early glimpse of what is to come.

Related content:

Continue reading Got speech? These guidelines will help you get started building voice applications.

Categories: Technology

Four short links: 8 August 2019

O'Reilly Radar - Thu, 2019/08/08 - 04:00

Counterfeit Security, Poses in Art, Content Moderation, and iPhone Remote Attack Surface

  1. From The Depths Of Counterfeit Smartphones -- security look at the counterfeit phones. Spoiler: they're nasty, stay away. Both the Galaxy S10 and iPhone 6 counterfeits we assessed contained malware and rootkits. And that's the most straightforward nastiness: even if you removed the rootkit they'd still be shocking. In the case of the "iPhone," further digging revealed that it runs a far older version of Android: Kitkat 4.4.0. Kitkat’s last update came in 2014.
  2. Linking Art through Human Poses -- arXiv paper that finds artwork with matching poses using OpenPose. (via MIT TR)
  3. A Framework for Content Moderation (Ben Thompson) -- pretty good post, tackling why and where the different levels of moderation make sense.
  4. Fully Remote Attack Surface of the iPhone (Google Project Zero) -- very interesting read, showing the detail and dead ends of a security tester. The method [...] processes incoming MIME messages, and sends them to specific decoders based on the MIME type. Unfortunately, the implementation did this by appending the MIME type string from an incoming message to the string "decode" and calling the resulting method. This meant that an unintended selector could be called, leading to memory corruption.

Continue reading Four short links: 8 August 2019.

Categories: Technology

New live online training courses

O'Reilly Radar - Wed, 2019/08/07 - 05:15

Get hands-on training in deep learning, AI applications, business strategies, Python, data analysis, and many other topics.

Learn new topics and refine your skills with more than 180 new live online training courses we opened up for August and September on the O'Reilly online learning platform.

AI and machine learning

Introduction to AI on Google Cloud, August 28

Robotics 2.0, August 28

Debugging and Optimizing Convolutional Neural Networks with Keras, September 5

Fundamentals of Machine Learning and Data Analytics, September 5-6

Fundamentals of Machine Learning with AWS, September 10

Introduction to Reinforcement Learning, September 11

Probabilistic Modeling with TensorFlow Probability, September 16

Graphs and Network Algorithms from Scratch, September 16

Artificial intelligence: An Overview of AI and Machine Learning, September 17

Machine Learning from Scratch, September 18

Deep Learning Fundamentals, September 18

TensorFlow Extended: Model Build, Analysis, and Serving, September 19

Hands-On Deep Neural Networks with PyTorch, September 19

Getting Started with Machine Learning, September 23

Getting Started with Tensorflow.js, September 23

Building Machine Learning Models with AWS Sagemaker, September 23

Automated Machine Learning for Hyperparameter Optimization and Algorithm Selection, September 24

Deep Learning with TensorFlow, September 25

Deep Learning from Scratch, September 30

Machine Learning for Business Analytics: A Deep Dive into Data with Python, October 3

Deploying Machine Learning Models to Production: A Toolkit for Real-World Success, October 15-16

Real-Time Streaming Analytics and Algorithms for AI Applications, October 16


Introducing Blockchain, September 23


Unlocking Agility: 7 Signs Your Company is Becoming More Agile, August 12

Product Management in 90 minutes, August 15

60 Minutes to Better Email , August 19

60 Minutes to a Better Daily Scrum, August 26

Building the Skills to Succeed as a Remote Worker in 90 Minutes, August 27

60 Minute Introduction to Hypothesis-Driven Software Development, August 28

Fundamentals of Financial Decision-Making, September 3

Building a Case to Start Working Remotely in 90 Minutes, September 3

Incident Management, September 3

Introduction to Quantitative Financial Risk Management with R, September 6

Fundamentals of Management, September 12

Quantitative Trading with Python, September 16

60 Minutes to Better Product Metrics, September 16

Better Business Writing, September 17

Unlock Your Potential, September 17

Competing with Business Strategy, September 19

Giving a Powerful Presentation, September 19

Building Intelligent Analytics Through Time Series Data, September 20

How the Internet Really Works, September 20

Performance goals for growth, September 24

Introduction to Critical Thinking, September 24

Spotlight on Learning from Failure: Setting Objectives Matters with Josh Seiden, September 24

Negotiation Fundamentals, September 25

Emotional Intelligence in the Workplace, September 25

Managing a Toxic Work Environment, September 25

Introduction to Time Management Skills, September 26

90 Minutes to Better Decision-Making, September 27

Getting Unstuck, October 1

Core Agile, October 2

Fundamentals of Learning: Learn Faster and Better Using Neuroscience, October 3

Building the Courage to Take Risks, October 3

How to Give Great Presentations, October 8

How to Be a Better Mentor, October 8

Managing Stress, October 8

Introduction to Leadership Skills, October 10

Empathy at Work, October 15

Managing Team Conflict, October 15

Product Management in 90 minutes, October 15

Business Fundamentals, October 15

Building Your LinkedIn Network, October 16

60 Minutes to Better User Stories and Backlog Management, October 16

Unlock Your Potential , October 17

60 Minutes to Better Email, October 17

Data science and data tools

Data Visualization with Matplotlib and Seaborn, August 9

Introduction to DAX Using Power BI, September 6

Linear Regression with Python: Essential Math for Data Science, September 9

Probability with Python: Essential Math for Data Science, September 12

Foundational Python for Data Science, September 17

Programming with Data: Advanced Python and Pandas, September 17

Statistics and Hypothesis Testing with Python: Essential Math for Data Science, September 18

Inferential Statistics Using R, September 18

Bridging the Gap from Excel to SQL, September 18

SQL Data Wrangling for Excel Users, September 20

Beginner’s Guide to Writing AWS Lambda Functions in Python, September 24

SQL-Powered Excel for Business Analytics, September 25

Debugging Data Science, September 26

Data-Driven Fundamentals of Google Analytics, September 26

R-Powered Excel for Business Analytics, September 27

Intermediate SQL for Data Analysis, September 30

First Steps in Data Analysis, September 30

Data Engineering for Data Scientists, October 1

Intro to Mathematical Optimization, October 7

Introduction to Algorithms and Data Structures, October 9

Algorithmic Risk Management in Trading and Investing, October 10

Visualization and Presentation of Data, October 15

SQL Fundamentals for Data, October 15-16

Introduction to Statistics for Data Analysis with Python, October 16


Data Structures in Java, August 30

Programming with Java Lambdas and Streams, September 3

Python Data Handling—a Deeper Dive, September 4

Advanced Test-Driven Development (TDD), September 5

SQL for Any IT Professional, September 10

Java Full Throttle with Paul Deitel: A One-Day, Code-Intensive Java Standard Edition Presentation , September 10

Scale Your Python Processing with Dask, September 11

What's New in Java, September 16

Python Full Throttle with Paul Deitel, September 16

Software Development and UX, September 17

Next-Generation Java Testing with JUnit 5, September 17

Design Patterns in Java, September 17-18

Java Testing with Mockito and the Hamcrest Matchers, September 18

Advanced React.js, September 18

Scala Fundamentals: From Core Concepts to Real Code in 5 Hours, September 19

Introduction to the Go Programming Language, September 20

Rust Crash Course: Learn Enough to Get Over the Hump, September 23

Interactive Visualization Approaches in Jupyter Notebooks, September 23

Modern Java Exception Handling, September 23

Clean Code, September 23

Kotlin Fundamentals, September 24

Java 8 Generics in 3 Hours, September 24

Design Patterns with TypeScript, September 25

Creating Angular Applications with GraphQL, September 25

Introduction to Python Programming, September 30

Learning Python 3 by Example, September 30

Getting Started with Spring and Spring Boot, October 7-8


Ethical Hacking Bootcamp with Hands-on Labs, September 4-6

Getting Started with Cyber Investigations and Digital Forensics, September 6

Intense Introduction to Hacking Web Applications, September 11

CompTIA Security+ SY0-501 Crash Course, September 11-12

Introduction to Encryption, September 17

Privacy Practices for Cybersecurity Professionals, September 26

Introduction to Digital Forensics and Incident Response (DFIR), September 27

Defensive Cybersecurity Fundamentals, October 15

Systems engineering and operations

Take Terraform to the Next Level, August 8

Red Hat Certified Engineer (RHCE) EX294 Crash Course: Red Hat Ansible Automation, September 3-6

Container Revolution, September 4

Network DevOps, September 5

Introduction to Jenkins for DevOps, September 5

Amazon Web Services (AWS) Security Crash Course, September 5

Building Applications with Apache Cassandra, September 6

Linux Under the Hood, September 9

Getting Started: Microsoft Certified Azure Administrator, September 9

Getting Started with Amazon Web Services (AWS), September 9-10

Essential Machine Learning and Exploratory Data Analysis with Python and Jupyter Notebook, September 9-10

Exam AZ-103: Microsoft Azure Administrator Crash Course, September 9-10

DevOps on Google Cloud Platform (GCP), September 10

Learn Linux in 3 Hours, September 10

Microservices Architecture and Design, September 10-11

Building APIs with Django REST Framework, September 11

AWS Certified Solutions Architect Associate Exam Crash Course, September 11-12

Getting Started With Jenkins X, September 12

Getting Started with Cloud Computing, September 13

Software Architecture by Example, September 16

Introduction to Docker CI/CD, September 16

Getting Started with OpenShift, September 16

IP Subnetting from Beginning to Mastery, September 16-17

AWS Machine Learning Specialty Certification Crash Course, September 16-17

Software Architecture Foundations: Characteristics and Tradeoffs, September 17

Practical Linux Command Line for Data Engineers and Analysts, September 17

Exam AZ-300: Microsoft Azure Architect Technologies Crash Course, September 17-18

CompTIA Linux+ Certification Crash Course, September 18-19

Amazon Web Services: AWS Managed Services, September 18-19

AWS Certified Developer Associate Crash Course, September 18-19

Introduction to Docker Images, September 19

AWS Security Fundamentals, September 19

Building a Cloud Roadmap, September 19

Kubernetes in 4 Hours, September 20

Managing Containers on Linux, September 23

Introduction to Kubernetes, September 23-24

Microservice Decomposition Patterns, September 24

Docker for JVM Projects, September 24

Apache Hadoop, Spark, and Big Data Foundations, September 24

AWS Certified SysOps Administrator (Associate) Crash Course, September 25-26

Windows 10 Crash Course (Exam MD-100, September 25-26

Analyzing Software Architecture, September 26

Certified Kubernetes Application Developer (CKAD) Crash Course, September 26-27

Building Micro-Frontends, September 30

Automating Architectural Governance, September 30

Exam AZ-301: Microsoft Azure Architect Design Crash Course, October 1-2

Building and Managing Kubernetes Applications, October 3

Docker: Beyond the Basics (CI & CD), October 7-8

Amazon Web Services (AWS) Technical Essentials, October 10

Designing Serverless Architecture with AWS Lambda, October 15-16

Azure Cognitive Services, October 16

Microservice Fundamentals, October 16

Building Simple Serverless Applications with Azure Functions, October 17

Microservice Collaboration, October 17

AWS Design Fundamentals, October 17-18

Continue reading New live online training courses.

Categories: Technology

Four short links: 7 August 2019

O'Reilly Radar - Wed, 2019/08/07 - 03:55

Checklists, Farewells, De-Risking, and Statistical Complexity of Brain Activity

  1. Why Checklists Fail (Nature) -- After the NHS mandated the WHO checklist, researchers at Imperial College London launched a project to monitor the tool's use and found that staff were often not using it as they should. In a review of nearly 7,000 surgical procedures performed at five NHS hospitals, they found that the checklist was used in 97% of cases, but was completed only 62% of the time. When the researchers watched a smaller number of procedures in person, they found that practitioners often failed to give the checks their full attention, and read only two-thirds of the items out loud. In slightly more than 40% of cases, at least one team member was absent during the checks; 10% of the time, the lead surgeon was missing. If you give a checklist that ensures X to workers who don't value X, you get workers who half-arse their way through a checklist. And, in this case, unnecessarily hurt and/or killed patients.
  2. Rowboats and Magic Feathers: Reflections on 13 Years of Museum 2.0 (Nina Simon) -- popular social media productions twist the creators' perceptions and become burdens. I kept to a rigorous schedule and never took a week off. Even weeks when I was giving birth, on vacation, or exhausted from challenges at work, I blogged. My attitude was, "readers don't care what's going on with me. They want the content." This blog became like Dumbo's feather. I loved it, but I also let it overpower my sense of self. As long as I was holding it—as long as I was pumping out content—I could soar. But I was terrified to let it drop. Without the blog, I presumed I could not fly. Compare Overly-Attached Girlfriend's video on leaving YouTube. It's hard stuff.
  3. De-Risking Custom Technology Projects (18F) -- sweet advice.
  4. Distinguishing States of Conscious Arousal Using Statistical Complexity -- how can you tell whether someone is awake or sedated, just from their brain activity? By analyzing signals from individual electrodes and disregarding spatial correlations, we find that statistical complexity distinguishes between the two states of conscious arousal through temporal correlations alone. In particular, as the degree of temporal correlations increases, the difference in complexity between the wakeful and anaesthetized states becomes larger. Uses an "epsilon machine," which I'd not heard of before but which is a "minimal, unifilar presentation of a stationary stochastic process" (particular type of hidden Markov model). The entropy of the epsilon machine's states yields a measure of statistical complexity, which this paper shows maps to sedated/wake states.

Continue reading Four short links: 7 August 2019.

Categories: Technology

Four short links: 6 August 2019

O'Reilly Radar - Tue, 2019/08/06 - 04:10

Path Tracing, Games Experiences, Cinematic Visualization, and IoT Security

  1. The Path to Traced Movies (Pixar) -- Until recently, brute-force path tracing techniques were simply too noisy and slow to be practical for movie production rendering.[...] In this survey, we provide an overview of path tracing and highlight important milestones in its development that have led to it becoming the preferred movie rendering technique today.
  2. Free to Play? Hate, Harassment, and Positive Social Experiences in Online Games (ADL) -- The survey found that 88% of adults who play online multiplayer games in the US reported positive social experiences while playing games online. The most common experiences were making friends (51%) and helping other players (50%). [...] Seventy-four percent of adults who play online multiplayer games in the US experience some form of harassment while playing games online. Sixty-five percent of players experience some form of severe harassment, including physical threats, stalking, and sustained harassment. Alarmingly, nearly a third of online multiplayer gamers (29%) have been doxed.
  3. Cinematic Scientific Visualization: The Art of Communicating Science -- slides and words from SIGGRAPH talk on advanced film-style techniques for telling science stories.
  4. Core Cybersecurity Feature Baseline for Securable IoT Devices: A Starting Point for IoT Device Manufacturers (NIST) -- draft of some excellent guidelines to device manufacturers. Device identifiers, firmware updates and resets, data protection, disabling and restricting access to local and network interfaces, event logging, etc. Doesn't specify how to do these things, just that manufacturers should do them. Important so we don't build more future botfarms.

Continue reading Four short links: 6 August 2019.

Categories: Technology

Topics for Aug 8th's meeting

PLUG - Mon, 2019/08/05 - 09:28
Dhruva Lokegaonkar: Shell Scripting for everyone

An introduction to Shell scripting.
- The basics of stringing together various commands
- Pipes and Parallelization
- Conditionals and Loops
- How to use these things to create useful scripts, like creating basic website generators, background switches, keyboard hotkeys, etc.

Dhruva is a ASU Computer Science Freshman. He's been using Linux for the past 5 Years. He's been involved with the Indian Linux Users Group Bombay (ILUG-BOM) in their mission to introduce Linux to High School and College students by making it a default in the Indian Curriculum.

Austin Godber: Stream Processing with Python and Kafka

A quick intro to Kafka, a distributed log system, and how to interact with it using Python.

Four short links: 5 August 2019

O'Reilly Radar - Mon, 2019/08/05 - 06:20

Innovation Policy Toolkit, Differential Privacy, Ethically Aligned Design, Low-N Learning

  1. Toolkit of Policies to Promote Innovation (Journal of Economic Perspectives) -- We discuss a number of the main innovation policy levers and describe the available evidence on their effectiveness: tax policies to favor research and development, government research grants, policies aimed at increasing the supply of human capital focused on innovation, intellectual property policies, and pro-competitive policies. In the conclusion, we synthesize this evidence into a single-page “toolkit,” in which we rank policies in terms of the quality and implications of the available evidence and the policies’ overall impact from a social cost-benefit perspective. We also score policies in terms of their speed and likely distributional effects. (via Marginal Revolution)
  2. A Brief Tour of Differential Privacy -- lecture slides from a CMU course. Content warning: Comic Sans.
  3. Ethically Aligned Design, First Edition -- read online. The most comprehensive, crowd-sourced global treatise regarding the ethics of autonomous and intelligent systems available today.
  4. N-Shot Learning -- brief overview of machine learning from zero, one, or a handful of examples.

Continue reading Four short links: 5 August 2019.

Categories: Technology

Four short links: 2 August 2019

O'Reilly Radar - Fri, 2019/08/02 - 01:00

Cognitive Biases, Conflict, Language Models, and Programmable Memristor Computer

  1. The Evolutionary Roots of Human Decision Making (NCBI) — paper showing that we share cognitive biases with other primates. In one study, monkeys had a choice between one experimenter (the gains experimenter) who started by showing the monkey one piece of apple and sometimes added an extra piece of apple, and a second experimenter (the losses experimenter) who started by showing the monkey two pieces of apple and sometimes removed one. Monkeys showed an overwhelming preference for the gains experimenter over the losses experimenter—even though they received the same payoff from both. In this way, capuchins appear to avoid options that are framed as a loss, just as humans do.
  2. 6 Must Reads for Cutting Through Conflict and Tough Conversations (First Round Capital) — a summary of good (?) advice from books. Some I agree with, but others ... having worked for narcissists and bean counters, find a new job. Don't stay any longer than you have to with those jerks.
  3. ERNIE — Baidu's open source continual pre-training framework for language understanding. Baidu says: Integrating both phrase information and named entity information enables the model to obtain better language representation compared to BERT. ERNIE is trained on multi-source data and knowledge collected from encyclopedia articles, news, and forum dialogues, which improves its performance in context-based knowledge reasoning. See also the ERNIE paper.
  4. First Programmable Memristor Computer (IEEE) — The new chip combines an array of 5,832 memristors with an OpenRISC processor. 486 specially-designed digital-to-analog converters, 162 analog-to-digital converters, and two mixed-signal interfaces act as translators between the memristors’ analog computations and the main processor.

Continue reading Four short links: 2 August 2019.

Categories: Technology

Make data science more useful

O'Reilly Radar - Thu, 2019/08/01 - 04:00

The O’Reilly Data Show Podcast: Cassie Kozyrkov on connecting data and AI to business.

In this episode of the Data Show, I speak with Cassie Kozyrkov, technical director and chief decision scientist at Google Cloud. She describes "decision intelligence" as an interdisciplinary field concerned with all aspects of decision-making, and which combines data science with the behavioral sciences. Most recently she has been focused on developing best practices that can help practitioners make safe, effective use of AI and data. Kozyrkov uses her platform to help data scientists develop skills that will enable them to connect data and AI with their organizations' core businesses.

Continue reading Make data science more useful.

Categories: Technology

Taming chaos: Preparing for your next incident

O'Reilly Radar - Thu, 2019/08/01 - 03:55

Tim Craig and Gustavo Franco on establishing robust and well-supported incident response processes.

Incident response is like security: when you do it well, no one notices because everything just works the way it should. With both incident response and security, the costs are obvious to the organization while the benefits remain amorphous. And as with security, a lack of attention to incident response could be regretted—you can lose a good deal of money while your systems are non-functional. Even worse, your clients and customers might lose confidence in your product or organization, potentially costing you the entire business.

In this interview, Tim Craig and fellow Googler Gustavo Franco, a site reliability engineer (SRE), discuss the wide range of events that qualify as “incidents;” the need for a conscious, robust, and well-defined process for understanding them; the role of training; and how to get buy-in from management so you can spread incident response training throughout an organization.

The concept of incident response is very broad, according to Craig. It goes far beyond the people usually considered, such as SREs and network administrators. Imagine a major incident requires a statement from an executive who happens to be eight time zones away or on an airplane. Can you reach them quickly? Are your legal and PR teams ready? Thus, incident response can cross many teams and involve an entire organization.

Craig may also surprise you by elevating processes above tools. But this is natural because incident response is an organizational issue, not just a technical one. Tools are important, of course—more on that in a moment—but people are even more important. When you choose the types of incidents for which you need to train and prepare, consider not just what’s most important to the business, but also what can best teach your staff.

Ideally, when a disaster happens, everybody who can help will immediately take their places and perform a useful role, like the crew of an airplane or ship. This requires regular training, just as people who earn first aid and CPR badges must complete ongoing follow-up training.

Tools enter the picture in order to automate disaster recovery testing. Teams create software to aid as much as possible in the design, scheduling, and evaluation of side effects from tests. Incident response is often practiced and measured during these tests as well. Automation is important for several reasons:

  • To scale up and protect as many aspects of your systems as possible, you need to shorten the amount of time a responder requires to create and run tests.
  • To persuade busy coworkers to adopt incident response, you need to make it easy for them.
  • To get approval for the incident response program from managers, you need to minimize costs, an aspect with which automation can help.

Many organizations can start small incident response initiatives without approval from higher management. But to move beyond isolated teams and to try out incident response where it really matters—on the production systems facing your clients—you’ll need buy-in from high up in an organization.

Craig endorses starting small and simple. He reminds the audience that Google has been practicing its current incident response program for 15 years. Don’t try to attain Google’s level of organization and automation from the start. It can be useful just to sit with the people responsible for handling incidents and talking through what they do, while making notes on paper. The chapter on incident response in The Site Reliability Workbook offers additional detail.

Craig and Franco use a couple abbreviations that I’ll define here:

  • MTTM: Median time to mitigate, one of several terms designating how long you can take to recover from an incident.
  • SLO: Service-level objective, part of a service-level agreement (SLA).

The term chaos engineering comes up during the interview, but Craig points out that incident response is a broader activity, and much more under the team’s control than the term “chaos” would suggest.

Take a listen for more interesting details about incident response processes that have worked at Google.

This post is a collaboration between O’Reilly and Google. See our statement of editorial independence.

Continue reading Taming chaos: Preparing for your next incident.

Categories: Technology

Four short links: 1 August 2019

O'Reilly Radar - Thu, 2019/08/01 - 01:00

Software-Defined Analog Circuits, Public Domain, Talk Radio Corpus, and Bad Science

  1. Software-Defined Analog CircuitsZrna hardware realizes the analog circuit you specify in software, in real time. Change any circuit parameter on the fly with an API request, at your lab bench or embedded in-application. This is ... weird. But cool. Cool and weird.c
  2. Most Pre-1964 US Books are in the Public Domain — and finally, thanks to the work of librarians and archivists, for anything that's unambiguously a "book", we have a parseable record of its pre-1964 interactions with the Copyright Office: the initial registration and any potential renewal. (via Evil Mad Scientist)
  3. RadioTalk: A Large-Scale Corpus of Talk Radio Transcripts — arxiv paper and github.
  4. A Rough Guide to Spotting Bad Science — some very useful heuristics. Via this considered evaluation of wild claims.

Continue reading Four short links: 1 August 2019.

Categories: Technology

Learning from adversaries

O'Reilly Radar - Wed, 2019/07/31 - 06:20

Adversarial images aren’t a problem—they’re an opportunity to explore new ways of interacting with AI.

A recent paper, Natural Adversarial Examples, pointed out many real-world images that machine learning (ML) systems identify incorrectly: squirrels classified as sea lions or frogs, eagles classified as limousines, mushrooms classified as pretzels or nails, and so on. If you’re involved with machine learning, you’ve probably seen a number of these images already, in addition to others.

It's important to realize that machine learning makes mistakes. Not only does it make mistakes, it always will make mistakes; unlike traditional programming, it's impossible for any ML system to be perfect. I’ve called that “the paradox of machine learning.” And that fact requires us to treat ML with a certain amount of caution.

But we need to remember three other things:

  1. Humans make mistakes, too. It's easy to look at a picture of a mushroom and say "that's so obviously not a nail." But all of us have been fooled at one time or another—possibly many times—about some visual object. And some of those ML misidentifications are mistakes I'd make. Find a picture of the harvestman (daddy longlegs) and see if you think it doesn't look like a ladybug. Or if you could blame anyone for misidentifying the squirrel hidden in the grass as a frog. It's important to approach ML's limitations with at least some humility.
  2. ML mistakes are often completely different from human mistakes. When ML is wrong, it’s “really” wrong—really? Or do ML mistakes seem outlandish only because they’re different from the ones we’d make? ML mistakes often occur because systems lack context. When humans see a picture, we usually know what we’re supposed to look at. ML systems often mistake the background for the thing itself, so a bird feeder is identified as a bird, even if there isn’t a bird on it. Humans are very good at ignoring extraneous information.
  3. Humans are good at correcting mistakes. This is perhaps where ML systems and humans are most different. ML systems have trouble saying, "I don't know." More than that, they can't say, "Oh, I see, that really isn't a nail; it's a mushroom." They rarely get second chances.

The last point bears some thought. I don't see why software can't admit it made a mistake, particularly if it has a live video feed rather than a static image. As a system looks at something from different perspectives, it should be possible to say "that thing I thought was a nail, it's really a mushroom." It's possible there are already systems that do this; the ability to correct mistaken judgements would be essential for an autonomous vehicle.

That ability is also essential for collaboration between humans and machine learning systems. Collaboration isn't possible when one member of the team (or both) is an oracle that is never wrong, or can never admit it is wrong. When a system only presents a single answer that can't be discussed, humans respond predictably. If they agree with the machine, the machine is useless because it didn't tell them anything they didn't already know. If they disagree, the machine is just wrong. And if the human decides what course of action to take (for example, a patient's diagnosis and treatment), no one may ever find out whether the machine was right.

Part of the solution is exposing other possibilities: if the machine is classifying images, what classifications had high probabilities but were rejected? A machine might get bonus points for saying "why." Explainability may never be one of ML's strengths, but even neural networks can build a list of alternatives weighted by probabilities. How do we build an interface that exposes these alternatives and lets a human evaluate them? What kind of interface would let a human hold a discussion with an AI? An argument? I don't mean a silly chatbot, like Siri; I mean a reasoned discussion about a situation that demands a decision. What would it mean for a human to convince a machine learning system that it’s wrong?

Adversarial images aren’t a problem; they’re an opportunity—and not just an opportunity to fix our classifiers. They’re an opportunity to explore new ways of interacting with AI. We need to move beyond the interfaces and experiences that have informed desktop apps, web apps, and mobile apps. We need to design for collaboration between machines and humans. That's the big challenge facing AI designers.

Continue reading Learning from adversaries.

Categories: Technology

Four short links: 31 July 2019

O'Reilly Radar - Wed, 2019/07/31 - 04:05

Provably Correct AI, Porn & Privacy, Math for CS and ML, and Xenophobia Classifier

  1. ART: Abstraction Refinement-Guided Training for Provably Correct Neural Networks -- provably correct neural networks—now there's an interesting idea...
  2. Tracking Sex: The Implications of Widespread Sexual Data Leakage and Tracking on Porn Websites -- Our analysis of 22,484 pornography websites indicated that 93% leak user data to a third party. Tracking on these sites is highly concentrated by a handful of major companies, which we identify. We successfully extracted privacy policies for 3,856 sites, 17% of the total. The policies were written such that one might need a two-year college education to understand them. Our content analysis of the sample's domains indicated 44.97% of them expose or suggest a specific gender/sexual identity or interest likely to be linked to the user.
  3. Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning -- a 1,962-page LaTeX book which some wag listed as Math Basics for CS and ML on Hacker News.
  4. Open Source Xenophobia Classifier for Tweets -- source is a Colab notebook, and they make their labeled training data available, too.

Continue reading Four short links: 31 July 2019.

Categories: Technology

Four short links: 30 July 2019

O'Reilly Radar - Tue, 2019/07/30 - 03:55

Game Translation, Modern Hypercard, Cryptographic Attacks, and Digital Hardware Debugger

  1. The Near Impossible 20-Year Journey to Translate "Fire Emblem: Thracia 776" (Vice) -- an incredible story of translation philosophy, playing out in the context of fan attempts to make an English-language version of a 1999 tactical RPG.
  2. LiveCode -- open source (GPL) HyperCard-esque app developer, for the modern age. Very nice!
  3. Cryptographic Attacks: A Guide for the Perplexed (Checkpoint) -- various types of cryptographic attacks, with a focus on the attacks' underlying principles.
  4. Glasgow -- FPGA-based tool for exploring digital interfaces, aimed at embedded developers, reverse engineers, digital archivists, electronics hobbyists, and everyone else who wants to communicate to a wide selection of digital devices with high reliability and minimum hassle. It can be attached to most devices without additional active or passive components, and includes extensive protection from unexpected conditions and operator error.

Continue reading Four short links: 30 July 2019.

Categories: Technology

Four short links: 29 July 2019

O'Reilly Radar - Mon, 2019/07/29 - 03:40

Email, End-to-End Encryption, AI Ethics, Reliable Distributed Systems

  1. Notqmail -- Collaborative open source successor to qmail.
  2. The Encryption Debate is Over—Dead at the Hands of Facebook (Forbes) -- Facebook’s model entirely bypasses the encryption debate by globalizing the current practice of compromising devices by building those encryption bypasses directly into the communications clients themselves and deploying what amounts to machine-based wiretaps to billions of users at once.
  3. Why Ethics Cannot be Replaced by the UDHR -- Ethics and the UDHR are on the same page, if we keep it general. But questions about what is the right thing to do or what policy is the right one to implement become challenging only when these dearly held values conflict, necessarily involving trade-offs. When we dive deep, the UDHR is simply unable to guide us on those questions. Solving such challenges is the job of ethical reasoning.
  4. Operating a Large, Distributed System in a Reliable Way: Practices I Learned (Gergely Orosz) -- This post is the collection of the practices I've found useful to reliably operate a large system at Uber, while working here. Generalizable beyond Uber.

Continue reading Four short links: 29 July 2019.

Categories: Technology

Four short links: 26 July 2019

O'Reilly Radar - Fri, 2019/07/26 - 04:40

Disinformation, Election Meddling, Quantum Supremacy, and International Pineapple Day

  1. Disinformation’s Spread: Bots, Trolls, and All of Us (Kate Starbird) -- a short and on-the-mark summary of misconceptions about disinformation.
  2. The Unsexy Threat to Our Election Security (Krebs) -- surprisingly low-tech threats (SIM stealing, hijacking a Twitter account) that could bugger up elections.
  3. Quantum Supremacy is Coming (Quanta) -- "supremacy" is marketing hype. Quantum computers will still be useless for a while to come. "Supremacy" refers to conquering errors and noise enough to make a system that can use quantum phenomenon to do in parallel what classical computers must do in serial—even if it's only on a toy problem.
  4. How I Started Pineapple Day (Andrew Lee) -- “That’s not a real thing,” James retorted with an eyeroll as he set his bag down and sat down at his desk. “Sure it is,” I insisted, and to back my claim up I pulled up Google Calendar and added “International Bring Your Pineapple to Work Day” to our shared company calendar. I set the event to repeat every year on June 27th. Have a great weekend!

Continue reading Four short links: 26 July 2019.

Categories: Technology

Four short links: 25 July 2019

O'Reilly Radar - Thu, 2019/07/25 - 07:10

Mutable Web, Re-Identification, Rule-Based Programming, and Risks of Government Hacking

  1. The Mutable Web -- rewriting Twitter's web styling is hard but not impossible, and makes the author muse on the value of the mutable web. Transparency and introspection are fundamental to the way the web works, and obfuscation, intentional or not, can't really change that.
  2. Estimating the Success of Re-Identifications in Incomplete Data Sets Using Generative Models (Nature) -- Using our model, we find that 99.98% of Americans would be correctly re-identified in any data set using 15 demographic attributes. Reminds me of the finding (claim?) that it only takes 8 (12? citation needed) words to uniquely identify a text.
  3. Picat -- a simple, and yet powerful, logic-based multi-paradigm programming language aimed for general-purpose applications. Picat is a rule-based language, in which predicates, functions, and actors are defined with pattern-matching rules. Interesting take on a language, which made more sense after I read this Hacker News comment.
  4. Security Risks of Government Hacking (Stanford Cyberlaw) -- This paper addresses six main ways that government hacking can raise broader computer security risks. These include: creating a disincentive to disclose vulnerabilities that should be disclosed because other attackers might independently discover them; cultivating a market for surveillance tools and 0-days; risking that vulnerabilities exploited by the malware will be identified and used by other attackers, as a result of either law enforcement’s losing control of the hacking tools, or discovery by outsiders of law enforcement’s hacking activity; creating an incentive to push for less-secure software and standards; and risking that the malware will affect innocent users.

Continue reading Four short links: 25 July 2019.

Categories: Technology


Subscribe to LuftHans aggregator