You are here

Feed aggregator

Four short links: 21 November 2018

O'Reilly Radar - Wed, 2018/11/21 - 05:55

Black Mirror, Innovation Toolkits, Code-Generator for APIs, and Hardware Effects

  1. Black Mirror Brainstorms (Aaron Lewis) -- In light of the latest FB scandal, here's my proposal for replacing Design Sprints: "Black Mirror Brainstorms." A workshop in which you create a Black Mirror episode. The plot must revolve around misuse of your team's product. See Casey Fiesler's Black Mirror, Light Mirror, which I've linked to before on 4SL.
  2. Toolkit Navigator -- A compendium of toolkits for public sector innovation and transformation, curated by OPSI and our partners around the world.
  3. Conjure -- Palantir's open source simple but opinionated toolchain for defining APIs once and generating client/server interfaces in multiple languages. For more, read the blog post.
  4. Hardware Effects -- this repository demonstrates various hardware effects that can degrade application performance in surprising ways and that may be very hard to explain without knowledge of the low-level CPU and OS architecture. For each effect I try to create a proof of concept program that is as small as possible so it can be understood easily. How full stack ARE you?

Continue reading Four short links: 21 November 2018.

Categories: Technology

Four short links: 20 November 2018

O'Reilly Radar - Tue, 2018/11/20 - 05:15

East African ML Needs, Autonomy Corrections, Information Security, and UIs from Doodles

  1. Some Requests for Machine Learning Research from the East African Tech Scene -- Based on 46 in–depth interviews [...] a list of concrete machine learning research problems, progress on which would directly benefit tech ventures in East Africa. Example: Priors for autocorrect and low-literacy SMS use—SMS text contains many language misuses due to a combination of autocorrection and low literacy. E.g., “poultry farmer” becoming “poetry farmer.” Such mistakes are bound to occur in any written language corpus, but engineers working with rural populations in East Africa report that this is a prevalent issue for them, confounding the use of pretrained language models. This problem also exists to some degree in voice data with respect to English spoken in different accents. Priors over autocorrect substitution rules, or custom, per–dialect confusion matrices between phonetically similar words could potentially help. Expect much more work like this as AI/ML moves into non-WEIRD (Western Educated Industrialized Rich Democratic) nations.
  2. How the Media Gets Tesla Wrong -- a reminder that our convenient shorthand and once-over-lightly reading of the news gives a false and rosy picture of what's possible.
  3. Why Information Security is Hard: An Economic Perspective -- fascinating arguments! I particularly like the statistical argument: a lone attacker might find 10 bugs a year, a well-prepared defender might find 1,000 bugs a year, but if there are 100,000 available bugs for exploitation, then there's very low probability that the defender found and patched the same bugs that the attacker found...
  4. DoodleMaster -- sketches->UI via a CNN, a proof-of-concept.

Continue reading Four short links: 20 November 2018.

Categories: Technology

Four short links: 19 November 2018

O'Reilly Radar - Mon, 2018/11/19 - 05:00

Partial Time, Black Mirror, Implant Usability, and Open Source Game

  1. Time is Partial -- Even though time naturally feels like a total order, studying distributed systems or weak memory exposes you, head on, to how it isn’t. And that’s precisely because these are both cases where our standard over-approximation of time being total limits performance—which we obviously can’t have.
  2. Black Mirror, Light Mirror: Teaching Technology Ethics Through Speculation (Casey Fiesler) -- This is not a new idea, and I’m certainly not the only one to do a lot of thinking about it (e.g., see “How to Teach Computer Ethics Through Science Fiction”), but I wanted to share two specific exercises that I use and that I think are easily adaptable.
  3. How I Lost and Regained Control of My Microchip Implant (Vice) -- After a year of living with a totally useless NFC implant, I kind of started to like it. That small, almost imperceptible little bump on my left hand was a constant reminder that even the most sophisticated and fool-proof technologies are no match for human incompetence. (via Slashdot)
  4. System Syzygy -- open source puzzle game for Mac, Windows, and Linux. (via Andrew Plotkin)

Continue reading Four short links: 19 November 2018.

Categories: Technology

10 top Java resources on O’Reilly’s online learning platform

O'Reilly Radar - Mon, 2018/11/19 - 05:00

Our most-used Java resources will help you stay on track in your journey to learn and apply Java.

We dove into the data on our online learning platform to identify the most-used Java resources. These are the items our platform subscribers regularly turn to as they apply Java in their projects and organizations.

Effective Java, 3rd Edition — Joshua Bloch covers language and library features added in Java 7, 8, and 9, including the functional programming constructs that were added to its object-oriented roots. Many new items have been added, including a chapter devoted to lambdas and streams.

Java 8 and 9 Fundamentals: Modern Java Development with Lambdas, Streams, and Introducing Java 9’s JShell and the Java Platform Module System (JPMS) — Paul Deitel applies the Deitel signature live-code approach to teaching programming and explores the Java language and Java APIs in depth.

Java 8 in Action: Lambdas, streams, and functional-style programming — Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft cover lambdas, streams, and functional-style programming in this clearly written guide to to the new features of Java 8.

Head First Java, 2nd Edition — Bert Bates and Kathy Sierra offer a complete introduction to object-oriented programming and Java.

OCP Oracle Certified Professional Java SE 8 Programmer II — Scott Selikoff and Jeanne Boyarsky bring you a comprehensive companion for preparing for Exam 1Z0-809 as well as upgrade Exam 1Z0-810 and Exam 1Z0-813.

Java Concurrency in Practice — This book arms readers with both the theoretical underpinnings and concrete techniques for building reliable, scalable, maintainable concurrent applications.

Optimizing Java — Chris Newland, James Gough, and Benjamin Evans teach you how to tune Java applications for performance using a quantitative, verifiable approach.

Java: The Complete Reference, 10th Edition — Herbert Schildt covers the entire Java language, including its syntax, keywords, and fundamental programming principles.

Java for Beginners: Step-by-Step Hands-On Guide to Java — Manuj Aggarwal and the TetraTutorials Team bring you a course jam-packed with practical demos, homework assignments, and live coding to help you grasp the complex topics.

Cloud Native Java — Josh Long and Kenny Bastani show Java/JVM developers how to build better software, faster, using Spring Boot, Spring Cloud, and Cloud Foundry.

Continue reading 10 top Java resources on O’Reilly’s online learning platform.

Categories: Technology

Four short links: 16 November 2018

O'Reilly Radar - Fri, 2018/11/16 - 05:00

Illuminated Paper, Software Forge, Leak Checklist, and PC on ESP

  1. IllumiPaper -- illuminated elements built into regular paper, with implementation.
  2. -- (pronounced "sir hat") a software forge like GitHub or GitLab, but with interesting strengths (e.g., very lightweight pages, and the CI system).
  3. Leak Mitigation Checklist -- If you just leaked sensitive information in public source code, read this document as part of your emergency procedure.
  4. Emulating an IBM PC on an ESP8266 -- an 8086 PC-XT emulation with 640K RAM, 80×25 CGA composite video, and a 1.44MB MS-DOS disk on an ESP12E without additional components. (via Alasdair Allen)

Continue reading Four short links: 16 November 2018.

Categories: Technology

Four short links: 15 November 2018

O'Reilly Radar - Thu, 2018/11/15 - 04:50

Punish Online Criminals, Fake Fingerprints, Implementing Identity, and Project Visbug

  1. USA Needs to Pursue Malicious Cyber Actors -- a report that argues that the United States currently lacks a comprehensive overarching strategic approach to identify, stop, and punish cyberattackers. (1) There is a burgeoning cybercrime wave. (2) There is a stunning cyber enforcement gap. (3) There is no comprehensive U.S. cyber enforcement strategy aimed at the human attacker. This is definitely a golden age of online crime.
  2. DeepMasterPrints: Generating MasterPrints for Dictionary Attacks via Latent Variable Evolution -- MasterPrints are real or synthetic fingerprints that can fortuitously match with a large number of fingerprints, thereby undermining the security afforded by fingerprint systems. Previous work by Roy, et al., generated synthetic MasterPrints at the feature level. In this work, we generate complete image-level MasterPrints known as DeepMasterPrints, whose attack accuracy is found to be much superior than that of previous methods. (via Mikko Hypponen)
  3. The Tripartite Identity Pattern (Randy Farmer) -- The three components of user identity are: the account identifier, the login identifier, and the public identifier.
  4. Project VisBug -- edit/tweak existing webpages.

Continue reading Four short links: 15 November 2018.

Categories: Technology

Four short links: 14 November 2018

O'Reilly Radar - Wed, 2018/11/14 - 05:45

ML Risk, IGF Session, Feature Engineering, and Solving Snake

  1. Managing Risk in Machine Learning Projects (Ben Lorica) -- Considerations for a world where ML models are becoming mission critical.
  2. Transcripts of 2018 IGF -- Internet Governance Forum session transcripts.
  3. Featuretools -- open source Python framework for automated feature engineering.
  4. Solving Snake -- fun exploration of different algorithms you might use to play the Snake game.

Continue reading Four short links: 14 November 2018.

Categories: Technology

Four short links: 13 November 2018

O'Reilly Radar - Tue, 2018/11/13 - 05:05

Ways of Working, Too-Smart AI, Wi-Fi Vision, and Materials Science AI

  1. Internet-Era Ways of Working -- an elegant brief summary of how we do software in 2018, from Tom Loosemore's team.
  2. Examples of AI Gaming the System -- a list of examples of AIs learning more than was intended. Neural nets evolved to classify edible and poisonous mushrooms, took advantage of the data being presented in alternating order, and didn't actually learn any features of the input images. (via BoingBoing)
  3. Using Wi-Fi to “See” Behind Closed Doors is Easier than Anyone Thought (MIT TR) -- if all you are interested in is the movement of people. Humans also reflect and distort this Wi-Fi light. The distortion, and the way it moves, would be clearly visible through Wi-Fi eyes, even though the other details would be smeared. This crazy Wi-Fi vision would clearly reveal whether anybody was behind a wall and, if so, whether the person was moving. That’s the basis of Zhu and co’s Wi-Fi-based peeping tom. It looks for changes in an ordinary Wi-Fi signal that reveal the presence of humans.
  4. Learning Process-Structure-Property Relations -- clever research project that mines research literature to learn relationships about the physical properties and processes in materials science, then automatically generates a diagam for the particular constraints your project has. Code released as open source.

Continue reading Four short links: 13 November 2018.

Categories: Technology

Managing risk in machine learning

O'Reilly Radar - Tue, 2018/11/13 - 05:00

Considerations for a world where ML models are becoming mission critical.

In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in New York last September. As the data community begins to deploy more machine learning (ML) models, I wanted to review some important considerations.

Let’s begin by looking at the state of adoption. We recently conducted a survey which garnered more than 11,000 respondents—our main goal was to ascertain how enterprises were using machine learning. One of the things we learned was that many companies are still in the early stages of deploying machine learning (ML):

As far as reasons for companies holding back, we found from a survey we conducted earlier this year that companies cited lack of skilled people, a “skills gap,” as the main challenge holding back adoption.

Interest on the part of companies means the demand side for “machine learning talent” is healthy. Developers have taken notice and are beginning to learn about ML. In our own online training platform (which has more than 2.1 million users), we’re finding strong interest in machine learning topics. Below are the top search topics on our training platform:

Beyond “search,” note that we’re seeing strong growth in consumption of content related to ML across all formats—books, posts, video, and training.

Before I continue, it’s important to emphasize that machine learning is much more than building models. You need to have the culture, processes, and infrastructure in place before you can deploy many models into products and services. At the recent Strata Data conference we had a series of talks on relevant cultural, organizational, and engineering topics. Here's a list of a few clusters of relevant sessions from the recent conference:

Over the last 12-18 months, companies that use a lot of ML and employ teams of data scientists have been describing their internal data science platforms (see, for example, Uber, Netflix, Twitter, and Facebook). They share some of the features I list below, including support for multiple ML libraries and frameworks, notebooks, scheduling, and collaboration. Some companies include advanced capabilities, including a way for data scientists to share features used in ML models, tools that can automatically search through potential models, and some platforms even have model deployment capabilities:

As you get beyond prototyping and you actually begin to deploy ML models, there are many challenges that will arise as those models begin to interact with real users or devices. David Talby summarized some of these key challenges in a recent post:

  • Your models may start degrading in accuracy
  • Models will need to be customized (for specific locations, cultural settings, domains, and applications)
  • Real modeling begins once in production

There are also many important considerations that go beyond optimizing a statistical or quantitative metric. For instance, there are certain areas—such as credit scoring or health care—that require a model to be explainable. In certain application domains (including autonomous vehicles or medical applications), safety and error estimates are paramount. As we deploy ML in many real-world contexts, optimizing statistical or business metics alone will not suffice. The data science community has been increasingly engaged in two topics I want to cover in the rest of this post: privacy and fairness in machine learning.

Privacy and security

Given the growing interest in data privacy among users and regulators, there is a lot of interest in tools that will enable you to build ML models while protecting data privacy. These tools rely on building blocks, and we are beginning to see working systems that combine many of these building blocks. Some of these tools are open source and are becoming available for use by the broader data community:

  • Federated learning is useful when you want to collaborate and build a centralized model without sharing private data. It’s used in production at Google, but we still are in need of tools to make federated learning broadly accessible.
  • We’re starting to see tools that allow you to build models while guaranteeing differential privacy, one of the most popular and powerful definitions of privacy. At a high-level these methods inject random noise at different stages of the model building process. These emerging sets of tools aim to be accessible to data scientists who are already using libraries such as scikit-learn and TensorFlow. The hope is that data scientists will soon be able to routinely build differentially private models.
  • There’s a small and growing number of researchers and entrepreneurs who are investigating whether we can build or use machine learning models on encrypted data. This past year, we’ve seen open source libraries (HElib and Palisade) for fast homomorphic encryption, and we have startups that are building machine learning tools and services on top of those libraries. The main bottleneck here is speed: many researchers are actively investigating hardware and software tools that can speed up model inference (and perhaps even model building) on encrypted data.
  • Secure multi-party computation is another promising class of techniques used in this area.

Now let’s consider fairness. Over the last couple of years, many ML researchers and practitioners have started investigating and developing tools that can help ensure ML models are fair and just. Just the other day, I searched Google for recent news stories about AI, and I was surprised by the number of articles that touch on fairness.

For the rest of this section, let’s assume one is building a classifier and that certain variables are considered “protected attributes” (this can include things like age, ethnicity, gender, ...). It turns out that the ML research community has used numerous mathematical criteria to define what it means for a classifier to be fair. Fortunately, a recent survey paper from Stanford—A Critical Review of Fair Machine Learning—simplifies these criteria and groups them into the following types of measures:

  • Anti-classification means the omission of protected attributes and their proxies from the model or classifier.
  • Classification parity means that one or more of the standard performance measures (e.g., false positive and false negative rates, precision, recall) are the same across groups defined by the protected attributes.
  • Calibration: If an algorithm produces a “score,” that “score” should mean the same thing for different groups.

However, as the authors from Stanford point out in their paper, each of the mathematical formulations described above suffers from limitations. With respect to fairness, there is no black box or series of procedures that you can stick your algorithm into that can give it a clean bill of health. There is no such thing as a “one size, fits all” procedure.

Because there’s no ironclad procedure, you will need a team of humans-in-the-loop. Notions of fairness are not only domain and context sensitive, but as researchers from UC Berkeley recently pointed out, there is a temporal dimension as well (“We advocate for a view toward long-term outcomes in the discussion of ‘fair’ machine learning”). What is needed are data scientists who can interrogate the data and understand the underlying distributions, working alongside domain experts who can evaluate models holistically.

Culture and organization

As we deploy more models, it’s becoming clear that we will need to think beyond optimizing statistical and business metrics. While I haven’t touched on them during this short post, it’s clear that reliability and safety are going to be extremely important moving forward. How do you build and organize your team in a world where ML models have to take many other important things under consideration?

Fortunately there are members of our data community who have been thinking about these problems. The Future of Privacy Forum and Immuta recently released a report with some great suggestions on how one might approach machine learning projects with risk management in mind:

  • When you’re working on a machine learning project, you need to employ a mix of data engineers, data scientists, and domain experts.
  • One important change outlined in the report is the need for a set of data scientists who are independent from this model-building team. This team of “validators” can then be tasked with evaluating the ML model on things like explainability, privacy, and fairness.
Closing remarks

So, what skills will be needed in a world where ML models are becoming mission critical? As noted above, fairness audits will require a mix of data and domain experts. In fact, a recent analysis of job postings from NBER found that compared with other data analysis skills, machine learning skills tend to be bundled with domain knowledge.

But you’ll also need to supplement your data and domain experts with with legal and security experts. Moving forward, we’ll need to have legal, compliance, and security people working more closely with data scientists and data engineers.

This shouldn’t come as a shock: we already invest in desktop security, web security, and mobile security. If machine learning is going to eat software, we will need to grapple with AI and ML security, too.

Related content:

Continue reading Managing risk in machine learning.

Categories: Technology

Four short links: 12 November 2018

O'Reilly Radar - Mon, 2018/11/12 - 06:15

Gov Open Source, Bruce Sterling, Robot Science, and Illustrated TLS 1.3

  1. FDA MyStudies App -- open source from government, designed to facilitate the input of real-world data directly by patients which can be linked to electronic health data supporting traditional clinical trials, pragmatic trials, observational studies, and registries.
  2. Bruce Sterling Interview -- on architecture, design, science fiction, futurism, and involuntary parks. (via Cory Doctorow)
  3. Inventing New Materials with AI (MIT TR) -- using machine learning to generate hypotheses for new materials, to be explored and tested by actual humans.
  4. The New Illustrated TLS Connection -- Every byte explained and reproduced. A revised edition in which we dissect the new manner of secure and authenticated data exchange, the TLS 1.3 cryptographic protocol.

Continue reading Four short links: 12 November 2018.

Categories: Technology

Four short links: 9 November 2018

O'Reilly Radar - Fri, 2018/11/09 - 06:00

Counting Computers, New Software, Unix History, and Tencent Framework

  1. How Many Computers Are In Your Computer? -- So, a desktop or smartphone can reasonably be expected to have anywhere from 15 to several thousand computers in the sense of a Turing-complete device which can be programmed and which is computationally powerful enough to run many programs from throughout computing history and which can be exploited by an adversary for surveillance, exfiltration, or attacks against the rest of the system. Which is why security folks sometimes sleep poorly at night.
  2. Some Notes on Running New Software in Production (Julia Evans) -- The playbook for understanding the software you run in production is pretty simple. Here it is: (1) Start using it in production in a non-critical capacity (by sending a small percentage of traffic to it, on a less critical service, etc); (2) Let that bake for a few weeks. (3) Run into problems. (4) Fix the problems. Go to step 3.
  3. Unix History (Rob Pike) -- know your past.
  4. Omi -- Tencent's ext generation web framework in 4KB JavaScript (Web Components + JSX + Proxy + Store + Path Updating).

Continue reading Four short links: 9 November 2018.

Categories: Technology

Lessons learned while helping enterprises adopt machine learning

O'Reilly Radar - Thu, 2018/11/08 - 05:20

The O’Reilly Data Show Podcast: Francesca Lazzeri and Jaya Mathew on digital transformation, culture and organization, and the team data science process.

In this episode of the Data Show, I spoke with Francesca Lazzeri, an AI and machine learning scientist at Microsoft, and her colleague Jaya Mathew, a senior data scientist at Microsoft. We conducted a couple of surveys this year—“How Companies Are Putting AI to Work Through Deep Learning” and “The State of Machine Learning Adoption in the Enterprise”—and we found that while many companies are still in the early stages of machine learning adoption, there’s considerable interest in moving forward with projects in the near future. Lazzeri and Mathew spend a considerable amount of time interacting with companies that are beginning to use machine learning and have experiences that span many different industries and applications. I wanted to learn some of the processes and tools they use when they assist companies in beginning their machine learning journeys.

Continue reading Lessons learned while helping enterprises adopt machine learning.

Categories: Technology

Four short links: 8 November 2018

O'Reilly Radar - Thu, 2018/11/08 - 04:55

Approximate Graph Pattern Mining, Ephemeral Containers, SaaS Metrics, and Edge Neural Networks

  1. ASAP: Fast, Approximate, Graph Pattern Mining at Scale (Usenix) -- we present A Swift Approximate Pattern-miner (ASAP), a system that enables both fast and scalable pattern mining. ASAP is motivated by one key observation: in many pattern mining tasks, it is often not necessary to output the exact answer [...] an approximate count is good enough. (via Morning Paper)
  2. Binci -- tackling the same problem space as Docker Compose, but aimed at ephemeral containers rather than long-running ones (e.g., for test/CI systems).
  3. Metrics for Investors (Andrew Chen) -- detailed take on the metrics through which investors view SaaS businesses.
  4. How to Fit Large Neural Networks on the Edge -- This blog explores a few techniques that can be used to fit neural networks in memory-constrained settings. Different techniques are used for the “training” and “inference” stages, and hence they are discussed separately.

Continue reading Four short links: 8 November 2018.

Categories: Technology

Four short links: 7 November 2018

O'Reilly Radar - Wed, 2018/11/07 - 05:10

Summarizing Text, Knowledge Database, AI Park, and Approximate Regexes

  1. Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting -- Inspired by how humans summarize long documents, we propose an accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively (i.e., compresses and paraphrases) to generate a concise overall summary. We use a novel sentence-level policy gradient method to bridge the non-differentiable computation between these two neural networks in a hierarchical way, while maintaining language fluency. Source code available.
  2. KBPedia -- a comprehensive knowledge structure for promoting data interoperability and knowledge-based artificial intelligence, [which] combines seven "core" public knowledge bases—Wikipedia, Wikidata,, DBpedia, GeoNames, OpenCyc, and UMBEL—into an integrated whole. Now has a serious open source offering.
  3. Baidu Opens AI Park in Beijing -- autonomous buses, smart walkways that track people's steps using facial recognition, intelligent pavilions equipped with the company's conversational DuerOS system, and augmented reality Tai Chi lessons. It's theatre, but theatre sets perceptions. In this case, the perception that China is miles ahead of America in AI. It was the AR Tai Chi that caught my eye.
  4. TRE: A Regex Engine with Approximate Matching -- It does this by calculating the Levenshtein distance (number of insertions, deletions, or substitutions it would take to make the strings equal) as it searches for a match.

Continue reading Four short links: 7 November 2018.

Categories: Technology

140 live online training courses opened for November, December, and January

O'Reilly Radar - Wed, 2018/11/07 - 04:00

Get hands-on training in deep learning, Python, Kubernetes, blockchain, security, and many other topics.

Learn new topics and refine your skills with 140 live online training courses we opened up for November, December, and January on our learning platform.

Artificial intelligence and machine learning

Artificial Intelligence for Big Data, November 28-29

Essential Machine Learning and Exploratory Data Analysis with Python and Jupyter Notebook , December 3

Deep Learning for Machine Vision, December 4

Beginning Machine Learning with Scikit-Learn, December 5

Managed Machine Learning Systems and Internet of Things, December 5-6

Natural Language Processing (NLP) from Scratch, December 7

Machine Learning in Practice, December 7

Deep Learning with TensorFlow, December 12

Getting Started with Machine Learning, December 12

Essential Machine Learning and Exploratory Data Analysis with Python and Jupyter Notebook, January 7-8

Artificial Intelligence: AI for Business, January 9

Managed Machine Learning Systems and Internet of Things, January 9-10

Applied Deep Learning for Coders with Apache MXNet, January 10-11

Artificial Intelligence: An Overview of AI and Machine Learning, January 15

Hands-On Machine Learning with Python: Classification and Regression, January 16

Hands-On Machine Learning with Python: Clustering, Dimension Reduction, and Time Series Analysis, January 17


Building Smart Contracts on the Blockchain, November 29-30

Introducing Blockchain, December 7

Understanding Hyperledger Fabric Blockchain, December 10-11

Blockchain Applications and Smart Contracts: Developing Decentralized, December 13


Spotlight on Innovation: The Future Beyond Digital, Entering a New Era of Exploration and Collaboration, November 28

Negotiation Fundamentals, December 7

Applying Critical Thinking, December 10

How to Give Great Presentations, December 10

Performance Goals for Growth, December 12

Leadership Communication Skills for Managers, January 9

Introduction to Critical Thinking, January 10

Introduction to Delegation Skills, January 10

Why Smart Leaders Fail, January 15

Data science and data tools

Real-Time Data Foundations: Kafka, December 3

Real-Time Data Foundations: Spark, December 4

Getting Started with Pandas, December 5

Getting Started with Python 3, December 5-6

Mastering Pandas, December 6

Real-Time Data Foundations: Flink, December 7

Apache Hadoop, Spark, and Big Data Foundations, December 10

Real-Time Data Foundations: Time Series Architectures, December 10

Sentiment Analysis for Chatbots in Python, December 11

Hands-on Introduction to Apache Hadoop and Spark Programming, December 12-13

Building Intelligent Bots in Python, December 13

Intermediate Machine Learning with Scikit-Learn, December 17


3ds Max and V-Ray: The Path Towards Photorealism, December 14


Java Full Throttle with Paul Deitel: A One-Day, Code-Intensive Java Standard Edition Presentation, November 15

Next-Generation Java Testing with JUnit 5, November 15

Mastering the Basics of Relational SQL Querying, November 19-20

Designing Bots and Conversational Apps for Work, November 29

Pythonic Object-Oriented Programming, December 3

Bash Shell Scripting in 3 Hours, December 3

Beyond Python Scripts: Logging, Modules, and Dependency Management, December 5

Linux Filesystem Administration, December 5-6

Beyond Python Scripts: Exceptions, Error Handling, and Command-Line Interfaces, December 6

Next Level Git - Master your Workflow, December 6

Programming with Java 8 Lambdas and Streams, December 6

Consumer Driven Contracts - A Hands-On Guide to Spring Cloud Contract, December 10

SQL for Any IT Professional, December 10

Linux Under the Hood, December 10

Linux Troubleshooting, December 11

Scalable Concurrency with the Java Executor Framework, December 11

Next Level Git - Master your Content, December 13

Linux Performance Optimization, December 13

Mastering Go for UNIX Administrators, UNIX Developers, and Web Developers, December 13-14

Getting Started with Java: From Core Concepts to Real Code in 4 Hours, December 17

Reactive Spring Boot, December 17

Scala Fundamentals: From Core Concepts to Real Code in 5 Hours, December 18

Programming with Data: Python and Pandas, December 18

Spring Boot and Kotlin, December 18

Julia 1.0 Essentials, December 18

Functional Design for Java 8, December 18-19

Java 8 Generics in 3 Hours, December 20

Python: The Next Level, January 7-8

Design Patterns Boot Camp, January 9-10

Learning Python 3 by Example, January 10

Modern JavaScript, January 14

Learn the Basics of Scala, January 14

Getting Started with Pandas, January 14

Introduction to JavaScript Programming, January 14-15

Getting Started with Python 3, January 14-15

Mastering Pandas, January 15

Scaling Python with Generators, January 15

Getting Started with Pytest, January 16

OCA Java SE 8 Programmer Certification Crash Course, January 16-18

Mastering Python's Pytest, January 17

Pythonic Design Patterns, January 18

Visualization in Python with Matplotlib, January 18


Cybersecurity Offensive and Defensive Techniques in 3 Hours, December 7

Cyber Security Fundamentals, December 10-11

Certified Ethical Hacker (CEH) Crash Course, December 13-14

Intense Introduction to Hacking Web Applications, December 17

CCNA Security Crash Course, December 18-19

CompTIA PenTest+ Crash Course, December 18-19

CompTIA Security+ SY0-501 Crash Course, January 7-8

AWS Certified Security - Specialty Crash Course, January 7-8

AWS Advanced Security with Config, GuardDuty, and Macie, January 14

Ethical Hacking Bootcamp with Hands-on Labs, January 15-17

CompTIA Security+ SY0-501 Certification Practice Questions and Exam Strategies, January 16

Cyber Ops SECFND 210-250 Crash Course, January 16

CCNA Cyber Ops SECOPS 210-255 Crash Course, January 18

Software architecture

Developing Incremental Architecture, December 10

Implementing Evolutionary Architectures, December 13-14

Architecture for Continuous Delivery, December 17

Architecture by Example, December 17-18

Comparing Service-Based Architectures, December 18

Software Architecture for Developers, January 7

Systems engineering and operations

Automating with Ansible, December 3

An Introduction to DevOps with AWS, December 3

Red Hat Certified Engineer (RHCE) Crash Course, December 4-7

9 Steps to Awesome with Kubernetes, December 5

Ansible for Managing Network Devices, December 5

Amazon Web Services: AWS Managed Services, December 5-6

Network Troubleshooting Using the Half Split and OODA, December 6

Google Cloud Certified Associate Cloud Engineer Crash Course, December 6-7

Getting Started with Continuous Delivery (CD), December 10

AWS Monitoring Strategies, December 10

Practical Docker, December 11

Getting Started with Amazon Web Services (AWS), December 11-12

Amazon Web Services: Architect Associate Certification - AWS Core Architecture Concepts, December 11-12

CCNP R/S ROUTE (300-101) Crash Course, December 11-13

Ansible in 3 Hours, December 12

Amazon Web Services: AWS Design Fundamentals, December 13-14

Deploying Container-Based Microservices on AWS, December 13-14

Kubernetes in 3 Hours, December 14

Jenkins 2: Up and Running, December 17

CompTIA Cloud+ CV0-002 Exam Prep, December 17

CCNP R/S SWITCH (300-115) Crash Course, December 17-19

Google Cloud Platform (GCP) for AWS Professionals, December 18

AWS CloudFormation Deep Dive, January 7-8

Red Hat Certified System Administrator (RHCSA) Crash Course, January 7-10

Building a Cloud Roadmap, January 9

Implementing and Troubleshooting TCP/IP, January 9

Docker: Up and Running, January 9-10

Building Distributed Pipelines for Data Science Using Kafka, Spark, and Cassandra, January 9-11

Understanding AWS Cloud Compute Options, January 10-11

Istio on Kubernetes: Enter the Service Mesh, January 16

AWS Certified SysOps Administrator (Associate) Crash Course, January 16-17

Chaos Engineering: Planning and Running Your First Game Day, January 17

Visualizing Software Architecture with the C4 Model, January 18

Web programming

Hands-On Chatbot and Conversational UI Development, December 3-4

Building APIs with Django REST Framework, December 17

Developing Web Apps with Angular and TypeScript, December 17-19

Rethinking REST: A Hands-On Guide to GraphQL and Queryable APIs, December 18

Continue reading 140 live online training courses opened for November, December, and January.

Categories: Technology

Kubernetes' scheduling magic revealed

O'Reilly Radar - Tue, 2018/11/06 - 07:40

Understanding how the Kubernetes scheduler makes scheduling decisions is critical to ensure consistent performance and optimal resource utilization.

Kubernetes is an industry-changing technology that allows massive scale and simplicity for the orchestration of containers. Most of us happily push thousands of deployments and pods to Kubernetes every day. Have you ever wondered what sorcery is at play in Kubernetes to determine where all those pods will be created in the Kubernetes cluster? All of this is made possible by the kube-scheduler.

Understanding how the Kubernetes scheduler makes scheduling decisions is critical in order to ensure consistent performance and optimal resource utilization. All scheduling in Kubernetes is done based upon a few key pieces of information. First, it is using information about the worker node to determine what the total capacity of the node is. Using kubectl describe node <node> will give you all the information you need to understand regarding how the scheduler sees the world.

Capacity: cpu: 4 ephemeral-storage: 103079200Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16427940Ki pods: 110 Allocatable: cpu: 3600m ephemeral-storage: 98127962034 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 14932524020 pods: 110

Here we see what the scheduler sees as being the total capacity of the worker node as well as the allocatable capacity. The allocatable numbers factor in kubelet settings for Kubernetes and system reserved space. Allocatable represents the total space the scheduler has to work with for a given node.

Next, we need to look at how we instruct the scheduler about our workload. It is important to note that Kubernetes does not consider actual CPU and memory utilization of a workload. It factors in only the resource descriptions provided by the developer or operator. Here is an example from a pod object definition:

resources: limits: cpu: 100m memory: 170Mi requests: cpu: 100m memory: 170Mi

These are the specifications provided at the container level. The developer must provide these resource requests and limits on a per container basis, not per pod. What do these specifications mean? The limits are only considered by the kubelet and are not a factor during scheduling. This indicates that the cgroup of this container will be set to limit CPU utilization to 10% of a single CPU core, and if memory utilization exceeds 170MB, then the process will be killed and restarted; there is no “soft” memory limit in Kubernetes use of cgroups. The requests are used by the scheduler to determine the best worker node on which to place this workload. Note that the scheduler is summing the resource requests of all containers in the pod to determine where to place it. The kubelet is enforcing limits on a per-container basis.

We now have enough information to understand the basic resource-based scheduling logic that Kubernetes uses. When a new pod is created, the scheduler looks at the total resource requests of the pod and then attempts to find the worker node that has the most available resources. This is tracked by the scheduler for each node, as seen in kubectl describe node:

Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 1333m (37%) 2138m (59%) 1033593344 (6%) 1514539264 (10%)

You can investigate the exact details of the Kubernetes scheduler via the source code. There are two key concepts in scheduling. On the first pass, the scheduler attempts to filter the nodes that are capable of running a given pod based on resource requests and other scheduling requirements. On the second pass the scheduler weighs the eligible nodes based on absolute and relative resource utilization of the nodes and other factors. The highest weighted eligible node is selected for scheduling of the pod.

This post is part of a collaboration between O'Reilly and IBM. See our statement of editorial independence.

Continue reading Kubernetes' scheduling magic revealed.

Categories: Technology

Four short links: 6 November 2018

O'Reilly Radar - Tue, 2018/11/06 - 05:50

People Don't Change, Open Access, Event Database, and Apple Maps

  1. People Don't Change -- interesting and entertaining talk to remind you that modern people with their selfies and mobile phone obsessions aren't new special creatures unlike the people of the past. The first half is non-technical similarities, and the second half kicks into how the same human drives behind our tech obsessions can be found (with different tech) in the past. (via Daniel Siegel)
  2. Bill and Melinda Gates Foundation Endorses European Open-Access Plan (Nature) -- the Wellcome Trust, which funds over a billion pounds of research each year, will only permit publication in subscription journals if there's simultaneous release in PubMed Central. The Gates Foundation, which is already strongly pro-OA, is bringing its requirements in line with the new European Plan S. (via Slashdot)
  3. EventStore -- open source, functional database with complex event processing in JavaScript.
  4. Apple's New Maps -- fantastically detailed write-up of the new Apple Maps, coverage, visuals, omissions.

Continue reading Four short links: 6 November 2018.

Categories: Technology

Four short links: 5 November 2018

O'Reilly Radar - Mon, 2018/11/05 - 05:00

Probabilistic Model Checker, Notebooks to Docs, AWS 12-Factor Apps, and AI Physicist

  1. Stormchecker -- A modern model checker for probabilistic systems. Test your models of your distributed system.
  2. MonoCorpus -- a note-taking app for software and machine learning engineers meant to encourage learning, sharing, and easier development. Increase documentation for yourself and your team without slowing your velocity. Take notes as part of your process instead of dedicating time to writing them. An interesting use for notebooks.
  3. Odin -- Deploy your 12-factor-applications to AWS easily and securely with the Odin, an AWS Step Function based on the step framework that deploys services as auto-scaling groups (ASGs).
  4. Toward an AI Physicist for Unsupervised Learning -- We investigate opportunities and challenges for improving unsupervised machine learning using four common strategies with a long history in physics: divide-and-conquer, Occam's Razor, unification, and lifelong learning. Instead of using one model to learn everything, we propose a novel paradigm centered around the learning and manipulation of *theories*, which parsimoniously predict both aspects of the future (from past observations) and the domain in which these predictions are accurate. (see also MIT TR)

Continue reading Four short links: 5 November 2018.

Categories: Technology

What changes when we go offline-first?

O'Reilly Radar - Fri, 2018/11/02 - 13:00

Martin Kleppmann shows how recent computer science research is helping develop the abstractions and APIs for the next generation of applications.

Continue reading What changes when we go offline-first?.

Categories: Technology

The freedom of Kubernetes

O'Reilly Radar - Fri, 2018/11/02 - 13:00

Kris Nova looks at the new era of the cloud native space and the kernel that has made it all possible: Kubernetes.

Continue reading The freedom of Kubernetes.

Categories: Technology


Subscribe to LuftHans aggregator