转载

[译] 机器学习工作职位需要的 7 项技能

本文翻译自: 7 key skills required for Machine Learning jobs from “Big Data Made Simple”(大数据其实很简单) 该网站由 Crayon Data 创建,该公司是关注大数据及其分析的几家科技投资商之一,由新加坡和印度联合设立。

作者:Alexa Strife(整理自Quora的: What skills are needed for machine learning jobs ?)

译者:@善良的右行 fromNLPJob翻译小组

本文地址: http://blog.nlpjob.com/?p=959

转载请务必保留上述声明和出处,否则视为侵权。

[译] 机器学习工作职位需要的 7 项技能

Machine Learning is usually associated with artificial intelligence (AI) that provides computers with the ability to do certain tasks, such as recognition, diagnosis, planning, robot control, prediction, etc., without being explicitly programmed. It focuses on the development of algorithms that can teach themselves to grow and change when exposed to new data.

机器学习经常与人工智能紧密相连,在不考虑显式编程的情况下,机器学习可以使计算机具备完成特定任务的能力,例如识别,诊断,规划,机器人控制和预测等。它往往聚焦于算法创新,即在面对新数据时,其自身能够发生演化。

In a way, the process of Machine Learning is similar to that of Data Mining. Both search through data to look for patterns. However, instead of extracting data for human comprehension — as is the case in data mining applications — machine learning uses that data to improve the program’s own understanding. Machine Learning programs detect patterns in data and adjust program actions accordingly.

在某种程度上,机器学习与数据挖掘很相似。它们都是通过数据来获取模式。然而,与人类可理解的数据提取方式不同—通常是按照数据挖掘应用的方式——机器学习主要是使用数据去提升程序本身的理解能力。机器学习程序能够在数据中检测出相关模式并相应的进行程序行为的调整。

Now, are you trying to understand some of the skills necessary to get a Machine Learning job? A good candidate should have a deep understanding of a broad set of algorithms and applied math, problem solving and analytical skills, probability and statistics and programming languages such as Python/C++/R/Java. Beyond all, Machine Learning requires innate curiosity, so if you never lost the curiosity you had when you were a child, you’re a natural candidate for Machine Learning. Here is a list of key skill sets in detail.

现在,你是否准备去了解一些获得机器学习工作必备的技术了呢?一个优秀的求 职者应该对以下各方面知识都有很深的理解:算法和数学应用,问题解决能力和分析 技巧,概率统计和诸如 Python/C++/R/Java 等编程语言。此外,机器学习还需要求职 者具有与生俱来的好奇心,因此,如何你从来没有失去过自孩童时代就有的好奇心, 那么,你就能顺理成章在机器学习领域取得成就。这里详细的列出一个的必备的技能清单。

1. Python/C++/R/Java

If you want a job in Machine Learning, you will probably have to learn all these languages at some point. C++ can help in speeding code up. R works great in statistics and plots, and Hadoop is Java-based, so you probably need to implement mappers and reducers in Java.

如果你希望在机器学习领域获得一份工作,那么在某种程度上,你很可能必须学习这里所列出的所有编程语言。C++ 能够加速代码执行速度。R 在统计绘图方面十分出 色,Hadoop 是以 Java 为基础的,因此,你可能需要在 Java 中完成 Map/Reduce 算法。

2. Probability and Statistics(概率和统计)

Theories help in learning about algorithms. Great samples are Naive Bayes, Gaussian Mixture Models, and Hidden Markov Models. You need to have a firm understanding of Probability and Stats to understand these models. Go nuts and study measure theory. Use statistics as a model evaluation metric: confusion matrices, receiver-operator curves, p-values, etc.

概率和统计理论能够帮助你学习算法。很多常用的模型例如朴素贝叶斯、高斯混合模型和隐马尔可夫模型等,需要你有很好的概率和统计背景知识去理解。甚至你需要全身心的投入并且研究测度理论,同时需要理解一些统计指标,这些指标常作为模型评价标准,例如混淆矩阵,ROC曲线, P值等。

3. Applied Math and Algorithms(数学和算法)

Having a firm understanding of algorithm theory and knowing how the algorithm works, you can also discriminate models such as SVMs. You will need to understand subjects such as gradient decent, convex optimization, lagrange, quadratic programming, partial differential equations and alike. Also, get used to looking at summations.

对算法理论有相当深入的认识并且了解算法运行的机制, 能够帮助你对模型加以区分, 例如支持向量机模型 (译者注:支持向量机模型包括许多不同的核函数,核函数的不同, 具体模型的原理、应用和结论也不同)。 你需要理解一些数学方法, 例如梯度下降, 凸优化, 拉格朗格方法, 二次规划, 偏微分方程等类似的理论和方法。同时,你也需要熟悉求和运算[ http://en.wikipedia.org/wiki/Summation ]。

4. Distributed Computing(分布式计算)

Most of the time, machine learning jobs entail working with large data sets these days. You cannot process this data using single machine, you need to distribute it across an entire cluster. Projects such as Apache Hadoop and cloud services like Amazon’s EC2 makes it easier and cost-effective.

大多数时候,机器学习需要处理大型的数据集。使用单机无法处理这些数据,因此,你需要通过集群进行分布式计算。像 Apache Hadoop 架构和 Amazon 的 EC2 云服务等项目能够使这一过程更加容易, 从而提高成本效益。

5. Expanding the Expertise in Unix Tools(使用Unix工具来拓宽你的专业知识)

You should also master all of the great unix tools that were designed for this: cat, grep, find, awk, sed, sort, cut, tr, and more. Since all of the processing will most likely be on linux-based machine, you need access to these tools. Learn their functions and utilize them well. They certainly have made my life a lot easier.

你应该掌握专门为以下工作而设计的Unix命令或工具: cat, grep, nd, awk, sed, sort, cut, tr 等。由于所有这些处理过程都运行于基于linux平台的设备, 因此, 你需要熟悉这些工具。学习并很好的使用这些工具, 会使你的工作更加轻松。

6. Learning more about Advanced Signal Processing techniques(学习一些信号处理技术)

Feature extraction is one of the most important parts of machine-learning. Different types of problems need various solutions, you may be able to utilize really cool advance signal processing algorithms such as: wavelets, shearlets, curvelets, contourlets, bandlets. Learn about time-frequency analysis, and try to apply it to your problems. If you have not read about Fourier Analysis and Convolution, you will need to learn about this stuff too. The ladder is signal processing 101 stuff though.

特征提取是机器学习最重根据部分之一。不同问题需要不同的解决方案, 你可以使用非常酷的高级信号处理算法,例如小波变换,剪切波变换,曲线波,轮廓波和 bandlets 变换等。学习时频分析技术,并用它来解决你的问题。如果你还不知道傅里 叶分析和卷积原理,你同样也需要学习这些知识。二进制码信号处理技术是解决问题 的重要方法。

7. Other skills

(a) Update oneself: You must stay up to date with any up and coming changes. It also means being aware of the news regarding the development to the tools (changelog, conferences, etc.), theory and algorithms (research papers, blogs, conference videos, etc.). Online community changes quickly. Expect and cultivate this change. (b) Read a lot: Read papers like Google Map-Reduce, Google File System, Google Big Table, The Unreasonable Effectiveness of Data. There are great free machine learning books online and you should read those as well.

(a) 提升自己:你必须时刻保持与新技术的同步以应对将要到来的挑战。这也意 味着你必须注意以下几方面的最新动态:关于这些工具理论的变更日志和会议,算 法的研究论文、博客和会议视频等。(b) 大量阅读。阅读一些像 Google Map-Reduce, Google File System, Google Big Table,以及 e Unreasonable Effectiveness of Data 之类的 论文。此外,网上也有许多免费的机器学习书籍,你同样也应该读一读。

Happy Machine Learning!

译后语:世界上最痛苦的事是什么?就是你以为是原创,其实别人早就做 过了…,这篇译文的题目是 7 key skills required for Machine Learning jobs, 另有一篇 来自于问答 SNS 网站 Quora 的文章 What skills are needed for machine learning jobs?, 两者相似度很高,后者内容更全面,并且已有网友羽林飞扬翻译过了。可悲的 是,我翻译到最后,在查找一个专业术语的译意时才发现的,早知道就不做这无 用功了 (我翻译的真是好烂)…,这里贴出该网友的译文地址供大家阅读:
http://www.cnblogs.com/zhengyuhong/p/3381331.html 。

注:转载请注明出处和作者: http://blog.nlpjob.com

正文到此结束
Loading...