Data Scientist
A Data Scientist is an data practitioner who can perform data science tasks (using data science systems).
- Context:
- They can (typically) have skill in Pattern Recognition (e.g. data clustering, time-series analysis).
- They can (typically) have Skill in Predictive Modeling (e.g. supervised learning).
- They can (typically) have Skill in Statistics (e.g. hypothesis testing and statistical software).
- They can (typically) have Skill in Programming (e.g. Python programming and script programming).
- They can (typically) have skill in Prototyping and Scripting Languages.
- They can (typically) have skill in Data Processing.
- They can (typically) have skill in Large-Data Querying (and use database framework such as Apache Hive).
- They can (typically) have skill in Large-Data Processing (and use a data processing framework such as a Apache Spark).
- They can (typically) specialize in a Data Mining Subtask.
- They can (typically) have knowledge of Applied Statistics (e.g. hypothesis testing) and Statistical Software.
- They can (typically) have knowledge of Predictive Modeling Algorithms (e.g. supervised learning).
- ...
- They can range from (typically) being a Data Science Worker to being a Data Science Hobbyist.
- They can range from being a Inexperienced Data Scientist to being an Experienced Data Scientist.
- ...
- They can (typically) have skill in Communicate of Findings (to product, engineering, and management teams).
- …
- Example(s):
- Counter-Example(s):
- a Data Analyst.
- a Business Intelligence Report Writer, who may lack pattern recognition concepts.
- a Statistician, who may lack important computer science concepts.
- a Computer Systems Analyst, who may lack important statistical concepts.
- a Software Programmer, who may lack important statistical concepts.
- a Machine Learning Researcher, who may not be a practitioner.
- a Database Administrator.
- See: Data Munging, Data Science Course.
References
2018
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Data_science Retrieved:2018-7-25.
- Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining. Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge. When Harvard Business Review called it "The Sexiest Job of the 21st Century", the term "data science" became a buzzword, and is now often applied to business analytics, business intelligence, predictive modeling, or any arbitrary use of data, or used as a glamorized term for statistics. In many cases, earlier approaches and solutions are now simply rebranded as "data science" to be more attractive, which can cause the term to become "dilute[d] beyond usefulness." While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents. To its discredit, however, many data science and big data projects fail to deliver useful results, often as a result of poor management and utilization of resources.
2018b
- https://linkedin.com/pulse/one-data-science-job-doesnt-fit-all-elena-grewal
- QUOTE: ... We decided to restructure data science along three tracks. These described what we were looking for and are areas we want to attract talent. The Analytics track is ideal for those who are skilled at asking a great question, exploring cuts of the data in a revealing way, automating analysis through dashboards and visualizations, and driving changes in the business as a result of recommendations. The Algorithms track would be the home for those with expertise in machine learning, passionate about creating business value by infusing data in our product and processes. And the Inference track would be perfect for our statisticians, economists, and social scientists using statistics to improve our decision making and measure the impact of our work. …
2015
- http://dataconomy.com/the-22-skills-of-a-data-scientist/
- QUOTE: To recap: Data Business People (DB) are leaders and entrepreneurs. Data Creatives (DC) are multi-talented artists and hackers. Data Developers (DD) are programmers and engineers. Data Researchers (DR) are scientists and statisticians. ...
- Algorithms (ex: computational complexity, CS theory) DD,DR
- Back-End Programming (ex: JAVA/Rails/Objective C) DC, DD
- Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS) DD, DR
- Big and Distributed Data (ex: Hadoop, Map/Reduce) DB, DC, DD
- Business (ex: management, business development, budgeting) DB
- Classical Statistics (ex: general linear model, ANOVA) DB, DC, DR
- Data Manipulation (ex: regexes, R, SAS, web scraping) DC, DR
- Front-End Programming (ex: JavaScript, HTML, CSS) DC, DD
- Graphical Models (ex: social networks, Bayes networks) DD, DR
- Machine Learning (ex: decision trees, neural nets, SVM, clustering) DC, DD
- Math (ex: linear algebra, real analysis, calculus) DD,DR
- Optimization (ex: linear, integer, convex, global) DD, DR
- Product Development (ex: design, project management) DB
- Science (ex: experimental design, technical writing/publishing) DC, DR
- Simulation (ex: discrete, agent-based, continuous) DD,DR
- Spatial Statistics (ex: geographic covariates, GIS) DC, DR
- Structured Data (ex: SQL, JSON, XML) DC, DD
- Surveys and Marketing (ex: multinomial modeling) DC, DR
- Systems Administration (ex: *nix, DBA, cloud tech.) DC, DD
- Temporal Statistics (ex: forecasting, time-series analysis) DC, DR
- Unstructured Data (ex: noSQL, text mining) DC, DD
- Visualisation (ex: statistical graphics, mapping, web-based data-viz) DC, DR
- QUOTE: To recap: Data Business People (DB) are leaders and entrepreneurs. Data Creatives (DC) are multi-talented artists and hackers. Data Developers (DD) are programmers and engineers. Data Researchers (DR) are scientists and statisticians. ...
2014
- http://www.experfy.com/jobs/915-Data-Scientist
- QUOTE: As a Twitter Data Scientist specializing in analysis, you will use statistical analysis and data mining techniques to better understand how users engage with Twitter, participate in creation and measurement of new and experimental features, and define meaningful success metrics for Twitter products. You should be passionate about finding insights in data and using quantitative analysis to answer complex questions. You should have a strong background in statistics and data analysis. Experience in modeling, machine learning, and working with large datasets is a plus.
Responsibilities
- Conduct statistical analyses to learn from and scale to petabytes of data
- Use Map-Reduce frameworks such as Pig and Scalding, statistical software such as R, and scripting languages like Python and Ruby
- Write and interpret complex SQL queries for standard as well as ad hoc data mining purposes
- Communicate findings to product, engineering, and management teams
- QUOTE: As a Twitter Data Scientist specializing in analysis, you will use statistical analysis and data mining techniques to better understand how users engage with Twitter, participate in creation and measurement of new and experimental features, and define meaningful success metrics for Twitter products. You should be passionate about finding insights in data and using quantitative analysis to answer complex questions. You should have a strong background in statistics and data analysis. Experience in modeling, machine learning, and working with large datasets is a plus.
2013
- http://wikipedia.org/wiki/Data_science
- … A practitioner of data science is called a data scientist. The term was coined by DJ Patil and Jeff Hammerbacher.[1] Data scientists solve complex data problems through employing deep expertise in some scientific discipline. It is generally expected that data scientists are able to work with various elements of mathematics, statistics and computer science, although expertise in these subjects are not required. However, a data scientist is most likely to be an expert in only one or two of these disciplines and proficient in another two or three. There is probably no living person who is an expert in all of these disciplines - if so they would be extremely rare. This means that data science must be practiced as a team, where across the membership of the team there is expertise and proficiency across all the disciplines.
Good data scientists are able to apply their skills to achieve a broad spectrum of end results. Some of these include the ability to find and interpret rich data sources, manage large amounts of data despite hardware, software and bandwidth constraints, merge data sources together, ensure consistency of data-sets, create visualizations to aid in understanding data and building rich tools that enable others to work effectively. The skill-sets and competencies that data scientists employ vary widely. Data scientists are an integral part of competitive intelligence, a newly emerging field that encompasses a number of activities, such as data mining and analysis, that can help businesses gain a competitive edge.[2]
A major goal of data science is to make it easier for others to find and coalesce data with greater ease. Data science technologies impact how we access data and conduct research across various domains, including the biological sciences, medical informatics, social sciences and the humanities.
- … A practitioner of data science is called a data scientist. The term was coined by DJ Patil and Jeff Hammerbacher.[1] Data scientists solve complex data problems through employing deep expertise in some scientific discipline. It is generally expected that data scientists are able to work with various elements of mathematics, statistics and computer science, although expertise in these subjects are not required. However, a data scientist is most likely to be an expert in only one or two of these disciplines and proficient in another two or three. There is probably no living person who is an expert in all of these disciplines - if so they would be extremely rare. This means that data science must be practiced as a team, where across the membership of the team there is expertise and proficiency across all the disciplines.
- ↑ Template:Cite news
- ↑ LaPonsie, Maryalene. "Data scientists: The Hottest Job You Haven't Heard Of". http://jobs.aol.com/articles/2011/08/10/data-scientist-the-hottest-job-you-havent-heard-of/. Retrieved 7 October 2012.