Chris Olah
Jump to navigation
Jump to search
A Chris Olah is a person.
- See: Anthropic, AI Safety, AI Explainability, AI Interpretability, Reinforcement Learning, TensorFlow.
References
2024
- (Templeton et al., 2024) ⇒ Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, Adam Pearce, Craig Citro, Emmanuel Ameisen, Andy Jones, Hoagy Cunningham, Nicholas L Turner, Callum McDougall, Monte MacDiarmid, Alex Tamkin, Esin Durmus, Tristan Hume, Francesco Mosconi, C. Daniel Freeman, Theodore R. Sumers, Edward Rees, Joshua Batson, Adam Jermyn, Shan Carter, Chris Olah, and Tom Henighan. (year 2024). “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet.” In: Circuits Updates.
2022
- (Bai et al., 2022a) ⇒ Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nicholas DasSarma, Daniel Drain, et al. (2022). “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.” In: arXiv preprint arXiv:2204.05862.
- NOTE: The authors describe techniques for enhancing the safety and usefulness of AI systems by incorporating human feedback into the reinforcement learning process, aiming to align the behavior of AI with human values and ethical considerations.
- (Bai et al., 2022b) ⇒ Yuntao Bai, Samuel Kadavath, Shantanu Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, et al. (2022). “Constitutional AI: Harmlessness from AI Feedback.” In: arXiv preprint arXiv:2212.08073.
- NOTE: This paper introduces a novel approach to ensuring the harmlessness of AI by establishing a set of principles or "constitution" that guides AI behavior. The feedback from AI itself is used to ensure adherence to these principles, promoting safer interactions with AI systems.
2018
- (Olah et al., 2018) ⇒ Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, Alexander Mordvintsev. (2018). “The Building Blocks of Interpretability.” In: Distill, 3(3), e10.
- NOTE: It discusses various interpretability methods in machine learning, presenting them as fundamental and composable building blocks for creating rich user interfaces. The paper emphasizes the importance of combining different techniques to understand how neural networks make decisions.
2016
- (Olah & Carter, 2016) ⇒ Chris Olah, and Shan Carter. (year 2016). “Attention and Augmented Recurrent Neural Networks.” In: Distill. doi:10.23915/distill.00001
- (Amodei et al., 2016b) ⇒ Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané. (2016). “Concrete Problems in AI Safety.” arXiv preprint arXiv:1606.06565.
- (Abadi et al., 2016b) ⇒ Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. (2016). “TensorFlow: A System for Large-scale Machine Learning.” In: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. ISBN:978-1-931971-33-1
- (Abadi et. al., 2016a) ⇒ Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng. (2016). “TensorFlow - Large-Scale Machine Learning on Heterogeneous Distributed Systems.” In: arXiv 1603.04467 Journal.
2015
- (Olah, 2015) ⇒ Chris Olah. (2015). “Understanding LSTM Networks.” In: colah.github.io.
- NOTE: It provides a comprehensive explanation of LSTM networks, their architecture, and their advantages in handling long-term dependencies in sequences of data.