Feature engineering

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature engineering is fundamental to the application of machine learning, and is both difficult and expensive. The need for manual feature engineering can be obviated by automated feature learning.

Feature engineering is an informal topic, but it is considered essential in applied machine learning.

Coming up with features is difficult, time-consuming, requires expert knowledge. "Applied machine learning" is basically feature engineering.

When working on a machine learning problem, feature engineering is manually designing what the input x's should be.

— Shayne Miel, "What is the intuitive explanation of feature engineering in machine learning?"[2]


A feature is a piece of information that might be useful for prediction. Any attribute could be a feature, as long as it is useful to the model.

The purpose of a feature, other than being an attribute, would be much easier to understand in the context of a problem. A feature is a characteristic that might help when solving the problem.[3]

Importance of features

The features in your data are important to the predictive models you use and will influence the results you are going to achieve. The quality and quantity of the features will have great influence on whether the model is good or not.[4]

You could say the better the features are, the better the result is. This isn't entirely true, because the results achieved also depend on the model and the data, not just the chosen features. That said, choosing the right features is still very important. Better features can produce simpler and more flexible models, and they often yield better results.[3]

The algorithms we used are very standard for Kagglers. […] We spent most of our efforts in feature engineering. [...] We were also very careful to discard features likely to expose us to the risk of over-fitting our model.

— Xavier Conort, "Q&A with Xavier Conort"[5]

…some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.

— Pedro Domingos, "A Few Useful Things to Know about Machine Learning"[6]

The process of feature engineering[7]

  1. Brainstorming features;
  2. Deciding what features to create;
  3. Creating features;
  4. Checking how the features work with your model;
  5. Improving your features if needed;
  6. Go back to brainstorming/creating more features until the work is done.

Feature relevance[8]

Depending on a feature it could be strongly relevant (has information that doesn't exist in any other feature), relevant, weakly relevant (some information that other features include) or irrelevant. It is important to create a lot of features. Even if some of them are irrelevant, you can't afford missing the rest. Afterwards, feature selection can be used in order to prevent overfitting.[9]

Feature explosion

Feature explosion can be caused by feature combination or feature templates, both leading to a quick growth in the total number of features.

  • Feature templates - implementing features templates instead of coding new features
  • Feature combinations - combinations that cannot be represented by the linear system

There are a few solutions to help stop feature explosion such as: regularisation, kernel method, feature selection.[10]

See also


  1. "Machine Learning and AI via Brain simulations". Stanford University. Retrieved 2015-03-23.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  2. "What is the intuitive explanation of feature engineering in machine learning? - Quora". www.quora.com. Retrieved 2015-11-11.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  3. 3.0 3.1 "Discover Feature Engineering, How to Engineer Features and How to Get Good at It - Machine Learning Mastery". Machine Learning Mastery. Retrieved 2015-11-11.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  4. "Feature Engineering: How to transform variables and create new ones?". Analytics Vidhya. 2015-03-12. Retrieved 2015-11-12.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  5. kaggle.com,(2015).Q&A with Xavier Conort,[Accessed at:]http://blog.kaggle.com/2013/04/10/qa-with-xavier-conort/%7Caccessdate=November 2015
  6. Domingos, Pedro. "A Few Useful Things to Know about Machine Learning" (PDF). Retrieved 12 November 2015.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  7. "Big Data: Week 3 Video 3 - Feature Engineering". youtube.com.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  8. "Feature Engineering" (PDF). 2010-04-22. Retrieved 12 November 2015.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  9. "Feature engineering and selection" (PDF). Alexandre Bouchard-Côté. Retrieved 12 November 2015.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  10. "Feature engineering in Machine Learning" (PDF). Zdenek Zabokrtsky. Retrieved 12 November 2015.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>