sal <- read.csv(file='C://Users//ckt//OneDrive - MUNI//Institutions//MU//ML Finance//2024//MidTerm//jobsal2024.csv') # These are data of salaries for 'data scientist' jobs around the world # The list of variables can be found in 'DataDescription.csv' # The variable of interest is: # salary - annual salary in USD # There are multiple groups of features: # year - for which the data is recorded # Experience level # Contract type # Company size # Employee residence # Company residence # Job title # Your goal is to design a prediction model that will predict the salary in year 2024. # Think about variables that you will use. You are strongly encouraged to # create your own variables. # Your graded tasks are as follows: # 1) Prepare a table with descriptive statistics and correlations for key variables (not necessarily all) that enter your model. # 6 points # 2) Use at least 4 models to predict salary. Using more complex models (random forest, xgb, elastic net) are more likely to result in maximum of 5 point here. # 5 points # 3) Evaluate models and suggest which model to use in future. Explain your recommendation. # 5 points # Remaining 14 points are rewarded based on the following: # Apart from fulfilling the goal, you will be evaluated based on following criteria: # - if you submit a working code (I load the data and run your code without errors. Think about reproducibility). (4b) # - if you present Figures as well. (1b) # - if you select different 'types' of models, but 4 is enough. (2b) # - if you use multiple loss functions. (1b) # - if you interpret your results correctly. (2b) # - if you create your own variables and consider different transformations. (4b) # - Upload this script with your working codes only. # - Add text to it to describe your results # - Use only prediction models we discussed during lectures and seminars. # - You are allowed to use scripts 'functions' with functions that should be helpful. # - You are allowed to use scripts - official ones - from classes. # Tips # - Do not use response/target variable salary as a feature - you would be surprised how often that happens.