- You need to understand the project idea (what they want to get from you)
You need to build a classifier model to determine diabetes. You have no restrictions on tools, new fields, and data encoding method.
The dataset is a collection of medical and demographic data of patients, as well as their diagnosis of diabetes (positive or negative).
The data includes characteristics such as age, gender, body mass index (BMI), hypertension, heart disease, smoking history, HbA1c level, and blood glucose level. This dataset can be used to build machine learning models to predict diabetes in patients based on their medical history and demographic information.
You will be provided with a second dataset, without the target variable (target - diabetes) This dataset will need to be accelerated and submitted to Google Classroom in .csv format, with 2 columns: ID and prediction
The prediction field must be a class prediction (predict), i.e. 1 or 0, not a probability (predict_proba)
df_train = pd.read_csv('training_data.csv', index_col=0)
df_test = pd.read_csv('test_data.csv', index_col=0)