Controversial Anyone good with Python here?

ShishioIsBack · Dec 30, 2022

I just need some help

Infitima · Dec 30, 2022

What are you scripting?

U c 4 up da idly · Dec 30, 2022

Should be asking stack overflow

ShishioIsBack · Dec 30, 2022

I have 3 CSV files, I am looping through that to parse data.

So basically I loop through 3 CSV files to make Key Value pairs to make another resultant CSV File.
I want to remove duplicates from that file and duplicates are based on one of the keys

So let me give you a code snippet

for master in master_dict_list:
for data in data_gen_dict_list:
for account in account_list:
if(float(master[account['Target Model']])>0):
deviation = float((random.randint(-int(data['Security deviation %']), int(data['Security deviation %']))))/100
security_weight = float(master[account['Target Model']]) + (float(master[account['Target Model']])*deviation)
security_market_value = float(account['Market Value'])*security_weight/100
shares = round(security_market_value/float(master['Price']))
positions_list.append({
'Account Code' : account['Account Code'],
'Security ID': master['SecurityID'],
'Shares' : shares,
'Security Market Value' : security_market_value,
'Security % Weight' : str(security_weight),
'Security Price' : master['Price']
})

Due to data_gen_dict_list, which has 2 entries, there are duplicates.
I want to remove duplicates where security id and account numbers are the same

Here is what a duplicate could look like
account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23

And so on

Post automatically merged: Dec 30, 2022

@comrade can you help me?

Midnight Delight · Dec 30, 2022

I am a beginner

Warchief Sanji D Goat · Dec 30, 2022

@Worst can probably helps.

U c 4 up da idly · Dec 30, 2022

ShishioIsBack said:

I have 3 CSV files, I am looping through that to parse data.

So basically I loop through 3 CSV files to make Key Value pairs to make another resultant CSV File.
I want to remove duplicates from that file and duplicates are based on one of the keys

So let me give you a code snippet

for master in master_dict_list:
for data in data_gen_dict_list:
for account in account_list:
if(float(master[account['Target Model']])>0):
deviation = float((random.randint(-int(data['Security deviation %']), int(data['Security deviation %']))))/100
security_weight = float(master[account['Target Model']]) + (float(master[account['Target Model']])*deviation)
security_market_value = float(account['Market Value'])*security_weight/100
shares = round(security_market_value/float(master['Price']))
positions_list.append({
'Account Code' : account['Account Code'],
'Security ID': master['SecurityID'],
'Shares' : shares,
'Security Market Value' : security_market_value,
'Security % Weight' : str(security_weight),
'Security Price' : master['Price']
})

Due to data_gen_dict_list, which has 2 entries, there are duplicates.
I want to remove duplicates where security id and account numbers are the same

Here is what a duplicate could look like
account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23

And so on

Post automatically merged: Dec 30, 2022

@comrade can you help me?

Can't you just sort the list account no and security ID and then remove the duplicates or are you looking for a pythonic/elegant solution?

Worst · Dec 30, 2022

Warchief Sanji D Goat said:

Ehhh i'm not a python guy xD i wouldn't know how to give you a code snippet but a super dumb solution could be storing the AccountCode & SecurityID as keys in an object ( or idk if in python there's a better data structure to do this ) and before you push a new element to this position_list you have you check if this key is found in this object you declared

in js i would do something like

JavaScript:

//support variable
const csvKeys = {}

for(.....)
   for(.....)
     for(.....)

const checkForDuplicateskey = `${account["Account Code"]}-${master.SecurityID]}`;

//if we DON'T find the key inside the support variable we're good to go 
if (!csvKeys [checkForDuplicateskey ]) {   

         csvKeys [checkForDuplicateskey ] = true;   
         positions_list.push(YourStuff);

 }

so assuming we have these 2 elements

account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23

during the first loop ( smv:22 )

csvKeys it's empty

so we enter inside the IF

csvKeys becomes

csvKeys = {
"11-abcd" : true
}

during the second loop checkForDuplicateskey will become 11-abcd it will be found inside our csvKeys object and the duplicate ( smv: 23 ) won't be added

I hope this helps :D

Worst · Dec 30, 2022

ShishioIsBack said:

but
That's the point of the logic, not appending and element with the same account code and security id twice

btw 4 for loops ? TF ?

Basically before appending an element to the list we check if that element was already appended before

i don't know which data structure is more appropriate in python probably a dictionary but let's simplify it and use an array of strings cuz it's the most simple

so before the 3 for lops we declare a variable which is an empty array

// here we're gonna store a list of strings where each string is a combination of the account code and security id pairs
alreadyAppendedElements = list()

so let's assume we ran a couple of iterations and we print it it might look something like this

["11-abc","12-fgh","13-etc",.......]

also before you do the if(float(master.......) we save the Account Code and SecurityID in 2 variables ( actually we gonna use 3 variables )

accountCode = account["Account Code"]
securityID = master["SecurityID"]
//this one is important
duplicateKey = accountCode + "-"+securityID

and basically before appending we check if this duplicateKey is inside our alreadyAppendedElements so now you can for example do an and in the if like this

if(float(master....... and duplicateKey not in alreadyAppendedElements)
//rest of the code

and after you append your stuff to position_list we make sure we also add the duplicateKey to the alreadyAppendedElements

basically this way when you are populating your position_list you are making sure that the same account code and security id pair has not been appended already

i hope it makes sense :)

ShishioIsBack · Dec 30, 2022

Worst said:

but
That's the point of the logic, not appending and element with the same account code and security id twice

btw 4 for loops ? TF ?

Basically before appending an element to the list we check if that element was already appended before

i don't know which data structure is more appropriate in python probably a dictionary but let's simplify it and use an array of strings cuz it's the most simple

so before the 3 for lops we declare a variable which is an empty array

// here we're gonna store a list of strings where each string is a combination of the account code and security id pairs
alreadyAppendedElements = list()

so let's assume we ran a couple of iterations and we print it it might look something like this

["11-abc","12-fgh","13-etc",.......]

also before you do the if(float(master.......) we save the Account Code and SecurityID in 2 variables ( actually we gonna use 3 variables )

accountCode = account["Account Code"]
securityID = master["SecurityID"]
//this one is important
duplicateKey = accountCode + "-"+securityID

and basically before appending we check if this duplicateKey is inside our alreadyAppendedElements so now you can for example do an and in the if like this

if(float(master....... and duplicateKey not in alreadyAppendedElements)
//rest of the code

and after you append your stuff to position_list we make sure we also add the duplicateKey to the alreadyAppendedElements

basically this way when you are populating your position_list you are making sure that the same account code and security id pair has not been appended already

i hope it makes sense :)

I am using dictionary so the string you made duplicateKey = accountCode + "-"+securityID is not gonna work.

Here is how one entry looks like

Hiding the Account Code and Security ID values cuz they are confidential
Rest are just calculations so it doesn't matter

Post automatically merged: Dec 30, 2022

But I got the general idea and I think that can work

Worst · Dec 30, 2022

ShishioIsBack said:

Ofc it's gonna work
If the object we're looping through rn is made like this

{'Account Code':333,SecurityID:'abc'.......}

And we do

accountCode = account["Account Code"]
//Same for the securityId

If we print accountCode it will be 333

And if we do

duplicateKey = accountCode + "-"+securityID

duplicateKey will become '333-abc'

U can put some prints here and there to check ;)

Nidai_Kitetsu · Dec 31, 2022

You should ask @Bogard who does it for a living!

I am not that good with python but I would have done as @U c 4 up da idly said, sort 30000 elements and then create a new array with 15000 elements to discard the duplicates, sort of like this way =>

S = ['a','b','b','a','c','e','f','f','c','d','g','g','d','e']

S.sort()

new_array = []

temp = S[0]
new_array.append(temp)

for i in range(len(S)):
if S != temp: ## single tab
new_array.append(S) ## double tab
temp = S ## single tab

Sorry, the post doesnt show the indentations in the loop that I can see in the editor

Controversial Anyone good with Python here?

More options

ShishioIsBack

Infitima

U c 4 up da idly

ShishioIsBack

Midnight Delight

Warchief Sanji D Goat

Queen Gunko!➡️⬆️⬇️⬅️

U c 4 up da idly

Worst

Custom title

Worst

Custom title

ShishioIsBack

Worst

Custom title

Nidai_Kitetsu