Controversial Anyone good with Python here?

#4
What are you scripting?
I have 3 CSV files, I am looping through that to parse data.

So basically I loop through 3 CSV files to make Key Value pairs to make another resultant CSV File.
I want to remove duplicates from that file and duplicates are based on one of the keys

So let me give you a code snippet


for master in master_dict_list:
for data in data_gen_dict_list:
for account in account_list:
if(float(master[account['Target Model']])>0):
deviation = float((random.randint(-int(data['Security deviation %']), int(data['Security deviation %']))))/100
security_weight = float(master[account['Target Model']]) + (float(master[account['Target Model']])*deviation)
security_market_value = float(account['Market Value'])*security_weight/100
shares = round(security_market_value/float(master['Price']))
positions_list.append({
'Account Code' : account['Account Code'],
'Security ID': master['SecurityID'],
'Shares' : shares,
'Security Market Value' : security_market_value,
'Security % Weight' : str(security_weight),
'Security Price' : master['Price']
})

Due to data_gen_dict_list, which has 2 entries, there are duplicates.
I want to remove duplicates where security id and account numbers are the same

Here is what a duplicate could look like
account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23

And so on
Post automatically merged:

@comrade can you help me?
 
#7
I have 3 CSV files, I am looping through that to parse data.

So basically I loop through 3 CSV files to make Key Value pairs to make another resultant CSV File.
I want to remove duplicates from that file and duplicates are based on one of the keys

So let me give you a code snippet


for master in master_dict_list:
for data in data_gen_dict_list:
for account in account_list:
if(float(master[account['Target Model']])>0):
deviation = float((random.randint(-int(data['Security deviation %']), int(data['Security deviation %']))))/100
security_weight = float(master[account['Target Model']]) + (float(master[account['Target Model']])*deviation)
security_market_value = float(account['Market Value'])*security_weight/100
shares = round(security_market_value/float(master['Price']))
positions_list.append({
'Account Code' : account['Account Code'],
'Security ID': master['SecurityID'],
'Shares' : shares,
'Security Market Value' : security_market_value,
'Security % Weight' : str(security_weight),
'Security Price' : master['Price']
})

Due to data_gen_dict_list, which has 2 entries, there are duplicates.
I want to remove duplicates where security id and account numbers are the same

Here is what a duplicate could look like
account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23

And so on
Post automatically merged:

@comrade can you help me?
Can't you just sort the list account no and security ID and then remove the duplicates or are you looking for a pythonic/elegant solution?
 

Worst

Custom title
#8
Ehhh i'm not a python guy xD i wouldn't know how to give you a code snippet but a super dumb solution could be storing the AccountCode & SecurityID as keys in an object ( or idk if in python there's a better data structure to do this ) and before you push a new element to this position_list you have you check if this key is found in this object you declared


in js i would do something like

JavaScript:
//support variable
const csvKeys = {}

for(.....)
   for(.....)
     for(.....)

const checkForDuplicateskey = `${account["Account Code"]}-${master.SecurityID]}`;

//if we DON'T find the key inside the support variable we're good to go 
if (!csvKeys [checkForDuplicateskey ]) {   

         csvKeys [checkForDuplicateskey ] = true;   
         positions_list.push(YourStuff);

 }
so assuming we have these 2 elements

account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23

during the first loop ( smv:22 )

csvKeys it's empty

so we enter inside the IF

csvKeys becomes

csvKeys = {
"11-abcd" : true
}

during the second loop checkForDuplicateskey will become 11-abcd it will be found inside our csvKeys object and the duplicate ( smv: 23 ) won't be added




I hope this helps :D
 

Worst

Custom title
#9
Oh maybe I didn't give instructions properly, neither of the lists that I am looping over, master_dict_list, data_gen_dict_list or accounts_list have duplicates.

The resultant list positions_list that I am appending is getting duplicates because data_gen_dict_list runs for 2 iterations as it has two entires (Ordered pairs).

The positions_list has 30,000 entries but it should have 150000 because for each account (there are 15000) accounts, there are two duplicates securites.

I want to remove elements from positions_list where account_no is same and securities is also same as one account can not have two securites with same value.

Does that make sense?
Post automatically merged:

@Worst Here is code snap shot
but
That's the point of the logic, not appending and element with the same account code and security id twice

btw 4 for loops ? TF ?

Basically before appending an element to the list we check if that element was already appended before

i don't know which data structure is more appropriate in python probably a dictionary but let's simplify it and use an array of strings cuz it's the most simple

so before the 3 for lops we declare a variable which is an empty array

// here we're gonna store a list of strings where each string is a combination of the account code and security id pairs
alreadyAppendedElements = list()

so let's assume we ran a couple of iterations and we print it it might look something like this

["11-abc","12-fgh","13-etc",.......]

also before you do the if(float(master.......) we save the Account Code and SecurityID in 2 variables ( actually we gonna use 3 variables )

accountCode = account["Account Code"]
securityID = master["SecurityID"]
//this one is important
duplicateKey = accountCode + "-"+securityID

and basically before appending we check if this duplicateKey is inside our alreadyAppendedElements so now you can for example do an and in the if like this

if(float(master....... and duplicateKey not in alreadyAppendedElements)
//rest of the code

and after you append your stuff to position_list we make sure we also add the duplicateKey to the alreadyAppendedElements


basically this way when you are populating your position_list you are making sure that the same account code and security id pair has not been appended already

i hope it makes sense :)
 
#10
but
That's the point of the logic, not appending and element with the same account code and security id twice

btw 4 for loops ? TF ?

Basically before appending an element to the list we check if that element was already appended before

i don't know which data structure is more appropriate in python probably a dictionary but let's simplify it and use an array of strings cuz it's the most simple

so before the 3 for lops we declare a variable which is an empty array

// here we're gonna store a list of strings where each string is a combination of the account code and security id pairs
alreadyAppendedElements = list()

so let's assume we ran a couple of iterations and we print it it might look something like this

["11-abc","12-fgh","13-etc",.......]

also before you do the if(float(master.......) we save the Account Code and SecurityID in 2 variables ( actually we gonna use 3 variables )

accountCode = account["Account Code"]
securityID = master["SecurityID"]
//this one is important
duplicateKey = accountCode + "-"+securityID

and basically before appending we check if this duplicateKey is inside our alreadyAppendedElements so now you can for example do an and in the if like this

if(float(master....... and duplicateKey not in alreadyAppendedElements)
//rest of the code

and after you append your stuff to position_list we make sure we also add the duplicateKey to the alreadyAppendedElements


basically this way when you are populating your position_list you are making sure that the same account code and security id pair has not been appended already

i hope it makes sense :)
I am using dictionary so the string you made duplicateKey = accountCode + "-"+securityID is not gonna work.

Here is how one entry looks like

Hiding the Account Code and Security ID values cuz they are confidential
Rest are just calculations so it doesn't matter
Post automatically merged:

But I got the general idea and I think that can work
 

Worst

Custom title
#11
I am using dictionary so the string you made duplicateKey = accountCode + "-"+securityID is not gonna work.

Here is how one entry looks like

Hiding the Account Code and Security ID values cuz they are confidential
Rest are just calculations so it doesn't matter
Post automatically merged:

But I got the general idea and I think that can work
Ofc it's gonna work
If the object we're looping through rn is made like this

{'Account Code':333,SecurityID:'abc'.......}

And we do

accountCode = account["Account Code"]
//Same for the securityId

If we print accountCode it will be 333

And if we do

duplicateKey = accountCode + "-"+securityID

duplicateKey will become '333-abc'

U can put some prints here and there to check ;)
 
#12
You should ask @Bogard who does it for a living!

I am not that good with python but I would have done as @U c 4 up da idly said, sort 30000 elements and then create a new array with 15000 elements to discard the duplicates, sort of like this way =>

S = ['a','b','b','a','c','e','f','f','c','d','g','g','d','e']

S.sort()

new_array = []

temp = S[0]
new_array.append(temp)

for i in range(len(S)):
if S != temp: ## single tab
new_array.append(S) ## double tab
temp = S ## single tab

Sorry, the post doesnt show the indentations in the loop that I can see in the editor
 
Last edited:
Top